# hub.solver.ldfs.ldfs
Domain specification
# LDFS
Labeled Depth-First Search (LDFS) solver for MDPs.
From: Bonet & Geffner, "Learning Depth-First Search: A Unified Approach to Heuristic Search in Deterministic and Non-Deterministic Settings, and Its Application to MDPs", ICAPS 2008.
LDFS is a systematic alternative to LRTDP for solving SSPs and discounted MDPs. Instead of random trials, it performs depth-first search from the initial state, following the greedy policy and expanding states on-the-fly. After the DFS returns from each state, the check_solved procedure labels converged states as solved.
Uses cost minimization: V(s) = min_a [C(s,a) + gamma * sum_s' P(s'|s,a) * V(s')]. Terminal states are absorbing with value set by the terminal_value functor (defaults to Value(cost=0)).
Unlike VI/PI which enumerate the full reachable state space, LDFS only explores states reachable under the evolving greedy policy, which can be much smaller on large domains.
# Constructor LDFS
LDFS(
domain_factory: Callable[[], Domain],
heuristic: Callable[[Domain, D.T_state], StrDict[Value[D.T_value]]] = <lambda function>,
terminal_value: Callable[[D.T_state], Value[D.T_value]] = <lambda function>,
discount: float = 1.0,
epsilon: float = 0.001,
max_depth: int = 0,
parallel: bool = False,
shared_memory_proxy = None,
callback: Callable[[LDFS], bool] = <lambda function>,
verbose: bool = False
) -> None
Construct a LDFS solver instance
# Parameters
- domain_factory: The lambda function to create a domain instance.
- heuristic: Function h(domain, state) -> Value used to initialize V(s) = h(s).cost for newly discovered states. An admissible heuristic (h(s) <= V*(s)) accelerates convergence. Defaults to Value(cost=0).
- terminal_value: Function f(state) -> Value assigning a fixed value to terminal (absorbing) states. Use Value(cost=0) for goal-like terminals and Value(cost=large_penalty) for dead-end-like terminals. Defaults to Value(cost=0).
- discount: Value function's discount factor. Defaults to 0.999.
- epsilon: Maximum Bellman error allowed to label a state as solved. Defaults to 0.001.
- max_depth: Maximum DFS depth per driver iteration. 0 means unlimited. When reached, the DFS backtracks as if the state were unsolved. The driver loop retries, so correctness is preserved — only per-iteration work is bounded. Defaults to 0 (unlimited).
- parallel: Parallelize action-transition generation. Defaults to False.
- shared_memory_proxy: The optional shared memory proxy. Defaults to None.
- callback: Lambda function called at the end of each LDFS pass, taking the solver as argument, returning true to stop. Defaults to never stop.
- verbose: Whether verbose messages should be logged. Defaults to False.
# call_domain_method ParallelSolver
call_domain_method(
self,
name,
*args
)
Calls a parallel domain's method. This is the only way to get a domain method for a parallel domain.
# check_domain Solver
check_domain(
domain: Domain
) -> bool
Check whether a domain is compliant with this solver type.
By default, Solver.check_domain() provides some boilerplate code and internally
calls Solver._check_domain_additional() (which returns True by default but can be overridden to define
specific checks in addition to the "domain requirements"). The boilerplate code automatically checks whether all
domain requirements are met.
# Parameters
- domain: The domain to check.
# Returns
True if the domain is compliant with the solver type (False otherwise).
# close ParallelSolver
close(
self
)
Joins the parallel domains' processes.
# complete_with_default_hyperparameters Hyperparametrizable
complete_with_default_hyperparameters(
kwargs: dict[str, Any],
names: Optional[list[str]] = None
)
Add missing hyperparameters to kwargs by using default values
Args:
kwargs: keyword arguments to complete (e.g. for __init__, init_model, or solve)
names: names of the hyperparameters to add if missing.
By default, all available hyperparameters.
Returns: a new dictionary, completion of kwargs
# copy_and_update_hyperparameters Hyperparametrizable
copy_and_update_hyperparameters(
names: Optional[list[str]] = None,
**kwargs_by_name: dict[str, Any]
) -> list[Hyperparameter]
Copy hyperparameters definition of this class and update them with specified kwargs.
This is useful to define hyperparameters for a child class for which only choices of the hyperparameter change for instance.
Args: names: names of hyperparameters to copy. Default to all. **kwargs_by_name: for each hyperparameter specified by its name, the attributes to update. If a given hyperparameter name is not specified, the hyperparameter is copied without further update.
Returns:
# get_default_hyperparameters Hyperparametrizable
get_default_hyperparameters(
names: Optional[list[str]] = None
) -> dict[str, Any]
Get hyperparameters default values.
Args: names: names of the hyperparameters to choose. By default, all available hyperparameters will be suggested.
Returns: a mapping between hyperparameter's name_in_kwargs and its default value (None if not specified)
# get_domain ParallelSolver
get_domain(
self
)
Returns the domain, optionally creating a parallel domain if not already created.
# get_domain_requirements Solver
get_domain_requirements(
) -> list[type]
Get domain requirements for this solver class to be applicable.
Domain requirements are classes from the skdecide.builders.domain package that the domain needs to inherit from.
# Returns
A list of classes to inherit from.
# get_explored_states LDFS
get_explored_states(
self
) -> set[StrDict[D.T_observation]]
Get all states explored so far
# get_hyperparameter Hyperparametrizable
get_hyperparameter(
name: str
) -> Hyperparameter
Get hyperparameter from given name.
# get_hyperparameters_by_name Hyperparametrizable
get_hyperparameters_by_name(
) -> dict[str, Hyperparameter]
Mapping from name to corresponding hyperparameter.
# get_hyperparameters_names Hyperparametrizable
get_hyperparameters_names(
) -> list[str]
List of hyperparameters names.
# get_nb_of_explored_states LDFS
get_nb_of_explored_states(
self
) -> int
Get the number of states explored so far
# get_nb_tip_states LDFS
get_nb_tip_states(
self
) -> int
Get the number of tip states expanded during search
# get_next_action DeterministicPolicies
get_next_action(
self,
observation: StrDict[D.T_observation],
domain: Optional[Domain] = None
) -> StrDict[list[D.T_event]]
Get the next deterministic action (from the solver's current policy).
# Parameters
- observation: The observation for which next action is requested.
- domain: the domain source of the observation. Typically used to get current applicable actions or action mask. NB: Be careful that the domain has not been autocast, so may not respect the T_domain specs.
# Returns
The next deterministic action.
# get_next_action_distribution UncertainPolicies
get_next_action_distribution(
self,
observation: StrDict[D.T_observation],
domain: Optional[Domain] = None
) -> Distribution[StrDict[list[D.T_event]]]
Get the probabilistic distribution of next action for the given observation (from the solver's current policy).
# Parameters
- observation: The observation to consider.
- domain: the domain source of the observation. Typically used to get current applicable actions or action mask.
# Returns
The probabilistic distribution of next action.
# get_policy LDFS
get_policy(
self
) -> dict[StrDict[D.T_observation], tuple[StrDict[list[D.T_event]], float]]
Get the (partial) solution policy
# get_solved_states LDFS
get_solved_states(
self
) -> set[StrDict[D.T_observation]]
Get states labeled as solved (converged)
# get_solving_time LDFS
get_solving_time(
self
) -> int
Get the solving time in milliseconds
# get_strongly_connected_components LDFS
get_strongly_connected_components(
self
) -> list[set[StrDict[D.T_observation]]]
Get the strongly connected components discovered so far by Tarjan's algorithm during the DFS. Each SCC is a set of states that form a cycle under the greedy policy. Useful for monitoring convergence within cyclic components.
# get_utility Utilities
get_utility(
self,
observation: StrDict[D.T_observation]
) -> D.T_value
Get the estimated on-policy utility of the given observation.
In mathematical terms, for a fully observable domain, this function estimates:
# Parameters
- observation: The observation to consider.
# Returns
The estimated on-policy utility of the given observation.
# is_policy_defined_for Policies
is_policy_defined_for(
self,
observation: StrDict[D.T_observation]
) -> bool
Check whether the solver's current policy is defined for the given observation.
# Parameters
- observation: The observation to consider.
# Returns
True if the policy is defined for the given observation memory (False otherwise).
# reset Solver
reset(
self
) -> None
Reset whatever is needed on this solver before running a new episode.
This function does nothing by default but can be overridden if needed (e.g. to reset the hidden state of a LSTM policy network, which carries information about past observations seen in the previous episode).
# sample_action Policies
sample_action(
self,
observation: StrDict[D.T_observation],
domain: Optional[Domain] = None
) -> StrDict[list[D.T_event]]
Sample an action for the given observation (from the solver's current policy).
# Parameters
- observation: The observation for which an action must be sampled.
- domain: the domain source of the observation. Typically used to get current applicable actions or action mask.
# Returns
The sampled action.
# solve FromInitialState
solve(
self,
from_memory: Optional[Memory[D.T_state]] = None
) -> None
Run the solving process.
# Parameters
- from_memory: The source memory (state or history) from which we begin the solving process. If None, initial state is used if the domain is initializable, else a ValueError is raised.
TIP
The nature of the solutions produced here depends on other solver's characteristics like
policy and assessibility.
# solve_from FromAnyState
solve_from(
self,
memory: Memory[D.T_state]
) -> None
Run the solving process from a given state.
# Parameters
- memory: The source memory (state or history) of the transition.
TIP
The nature of the solutions produced here depends on other solver's characteristics like
policy and assessibility.
# suggest_hyperparameter_with_optuna Hyperparametrizable
suggest_hyperparameter_with_optuna(
trial: optuna.trial.Trial,
name: str,
prefix: str,
**kwargs
) -> Any
Suggest hyperparameter value during an Optuna trial.
This can be used during Optuna hyperparameters tuning.
Args: trial: optuna trial during hyperparameters tuning name: name of the hyperparameter to choose prefix: prefix to add to optuna corresponding parameter name (useful for disambiguating hyperparameters from subsolvers in case of meta-solvers) **kwargs: options for optuna hyperparameter suggestions
Returns:
kwargs can be used to pass relevant arguments to
- trial.suggest_float()
- trial.suggest_int()
- trial.suggest_categorical()
For instance it can
- add a low/high value if not existing for the hyperparameter or override it to narrow the search. (for float or int hyperparameters)
- add a step or log argument (for float or int hyperparameters, see optuna.trial.Trial.suggest_float())
- override choices for categorical or enum parameters to narrow the search
# suggest_hyperparameters_with_optuna Hyperparametrizable
suggest_hyperparameters_with_optuna(
trial: optuna.trial.Trial,
names: Optional[list[str]] = None,
kwargs_by_name: Optional[dict[str, dict[str, Any]]] = None,
fixed_hyperparameters: Optional[dict[str, Any]] = None,
prefix: str
) -> dict[str, Any]
Suggest hyperparameters values during an Optuna trial.
Args:
trial: optuna trial during hyperparameters tuning
names: names of the hyperparameters to choose.
By default, all available hyperparameters will be suggested.
If fixed_hyperparameters is provided, the corresponding names are removed from names.
kwargs_by_name: options for optuna hyperparameter suggestions, by hyperparameter name
fixed_hyperparameters: values of fixed hyperparameters, useful for suggesting subbrick hyperparameters,
if the subbrick class is not suggested by this method, but already fixed.
Will be added to the suggested hyperparameters.
prefix: prefix to add to optuna corresponding parameters
(useful for disambiguating hyperparameters from subsolvers in case of meta-solvers)
Returns:
mapping between the hyperparameter name and its suggested value.
If the hyperparameter has an attribute name_in_kwargs, this is used as the key in the mapping
instead of the actual hyperparameter name.
the mapping is updated with fixed_hyperparameters.
kwargs_by_name[some_name] will be passed as **kwargs to suggest_hyperparameter_with_optuna(name=some_name)
# _check_domain_additional Solver
_check_domain_additional(
domain: Domain
) -> bool
Check whether the given domain is compliant with the specific requirements of this solver type (i.e. the ones in addition to "domain requirements").
This is a helper function called by default from Solver.check_domain(). It focuses on specific checks, as
opposed to taking also into account the domain requirements for the latter.
# Parameters
- domain: The domain to check.
# Returns
True if the domain is compliant with the specific requirements of this solver type (False otherwise).
# _get_next_action DeterministicPolicies
_get_next_action(
self,
observation: StrDict[D.T_observation],
domain: Optional[Domain] = None
) -> StrDict[list[D.T_event]]
Get the next deterministic action (from the solver's current policy).
# Parameters
- observation: The observation for which next action is requested.
- domain: the domain source of the observation. Typically used to get current applicable actions or action mask. NB: Be careful that the domain has not been autocast, so may not respect the T_domain specs.
# Returns
The next deterministic action.
# _get_next_action_distribution UncertainPolicies
_get_next_action_distribution(
self,
observation: StrDict[D.T_observation],
domain: Optional[Domain] = None
) -> Distribution[StrDict[list[D.T_event]]]
Get the probabilistic distribution of next action for the given observation (from the solver's current policy).
# Parameters
- observation: The observation to consider.
- domain: the domain source of the observation. Typically used to get current applicable actions or action mask. NB: Be careful that the domain has not been autocast, so may not respect the T_domain specs.
# Returns
The probabilistic distribution of next action.
# _get_utility Utilities
_get_utility(
self,
observation: StrDict[D.T_observation]
) -> D.T_value
Get the estimated on-policy utility of the given observation.
In mathematical terms, for a fully observable domain, this function estimates:
# Parameters
- observation: The observation to consider.
# Returns
The estimated on-policy utility of the given observation.
# _initialize Solver
_initialize(
self
)
Launches the parallel domains. This method requires to have previously recorded the self._domain_factory, the set of lambda functions passed to the solver's constructor (e.g. heuristic lambda for heuristic-based solvers), and whether the parallel domain jobs should notify their status via the IPC protocol (required when interacting with other programming languages like C++)
# _is_policy_defined_for Policies
_is_policy_defined_for(
self,
observation: StrDict[D.T_observation]
) -> bool
Check whether the solver's current policy is defined for the given observation.
# Parameters
- observation: The observation to consider.
# Returns
True if the policy is defined for the given observation memory (False otherwise).
# _reset Solver
_reset(
self
) -> None
Reset whatever is needed on this solver before running a new episode.
This function does nothing by default but can be overridden if needed (e.g. to reset the hidden state of a LSTM policy network, which carries information about past observations seen in the previous episode).
# _sample_action Policies
_sample_action(
self,
observation: StrDict[D.T_observation],
domain: Optional[Domain] = None
) -> StrDict[list[D.T_event]]
Sample an action for the given observation (from the solver's current policy).
# Parameters
- observation: The observation for which an action must be sampled.
- domain: the domain source of the observation. Typically used to get current applicable actions or action mask. NB: Be careful that the domain has not been autocast, so may not respect the T_domain specs.
# Returns
The sampled action.
# _solve FromInitialState
_solve(
self,
from_memory: Optional[Memory[D.T_state]] = None
) -> None
Run the solving process.
# Parameters
- from_memory: The source memory (state or history) from which we begin the solving process. If None, initial state is used if the domain is initializable, else a ValueError is raised.
TIP
The nature of the solutions produced here depends on other solver's characteristics like
policy and assessibility.
# _solve_from FromAnyState
_solve_from(
self,
memory: Memory[D.T_state]
) -> None
Run the solving process from a given state.
# Parameters
- memory: The source memory (state or history) of the transition.
TIP
The nature of the solutions produced here depends on other solver's characteristics like
policy and assessibility.
# IDAstar
IDA* solver for deterministic planning problems.
From: Bonet & Geffner, "Learning Depth-First Search: A Unified Approach to Heuristic Search in Deterministic and Non-Deterministic Settings, and Its Application to MDPs", ICAPS 2008, Proposition 6.
IDA* is a specialization of LDFS for deterministic domains. When all transitions are deterministic and the heuristic is admissible (monotone), LDFS reduces exactly to IDA* with transposition tables. The algorithm, SCCs, and Tarjan bookkeeping simplify away since deterministic policies cannot have cycles, leaving pure iterative-deepening depth-first search.
Uses cost minimization with goals, same as LDFS.
# Constructor IDAstar
IDAstar(
domain_factory: Callable[[], Domain],
heuristic: Callable[[Domain, D_IDAstar.T_state], D_IDAstar.T_agent[Value[D_IDAstar.T_value]]] = <lambda function>,
max_depth: int = 0,
parallel: bool = False,
shared_memory_proxy = None,
callback: Callable[[LDFS], bool] = <lambda function>,
verbose: bool = False
) -> None
Construct an IDA* solver instance.
# Parameters
- domain_factory: The lambda function to create a domain instance.
- heuristic: Function h(domain, state) -> Value used to initialize V(s) = h(s).cost for newly discovered states. An admissible heuristic (h(s) <= V*(s)) is required for optimality. Defaults to Value(cost=0).
- max_depth: Maximum DFS depth per iteration. 0 means unlimited. Defaults to 0.
- parallel: Parallelize action-transition generation. Defaults to False.
- shared_memory_proxy: The optional shared memory proxy. Defaults to None.
- callback: Lambda function called at the end of each IDA* pass, taking the solver as argument, returning true to stop. Defaults to never stop.
- verbose: Whether verbose messages should be logged. Defaults to False.
# call_domain_method ParallelSolver
call_domain_method(
self,
name,
*args
)
Calls a parallel domain's method. This is the only way to get a domain method for a parallel domain.
# check_domain Solver
check_domain(
domain: Domain
) -> bool
Check whether a domain is compliant with this solver type.
By default, Solver.check_domain() provides some boilerplate code and internally
calls Solver._check_domain_additional() (which returns True by default but can be overridden to define
specific checks in addition to the "domain requirements"). The boilerplate code automatically checks whether all
domain requirements are met.
# Parameters
- domain: The domain to check.
# Returns
True if the domain is compliant with the solver type (False otherwise).
# close ParallelSolver
close(
self
)
Joins the parallel domains' processes.
# complete_with_default_hyperparameters Hyperparametrizable
complete_with_default_hyperparameters(
kwargs: dict[str, Any],
names: Optional[list[str]] = None
)
Add missing hyperparameters to kwargs by using default values
Args:
kwargs: keyword arguments to complete (e.g. for __init__, init_model, or solve)
names: names of the hyperparameters to add if missing.
By default, all available hyperparameters.
Returns: a new dictionary, completion of kwargs
# copy_and_update_hyperparameters Hyperparametrizable
copy_and_update_hyperparameters(
names: Optional[list[str]] = None,
**kwargs_by_name: dict[str, Any]
) -> list[Hyperparameter]
Copy hyperparameters definition of this class and update them with specified kwargs.
This is useful to define hyperparameters for a child class for which only choices of the hyperparameter change for instance.
Args: names: names of hyperparameters to copy. Default to all. **kwargs_by_name: for each hyperparameter specified by its name, the attributes to update. If a given hyperparameter name is not specified, the hyperparameter is copied without further update.
Returns:
# get_default_hyperparameters Hyperparametrizable
get_default_hyperparameters(
names: Optional[list[str]] = None
) -> dict[str, Any]
Get hyperparameters default values.
Args: names: names of the hyperparameters to choose. By default, all available hyperparameters will be suggested.
Returns: a mapping between hyperparameter's name_in_kwargs and its default value (None if not specified)
# get_domain ParallelSolver
get_domain(
self
)
Returns the domain, optionally creating a parallel domain if not already created.
# get_domain_requirements Solver
get_domain_requirements(
) -> list[type]
Get domain requirements for this solver class to be applicable.
Domain requirements are classes from the skdecide.builders.domain package that the domain needs to inherit from.
# Returns
A list of classes to inherit from.
# get_explored_states LDFS
get_explored_states(
self
) -> set[StrDict[D.T_observation]]
Get all states explored so far
# get_hyperparameter Hyperparametrizable
get_hyperparameter(
name: str
) -> Hyperparameter
Get hyperparameter from given name.
# get_hyperparameters_by_name Hyperparametrizable
get_hyperparameters_by_name(
) -> dict[str, Hyperparameter]
Mapping from name to corresponding hyperparameter.
# get_hyperparameters_names Hyperparametrizable
get_hyperparameters_names(
) -> list[str]
List of hyperparameters names.
# get_nb_of_explored_states LDFS
get_nb_of_explored_states(
self
) -> int
Get the number of states explored so far
# get_nb_tip_states LDFS
get_nb_tip_states(
self
) -> int
Get the number of tip states expanded during search
# get_next_action DeterministicPolicies
get_next_action(
self,
observation: StrDict[D.T_observation],
domain: Optional[Domain] = None
) -> StrDict[list[D.T_event]]
Get the next deterministic action (from the solver's current policy).
# Parameters
- observation: The observation for which next action is requested.
- domain: the domain source of the observation. Typically used to get current applicable actions or action mask. NB: Be careful that the domain has not been autocast, so may not respect the T_domain specs.
# Returns
The next deterministic action.
# get_next_action_distribution UncertainPolicies
get_next_action_distribution(
self,
observation: StrDict[D.T_observation],
domain: Optional[Domain] = None
) -> Distribution[StrDict[list[D.T_event]]]
Get the probabilistic distribution of next action for the given observation (from the solver's current policy).
# Parameters
- observation: The observation to consider.
- domain: the domain source of the observation. Typically used to get current applicable actions or action mask.
# Returns
The probabilistic distribution of next action.
# get_plan IDAstar
get_plan(
self,
from_memory: Optional[D_IDAstar.T_memory[D_IDAstar.T_state]] = None
) -> list[D_IDAstar.T_agent[D_IDAstar.T_concurrency[D_IDAstar.T_event]]]
Get the solution plan (sequence of actions to goal).
Since the domain is deterministic, the greedy policy defines a unique action sequence. Calls the C++ IDAstarSolver::get_plan() which follows the greedy policy through the search graph.
# Parameters
from_memory: State from which to extract the plan. If None, uses the domain's initial state.
Returns an empty list if no solution is defined.
# get_policy LDFS
get_policy(
self
) -> dict[StrDict[D.T_observation], tuple[StrDict[list[D.T_event]], float]]
Get the (partial) solution policy
# get_solved_states LDFS
get_solved_states(
self
) -> set[StrDict[D.T_observation]]
Get states labeled as solved (converged)
# get_solving_time LDFS
get_solving_time(
self
) -> int
Get the solving time in milliseconds
# get_strongly_connected_components LDFS
get_strongly_connected_components(
self
) -> list[set[StrDict[D.T_observation]]]
Get the strongly connected components discovered so far by Tarjan's algorithm during the DFS. Each SCC is a set of states that form a cycle under the greedy policy. Useful for monitoring convergence within cyclic components.
# get_utility Utilities
get_utility(
self,
observation: StrDict[D.T_observation]
) -> D.T_value
Get the estimated on-policy utility of the given observation.
In mathematical terms, for a fully observable domain, this function estimates:
# Parameters
- observation: The observation to consider.
# Returns
The estimated on-policy utility of the given observation.
# is_policy_defined_for Policies
is_policy_defined_for(
self,
observation: StrDict[D.T_observation]
) -> bool
Check whether the solver's current policy is defined for the given observation.
# Parameters
- observation: The observation to consider.
# Returns
True if the policy is defined for the given observation memory (False otherwise).
# reset Solver
reset(
self
) -> None
Reset whatever is needed on this solver before running a new episode.
This function does nothing by default but can be overridden if needed (e.g. to reset the hidden state of a LSTM policy network, which carries information about past observations seen in the previous episode).
# sample_action Policies
sample_action(
self,
observation: StrDict[D.T_observation],
domain: Optional[Domain] = None
) -> StrDict[list[D.T_event]]
Sample an action for the given observation (from the solver's current policy).
# Parameters
- observation: The observation for which an action must be sampled.
- domain: the domain source of the observation. Typically used to get current applicable actions or action mask.
# Returns
The sampled action.
# solve FromInitialState
solve(
self,
from_memory: Optional[Memory[D.T_state]] = None
) -> None
Run the solving process.
# Parameters
- from_memory: The source memory (state or history) from which we begin the solving process. If None, initial state is used if the domain is initializable, else a ValueError is raised.
TIP
The nature of the solutions produced here depends on other solver's characteristics like
policy and assessibility.
# solve_from FromAnyState
solve_from(
self,
memory: Memory[D.T_state]
) -> None
Run the solving process from a given state.
# Parameters
- memory: The source memory (state or history) of the transition.
TIP
The nature of the solutions produced here depends on other solver's characteristics like
policy and assessibility.
# suggest_hyperparameter_with_optuna Hyperparametrizable
suggest_hyperparameter_with_optuna(
trial: optuna.trial.Trial,
name: str,
prefix: str,
**kwargs
) -> Any
Suggest hyperparameter value during an Optuna trial.
This can be used during Optuna hyperparameters tuning.
Args: trial: optuna trial during hyperparameters tuning name: name of the hyperparameter to choose prefix: prefix to add to optuna corresponding parameter name (useful for disambiguating hyperparameters from subsolvers in case of meta-solvers) **kwargs: options for optuna hyperparameter suggestions
Returns:
kwargs can be used to pass relevant arguments to
- trial.suggest_float()
- trial.suggest_int()
- trial.suggest_categorical()
For instance it can
- add a low/high value if not existing for the hyperparameter or override it to narrow the search. (for float or int hyperparameters)
- add a step or log argument (for float or int hyperparameters, see optuna.trial.Trial.suggest_float())
- override choices for categorical or enum parameters to narrow the search
# suggest_hyperparameters_with_optuna Hyperparametrizable
suggest_hyperparameters_with_optuna(
trial: optuna.trial.Trial,
names: Optional[list[str]] = None,
kwargs_by_name: Optional[dict[str, dict[str, Any]]] = None,
fixed_hyperparameters: Optional[dict[str, Any]] = None,
prefix: str
) -> dict[str, Any]
Suggest hyperparameters values during an Optuna trial.
Args:
trial: optuna trial during hyperparameters tuning
names: names of the hyperparameters to choose.
By default, all available hyperparameters will be suggested.
If fixed_hyperparameters is provided, the corresponding names are removed from names.
kwargs_by_name: options for optuna hyperparameter suggestions, by hyperparameter name
fixed_hyperparameters: values of fixed hyperparameters, useful for suggesting subbrick hyperparameters,
if the subbrick class is not suggested by this method, but already fixed.
Will be added to the suggested hyperparameters.
prefix: prefix to add to optuna corresponding parameters
(useful for disambiguating hyperparameters from subsolvers in case of meta-solvers)
Returns:
mapping between the hyperparameter name and its suggested value.
If the hyperparameter has an attribute name_in_kwargs, this is used as the key in the mapping
instead of the actual hyperparameter name.
the mapping is updated with fixed_hyperparameters.
kwargs_by_name[some_name] will be passed as **kwargs to suggest_hyperparameter_with_optuna(name=some_name)
# _check_domain_additional Solver
_check_domain_additional(
domain: Domain
) -> bool
Check whether the given domain is compliant with the specific requirements of this solver type (i.e. the ones in addition to "domain requirements").
This is a helper function called by default from Solver.check_domain(). It focuses on specific checks, as
opposed to taking also into account the domain requirements for the latter.
# Parameters
- domain: The domain to check.
# Returns
True if the domain is compliant with the specific requirements of this solver type (False otherwise).
# _get_next_action DeterministicPolicies
_get_next_action(
self,
observation: StrDict[D.T_observation],
domain: Optional[Domain] = None
) -> StrDict[list[D.T_event]]
Get the next deterministic action (from the solver's current policy).
# Parameters
- observation: The observation for which next action is requested.
- domain: the domain source of the observation. Typically used to get current applicable actions or action mask. NB: Be careful that the domain has not been autocast, so may not respect the T_domain specs.
# Returns
The next deterministic action.
# _get_next_action_distribution UncertainPolicies
_get_next_action_distribution(
self,
observation: StrDict[D.T_observation],
domain: Optional[Domain] = None
) -> Distribution[StrDict[list[D.T_event]]]
Get the probabilistic distribution of next action for the given observation (from the solver's current policy).
# Parameters
- observation: The observation to consider.
- domain: the domain source of the observation. Typically used to get current applicable actions or action mask. NB: Be careful that the domain has not been autocast, so may not respect the T_domain specs.
# Returns
The probabilistic distribution of next action.
# _get_utility Utilities
_get_utility(
self,
observation: StrDict[D.T_observation]
) -> D.T_value
Get the estimated on-policy utility of the given observation.
In mathematical terms, for a fully observable domain, this function estimates:
# Parameters
- observation: The observation to consider.
# Returns
The estimated on-policy utility of the given observation.
# _initialize Solver
_initialize(
self
)
Launches the parallel domains. This method requires to have previously recorded the self._domain_factory, the set of lambda functions passed to the solver's constructor (e.g. heuristic lambda for heuristic-based solvers), and whether the parallel domain jobs should notify their status via the IPC protocol (required when interacting with other programming languages like C++)
# _is_policy_defined_for Policies
_is_policy_defined_for(
self,
observation: StrDict[D.T_observation]
) -> bool
Check whether the solver's current policy is defined for the given observation.
# Parameters
- observation: The observation to consider.
# Returns
True if the policy is defined for the given observation memory (False otherwise).
# _reset Solver
_reset(
self
) -> None
Reset whatever is needed on this solver before running a new episode.
This function does nothing by default but can be overridden if needed (e.g. to reset the hidden state of a LSTM policy network, which carries information about past observations seen in the previous episode).
# _sample_action Policies
_sample_action(
self,
observation: StrDict[D.T_observation],
domain: Optional[Domain] = None
) -> StrDict[list[D.T_event]]
Sample an action for the given observation (from the solver's current policy).
# Parameters
- observation: The observation for which an action must be sampled.
- domain: the domain source of the observation. Typically used to get current applicable actions or action mask. NB: Be careful that the domain has not been autocast, so may not respect the T_domain specs.
# Returns
The sampled action.
# _solve FromInitialState
_solve(
self,
from_memory: Optional[Memory[D.T_state]] = None
) -> None
Run the solving process.
# Parameters
- from_memory: The source memory (state or history) from which we begin the solving process. If None, initial state is used if the domain is initializable, else a ValueError is raised.
TIP
The nature of the solutions produced here depends on other solver's characteristics like
policy and assessibility.
# _solve_from FromAnyState
_solve_from(
self,
memory: Memory[D.T_state]
) -> None
Run the solving process from a given state.
# Parameters
- memory: The source memory (state or history) of the transition.
TIP
The nature of the solutions produced here depends on other solver's characteristics like
policy and assessibility.