# hub.solver.idual.idual
Domain specification
# IDual
Heuristic search in dual space for SSPs (i-dual algorithm).
Incrementally builds the search space from the initial state, solving growing dual LPs with heuristic terminal costs on fringe states. Focuses on promising regions reachable from s0, unlike full-enumeration LP solvers.
Based on: Trevizan et al., "Heuristic Search in Dual Space for Constrained Stochastic Shortest Path Problems", ICAPS 2016.
# Constructor IDual
IDual(
domain_factory: Callable[[], Domain],
goal_checker: Callable[[Domain, D_SSP.T_state], D_SSP.T_agent[D_SSP.T_predicate]] = <lambda function>,
heuristic: Callable[[Domain, D_SSP.T_state], D_SSP.T_agent[Value[D_SSP.T_value]]] = <lambda function>,
terminal_value: Callable[[D_SSP.T_state], D_SSP.T_agent[Value[D_SSP.T_value]]] = <lambda function>,
lp_infinity: float = 1e+20,
lp_tolerance: float = 1e-15,
default_dead_end_cost: float = 1000.0,
lp_callback_interval: int = 0,
parallel: bool = False,
callback: Callable[[IDual], bool] = <lambda function>,
verbose: bool = False
) -> None
Construct an IDual solver instance.
# Parameters
- domain_factory: Lambda returning a domain instance.
- goal_checker: Function gc(domain, state) -> bool. Identifies goal states where V(g)=0. Defaults to domain.is_goal().
- heuristic: Function h(domain, state) -> Value. Admissible heuristic for primary cost. Defaults to Value(cost=0).
- terminal_value: Function tv(state) -> Value. Penalty for dead-end (non-goal terminal) states. Defaults to 1000.
- lp_infinity: Upper bound for LP variable bounds and constraint lower bounds used with HiGHS. Defaults to 1e20.
- lp_tolerance: Sparsity threshold for LP coefficients. Values below this are treated as zero. Defaults to 1e-15.
- default_dead_end_cost: Default per-constraint dead-end cost when no explicit dead_end_costs vector is provided (only used in the constrained variant). Defaults to 1000.0.
- lp_callback_interval: Fire callback every N simplex iterations during each LP solve, reporting intermediate V(s) and policy. 0 disables LP-level callbacks. Defaults to 0.
- parallel: If True, explore action transitions in parallel.
- callback: Called after each LP iteration. Returns True to stop.
- verbose: Enable verbose logging.
# call_domain_method ParallelSolver
call_domain_method(
self,
name,
*args
)
Calls a parallel domain's method. This is the only way to get a domain method for a parallel domain.
# check_domain Solver
check_domain(
domain: Domain
) -> bool
Check whether a domain is compliant with this solver type.
By default, Solver.check_domain() provides some boilerplate code and internally
calls Solver._check_domain_additional() (which returns True by default but can be overridden to define
specific checks in addition to the "domain requirements"). The boilerplate code automatically checks whether all
domain requirements are met.
# Parameters
- domain: The domain to check.
# Returns
True if the domain is compliant with the solver type (False otherwise).
# close ParallelSolver
close(
self
)
Joins the parallel domains' processes. Not calling this method (or not using the 'with' context statement) results in the solver forever waiting for the domain processes to exit.
# complete_with_default_hyperparameters Hyperparametrizable
complete_with_default_hyperparameters(
kwargs: dict[str, Any],
names: Optional[list[str]] = None
)
Add missing hyperparameters to kwargs by using default values
Args:
kwargs: keyword arguments to complete (e.g. for __init__, init_model, or solve)
names: names of the hyperparameters to add if missing.
By default, all available hyperparameters.
Returns: a new dictionary, completion of kwargs
# copy_and_update_hyperparameters Hyperparametrizable
copy_and_update_hyperparameters(
names: Optional[list[str]] = None,
**kwargs_by_name: dict[str, Any]
) -> list[Hyperparameter]
Copy hyperparameters definition of this class and update them with specified kwargs.
This is useful to define hyperparameters for a child class for which only choices of the hyperparameter change for instance.
Args: names: names of hyperparameters to copy. Default to all. **kwargs_by_name: for each hyperparameter specified by its name, the attributes to update. If a given hyperparameter name is not specified, the hyperparameter is copied without further update.
Returns:
# get_default_hyperparameters Hyperparametrizable
get_default_hyperparameters(
names: Optional[list[str]] = None
) -> dict[str, Any]
Get hyperparameters default values.
Args: names: names of the hyperparameters to choose. By default, all available hyperparameters will be suggested.
Returns: a mapping between hyperparameter's name_in_kwargs and its default value (None if not specified)
# get_domain ParallelSolver
get_domain(
self
)
Returns the domain, optionally creating a parallel domain if not already created.
# get_domain_requirements Solver
get_domain_requirements(
) -> list[type]
Get domain requirements for this solver class to be applicable.
Domain requirements are classes from the skdecide.builders.domain package that the domain needs to inherit from.
# Returns
A list of classes to inherit from.
# get_hyperparameter Hyperparametrizable
get_hyperparameter(
name: str
) -> Hyperparameter
Get hyperparameter from given name.
# get_hyperparameters_by_name Hyperparametrizable
get_hyperparameters_by_name(
) -> dict[str, Hyperparameter]
Mapping from name to corresponding hyperparameter.
# get_hyperparameters_names Hyperparametrizable
get_hyperparameters_names(
) -> list[str]
List of hyperparameters names.
# get_next_action DeterministicPolicies
get_next_action(
self,
observation: StrDict[D.T_observation],
domain: Optional[Domain] = None
) -> StrDict[list[D.T_event]]
Get the next deterministic action (from the solver's current policy).
# Parameters
- observation: The observation for which next action is requested.
- domain: the domain source of the observation. Typically used to get current applicable actions or action mask. NB: Be careful that the domain has not been autocast, so may not respect the T_domain specs.
# Returns
The next deterministic action.
# get_next_action_distribution UncertainPolicies
get_next_action_distribution(
self,
observation: StrDict[D.T_observation],
domain: Optional[Domain] = None
) -> Distribution[StrDict[list[D.T_event]]]
Get the probabilistic distribution of next action for the given observation (from the solver's current policy).
# Parameters
- observation: The observation to consider.
- domain: the domain source of the observation. Typically used to get current applicable actions or action mask.
# Returns
The probabilistic distribution of next action.
# get_utility Utilities
get_utility(
self,
observation: StrDict[D.T_observation]
) -> D.T_value
Get the estimated on-policy utility of the given observation.
In mathematical terms, for a fully observable domain, this function estimates:
# Parameters
- observation: The observation to consider.
# Returns
The estimated on-policy utility of the given observation.
# is_policy_defined_for Policies
is_policy_defined_for(
self,
observation: StrDict[D.T_observation]
) -> bool
Check whether the solver's current policy is defined for the given observation.
# Parameters
- observation: The observation to consider.
# Returns
True if the policy is defined for the given observation memory (False otherwise).
# reset Solver
reset(
self
) -> None
Reset whatever is needed on this solver before running a new episode.
This function does nothing by default but can be overridden if needed (e.g. to reset the hidden state of a LSTM policy network, which carries information about past observations seen in the previous episode).
# sample_action Policies
sample_action(
self,
observation: StrDict[D.T_observation],
domain: Optional[Domain] = None
) -> StrDict[list[D.T_event]]
Sample an action for the given observation (from the solver's current policy).
# Parameters
- observation: The observation for which an action must be sampled.
- domain: the domain source of the observation. Typically used to get current applicable actions or action mask.
# Returns
The sampled action.
# solve FromInitialState
solve(
self,
from_memory: Optional[Memory[D.T_state]] = None
) -> None
Run the solving process.
# Parameters
- from_memory: The source memory (state or history) from which we begin the solving process. If None, initial state is used if the domain is initializable, else a ValueError is raised.
TIP
The nature of the solutions produced here depends on other solver's characteristics like
policy and assessibility.
# solve_from FromAnyState
solve_from(
self,
memory: Memory[D.T_state]
) -> None
Run the solving process from a given state.
# Parameters
- memory: The source memory (state or history) of the transition.
TIP
The nature of the solutions produced here depends on other solver's characteristics like
policy and assessibility.
# suggest_hyperparameter_with_optuna Hyperparametrizable
suggest_hyperparameter_with_optuna(
trial: optuna.trial.Trial,
name: str,
prefix: str,
**kwargs
) -> Any
Suggest hyperparameter value during an Optuna trial.
This can be used during Optuna hyperparameters tuning.
Args: trial: optuna trial during hyperparameters tuning name: name of the hyperparameter to choose prefix: prefix to add to optuna corresponding parameter name (useful for disambiguating hyperparameters from subsolvers in case of meta-solvers) **kwargs: options for optuna hyperparameter suggestions
Returns:
kwargs can be used to pass relevant arguments to
- trial.suggest_float()
- trial.suggest_int()
- trial.suggest_categorical()
For instance it can
- add a low/high value if not existing for the hyperparameter or override it to narrow the search. (for float or int hyperparameters)
- add a step or log argument (for float or int hyperparameters, see optuna.trial.Trial.suggest_float())
- override choices for categorical or enum parameters to narrow the search
# suggest_hyperparameters_with_optuna Hyperparametrizable
suggest_hyperparameters_with_optuna(
trial: optuna.trial.Trial,
names: Optional[list[str]] = None,
kwargs_by_name: Optional[dict[str, dict[str, Any]]] = None,
fixed_hyperparameters: Optional[dict[str, Any]] = None,
prefix: str
) -> dict[str, Any]
Suggest hyperparameters values during an Optuna trial.
Args:
trial: optuna trial during hyperparameters tuning
names: names of the hyperparameters to choose.
By default, all available hyperparameters will be suggested.
If fixed_hyperparameters is provided, the corresponding names are removed from names.
kwargs_by_name: options for optuna hyperparameter suggestions, by hyperparameter name
fixed_hyperparameters: values of fixed hyperparameters, useful for suggesting subbrick hyperparameters,
if the subbrick class is not suggested by this method, but already fixed.
Will be added to the suggested hyperparameters.
prefix: prefix to add to optuna corresponding parameters
(useful for disambiguating hyperparameters from subsolvers in case of meta-solvers)
Returns:
mapping between the hyperparameter name and its suggested value.
If the hyperparameter has an attribute name_in_kwargs, this is used as the key in the mapping
instead of the actual hyperparameter name.
the mapping is updated with fixed_hyperparameters.
kwargs_by_name[some_name] will be passed as **kwargs to suggest_hyperparameter_with_optuna(name=some_name)
# _check_domain_additional Solver
_check_domain_additional(
domain: Domain
) -> bool
Check whether the given domain is compliant with the specific requirements of this solver type (i.e. the ones in addition to "domain requirements").
This is a helper function called by default from Solver.check_domain(). It focuses on specific checks, as
opposed to taking also into account the domain requirements for the latter.
# Parameters
- domain: The domain to check.
# Returns
True if the domain is compliant with the specific requirements of this solver type (False otherwise).
# _get_next_action DeterministicPolicies
_get_next_action(
self,
observation: D_SSP.T_agent[D_SSP.T_observation],
domain: Optional[Domain] = None
) -> D_SSP.T_agent[D_SSP.T_concurrency[D_SSP.T_event]]
Get the next deterministic action (from the solver's current policy).
# Parameters
- observation: The observation for which next action is requested.
- domain: the domain source of the observation. Typically used to get current applicable actions or action mask. NB: Be careful that the domain has not been autocast, so may not respect the T_domain specs.
# Returns
The next deterministic action.
# _get_next_action_distribution UncertainPolicies
_get_next_action_distribution(
self,
observation: StrDict[D.T_observation],
domain: Optional[Domain] = None
) -> Distribution[StrDict[list[D.T_event]]]
Get the probabilistic distribution of next action for the given observation (from the solver's current policy).
# Parameters
- observation: The observation to consider.
- domain: the domain source of the observation. Typically used to get current applicable actions or action mask. NB: Be careful that the domain has not been autocast, so may not respect the T_domain specs.
# Returns
The probabilistic distribution of next action.
# _get_utility Utilities
_get_utility(
self,
observation: D_SSP.T_agent[D_SSP.T_observation]
) -> D_SSP.T_value
Get the estimated on-policy utility of the given observation.
In mathematical terms, for a fully observable domain, this function estimates:
# Parameters
- observation: The observation to consider.
# Returns
The estimated on-policy utility of the given observation.
# _initialize Solver
_initialize(
self
)
Launches the parallel domains. This method requires to have previously recorded the self._domain_factory, the set of lambda functions passed to the solver's constructor (e.g. heuristic lambda for heuristic-based solvers), and whether the parallel domain jobs should notify their status via the IPC protocol (required when interacting with other programming languages like C++)
# _is_policy_defined_for Policies
_is_policy_defined_for(
self,
observation: StrDict[D.T_observation]
) -> bool
Check whether the solver's current policy is defined for the given observation.
# Parameters
- observation: The observation to consider.
# Returns
True if the policy is defined for the given observation memory (False otherwise).
# _reset Solver
_reset(
self
) -> None
Reset whatever is needed on this solver before running a new episode.
This function does nothing by default but can be overridden if needed (e.g. to reset the hidden state of a LSTM policy network, which carries information about past observations seen in the previous episode).
# _sample_action Policies
_sample_action(
self,
observation: StrDict[D.T_observation],
domain: Optional[Domain] = None
) -> StrDict[list[D.T_event]]
Sample an action for the given observation (from the solver's current policy).
# Parameters
- observation: The observation for which an action must be sampled.
- domain: the domain source of the observation. Typically used to get current applicable actions or action mask. NB: Be careful that the domain has not been autocast, so may not respect the T_domain specs.
# Returns
The sampled action.
# _solve FromInitialState
_solve(
self,
from_memory: Optional[Memory[D.T_state]] = None
) -> None
Run the solving process.
# Parameters
- from_memory: The source memory (state or history) from which we begin the solving process. If None, initial state is used if the domain is initializable, else a ValueError is raised.
TIP
The nature of the solutions produced here depends on other solver's characteristics like
policy and assessibility.
# _solve_from FromAnyState
_solve_from(
self,
memory: D_SSP.T_memory[D_SSP.T_state]
) -> None
Run the solving process from a given state.
# Parameters
- memory: The source memory (state or history) of the transition.
TIP
The nature of the solutions produced here depends on other solver's characteristics like
policy and assessibility.
# CIDual
Heuristic search in dual space for constrained SSPs (i-dual).
Like IDual but for constrained SSPs: the domain provides secondary cost constraints via the Constrained mixin with BoundConstraint objects. The optimal policy may be stochastic.
Based on: Trevizan et al., ICAPS 2016 / IJCAI 2017.
# Constructor CIDual
CIDual(
domain_factory: Callable[[], Domain],
goal_checker: Callable[[Domain, D_CSSP.T_state], D_CSSP.T_agent[D_CSSP.T_predicate]] = <lambda function>,
heuristic: Callable[[Domain, D_CSSP.T_state], D_CSSP.T_agent[Value[D_CSSP.T_value]]] = <lambda function>,
terminal_value: Callable[[D_CSSP.T_state], D_CSSP.T_agent[Value[D_CSSP.T_value]]] = <lambda function>,
secondary_heuristics: Optional[list[Callable[[Domain, D_CSSP.T_state], D_CSSP.T_agent[Value[D_CSSP.T_value]]]]] = None,
dead_end_costs: Optional[list[float]] = None,
lp_infinity: float = 1e+20,
lp_tolerance: float = 1e-15,
default_dead_end_cost: float = 1000.0,
lp_callback_interval: int = 0,
parallel: bool = False,
callback: Callable[[CIDual], bool] = <lambda function>,
verbose: bool = False
) -> None
Construct a CIDual solver instance.
# Parameters
- domain_factory: Lambda returning a domain instance.
- goal_checker: Function gc(domain, state) -> bool.
- heuristic: Function h(domain, state) -> Value. Primary cost heuristic.
- terminal_value: Function tv(state) -> Value. Dead-end penalty.
- secondary_heuristics: List of h_j(domain, state) -> Value, one per constraint. Admissible heuristics for secondary costs. Defaults to h=0 (always admissible).
- dead_end_costs: List of d_j per constraint for dead-end penalty in secondary cost constraints. Defaults to default_dead_end_cost.
- lp_infinity: Upper bound for LP variable bounds and constraint lower bounds used with HiGHS. Defaults to 1e20.
- lp_tolerance: Sparsity threshold for LP coefficients. Values below this are treated as zero. Defaults to 1e-15.
- default_dead_end_cost: Default per-constraint dead-end cost when no explicit dead_end_costs is provided. Defaults to 1000.0.
- lp_callback_interval: Fire callback every N simplex iterations during each LP solve, reporting intermediate V(s) and policy. 0 disables LP-level callbacks. Defaults to 0.
- parallel: If True, explore action transitions in parallel.
- callback: Called after each LP iteration. Returns True to stop.
- verbose: Enable verbose logging.
# call_domain_method ParallelSolver
call_domain_method(
self,
name,
*args
)
Calls a parallel domain's method. This is the only way to get a domain method for a parallel domain.
# check_domain Solver
check_domain(
domain: Domain
) -> bool
Check whether a domain is compliant with this solver type.
By default, Solver.check_domain() provides some boilerplate code and internally
calls Solver._check_domain_additional() (which returns True by default but can be overridden to define
specific checks in addition to the "domain requirements"). The boilerplate code automatically checks whether all
domain requirements are met.
# Parameters
- domain: The domain to check.
# Returns
True if the domain is compliant with the solver type (False otherwise).
# close ParallelSolver
close(
self
)
Joins the parallel domains' processes. Not calling this method (or not using the 'with' context statement) results in the solver forever waiting for the domain processes to exit.
# complete_with_default_hyperparameters Hyperparametrizable
complete_with_default_hyperparameters(
kwargs: dict[str, Any],
names: Optional[list[str]] = None
)
Add missing hyperparameters to kwargs by using default values
Args:
kwargs: keyword arguments to complete (e.g. for __init__, init_model, or solve)
names: names of the hyperparameters to add if missing.
By default, all available hyperparameters.
Returns: a new dictionary, completion of kwargs
# copy_and_update_hyperparameters Hyperparametrizable
copy_and_update_hyperparameters(
names: Optional[list[str]] = None,
**kwargs_by_name: dict[str, Any]
) -> list[Hyperparameter]
Copy hyperparameters definition of this class and update them with specified kwargs.
This is useful to define hyperparameters for a child class for which only choices of the hyperparameter change for instance.
Args: names: names of hyperparameters to copy. Default to all. **kwargs_by_name: for each hyperparameter specified by its name, the attributes to update. If a given hyperparameter name is not specified, the hyperparameter is copied without further update.
Returns:
# get_default_hyperparameters Hyperparametrizable
get_default_hyperparameters(
names: Optional[list[str]] = None
) -> dict[str, Any]
Get hyperparameters default values.
Args: names: names of the hyperparameters to choose. By default, all available hyperparameters will be suggested.
Returns: a mapping between hyperparameter's name_in_kwargs and its default value (None if not specified)
# get_domain ParallelSolver
get_domain(
self
)
Returns the domain, optionally creating a parallel domain if not already created.
# get_domain_requirements Solver
get_domain_requirements(
) -> list[type]
Get domain requirements for this solver class to be applicable.
Domain requirements are classes from the skdecide.builders.domain package that the domain needs to inherit from.
# Returns
A list of classes to inherit from.
# get_hyperparameter Hyperparametrizable
get_hyperparameter(
name: str
) -> Hyperparameter
Get hyperparameter from given name.
# get_hyperparameters_by_name Hyperparametrizable
get_hyperparameters_by_name(
) -> dict[str, Hyperparameter]
Mapping from name to corresponding hyperparameter.
# get_hyperparameters_names Hyperparametrizable
get_hyperparameters_names(
) -> list[str]
List of hyperparameters names.
# get_next_action_distribution UncertainPolicies
get_next_action_distribution(
self,
observation: StrDict[D.T_observation],
domain: Optional[Domain] = None
) -> Distribution[StrDict[list[D.T_event]]]
Get the probabilistic distribution of next action for the given observation (from the solver's current policy).
# Parameters
- observation: The observation to consider.
- domain: the domain source of the observation. Typically used to get current applicable actions or action mask.
# Returns
The probabilistic distribution of next action.
# get_utility Utilities
get_utility(
self,
observation: StrDict[D.T_observation]
) -> D.T_value
Get the estimated on-policy utility of the given observation.
In mathematical terms, for a fully observable domain, this function estimates:
# Parameters
- observation: The observation to consider.
# Returns
The estimated on-policy utility of the given observation.
# is_policy_defined_for Policies
is_policy_defined_for(
self,
observation: StrDict[D.T_observation]
) -> bool
Check whether the solver's current policy is defined for the given observation.
# Parameters
- observation: The observation to consider.
# Returns
True if the policy is defined for the given observation memory (False otherwise).
# reset Solver
reset(
self
) -> None
Reset whatever is needed on this solver before running a new episode.
This function does nothing by default but can be overridden if needed (e.g. to reset the hidden state of a LSTM policy network, which carries information about past observations seen in the previous episode).
# sample_action Policies
sample_action(
self,
observation: StrDict[D.T_observation],
domain: Optional[Domain] = None
) -> StrDict[list[D.T_event]]
Sample an action for the given observation (from the solver's current policy).
# Parameters
- observation: The observation for which an action must be sampled.
- domain: the domain source of the observation. Typically used to get current applicable actions or action mask.
# Returns
The sampled action.
# solve FromInitialState
solve(
self,
from_memory: Optional[Memory[D.T_state]] = None
) -> None
Run the solving process.
# Parameters
- from_memory: The source memory (state or history) from which we begin the solving process. If None, initial state is used if the domain is initializable, else a ValueError is raised.
TIP
The nature of the solutions produced here depends on other solver's characteristics like
policy and assessibility.
# solve_from FromAnyState
solve_from(
self,
memory: Memory[D.T_state]
) -> None
Run the solving process from a given state.
# Parameters
- memory: The source memory (state or history) of the transition.
TIP
The nature of the solutions produced here depends on other solver's characteristics like
policy and assessibility.
# suggest_hyperparameter_with_optuna Hyperparametrizable
suggest_hyperparameter_with_optuna(
trial: optuna.trial.Trial,
name: str,
prefix: str,
**kwargs
) -> Any
Suggest hyperparameter value during an Optuna trial.
This can be used during Optuna hyperparameters tuning.
Args: trial: optuna trial during hyperparameters tuning name: name of the hyperparameter to choose prefix: prefix to add to optuna corresponding parameter name (useful for disambiguating hyperparameters from subsolvers in case of meta-solvers) **kwargs: options for optuna hyperparameter suggestions
Returns:
kwargs can be used to pass relevant arguments to
- trial.suggest_float()
- trial.suggest_int()
- trial.suggest_categorical()
For instance it can
- add a low/high value if not existing for the hyperparameter or override it to narrow the search. (for float or int hyperparameters)
- add a step or log argument (for float or int hyperparameters, see optuna.trial.Trial.suggest_float())
- override choices for categorical or enum parameters to narrow the search
# suggest_hyperparameters_with_optuna Hyperparametrizable
suggest_hyperparameters_with_optuna(
trial: optuna.trial.Trial,
names: Optional[list[str]] = None,
kwargs_by_name: Optional[dict[str, dict[str, Any]]] = None,
fixed_hyperparameters: Optional[dict[str, Any]] = None,
prefix: str
) -> dict[str, Any]
Suggest hyperparameters values during an Optuna trial.
Args:
trial: optuna trial during hyperparameters tuning
names: names of the hyperparameters to choose.
By default, all available hyperparameters will be suggested.
If fixed_hyperparameters is provided, the corresponding names are removed from names.
kwargs_by_name: options for optuna hyperparameter suggestions, by hyperparameter name
fixed_hyperparameters: values of fixed hyperparameters, useful for suggesting subbrick hyperparameters,
if the subbrick class is not suggested by this method, but already fixed.
Will be added to the suggested hyperparameters.
prefix: prefix to add to optuna corresponding parameters
(useful for disambiguating hyperparameters from subsolvers in case of meta-solvers)
Returns:
mapping between the hyperparameter name and its suggested value.
If the hyperparameter has an attribute name_in_kwargs, this is used as the key in the mapping
instead of the actual hyperparameter name.
the mapping is updated with fixed_hyperparameters.
kwargs_by_name[some_name] will be passed as **kwargs to suggest_hyperparameter_with_optuna(name=some_name)
# _check_domain_additional Solver
_check_domain_additional(
domain: Domain
) -> bool
Check whether the given domain is compliant with the specific requirements of this solver type (i.e. the ones in addition to "domain requirements").
This is a helper function called by default from Solver.check_domain(). It focuses on specific checks, as
opposed to taking also into account the domain requirements for the latter.
# Parameters
- domain: The domain to check.
# Returns
True if the domain is compliant with the specific requirements of this solver type (False otherwise).
# _get_next_action_distribution UncertainPolicies
_get_next_action_distribution(
self,
observation: D_CSSP.T_agent[D_CSSP.T_observation],
domain: Optional[Domain] = None
)
Get the probabilistic distribution of next action for the given observation (from the solver's current policy).
# Parameters
- observation: The observation to consider.
- domain: the domain source of the observation. Typically used to get current applicable actions or action mask. NB: Be careful that the domain has not been autocast, so may not respect the T_domain specs.
# Returns
The probabilistic distribution of next action.
# _get_utility Utilities
_get_utility(
self,
observation: D_CSSP.T_agent[D_CSSP.T_observation]
) -> D_CSSP.T_value
Get the estimated on-policy utility of the given observation.
In mathematical terms, for a fully observable domain, this function estimates:
# Parameters
- observation: The observation to consider.
# Returns
The estimated on-policy utility of the given observation.
# _initialize Solver
_initialize(
self
)
Launches the parallel domains. This method requires to have previously recorded the self._domain_factory, the set of lambda functions passed to the solver's constructor (e.g. heuristic lambda for heuristic-based solvers), and whether the parallel domain jobs should notify their status via the IPC protocol (required when interacting with other programming languages like C++)
# _is_policy_defined_for Policies
_is_policy_defined_for(
self,
observation: StrDict[D.T_observation]
) -> bool
Check whether the solver's current policy is defined for the given observation.
# Parameters
- observation: The observation to consider.
# Returns
True if the policy is defined for the given observation memory (False otherwise).
# _reset Solver
_reset(
self
) -> None
Reset whatever is needed on this solver before running a new episode.
This function does nothing by default but can be overridden if needed (e.g. to reset the hidden state of a LSTM policy network, which carries information about past observations seen in the previous episode).
# _sample_action Policies
_sample_action(
self,
observation: StrDict[D.T_observation],
domain: Optional[Domain] = None
) -> StrDict[list[D.T_event]]
Sample an action for the given observation (from the solver's current policy).
# Parameters
- observation: The observation for which an action must be sampled.
- domain: the domain source of the observation. Typically used to get current applicable actions or action mask. NB: Be careful that the domain has not been autocast, so may not respect the T_domain specs.
# Returns
The sampled action.
# _solve FromInitialState
_solve(
self,
from_memory: Optional[Memory[D.T_state]] = None
) -> None
Run the solving process.
# Parameters
- from_memory: The source memory (state or history) from which we begin the solving process. If None, initial state is used if the domain is initializable, else a ValueError is raised.
TIP
The nature of the solutions produced here depends on other solver's characteristics like
policy and assessibility.
# _solve_from FromAnyState
_solve_from(
self,
memory: D_CSSP.T_memory[D_CSSP.T_state]
) -> None
Run the solving process from a given state.
# Parameters
- memory: The source memory (state or history) of the transition.
TIP
The nature of the solutions produced here depends on other solver's characteristics like
policy and assessibility.