# hub.solver.ssipp.ssipp

Domain specification

Domain

# SSiPP

SSiPP (Short-Sighted Probabilistic Planner) from Trevizan & Veloso (ICAPS 2012).

SSiPP repeatedly builds short-sighted sub-SSPs by BFS to a fixed depth t from the current state, solves each optimally with a configurable inner solver (LRTDP, ILAOstar, or LDFS), and accumulates a global value function across iterations. Boundary states at distance t receive V(s) as goal cost, guiding the search toward the original goals.

SSiPP is asymptotically optimal: V converges to V* over relevant states as the number of iterations grows.

# Constructor SSiPP

SSiPP(
  domain_factory: Callable[[], Domain],
heuristic: Callable[[Domain, D.T_state], StrDict[Value[D.T_value]]] = <lambda function>,
depth: int = 3,
inner_solver_factory: Optional[Callable[[], tuple[str, dict]]] = None,
discount: float = 1.0,
epsilon: float = 0.001,
max_iterations: int = 10000,
parallel: bool = False,
shared_memory_proxy = None,
callback: Callable[[SSiPP], bool] = <lambda function>,
verbose: bool = False
) -> None

Construct an SSiPP solver instance.

# Parameters

  • domain_factory: Lambda returning a domain instance.
  • heuristic: Function h(domain, state) -> Value returning the heuristic cost estimate. Defaults to Value(cost=0).
  • depth: Short-sighted depth t. Larger values explore more states per sub-SSP but take longer. Defaults to 3.
  • inner_solver_factory: Callable returning a (name, params) tuple specifying the inner solver and its parameters. Available inner solvers: "LRTDP", "ILAOstar", "LDFS", "VI". Defaults to lambda: ("LRTDP", {}).
  • discount: Value function's discount factor. Defaults to 1.0.
  • epsilon: Bellman residual threshold for convergence. Defaults to 0.001.
  • max_iterations: Maximum number of sub-SSP solve iterations per call to solve(). Defaults to 10000.
  • parallel: Parallelize the inner solver. Defaults to False.
  • shared_memory_proxy: Optional shared memory proxy.
  • callback: Called after each sub-SSP solve. Returns True to stop.
  • verbose: Enable verbose logging. Defaults to False.

# call_domain_method ParallelSolver

call_domain_method(
  self,
name,
*args
)

Calls a parallel domain's method. This is the only way to get a domain method for a parallel domain.

# check_domain Solver

check_domain(
  domain: Domain
) -> bool

Check whether a domain is compliant with this solver type.

By default, Solver.check_domain() provides some boilerplate code and internally calls Solver._check_domain_additional() (which returns True by default but can be overridden to define specific checks in addition to the "domain requirements"). The boilerplate code automatically checks whether all domain requirements are met.

# Parameters

  • domain: The domain to check.

# Returns

True if the domain is compliant with the solver type (False otherwise).

# close ParallelSolver

close(
  self
)

Joins the parallel domains' processes. Not calling this method (or not using the 'with' context statement) results in the solver forever waiting for the domain processes to exit.

# complete_with_default_hyperparameters Hyperparametrizable

complete_with_default_hyperparameters(
  kwargs: dict[str, Any],
names: Optional[list[str]] = None
)

Add missing hyperparameters to kwargs by using default values

Args: kwargs: keyword arguments to complete (e.g. for __init__, init_model, or solve) names: names of the hyperparameters to add if missing. By default, all available hyperparameters.

Returns: a new dictionary, completion of kwargs

# copy_and_update_hyperparameters Hyperparametrizable

copy_and_update_hyperparameters(
  names: Optional[list[str]] = None,
**kwargs_by_name: dict[str, Any]
) -> list[Hyperparameter]

Copy hyperparameters definition of this class and update them with specified kwargs.

This is useful to define hyperparameters for a child class for which only choices of the hyperparameter change for instance.

Args: names: names of hyperparameters to copy. Default to all. **kwargs_by_name: for each hyperparameter specified by its name, the attributes to update. If a given hyperparameter name is not specified, the hyperparameter is copied without further update.

Returns:

# get_default_hyperparameters Hyperparametrizable

get_default_hyperparameters(
  names: Optional[list[str]] = None
) -> dict[str, Any]

Get hyperparameters default values.

Args: names: names of the hyperparameters to choose. By default, all available hyperparameters will be suggested.

Returns: a mapping between hyperparameter's name_in_kwargs and its default value (None if not specified)

# get_domain ParallelSolver

get_domain(
  self
)

Returns the domain, optionally creating a parallel domain if not already created.

# get_domain_requirements Solver

get_domain_requirements(
) -> list[type]

Get domain requirements for this solver class to be applicable.

Domain requirements are classes from the skdecide.builders.domain package that the domain needs to inherit from.

# Returns

A list of classes to inherit from.

# get_hyperparameter Hyperparametrizable

get_hyperparameter(
  name: str
) -> Hyperparameter

Get hyperparameter from given name.

# get_hyperparameters_by_name Hyperparametrizable

get_hyperparameters_by_name(
) -> dict[str, Hyperparameter]

Mapping from name to corresponding hyperparameter.

# get_hyperparameters_names Hyperparametrizable

get_hyperparameters_names(
) -> list[str]

List of hyperparameters names.

# get_next_action DeterministicPolicies

get_next_action(
  self,
observation: StrDict[D.T_observation],
domain: Optional[Domain] = None
) -> StrDict[list[D.T_event]]

Get the next deterministic action (from the solver's current policy).

# Parameters

  • observation: The observation for which next action is requested.
  • domain: the domain source of the observation. Typically used to get current applicable actions or action mask. NB: Be careful that the domain has not been autocast, so may not respect the T_domain specs.

# Returns

The next deterministic action.

# get_next_action_distribution UncertainPolicies

get_next_action_distribution(
  self,
observation: StrDict[D.T_observation],
domain: Optional[Domain] = None
) -> Distribution[StrDict[list[D.T_event]]]

Get the probabilistic distribution of next action for the given observation (from the solver's current policy).

# Parameters

  • observation: The observation to consider.
  • domain: the domain source of the observation. Typically used to get current applicable actions or action mask.

# Returns

The probabilistic distribution of next action.

# get_utility Utilities

get_utility(
  self,
observation: StrDict[D.T_observation]
) -> D.T_value

Get the estimated on-policy utility of the given observation.

In mathematical terms, for a fully observable domain, this function estimates:

where is the current policy, any represents a trajectory sampled from the policy, is the return (cumulative reward) and the initial state for the trajectories.

# Parameters

  • observation: The observation to consider.

# Returns

The estimated on-policy utility of the given observation.

# is_policy_defined_for Policies

is_policy_defined_for(
  self,
observation: StrDict[D.T_observation]
) -> bool

Check whether the solver's current policy is defined for the given observation.

# Parameters

  • observation: The observation to consider.

# Returns

True if the policy is defined for the given observation memory (False otherwise).

# reset Solver

reset(
  self
) -> None

Reset whatever is needed on this solver before running a new episode.

This function does nothing by default but can be overridden if needed (e.g. to reset the hidden state of a LSTM policy network, which carries information about past observations seen in the previous episode).

# sample_action Policies

sample_action(
  self,
observation: StrDict[D.T_observation],
domain: Optional[Domain] = None
) -> StrDict[list[D.T_event]]

Sample an action for the given observation (from the solver's current policy).

# Parameters

  • observation: The observation for which an action must be sampled.
  • domain: the domain source of the observation. Typically used to get current applicable actions or action mask.

# Returns

The sampled action.

# solve FromInitialState

solve(
  self,
from_memory: Optional[Memory[D.T_state]] = None
) -> None

Run the solving process.

# Parameters

  • from_memory: The source memory (state or history) from which we begin the solving process. If None, initial state is used if the domain is initializable, else a ValueError is raised.

TIP

The nature of the solutions produced here depends on other solver's characteristics like policy and assessibility.

# solve_from FromAnyState

solve_from(
  self,
memory: Memory[D.T_state]
) -> None

Run the solving process from a given state.

# Parameters

  • memory: The source memory (state or history) of the transition.

TIP

The nature of the solutions produced here depends on other solver's characteristics like policy and assessibility.

# suggest_hyperparameter_with_optuna Hyperparametrizable

suggest_hyperparameter_with_optuna(
  trial: optuna.trial.Trial,
name: str,
prefix: str,
**kwargs
) -> Any

Suggest hyperparameter value during an Optuna trial.

This can be used during Optuna hyperparameters tuning.

Args: trial: optuna trial during hyperparameters tuning name: name of the hyperparameter to choose prefix: prefix to add to optuna corresponding parameter name (useful for disambiguating hyperparameters from subsolvers in case of meta-solvers) **kwargs: options for optuna hyperparameter suggestions

Returns:

kwargs can be used to pass relevant arguments to

  • trial.suggest_float()
  • trial.suggest_int()
  • trial.suggest_categorical()

For instance it can

  • add a low/high value if not existing for the hyperparameter or override it to narrow the search. (for float or int hyperparameters)
  • add a step or log argument (for float or int hyperparameters, see optuna.trial.Trial.suggest_float())
  • override choices for categorical or enum parameters to narrow the search

# suggest_hyperparameters_with_optuna Hyperparametrizable

suggest_hyperparameters_with_optuna(
  trial: optuna.trial.Trial,
names: Optional[list[str]] = None,
kwargs_by_name: Optional[dict[str, dict[str, Any]]] = None,
fixed_hyperparameters: Optional[dict[str, Any]] = None,
prefix: str
) -> dict[str, Any]

Suggest hyperparameters values during an Optuna trial.

Args: trial: optuna trial during hyperparameters tuning names: names of the hyperparameters to choose. By default, all available hyperparameters will be suggested. If fixed_hyperparameters is provided, the corresponding names are removed from names. kwargs_by_name: options for optuna hyperparameter suggestions, by hyperparameter name fixed_hyperparameters: values of fixed hyperparameters, useful for suggesting subbrick hyperparameters, if the subbrick class is not suggested by this method, but already fixed. Will be added to the suggested hyperparameters. prefix: prefix to add to optuna corresponding parameters (useful for disambiguating hyperparameters from subsolvers in case of meta-solvers)

Returns: mapping between the hyperparameter name and its suggested value. If the hyperparameter has an attribute name_in_kwargs, this is used as the key in the mapping instead of the actual hyperparameter name. the mapping is updated with fixed_hyperparameters.

kwargs_by_name[some_name] will be passed as **kwargs to suggest_hyperparameter_with_optuna(name=some_name)

# _check_domain_additional Solver

_check_domain_additional(
  domain: Domain
) -> bool

Check whether the given domain is compliant with the specific requirements of this solver type (i.e. the ones in addition to "domain requirements").

This is a helper function called by default from Solver.check_domain(). It focuses on specific checks, as opposed to taking also into account the domain requirements for the latter.

# Parameters

  • domain: The domain to check.

# Returns

True if the domain is compliant with the specific requirements of this solver type (False otherwise).

# _get_next_action DeterministicPolicies

_get_next_action(
  self,
observation: StrDict[D.T_observation],
domain: Optional[Domain] = None
) -> StrDict[list[D.T_event]]

Get the next deterministic action (from the solver's current policy).

# Parameters

  • observation: The observation for which next action is requested.
  • domain: the domain source of the observation. Typically used to get current applicable actions or action mask. NB: Be careful that the domain has not been autocast, so may not respect the T_domain specs.

# Returns

The next deterministic action.

# _get_next_action_distribution UncertainPolicies

_get_next_action_distribution(
  self,
observation: StrDict[D.T_observation],
domain: Optional[Domain] = None
) -> Distribution[StrDict[list[D.T_event]]]

Get the probabilistic distribution of next action for the given observation (from the solver's current policy).

# Parameters

  • observation: The observation to consider.
  • domain: the domain source of the observation. Typically used to get current applicable actions or action mask. NB: Be careful that the domain has not been autocast, so may not respect the T_domain specs.

# Returns

The probabilistic distribution of next action.

# _get_utility Utilities

_get_utility(
  self,
observation: StrDict[D.T_observation]
) -> D.T_value

Get the estimated on-policy utility of the given observation.

In mathematical terms, for a fully observable domain, this function estimates:

where is the current policy, any represents a trajectory sampled from the policy, is the return (cumulative reward) and the initial state for the trajectories.

# Parameters

  • observation: The observation to consider.

# Returns

The estimated on-policy utility of the given observation.

# _initialize Solver

_initialize(
  self
)

Launches the parallel domains. This method requires to have previously recorded the self._domain_factory, the set of lambda functions passed to the solver's constructor (e.g. heuristic lambda for heuristic-based solvers), and whether the parallel domain jobs should notify their status via the IPC protocol (required when interacting with other programming languages like C++)

# _is_policy_defined_for Policies

_is_policy_defined_for(
  self,
observation: StrDict[D.T_observation]
) -> bool

Check whether the solver's current policy is defined for the given observation.

# Parameters

  • observation: The observation to consider.

# Returns

True if the policy is defined for the given observation memory (False otherwise).

# _reset Solver

_reset(
  self
) -> None

Reset whatever is needed on this solver before running a new episode.

This function does nothing by default but can be overridden if needed (e.g. to reset the hidden state of a LSTM policy network, which carries information about past observations seen in the previous episode).

# _sample_action Policies

_sample_action(
  self,
observation: StrDict[D.T_observation],
domain: Optional[Domain] = None
) -> StrDict[list[D.T_event]]

Sample an action for the given observation (from the solver's current policy).

# Parameters

  • observation: The observation for which an action must be sampled.
  • domain: the domain source of the observation. Typically used to get current applicable actions or action mask. NB: Be careful that the domain has not been autocast, so may not respect the T_domain specs.

# Returns

The sampled action.

# _solve FromInitialState

_solve(
  self,
from_memory: Optional[Memory[D.T_state]] = None
) -> None

Run the solving process.

# Parameters

  • from_memory: The source memory (state or history) from which we begin the solving process. If None, initial state is used if the domain is initializable, else a ValueError is raised.

TIP

The nature of the solutions produced here depends on other solver's characteristics like policy and assessibility.

# _solve_from FromAnyState

_solve_from(
  self,
memory: Memory[D.T_state]
) -> None

Run the solving process from a given state.

# Parameters

  • memory: The source memory (state or history) of the transition.

TIP

The nature of the solutions produced here depends on other solver's characteristics like policy and assessibility.