# hub.domain.graph_domain.GraphDomain

Domain specification


# GraphDomainUncertain

This domain is for uncertain goal MDP where the full transitions, probabilities and cost are already computed. In this case, using the dictionary structures will improve the computing performance of the domain and therefore its solving time

# Constructor GraphDomainUncertain

  next_state_map: dict[D.T_state, dict[D.T_event, dict[D.T_state, tuple[float, float]]]],
state_terminal: dict[D.T_state, bool],
state_goal: dict[D.T_state, bool]

# Parameters

  • next_state_map : a dictionary whose keys are state and values are dictionary with actions as keys and as value another dict with next state as keys and with (proba, cost) as value. This format could be changed in the future.
  • state_terminal: a dictionary indicating for each state if it's terminal or not
  • state_goal: a dictionary indicating for each state if it's a goal or not

# check_value Rewards

value: Value[D.T_value]
) -> bool

Check that a value is compliant with its reward specification.


This function returns always True by default because any kind of reward should be accepted at this level.

# Parameters

  • value: The value to check.

# Returns

True if the value is compliant (False otherwise).

# get_action_space Events

) -> StrDict[Space[D.T_event]]

Get the (cached) domain action space (finite or infinite set).

By default, Events.get_action_space() internally calls Events._get_action_space_() the first time and automatically caches its value to make future calls more efficient (since the action space is assumed to be constant).

# Returns

The action space.

# get_agents MultiAgent

) -> set[str]

Return the set of available agents ids.

# get_applicable_actions Events

memory: Optional[Memory[D.T_state]] = None
) -> StrDict[Space[D.T_event]]

Get the space (finite or infinite set) of applicable actions in the given memory (state or history), or in the internal one if omitted.

By default, Events.get_applicable_actions() provides some boilerplate code and internally calls Events._get_applicable_actions(). The boilerplate code automatically passes the _memory attribute instead of the memory parameter whenever the latter is None.

# Parameters

  • memory: The memory to consider (if None, the internal memory attribute _memory is used instead).

# Returns

The space of applicable actions.

# get_enabled_events Events

memory: Optional[Memory[D.T_state]] = None
) -> Space[D.T_event]

Get the space (finite or infinite set) of enabled uncontrollable events in the given memory (state or history), or in the internal one if omitted.

By default, Events.get_enabled_events() provides some boilerplate code and internally calls Events._get_enabled_events(). The boilerplate code automatically passes the _memory attribute instead of the memory parameter whenever the latter is None.

# Parameters

  • memory: The memory to consider (if None, the internal memory attribute _memory is used instead).

# Returns

The space of enabled events.

# get_goals Goals

) -> StrDict[Space[D.T_observation]]

Get the (cached) domain goals space (finite or infinite set).

By default, Goals.get_goals() internally calls Goals._get_goals_() the first time and automatically caches its value to make future calls more efficient (since the goals space is assumed to be constant).


Goal states are assumed to be fully observable (i.e. observation = state) so that there is never uncertainty about whether the goal has been reached or not. This assumption guarantees that any policy that does not reach the goal with certainty incurs in infinite expected cost. - Geffner, 2013: A Concise Introduction to Models and Methods for Automated Planning

# Returns

The goals space.

# get_next_state_distribution UncertainTransitions

memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> Distribution[D.T_state]

Get the probability distribution of next state given a memory and action.

# Parameters

  • memory: The source memory (state or history) of the transition.
  • action: The action taken in the given memory (state or history) triggering the transition.

# Returns

The probability distribution of next state.

# get_observation TransformedObservable

state: D.T_state,
action: Optional[StrDict[list[D.T_event]]] = None
) -> StrDict[D.T_observation]

Get the deterministic observation given a state and action.

# Parameters

  • state: The state to be observed.
  • action: The last applied action (or None if the state is an initial state).

# Returns

The probability distribution of the observation.

# get_observation_distribution PartiallyObservable

state: D.T_state,
action: Optional[StrDict[list[D.T_event]]] = None
) -> Distribution[StrDict[D.T_observation]]

Get the probability distribution of the observation given a state and action.

In mathematical terms (discrete case), given an action , this function represents: , where is the random variable of the observation.

# Parameters

  • state: The state to be observed.
  • action: The last applied action (or None if the state is an initial state).

# Returns

The probability distribution of the observation.

# get_observation_space PartiallyObservable

) -> StrDict[Space[D.T_observation]]

Get the (cached) observation space (finite or infinite set).

By default, PartiallyObservable.get_observation_space() internally calls PartiallyObservable._get_observation_space_() the first time and automatically caches its value to make future calls more efficient (since the observation space is assumed to be constant).

# Returns

The observation space.

# get_transition_value UncertainTransitions

memory: Memory[D.T_state],
action: StrDict[list[D.T_event]],
next_state: Optional[D.T_state] = None
) -> StrDict[Value[D.T_value]]

Get the value (reward or cost) of a transition.

The transition to consider is defined by the function parameters.


If this function never depends on the next_state parameter for its computation, it is recommended to indicate it by overriding UncertainTransitions._is_transition_value_dependent_on_next_state_() to return False. This information can then be exploited by solvers to avoid computing next state to evaluate a transition value (more efficient).

# Parameters

  • memory: The source memory (state or history) of the transition.
  • action: The action taken in the given memory (state or history) triggering the transition.
  • next_state: The next state in which the transition ends (if needed for the computation).

# Returns

The transition value (reward or cost).

# is_action Events

event: D.T_event
) -> bool

Indicate whether an event is an action (i.e. a controllable event for the agents).


By default, this function is implemented using the skdecide.core.Space.contains() function on the domain action space provided by Events.get_action_space(), but it can be overridden for faster implementations.

# Parameters

  • event: The event to consider.

# Returns

True if the event is an action (False otherwise).

# is_applicable_action Events

action: StrDict[D.T_event],
memory: Optional[Memory[D.T_state]] = None
) -> bool

Indicate whether an action is applicable in the given memory (state or history), or in the internal one if omitted.

By default, Events.is_applicable_action() provides some boilerplate code and internally calls Events._is_applicable_action(). The boilerplate code automatically passes the _memory attribute instead of the memory parameter whenever the latter is None.

# Parameters

  • memory: The memory to consider (if None, the internal memory attribute _memory is used instead).

# Returns

True if the action is applicable (False otherwise).

# is_enabled_event Events

event: D.T_event,
memory: Optional[Memory[D.T_state]] = None
) -> bool

Indicate whether an uncontrollable event is enabled in the given memory (state or history), or in the internal one if omitted.

By default, Events.is_enabled_event() provides some boilerplate code and internally calls Events._is_enabled_event(). The boilerplate code automatically passes the _memory attribute instead of the memory parameter whenever the latter is None.

# Parameters

  • memory: The memory to consider (if None, the internal memory attribute _memory is used instead).

# Returns

True if the event is enabled (False otherwise).

# is_goal Goals

observation: StrDict[D.T_observation]
) -> StrDict[D.T_predicate]

Indicate whether an observation belongs to the goals.


By default, this function is implemented using the skdecide.core.Space.contains() function on the domain goals space provided by Goals.get_goals(), but it can be overridden for faster implementations.

# Parameters

  • observation: The observation to consider.

# Returns

True if the observation is a goal (False otherwise).

# is_observation PartiallyObservable

observation: StrDict[D.T_observation]
) -> bool

Check that an observation indeed belongs to the domain observation space.


By default, this function is implemented using the skdecide.core.Space.contains() function on the domain observation space provided by PartiallyObservable.get_observation_space(), but it can be overridden for faster implementations.

# Parameters

  • observation: The observation to consider.

# Returns

True if the observation belongs to the domain observation space (False otherwise).

# is_terminal UncertainTransitions

state: D.T_state
) -> StrDict[D.T_predicate]

Indicate whether a state is terminal.

A terminal state is a state with no outgoing transition (except to itself with value 0).

# Parameters

  • state: The state to consider.

# Returns

True if the state is terminal (False otherwise).

# is_transition_value_dependent_on_next_state UncertainTransitions

) -> bool

Indicate whether get_transition_value() requires the next_state parameter for its computation (cached).

By default, UncertainTransitions.is_transition_value_dependent_on_next_state() internally calls UncertainTransitions._is_transition_value_dependent_on_next_state_() the first time and automatically caches its value to make future calls more efficient (since the returned value is assumed to be constant).

# Returns

True if the transition value computation depends on next_state (False otherwise).

# sample Simulation

memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]

Sample one transition of the simulator's dynamics.

By default, Simulation.sample() provides some boilerplate code and internally calls Simulation._sample() (which returns a transition outcome). The boilerplate code automatically samples an observation corresponding to the sampled next state.


Whenever an existing simulator needs to be wrapped instead of implemented fully in scikit-decide (e.g. a simulator), it is recommended to overwrite Simulation.sample() to call the external simulator and not use the Simulation._sample() helper function.

# Parameters

  • memory: The source memory (state or history) of the transition.
  • action: The action taken in the given memory (state or history) triggering the transition.

# Returns

The environment outcome of the sampled transition.

# set_memory Simulation

memory: Memory[D.T_state]
) -> None

Set internal memory attribute _memory to given one.

This can be useful to set a specific "starting point" before doing a rollout with successive Environment.step() calls.

# Parameters

  • memory: The memory to set internally.

# Example

# Set simulation_domain memory to my_state (assuming Markovian domain)

# Start a 100-steps rollout from here (applying my_action at every step)
for _ in range(100):

# step Environment

action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]

Run one step of the environment's dynamics.

By default, Environment.step() provides some boilerplate code and internally calls Environment._step() (which returns a transition outcome). The boilerplate code automatically stores next state into the _memory attribute and samples a corresponding observation.


Whenever an existing environment needs to be wrapped instead of implemented fully in scikit-decide (e.g. compiled ATARI games), it is recommended to overwrite Environment.step() to call the external environment and not use the Environment._step() helper function.


Before calling Environment.step() the first time or when the end of an episode is reached, Initializable.reset() must be called to reset the environment's state.

# Parameters

  • action: The action taken in the current memory (state or history) triggering the transition.

# Returns

The environment outcome of this step.

# _check_value Rewards

value: Value[D.T_value]
) -> bool

Check that a value is compliant with its cost specification (must be positive).


This function calls PositiveCost._is_positive() to determine if a value is positive (can be overridden for advanced value types).

# Parameters

  • value: The value to check.

# Returns

True if the value is compliant (False otherwise).

# _get_action_space Events

) -> StrDict[Space[D.T_event]]

Get the (cached) domain action space (finite or infinite set).

By default, Events._get_action_space() internally calls Events._get_action_space_() the first time and automatically caches its value to make future calls more efficient (since the action space is assumed to be constant).

# Returns

The action space.

# _get_action_space_ Events

) -> Space[D.T_event]

Get the domain action space (finite or infinite set).

This is a helper function called by default from Events._get_action_space(), the difference being that the result is not cached here.


The underscore at the end of this function's name is a convention to remind that its result should be constant.

# Returns

The action space.

# _get_applicable_actions Events

memory: Optional[Memory[D.T_state]] = None
) -> StrDict[Space[D.T_event]]

Get the space (finite or infinite set) of applicable actions in the given memory (state or history), or in the internal one if omitted.

By default, Events._get_applicable_actions() provides some boilerplate code and internally calls Events._get_applicable_actions_from(). The boilerplate code automatically passes the _memory attribute instead of the memory parameter whenever the latter is None.

# Parameters

  • memory: The memory to consider (if None, the internal memory attribute _memory is used instead).

# Returns

The space of applicable actions.

# _get_applicable_actions_from Events

memory: Memory[D.T_state]
) -> Space[D.T_event]

Get the space (finite or infinite set) of applicable actions in the given memory (state or history).

This is a helper function called by default from Events._get_applicable_actions(), the difference being that the memory parameter is mandatory here.

# Parameters

  • memory: The memory to consider.

# Returns

The space of applicable actions.

# _get_enabled_events Events

memory: Optional[Memory[D.T_state]] = None
) -> Space[D.T_event]

Get the space (finite or infinite set) of enabled uncontrollable events in the given memory (state or history), or in the internal one if omitted.

By default, Events._get_enabled_events() provides some boilerplate code and internally calls Events._get_enabled_events_from(). The boilerplate code automatically passes the _memory attribute instead of the memory parameter whenever the latter is None.

# Parameters

  • memory: The memory to consider (if None, the internal memory attribute _memory is used instead).

# Returns

The space of enabled events.

# _get_enabled_events_from Events

memory: Memory[D.T_state]
) -> Space[D.T_event]

Get the space (finite or infinite set) of enabled uncontrollable events in the given memory (state or history).

This is a helper function called by default from Events._get_enabled_events(), the difference being that the memory parameter is mandatory here.

# Parameters

  • memory: The memory to consider.

# Returns

The space of enabled events.

# _get_goals Goals

) -> StrDict[Space[D.T_observation]]

Get the (cached) domain goals space (finite or infinite set).

By default, Goals._get_goals() internally calls Goals._get_goals_() the first time and automatically caches its value to make future calls more efficient (since the goals space is assumed to be constant).


Goal states are assumed to be fully observable (i.e. observation = state) so that there is never uncertainty about whether the goal has been reached or not. This assumption guarantees that any policy that does not reach the goal with certainty incurs in infinite expected cost. - Geffner, 2013: A Concise Introduction to Models and Methods for Automated Planning

# Returns

The goals space.

# _get_goals_ Goals

) -> Space[D.T_observation]

Get the domain goals space (finite or infinite set).

This is a helper function called by default from Goals._get_goals(), the difference being that the result is not cached here.


The underscore at the end of this function's name is a convention to remind that its result should be constant.

# Returns

The goals space.

# _get_memory_maxlen History

) -> int

Get the (cached) memory max length.

By default, FiniteHistory._get_memory_maxlen() internally calls FiniteHistory._get_memory_maxlen_() the first time and automatically caches its value to make future calls more efficient (since the memory max length is assumed to be constant).

# Returns

The memory max length.

# _get_memory_maxlen_ FiniteHistory

) -> int

Get the memory max length.

This is a helper function called by default from FiniteHistory._get_memory_maxlen(), the difference being that the result is not cached here.


The underscore at the end of this function's name is a convention to remind that its result should be constant.

# Returns

The memory max length.

# _get_next_state_distribution UncertainTransitions

memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> Distribution[D.T_state]

Get the probability distribution of next state given a memory and action.

# Parameters

  • memory: The source memory (state or history) of the transition.
  • action: The action taken in the given memory (state or history) triggering the transition.

# Returns

The probability distribution of next state.

# _get_observation TransformedObservable

state: D.T_state,
action: Optional[StrDict[list[D.T_event]]] = None
) -> StrDict[D.T_observation]

Get the deterministic observation given a state and action.

# Parameters

  • state: The state to be observed.
  • action: The last applied action (or None if the state is an initial state).

# Returns

The probability distribution of the observation.

# _get_observation_distribution PartiallyObservable

state: D.T_state,
action: Optional[StrDict[list[D.T_event]]] = None
) -> Distribution[StrDict[D.T_observation]]

Get the probability distribution of the observation given a state and action.

In mathematical terms (discrete case), given an action , this function represents: , where is the random variable of the observation.

# Parameters

  • state: The state to be observed.
  • action: The last applied action (or None if the state is an initial state).

# Returns

The probability distribution of the observation.

# _get_observation_space PartiallyObservable

) -> StrDict[Space[D.T_observation]]

Get the (cached) observation space (finite or infinite set).

By default, PartiallyObservable._get_observation_space() internally calls PartiallyObservable._get_observation_space_() the first time and automatically caches its value to make future calls more efficient (since the observation space is assumed to be constant).

# Returns

The observation space.

# _get_observation_space_ PartiallyObservable

) -> StrDict[Space[D.T_observation]]

Get the observation space (finite or infinite set).

This is a helper function called by default from PartiallyObservable._get_observation_space(), the difference being that the result is not cached here.


The underscore at the end of this function's name is a convention to remind that its result should be constant.

# Returns

The observation space.

# _get_transition_value UncertainTransitions

memory: Memory[D.T_state],
event: D.T_event,
next_state: Optional[D.T_state] = None
) -> Value[D.T_value]

Get the value (reward or cost) of a transition.

The transition to consider is defined by the function parameters.


If this function never depends on the next_state parameter for its computation, it is recommended to indicate it by overriding UncertainTransitions._is_transition_value_dependent_on_next_state_() to return False. This information can then be exploited by solvers to avoid computing next state to evaluate a transition value (more efficient).

# Parameters

  • memory: The source memory (state or history) of the transition.
  • action: The action taken in the given memory (state or history) triggering the transition.
  • next_state: The next state in which the transition ends (if needed for the computation).

# Returns

The transition value (reward or cost).

# _init_memory History

state: Optional[D.T_state] = None
) -> Memory[D.T_state]

Initialize memory (possibly with a state) according to its specification and return it.

This function is automatically called by Initializable._reset() to reinitialize the internal memory whenever the domain is used as an environment.

# Parameters

  • state: An optional state to initialize the memory with (typically the initial state).

# Returns

The new initialized memory.

# _is_action Events

event: D.T_event
) -> bool

Indicate whether an event is an action (i.e. a controllable event for the agents).


By default, this function is implemented using the skdecide.core.Space.contains() function on the domain action space provided by Events._get_action_space(), but it can be overridden for faster implementations.

# Parameters

  • event: The event to consider.

# Returns

True if the event is an action (False otherwise).

# _is_applicable_action Events

action: StrDict[D.T_event],
memory: Optional[Memory[D.T_state]] = None
) -> bool

Indicate whether an action is applicable in the given memory (state or history), or in the internal one if omitted.

By default, Events._is_applicable_action() provides some boilerplate code and internally calls Events._is_applicable_action_from(). The boilerplate code automatically passes the _memory attribute instead of the memory parameter whenever the latter is None.

# Parameters

  • memory: The memory to consider (if None, the internal memory attribute _memory is used instead).

# Returns

True if the action is applicable (False otherwise).

# _is_applicable_action_from Events

action: StrDict[D.T_event],
memory: Memory[D.T_state]
) -> bool

Indicate whether an action is applicable in the given memory (state or history).

This is a helper function called by default from Events._is_applicable_action(), the difference being that the memory parameter is mandatory here.


By default, this function is implemented using the skdecide.core.Space.contains() function on the space of applicable actions provided by Events._get_applicable_actions_from(), but it can be overridden for faster implementations.

# Parameters

  • memory: The memory to consider.

# Returns

True if the action is applicable (False otherwise).

# _is_enabled_event Events

event: D.T_event,
memory: Optional[Memory[D.T_state]] = None
) -> bool

Indicate whether an uncontrollable event is enabled in the given memory (state or history), or in the internal one if omitted.

By default, Events._is_enabled_event() provides some boilerplate code and internally calls Events._is_enabled_event_from(). The boilerplate code automatically passes the _memory attribute instead of the memory parameter whenever the latter is None.

# Parameters

  • memory: The memory to consider (if None, the internal memory attribute _memory is used instead).

# Returns

True if the event is enabled (False otherwise).

# _is_enabled_event_from Events

event: D.T_event,
memory: Memory[D.T_state]
) -> bool

Indicate whether an event is enabled in the given memory (state or history).

This is a helper function called by default from Events._is_enabled_event(), the difference being that the memory parameter is mandatory here.


By default, this function is implemented using the skdecide.core.Space.contains() function on the space of enabled events provided by Events._get_enabled_events_from(), but it can be overridden for faster implementations.

# Parameters

  • memory: The memory to consider.

# Returns

True if the event is enabled (False otherwise).

# _is_goal Goals

state: D.T_state
) -> bool

Indicate whether an observation belongs to the goals.


By default, this function is implemented using the skdecide.core.Space.contains() function on the domain goals space provided by Goals._get_goals(), but it can be overridden for faster implementations.

# Parameters

  • observation: The observation to consider.

# Returns

True if the observation is a goal (False otherwise).

# _is_observation PartiallyObservable

observation: StrDict[D.T_observation]
) -> bool

Check that an observation indeed belongs to the domain observation space.


By default, this function is implemented using the skdecide.core.Space.contains() function on the domain observation space provided by PartiallyObservable._get_observation_space(), but it can be overridden for faster implementations.

# Parameters

  • observation: The observation to consider.

# Returns

True if the observation belongs to the domain observation space (False otherwise).

# _is_positive PositiveCosts

cost: D.T_value
) -> bool

Determine if a value is positive (can be overridden for advanced value types).

# Parameters

  • cost: The cost to evaluate.

# Returns

True if the cost is positive (False otherwise).

# _is_terminal UncertainTransitions

state: D.T_state
) -> bool

Indicate whether a state is terminal.

A terminal state is a state with no outgoing transition (except to itself with value 0).

# Parameters

  • state: The state to consider.

# Returns

True if the state is terminal (False otherwise).

# _is_transition_value_dependent_on_next_state UncertainTransitions

) -> bool

Indicate whether _get_transition_value() requires the next_state parameter for its computation (cached).

By default, UncertainTransitions._is_transition_value_dependent_on_next_state() internally calls UncertainTransitions._is_transition_value_dependent_on_next_state_() the first time and automatically caches its value to make future calls more efficient (since the returned value is assumed to be constant).

# Returns

True if the transition value computation depends on next_state (False otherwise).

# _is_transition_value_dependent_on_next_state_ UncertainTransitions

) -> bool

Indicate whether _get_transition_value() requires the next_state parameter for its computation.

This is a helper function called by default from UncertainTransitions._is_transition_value_dependent_on_next_state(), the difference being that the result is not cached here.


The underscore at the end of this function's name is a convention to remind that its result should be constant.

# Returns

True if the transition value computation depends on next_state (False otherwise).

# _sample Simulation

memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]

Sample one transition of the simulator's dynamics.

By default, Simulation._sample() provides some boilerplate code and internally calls Simulation._state_sample() (which returns a transition outcome). The boilerplate code automatically samples an observation corresponding to the sampled next state.


Whenever an existing simulator needs to be wrapped instead of implemented fully in scikit-decide (e.g. a simulator), it is recommended to overwrite Simulation._sample() to call the external simulator and not use the Simulation._state_sample() helper function.

# Parameters

  • memory: The source memory (state or history) of the transition.
  • action: The action taken in the given memory (state or history) triggering the transition.

# Returns

The environment outcome of the sampled transition.

# _set_memory Simulation

memory: Memory[D.T_state]
) -> None

Set internal memory attribute _memory to given one.

This can be useful to set a specific "starting point" before doing a rollout with successive Environment._step() calls.

# Parameters

  • memory: The memory to set internally.

# Example

# Set simulation_domain memory to my_state (assuming Markovian domain)

# Start a 100-steps rollout from here (applying my_action at every step)
for _ in range(100):

# _state_sample Simulation

memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> TransitionOutcome[D.T_state, StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]

Compute one sample of the transition's dynamics.

This is a helper function called by default from Simulation._sample(). It focuses on the state level, as opposed to the observation one for the latter.

# Parameters

  • memory: The source memory (state or history) of the transition.
  • action: The action taken in the given memory (state or history) triggering the transition.

# Returns

The transition outcome of the sampled transition.

# _state_step Environment

action: StrDict[list[D.T_event]]
) -> TransitionOutcome[D.T_state, StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]

Compute one step of the transition's dynamics.

This is a helper function called by default from Environment._step(). It focuses on the state level, as opposed to the observation one for the latter.

# Parameters

  • action: The action taken in the current memory (state or history) triggering the transition.

# Returns

The transition outcome of this step.

# _step Environment

action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]

Run one step of the environment's dynamics.

By default, Environment._step() provides some boilerplate code and internally calls Environment._state_step() (which returns a transition outcome). The boilerplate code automatically stores next state into the _memory attribute and samples a corresponding observation.


Whenever an existing environment needs to be wrapped instead of implemented fully in scikit-decide (e.g. compiled ATARI games), it is recommended to overwrite Environment._step() to call the external environment and not use the Environment._state_step() helper function.


Before calling Environment._step() the first time or when the end of an episode is reached, Initializable._reset() must be called to reset the environment's state.

# Parameters

  • action: The action taken in the current memory (state or history) triggering the transition.

# Returns

The environment outcome of this step.

# GraphDomain

This domain is for deterministic planning domain where the full transitions and cost are already computed. In this case, using the dictionary structures will improve the computing performance of the domain and therefore its solving time.

# Constructor GraphDomain

  next_state_map: dict[D.T_state, dict[D.T_event, D.T_state]],
next_state_attributes: dict[D.T_state, dict[D.T_event, dict[str, float]]],
targets: Optional[set[D.T_state]] = None,
attribute_weight = weight

# Parameters

  • next_state_map: is a dictionary with keys the state and values a dictionary with action as keys and next state as values
  • next_state_attributes: for each transition, stores float attributes (typically cost of transition)
  • target : set of goal states
  • attribute_weight: key in next_state_attributes to consider as the cost attribute.

# check_value Rewards

value: Value[D.T_value]
) -> bool

Check that a value is compliant with its reward specification.


This function returns always True by default because any kind of reward should be accepted at this level.

# Parameters

  • value: The value to check.

# Returns

True if the value is compliant (False otherwise).

# get_action_space Events

) -> StrDict[Space[D.T_event]]

Get the (cached) domain action space (finite or infinite set).

By default, Events.get_action_space() internally calls Events._get_action_space_() the first time and automatically caches its value to make future calls more efficient (since the action space is assumed to be constant).

# Returns

The action space.

# get_agents MultiAgent

) -> set[str]

Return a singleton for single agent domains.

We must be here consistent with skdecide.core.autocast() which transforms a single agent domain into a multi agents domain whose only agent has the id "agent".

# get_applicable_actions Events

memory: Optional[Memory[D.T_state]] = None
) -> StrDict[Space[D.T_event]]

Get the space (finite or infinite set) of applicable actions in the given memory (state or history), or in the internal one if omitted.

By default, Events.get_applicable_actions() provides some boilerplate code and internally calls Events._get_applicable_actions(). The boilerplate code automatically passes the _memory attribute instead of the memory parameter whenever the latter is None.

# Parameters

  • memory: The memory to consider (if None, the internal memory attribute _memory is used instead).

# Returns

The space of applicable actions.

# get_enabled_events Events

memory: Optional[Memory[D.T_state]] = None
) -> Space[D.T_event]

Get the space (finite or infinite set) of enabled uncontrollable events in the given memory (state or history), or in the internal one if omitted.

By default, Events.get_enabled_events() provides some boilerplate code and internally calls Events._get_enabled_events(). The boilerplate code automatically passes the _memory attribute instead of the memory parameter whenever the latter is None.

# Parameters

  • memory: The memory to consider (if None, the internal memory attribute _memory is used instead).

# Returns

The space of enabled events.

# get_goals Goals

) -> StrDict[Space[D.T_observation]]

Get the (cached) domain goals space (finite or infinite set).

By default, Goals.get_goals() internally calls Goals._get_goals_() the first time and automatically caches its value to make future calls more efficient (since the goals space is assumed to be constant).


Goal states are assumed to be fully observable (i.e. observation = state) so that there is never uncertainty about whether the goal has been reached or not. This assumption guarantees that any policy that does not reach the goal with certainty incurs in infinite expected cost. - Geffner, 2013: A Concise Introduction to Models and Methods for Automated Planning

# Returns

The goals space.

# get_initial_state DeterministicInitialized

) -> D.T_state

Get the (cached) initial state.

By default, DeterministicInitialized.get_initial_state() internally calls DeterministicInitialized._get_initial_state_() the first time and automatically caches its value to make future calls more efficient (since the initial state is assumed to be constant).

# Returns

The initial state.

# get_initial_state_distribution UncertainInitialized

) -> Distribution[D.T_state]

Get the (cached) probability distribution of initial states.

By default, UncertainInitialized.get_initial_state_distribution() internally calls UncertainInitialized._get_initial_state_distribution_() the first time and automatically caches its value to make future calls more efficient (since the initial state distribution is assumed to be constant).

# Returns

The probability distribution of initial states.

# get_next_state DeterministicTransitions

memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> D.T_state

Get the next state given a memory and action.

# Parameters

  • memory: The source memory (state or history) of the transition.
  • action: The action taken in the given memory (state or history) triggering the transition.

# Returns

The deterministic next state.

# get_next_state_distribution UncertainTransitions

memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> DiscreteDistribution[D.T_state]

Get the discrete probability distribution of next state given a memory and action.


In the Markovian case (memory only holds last state ), given an action , this function can be mathematically represented by , where is the next state random variable.

# Parameters

  • memory: The source memory (state or history) of the transition.
  • action: The action taken in the given memory (state or history) triggering the transition.

# Returns

The discrete probability distribution of next state.

# get_observation TransformedObservable

state: D.T_state,
action: Optional[StrDict[list[D.T_event]]] = None
) -> StrDict[D.T_observation]

Get the deterministic observation given a state and action.

# Parameters

  • state: The state to be observed.
  • action: The last applied action (or None if the state is an initial state).

# Returns

The probability distribution of the observation.

# get_observation_distribution PartiallyObservable

state: D.T_state,
action: Optional[StrDict[list[D.T_event]]] = None
) -> Distribution[StrDict[D.T_observation]]

Get the probability distribution of the observation given a state and action.

In mathematical terms (discrete case), given an action , this function represents: , where is the random variable of the observation.

# Parameters

  • state: The state to be observed.
  • action: The last applied action (or None if the state is an initial state).

# Returns

The probability distribution of the observation.

# get_observation_space PartiallyObservable

) -> StrDict[Space[D.T_observation]]

Get the (cached) observation space (finite or infinite set).

By default, PartiallyObservable.get_observation_space() internally calls PartiallyObservable._get_observation_space_() the first time and automatically caches its value to make future calls more efficient (since the observation space is assumed to be constant).

# Returns

The observation space.

# get_transition_value UncertainTransitions

memory: Memory[D.T_state],
action: StrDict[list[D.T_event]],
next_state: Optional[D.T_state] = None
) -> StrDict[Value[D.T_value]]

Get the value (reward or cost) of a transition.

The transition to consider is defined by the function parameters.


If this function never depends on the next_state parameter for its computation, it is recommended to indicate it by overriding UncertainTransitions._is_transition_value_dependent_on_next_state_() to return False. This information can then be exploited by solvers to avoid computing next state to evaluate a transition value (more efficient).

# Parameters

  • memory: The source memory (state or history) of the transition.
  • action: The action taken in the given memory (state or history) triggering the transition.
  • next_state: The next state in which the transition ends (if needed for the computation).

# Returns

The transition value (reward or cost).

# is_action Events

event: D.T_event
) -> bool

Indicate whether an event is an action (i.e. a controllable event for the agents).


By default, this function is implemented using the skdecide.core.Space.contains() function on the domain action space provided by Events.get_action_space(), but it can be overridden for faster implementations.

# Parameters

  • event: The event to consider.

# Returns

True if the event is an action (False otherwise).

# is_applicable_action Events

action: StrDict[D.T_event],
memory: Optional[Memory[D.T_state]] = None
) -> bool

Indicate whether an action is applicable in the given memory (state or history), or in the internal one if omitted.

By default, Events.is_applicable_action() provides some boilerplate code and internally calls Events._is_applicable_action(). The boilerplate code automatically passes the _memory attribute instead of the memory parameter whenever the latter is None.

# Parameters

  • memory: The memory to consider (if None, the internal memory attribute _memory is used instead).

# Returns

True if the action is applicable (False otherwise).

# is_enabled_event Events

event: D.T_event,
memory: Optional[Memory[D.T_state]] = None
) -> bool

Indicate whether an uncontrollable event is enabled in the given memory (state or history), or in the internal one if omitted.

By default, Events.is_enabled_event() provides some boilerplate code and internally calls Events._is_enabled_event(). The boilerplate code automatically passes the _memory attribute instead of the memory parameter whenever the latter is None.

# Parameters

  • memory: The memory to consider (if None, the internal memory attribute _memory is used instead).

# Returns

True if the event is enabled (False otherwise).

# is_goal Goals

state: D.T_state
) -> bool

Indicate whether an observation belongs to the goals.


By default, this function is implemented using the skdecide.core.Space.contains() function on the domain goals space provided by Goals.get_goals(), but it can be overridden for faster implementations.

# Parameters

  • observation: The observation to consider.

# Returns

True if the observation is a goal (False otherwise).

# is_observation PartiallyObservable

observation: StrDict[D.T_observation]
) -> bool

Check that an observation indeed belongs to the domain observation space.


By default, this function is implemented using the skdecide.core.Space.contains() function on the domain observation space provided by PartiallyObservable.get_observation_space(), but it can be overridden for faster implementations.

# Parameters

  • observation: The observation to consider.

# Returns

True if the observation belongs to the domain observation space (False otherwise).

# is_terminal UncertainTransitions

state: D.T_state
) -> bool

Indicate whether a state is terminal.

A terminal state is a state with no outgoing transition (except to itself with value 0).

# Parameters

  • state: The state to consider.

# Returns

True if the state is terminal (False otherwise).

# is_transition_value_dependent_on_next_state UncertainTransitions

) -> bool

Indicate whether get_transition_value() requires the next_state parameter for its computation (cached).

By default, UncertainTransitions.is_transition_value_dependent_on_next_state() internally calls UncertainTransitions._is_transition_value_dependent_on_next_state_() the first time and automatically caches its value to make future calls more efficient (since the returned value is assumed to be constant).

# Returns

True if the transition value computation depends on next_state (False otherwise).

# merge GraphDomain

graph_domain: GraphDomain

Return a new graph domain merged from self and another instance of GraphDomain.

# reset Initializable

) -> StrDict[D.T_observation]

Reset the state of the environment and return an initial observation.

By default, Initializable.reset() provides some boilerplate code and internally calls Initializable._reset() (which returns an initial state). The boilerplate code automatically stores the initial state into the _memory attribute and samples a corresponding observation.

# Returns

An initial observation.

# sample Simulation

memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]

Sample one transition of the simulator's dynamics.

By default, Simulation.sample() provides some boilerplate code and internally calls Simulation._sample() (which returns a transition outcome). The boilerplate code automatically samples an observation corresponding to the sampled next state.


Whenever an existing simulator needs to be wrapped instead of implemented fully in scikit-decide (e.g. a simulator), it is recommended to overwrite Simulation.sample() to call the external simulator and not use the Simulation._sample() helper function.

# Parameters

  • memory: The source memory (state or history) of the transition.
  • action: The action taken in the given memory (state or history) triggering the transition.

# Returns

The environment outcome of the sampled transition.

# set_memory Simulation

memory: Memory[D.T_state]
) -> None

Set internal memory attribute _memory to given one.

This can be useful to set a specific "starting point" before doing a rollout with successive Environment.step() calls.

# Parameters

  • memory: The memory to set internally.

# Example

# Set simulation_domain memory to my_state (assuming Markovian domain)

# Start a 100-steps rollout from here (applying my_action at every step)
for _ in range(100):

# set_nodes_target GraphDomain


Change the sources and targets attribute.

# set_sources_targets GraphDomain


Change the sources and targets attribute.

# step Environment

action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]

Run one step of the environment's dynamics.

By default, Environment.step() provides some boilerplate code and internally calls Environment._step() (which returns a transition outcome). The boilerplate code automatically stores next state into the _memory attribute and samples a corresponding observation.


Whenever an existing environment needs to be wrapped instead of implemented fully in scikit-decide (e.g. compiled ATARI games), it is recommended to overwrite Environment.step() to call the external environment and not use the Environment._step() helper function.


Before calling Environment.step() the first time or when the end of an episode is reached, Initializable.reset() must be called to reset the environment's state.

# Parameters

  • action: The action taken in the current memory (state or history) triggering the transition.

# Returns

The environment outcome of this step.

# _check_value Rewards

value: Value[D.T_value]
) -> bool

Check that a value is compliant with its cost specification (must be positive).


This function calls PositiveCost._is_positive() to determine if a value is positive (can be overridden for advanced value types).

# Parameters

  • value: The value to check.

# Returns

True if the value is compliant (False otherwise).

# _get_action_space Events

) -> StrDict[Space[D.T_event]]

Get the (cached) domain action space (finite or infinite set).

By default, Events._get_action_space() internally calls Events._get_action_space_() the first time and automatically caches its value to make future calls more efficient (since the action space is assumed to be constant).

# Returns

The action space.

# _get_action_space_ Events

) -> Space[D.T_event]

Get the domain action space (finite or infinite set).

This is a helper function called by default from Events._get_action_space(), the difference being that the result is not cached here.


The underscore at the end of this function's name is a convention to remind that its result should be constant.

# Returns

The action space.

# _get_applicable_actions Events

memory: Optional[Memory[D.T_state]] = None
) -> StrDict[Space[D.T_event]]

Get the space (finite or infinite set) of applicable actions in the given memory (state or history), or in the internal one if omitted.

By default, Events._get_applicable_actions() provides some boilerplate code and internally calls Events._get_applicable_actions_from(). The boilerplate code automatically passes the _memory attribute instead of the memory parameter whenever the latter is None.

# Parameters

  • memory: The memory to consider (if None, the internal memory attribute _memory is used instead).

# Returns

The space of applicable actions.

# _get_applicable_actions_from Events

memory: D.T_state
) -> Space[D.T_event]

Get the space (finite or infinite set) of applicable actions in the given memory (state or history).

This is a helper function called by default from Events._get_applicable_actions(), the difference being that the memory parameter is mandatory here.

# Parameters

  • memory: The memory to consider.

# Returns

The space of applicable actions.

# _get_enabled_events Events

memory: Optional[Memory[D.T_state]] = None
) -> Space[D.T_event]

Get the space (finite or infinite set) of enabled uncontrollable events in the given memory (state or history), or in the internal one if omitted.

By default, Events._get_enabled_events() provides some boilerplate code and internally calls Events._get_enabled_events_from(). The boilerplate code automatically passes the _memory attribute instead of the memory parameter whenever the latter is None.

# Parameters

  • memory: The memory to consider (if None, the internal memory attribute _memory is used instead).

# Returns

The space of enabled events.

# _get_enabled_events_from Events

memory: Memory[D.T_state]
) -> Space[D.T_event]

Get the space (finite or infinite set) of enabled uncontrollable events in the given memory (state or history).

This is a helper function called by default from Events._get_enabled_events(), the difference being that the memory parameter is mandatory here.

# Parameters

  • memory: The memory to consider.

# Returns

The space of enabled events.

# _get_goals Goals

) -> StrDict[Space[D.T_observation]]

Get the (cached) domain goals space (finite or infinite set).

By default, Goals._get_goals() internally calls Goals._get_goals_() the first time and automatically caches its value to make future calls more efficient (since the goals space is assumed to be constant).


Goal states are assumed to be fully observable (i.e. observation = state) so that there is never uncertainty about whether the goal has been reached or not. This assumption guarantees that any policy that does not reach the goal with certainty incurs in infinite expected cost. - Geffner, 2013: A Concise Introduction to Models and Methods for Automated Planning

# Returns

The goals space.

# _get_goals_ Goals

) -> Space[D.T_observation]

Get the domain goals space (finite or infinite set).

This is a helper function called by default from Goals._get_goals(), the difference being that the result is not cached here.


The underscore at the end of this function's name is a convention to remind that its result should be constant.

# Returns

The goals space.

# _get_initial_state DeterministicInitialized

) -> D.T_state

Get the (cached) initial state.

By default, DeterministicInitialized._get_initial_state() internally calls DeterministicInitialized._get_initial_state_() the first time and automatically caches its value to make future calls more efficient (since the initial state is assumed to be constant).

# Returns

The initial state.

# _get_initial_state_ DeterministicInitialized

) -> D.T_state

Get the initial state.

This is a helper function called by default from DeterministicInitialized._get_initial_state(), the difference being that the result is not cached here.

# Returns

The initial state.

# _get_initial_state_distribution UncertainInitialized

) -> Distribution[D.T_state]

Get the (cached) probability distribution of initial states.

By default, UncertainInitialized._get_initial_state_distribution() internally calls UncertainInitialized._get_initial_state_distribution_() the first time and automatically caches its value to make future calls more efficient (since the initial state distribution is assumed to be constant).

# Returns

The probability distribution of initial states.

# _get_initial_state_distribution_ UncertainInitialized

) -> Distribution[D.T_state]

Get the probability distribution of initial states.

This is a helper function called by default from UncertainInitialized._get_initial_state_distribution(), the difference being that the result is not cached here.


The underscore at the end of this function's name is a convention to remind that its result should be constant.

# Returns

The probability distribution of initial states.

# _get_memory_maxlen History

) -> int

Get the (cached) memory max length.

By default, FiniteHistory._get_memory_maxlen() internally calls FiniteHistory._get_memory_maxlen_() the first time and automatically caches its value to make future calls more efficient (since the memory max length is assumed to be constant).

# Returns

The memory max length.

# _get_memory_maxlen_ FiniteHistory

) -> int

Get the memory max length.

This is a helper function called by default from FiniteHistory._get_memory_maxlen(), the difference being that the result is not cached here.


The underscore at the end of this function's name is a convention to remind that its result should be constant.

# Returns

The memory max length.

# _get_next_state DeterministicTransitions

memory: Memory[D.T_state],
event: D.T_event
) -> D.T_state

Get the next state given a memory and action.

# Parameters

  • memory: The source memory (state or history) of the transition.
  • action: The action taken in the given memory (state or history) triggering the transition.

# Returns

The deterministic next state.

# _get_next_state_distribution UncertainTransitions

memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> SingleValueDistribution[D.T_state]

Get the discrete probability distribution of next state given a memory and action.


In the Markovian case (memory only holds last state ), given an action , this function can be mathematically represented by , where is the next state random variable.

# Parameters

  • memory: The source memory (state or history) of the transition.
  • action: The action taken in the given memory (state or history) triggering the transition.

# Returns

The discrete probability distribution of next state.

# _get_observation TransformedObservable

state: D.T_state,
action: Optional[StrDict[list[D.T_event]]] = None
) -> StrDict[D.T_observation]

Get the deterministic observation given a state and action.

# Parameters

  • state: The state to be observed.
  • action: The last applied action (or None if the state is an initial state).

# Returns

The probability distribution of the observation.

# _get_observation_distribution PartiallyObservable

state: D.T_state,
action: Optional[StrDict[list[D.T_event]]] = None
) -> Distribution[StrDict[D.T_observation]]

Get the probability distribution of the observation given a state and action.

In mathematical terms (discrete case), given an action , this function represents: , where is the random variable of the observation.

# Parameters

  • state: The state to be observed.
  • action: The last applied action (or None if the state is an initial state).

# Returns

The probability distribution of the observation.

# _get_observation_space PartiallyObservable

) -> StrDict[Space[D.T_observation]]

Get the (cached) observation space (finite or infinite set).

By default, PartiallyObservable._get_observation_space() internally calls PartiallyObservable._get_observation_space_() the first time and automatically caches its value to make future calls more efficient (since the observation space is assumed to be constant).

# Returns

The observation space.

# _get_observation_space_ PartiallyObservable

) -> Space[D.T_observation]

Get the observation space (finite or infinite set).

This is a helper function called by default from PartiallyObservable._get_observation_space(), the difference being that the result is not cached here.


The underscore at the end of this function's name is a convention to remind that its result should be constant.

# Returns

The observation space.

# _get_transition_value UncertainTransitions

memory: Memory[D.T_state],
event: D.T_event,
next_state: Optional[D.T_state] = None
) -> Value[D.T_value]

Get the value (reward or cost) of a transition.

The transition to consider is defined by the function parameters.


If this function never depends on the next_state parameter for its computation, it is recommended to indicate it by overriding UncertainTransitions._is_transition_value_dependent_on_next_state_() to return False. This information can then be exploited by solvers to avoid computing next state to evaluate a transition value (more efficient).

# Parameters

  • memory: The source memory (state or history) of the transition.
  • action: The action taken in the given memory (state or history) triggering the transition.
  • next_state: The next state in which the transition ends (if needed for the computation).

# Returns

The transition value (reward or cost).

# _init_memory History

state: Optional[D.T_state] = None
) -> Memory[D.T_state]

Initialize memory (possibly with a state) according to its specification and return it.

This function is automatically called by Initializable._reset() to reinitialize the internal memory whenever the domain is used as an environment.

# Parameters

  • state: An optional state to initialize the memory with (typically the initial state).

# Returns

The new initialized memory.

# _is_action Events

event: D.T_event
) -> bool

Indicate whether an event is an action (i.e. a controllable event for the agents).


By default, this function is implemented using the skdecide.core.Space.contains() function on the domain action space provided by Events._get_action_space(), but it can be overridden for faster implementations.

# Parameters

  • event: The event to consider.

# Returns

True if the event is an action (False otherwise).

# _is_applicable_action Events

action: StrDict[D.T_event],
memory: Optional[Memory[D.T_state]] = None
) -> bool

Indicate whether an action is applicable in the given memory (state or history), or in the internal one if omitted.

By default, Events._is_applicable_action() provides some boilerplate code and internally calls Events._is_applicable_action_from(). The boilerplate code automatically passes the _memory attribute instead of the memory parameter whenever the latter is None.

# Parameters

  • memory: The memory to consider (if None, the internal memory attribute _memory is used instead).

# Returns

True if the action is applicable (False otherwise).

# _is_applicable_action_from Events

action: StrDict[D.T_event],
memory: Memory[D.T_state]
) -> bool

Indicate whether an action is applicable in the given memory (state or history).

This is a helper function called by default from Events._is_applicable_action(), the difference being that the memory parameter is mandatory here.


By default, this function is implemented using the skdecide.core.Space.contains() function on the space of applicable actions provided by Events._get_applicable_actions_from(), but it can be overridden for faster implementations.

# Parameters

  • memory: The memory to consider.

# Returns

True if the action is applicable (False otherwise).

# _is_enabled_event Events

event: D.T_event,
memory: Optional[Memory[D.T_state]] = None
) -> bool

Indicate whether an uncontrollable event is enabled in the given memory (state or history), or in the internal one if omitted.

By default, Events._is_enabled_event() provides some boilerplate code and internally calls Events._is_enabled_event_from(). The boilerplate code automatically passes the _memory attribute instead of the memory parameter whenever the latter is None.

# Parameters

  • memory: The memory to consider (if None, the internal memory attribute _memory is used instead).

# Returns

True if the event is enabled (False otherwise).

# _is_enabled_event_from Events

event: D.T_event,
memory: Memory[D.T_state]
) -> bool

Indicate whether an event is enabled in the given memory (state or history).

This is a helper function called by default from Events._is_enabled_event(), the difference being that the memory parameter is mandatory here.


By default, this function is implemented using the skdecide.core.Space.contains() function on the space of enabled events provided by Events._get_enabled_events_from(), but it can be overridden for faster implementations.

# Parameters

  • memory: The memory to consider.

# Returns

True if the event is enabled (False otherwise).

# _is_goal Goals

observation: StrDict[D.T_observation]
) -> StrDict[D.T_predicate]

Indicate whether an observation belongs to the goals.


By default, this function is implemented using the skdecide.core.Space.contains() function on the domain goals space provided by Goals._get_goals(), but it can be overridden for faster implementations.

# Parameters

  • observation: The observation to consider.

# Returns

True if the observation is a goal (False otherwise).

# _is_observation PartiallyObservable

observation: StrDict[D.T_observation]
) -> bool

Check that an observation indeed belongs to the domain observation space.


By default, this function is implemented using the skdecide.core.Space.contains() function on the domain observation space provided by PartiallyObservable._get_observation_space(), but it can be overridden for faster implementations.

# Parameters

  • observation: The observation to consider.

# Returns

True if the observation belongs to the domain observation space (False otherwise).

# _is_positive PositiveCosts

cost: D.T_value
) -> bool

Determine if a value is positive (can be overridden for advanced value types).

# Parameters

  • cost: The cost to evaluate.

# Returns

True if the cost is positive (False otherwise).

# _is_terminal UncertainTransitions

state: D.T_state
) -> StrDict[D.T_predicate]

Indicate whether a state is terminal.

A terminal state is a state with no outgoing transition (except to itself with value 0).

# Parameters

  • state: The state to consider.

# Returns

True if the state is terminal (False otherwise).

# _is_transition_value_dependent_on_next_state UncertainTransitions

) -> bool

Indicate whether _get_transition_value() requires the next_state parameter for its computation (cached).

By default, UncertainTransitions._is_transition_value_dependent_on_next_state() internally calls UncertainTransitions._is_transition_value_dependent_on_next_state_() the first time and automatically caches its value to make future calls more efficient (since the returned value is assumed to be constant).

# Returns

True if the transition value computation depends on next_state (False otherwise).

# _is_transition_value_dependent_on_next_state_ UncertainTransitions

) -> bool

Indicate whether _get_transition_value() requires the next_state parameter for its computation.

This is a helper function called by default from UncertainTransitions._is_transition_value_dependent_on_next_state(), the difference being that the result is not cached here.


The underscore at the end of this function's name is a convention to remind that its result should be constant.

# Returns

True if the transition value computation depends on next_state (False otherwise).

# _reset Initializable

) -> StrDict[D.T_observation]

Reset the state of the environment and return an initial observation.

By default, Initializable._reset() provides some boilerplate code and internally calls Initializable._state_reset() (which returns an initial state). The boilerplate code automatically stores the initial state into the _memory attribute and samples a corresponding observation.

# Returns

An initial observation.

# _sample Simulation

memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]

Sample one transition of the simulator's dynamics.

By default, Simulation._sample() provides some boilerplate code and internally calls Simulation._state_sample() (which returns a transition outcome). The boilerplate code automatically samples an observation corresponding to the sampled next state.


Whenever an existing simulator needs to be wrapped instead of implemented fully in scikit-decide (e.g. a simulator), it is recommended to overwrite Simulation._sample() to call the external simulator and not use the Simulation._state_sample() helper function.

# Parameters

  • memory: The source memory (state or history) of the transition.
  • action: The action taken in the given memory (state or history) triggering the transition.

# Returns

The environment outcome of the sampled transition.

# _set_memory Simulation

memory: Memory[D.T_state]
) -> None

Set internal memory attribute _memory to given one.

This can be useful to set a specific "starting point" before doing a rollout with successive Environment._step() calls.

# Parameters

  • memory: The memory to set internally.

# Example

# Set simulation_domain memory to my_state (assuming Markovian domain)

# Start a 100-steps rollout from here (applying my_action at every step)
for _ in range(100):

# _state_reset Initializable

) -> D.T_state

Reset the state of the environment and return an initial state.

This is a helper function called by default from Initializable._reset(). It focuses on the state level, as opposed to the observation one for the latter.

# Returns

An initial state.

# _state_sample Simulation

memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> TransitionOutcome[D.T_state, StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]

Compute one sample of the transition's dynamics.

This is a helper function called by default from Simulation._sample(). It focuses on the state level, as opposed to the observation one for the latter.

# Parameters

  • memory: The source memory (state or history) of the transition.
  • action: The action taken in the given memory (state or history) triggering the transition.

# Returns

The transition outcome of the sampled transition.

# _state_step Environment

action: StrDict[list[D.T_event]]
) -> TransitionOutcome[D.T_state, StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]

Compute one step of the transition's dynamics.

This is a helper function called by default from Environment._step(). It focuses on the state level, as opposed to the observation one for the latter.

# Parameters

  • action: The action taken in the current memory (state or history) triggering the transition.

# Returns

The transition outcome of this step.

# _step Environment

action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]

Run one step of the environment's dynamics.

By default, Environment._step() provides some boilerplate code and internally calls Environment._state_step() (which returns a transition outcome). The boilerplate code automatically stores next state into the _memory attribute and samples a corresponding observation.


Whenever an existing environment needs to be wrapped instead of implemented fully in scikit-decide (e.g. compiled ATARI games), it is recommended to overwrite Environment._step() to call the external environment and not use the Environment._state_step() helper function.


Before calling Environment._step() the first time or when the end of an episode is reached, Initializable._reset() must be called to reset the environment's state.

# Parameters

  • action: The action taken in the current memory (state or history) triggering the transition.

# Returns

The environment outcome of this step.