# builders.domain.dynamics

Domain specification

Domain

# Environment

A domain must inherit this class if agents interact with it like a black-box environment.

Black-box environment examples include: the real world, compiled ATARI games, etc.

TIP

Environment domains are typically stateful: they must keep the current state or history in their memory to compute next steps (automatically done by default in the _memory attribute).

# step Environment

step(
  self,
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]

Run one step of the environment's dynamics.

By default, Environment.step() provides some boilerplate code and internally calls Environment._step() (which returns a transition outcome). The boilerplate code automatically stores next state into the _memory attribute and samples a corresponding observation.

TIP

Whenever an existing environment needs to be wrapped instead of implemented fully in scikit-decide (e.g. compiled ATARI games), it is recommended to overwrite Environment.step() to call the external environment and not use the Environment._step() helper function.

WARNING

Before calling Environment.step() the first time or when the end of an episode is reached, Initializable.reset() must be called to reset the environment's state.

# Parameters

  • action: The action taken in the current memory (state or history) triggering the transition.

# Returns

The environment outcome of this step.

# _state_step Environment

_state_step(
  self,
action: StrDict[list[D.T_event]]
) -> TransitionOutcome[D.T_state, StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]

Compute one step of the transition's dynamics.

This is a helper function called by default from Environment._step(). It focuses on the state level, as opposed to the observation one for the latter.

# Parameters

  • action: The action taken in the current memory (state or history) triggering the transition.

# Returns

The transition outcome of this step.

# _step Environment

_step(
  self,
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]

Run one step of the environment's dynamics.

By default, Environment._step() provides some boilerplate code and internally calls Environment._state_step() (which returns a transition outcome). The boilerplate code automatically stores next state into the _memory attribute and samples a corresponding observation.

TIP

Whenever an existing environment needs to be wrapped instead of implemented fully in scikit-decide (e.g. compiled ATARI games), it is recommended to overwrite Environment._step() to call the external environment and not use the Environment._state_step() helper function.

WARNING

Before calling Environment._step() the first time or when the end of an episode is reached, Initializable._reset() must be called to reset the environment's state.

# Parameters

  • action: The action taken in the current memory (state or history) triggering the transition.

# Returns

The environment outcome of this step.

# Simulation

A domain must inherit this class if agents interact with it like a simulation.

Compared to pure environment domains, simulation ones have the additional ability to sample transitions from any given state.

TIP

Simulation domains are typically stateless: they do not need to store the current state or history in memory since it is usually passed as parameter of their functions. By default, they only become stateful whenever they are used as environments (e.g. via Initializable.reset() and Environment.step() functions).

# sample Simulation

sample(
  self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]

Sample one transition of the simulator's dynamics.

By default, Simulation.sample() provides some boilerplate code and internally calls Simulation._sample() (which returns a transition outcome). The boilerplate code automatically samples an observation corresponding to the sampled next state.

TIP

Whenever an existing simulator needs to be wrapped instead of implemented fully in scikit-decide (e.g. a simulator), it is recommended to overwrite Simulation.sample() to call the external simulator and not use the Simulation._sample() helper function.

# Parameters

  • memory: The source memory (state or history) of the transition.
  • action: The action taken in the given memory (state or history) triggering the transition.

# Returns

The environment outcome of the sampled transition.

# set_memory Simulation

set_memory(
  self,
memory: Memory[D.T_state]
) -> None

Set internal memory attribute _memory to given one.

This can be useful to set a specific "starting point" before doing a rollout with successive Environment.step() calls.

# Parameters

  • memory: The memory to set internally.

# Example

# Set simulation_domain memory to my_state (assuming Markovian domain)
simulation_domain.set_memory(my_state)

# Start a 100-steps rollout from here (applying my_action at every step)
for _ in range(100):
    simulation_domain.step(my_action)

# step Environment

step(
  self,
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]

Run one step of the environment's dynamics.

By default, Environment.step() provides some boilerplate code and internally calls Environment._step() (which returns a transition outcome). The boilerplate code automatically stores next state into the _memory attribute and samples a corresponding observation.

TIP

Whenever an existing environment needs to be wrapped instead of implemented fully in scikit-decide (e.g. compiled ATARI games), it is recommended to overwrite Environment.step() to call the external environment and not use the Environment._step() helper function.

WARNING

Before calling Environment.step() the first time or when the end of an episode is reached, Initializable.reset() must be called to reset the environment's state.

# Parameters

  • action: The action taken in the current memory (state or history) triggering the transition.

# Returns

The environment outcome of this step.

# _sample Simulation

_sample(
  self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]

Sample one transition of the simulator's dynamics.

By default, Simulation._sample() provides some boilerplate code and internally calls Simulation._state_sample() (which returns a transition outcome). The boilerplate code automatically samples an observation corresponding to the sampled next state.

TIP

Whenever an existing simulator needs to be wrapped instead of implemented fully in scikit-decide (e.g. a simulator), it is recommended to overwrite Simulation._sample() to call the external simulator and not use the Simulation._state_sample() helper function.

# Parameters

  • memory: The source memory (state or history) of the transition.
  • action: The action taken in the given memory (state or history) triggering the transition.

# Returns

The environment outcome of the sampled transition.

# _set_memory Simulation

_set_memory(
  self,
memory: Memory[D.T_state]
) -> None

Set internal memory attribute _memory to given one.

This can be useful to set a specific "starting point" before doing a rollout with successive Environment._step() calls.

# Parameters

  • memory: The memory to set internally.

# Example

# Set simulation_domain memory to my_state (assuming Markovian domain)
simulation_domain._set_memory(my_state)

# Start a 100-steps rollout from here (applying my_action at every step)
for _ in range(100):
    simulation_domain._step(my_action)

# _state_sample Simulation

_state_sample(
  self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> TransitionOutcome[D.T_state, StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]

Compute one sample of the transition's dynamics.

This is a helper function called by default from Simulation._sample(). It focuses on the state level, as opposed to the observation one for the latter.

# Parameters

  • memory: The source memory (state or history) of the transition.
  • action: The action taken in the given memory (state or history) triggering the transition.

# Returns

The transition outcome of the sampled transition.

# _state_step Environment

_state_step(
  self,
action: StrDict[list[D.T_event]]
) -> TransitionOutcome[D.T_state, StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]

Compute one step of the transition's dynamics.

This is a helper function called by default from Environment._step(). It focuses on the state level, as opposed to the observation one for the latter.

# Parameters

  • action: The action taken in the current memory (state or history) triggering the transition.

# Returns

The transition outcome of this step.

# _step Environment

_step(
  self,
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]

Run one step of the environment's dynamics.

By default, Environment._step() provides some boilerplate code and internally calls Environment._state_step() (which returns a transition outcome). The boilerplate code automatically stores next state into the _memory attribute and samples a corresponding observation.

TIP

Whenever an existing environment needs to be wrapped instead of implemented fully in scikit-decide (e.g. compiled ATARI games), it is recommended to overwrite Environment._step() to call the external environment and not use the Environment._state_step() helper function.

WARNING

Before calling Environment._step() the first time or when the end of an episode is reached, Initializable._reset() must be called to reset the environment's state.

# Parameters

  • action: The action taken in the current memory (state or history) triggering the transition.

# Returns

The environment outcome of this step.

# UncertainTransitions

A domain must inherit this class if its dynamics is uncertain and provided as a white-box model.

Compared to pure simulation domains, uncertain transition ones provide in addition the full probability distribution of next states given a memory and action.

TIP

Uncertain transition domains are typically stateless: they do not need to store the current state or history in memory since it is usually passed as parameter of their functions. By default, they only become stateful whenever they are used as environments (e.g. via Initializable.reset() and Environment.step() functions).

# get_next_state_distribution UncertainTransitions

get_next_state_distribution(
  self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> Distribution[D.T_state]

Get the probability distribution of next state given a memory and action.

# Parameters

  • memory: The source memory (state or history) of the transition.
  • action: The action taken in the given memory (state or history) triggering the transition.

# Returns

The probability distribution of next state.

# get_transition_value UncertainTransitions

get_transition_value(
  self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]],
next_state: Optional[D.T_state] = None
) -> StrDict[Value[D.T_value]]

Get the value (reward or cost) of a transition.

The transition to consider is defined by the function parameters.

TIP

If this function never depends on the next_state parameter for its computation, it is recommended to indicate it by overriding UncertainTransitions._is_transition_value_dependent_on_next_state_() to return False. This information can then be exploited by solvers to avoid computing next state to evaluate a transition value (more efficient).

# Parameters

  • memory: The source memory (state or history) of the transition.
  • action: The action taken in the given memory (state or history) triggering the transition.
  • next_state: The next state in which the transition ends (if needed for the computation).

# Returns

The transition value (reward or cost).

# is_terminal UncertainTransitions

is_terminal(
  self,
state: D.T_state
) -> StrDict[D.T_predicate]

Indicate whether a state is terminal.

A terminal state is a state with no outgoing transition (except to itself with value 0).

# Parameters

  • state: The state to consider.

# Returns

True if the state is terminal (False otherwise).

# is_transition_value_dependent_on_next_state UncertainTransitions

is_transition_value_dependent_on_next_state(
  self
) -> bool

Indicate whether get_transition_value() requires the next_state parameter for its computation (cached).

By default, UncertainTransitions.is_transition_value_dependent_on_next_state() internally calls UncertainTransitions._is_transition_value_dependent_on_next_state_() the first time and automatically caches its value to make future calls more efficient (since the returned value is assumed to be constant).

# Returns

True if the transition value computation depends on next_state (False otherwise).

# sample Simulation

sample(
  self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]

Sample one transition of the simulator's dynamics.

By default, Simulation.sample() provides some boilerplate code and internally calls Simulation._sample() (which returns a transition outcome). The boilerplate code automatically samples an observation corresponding to the sampled next state.

TIP

Whenever an existing simulator needs to be wrapped instead of implemented fully in scikit-decide (e.g. a simulator), it is recommended to overwrite Simulation.sample() to call the external simulator and not use the Simulation._sample() helper function.

# Parameters

  • memory: The source memory (state or history) of the transition.
  • action: The action taken in the given memory (state or history) triggering the transition.

# Returns

The environment outcome of the sampled transition.

# set_memory Simulation

set_memory(
  self,
memory: Memory[D.T_state]
) -> None

Set internal memory attribute _memory to given one.

This can be useful to set a specific "starting point" before doing a rollout with successive Environment.step() calls.

# Parameters

  • memory: The memory to set internally.

# Example

# Set simulation_domain memory to my_state (assuming Markovian domain)
simulation_domain.set_memory(my_state)

# Start a 100-steps rollout from here (applying my_action at every step)
for _ in range(100):
    simulation_domain.step(my_action)

# step Environment

step(
  self,
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]

Run one step of the environment's dynamics.

By default, Environment.step() provides some boilerplate code and internally calls Environment._step() (which returns a transition outcome). The boilerplate code automatically stores next state into the _memory attribute and samples a corresponding observation.

TIP

Whenever an existing environment needs to be wrapped instead of implemented fully in scikit-decide (e.g. compiled ATARI games), it is recommended to overwrite Environment.step() to call the external environment and not use the Environment._step() helper function.

WARNING

Before calling Environment.step() the first time or when the end of an episode is reached, Initializable.reset() must be called to reset the environment's state.

# Parameters

  • action: The action taken in the current memory (state or history) triggering the transition.

# Returns

The environment outcome of this step.

# _get_next_state_distribution UncertainTransitions

_get_next_state_distribution(
  self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> Distribution[D.T_state]

Get the probability distribution of next state given a memory and action.

# Parameters

  • memory: The source memory (state or history) of the transition.
  • action: The action taken in the given memory (state or history) triggering the transition.

# Returns

The probability distribution of next state.

# _get_transition_value UncertainTransitions

_get_transition_value(
  self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]],
next_state: Optional[D.T_state] = None
) -> StrDict[Value[D.T_value]]

Get the value (reward or cost) of a transition.

The transition to consider is defined by the function parameters.

TIP

If this function never depends on the next_state parameter for its computation, it is recommended to indicate it by overriding UncertainTransitions._is_transition_value_dependent_on_next_state_() to return False. This information can then be exploited by solvers to avoid computing next state to evaluate a transition value (more efficient).

# Parameters

  • memory: The source memory (state or history) of the transition.
  • action: The action taken in the given memory (state or history) triggering the transition.
  • next_state: The next state in which the transition ends (if needed for the computation).

# Returns

The transition value (reward or cost).

# _is_terminal UncertainTransitions

_is_terminal(
  self,
state: D.T_state
) -> StrDict[D.T_predicate]

Indicate whether a state is terminal.

A terminal state is a state with no outgoing transition (except to itself with value 0).

# Parameters

  • state: The state to consider.

# Returns

True if the state is terminal (False otherwise).

# _is_transition_value_dependent_on_next_state UncertainTransitions

_is_transition_value_dependent_on_next_state(
  self
) -> bool

Indicate whether _get_transition_value() requires the next_state parameter for its computation (cached).

By default, UncertainTransitions._is_transition_value_dependent_on_next_state() internally calls UncertainTransitions._is_transition_value_dependent_on_next_state_() the first time and automatically caches its value to make future calls more efficient (since the returned value is assumed to be constant).

# Returns

True if the transition value computation depends on next_state (False otherwise).

# _is_transition_value_dependent_on_next_state_ UncertainTransitions

_is_transition_value_dependent_on_next_state_(
  self
) -> bool

Indicate whether _get_transition_value() requires the next_state parameter for its computation.

This is a helper function called by default from UncertainTransitions._is_transition_value_dependent_on_next_state(), the difference being that the result is not cached here.

TIP

The underscore at the end of this function's name is a convention to remind that its result should be constant.

# Returns

True if the transition value computation depends on next_state (False otherwise).

# _sample Simulation

_sample(
  self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]

Sample one transition of the simulator's dynamics.

By default, Simulation._sample() provides some boilerplate code and internally calls Simulation._state_sample() (which returns a transition outcome). The boilerplate code automatically samples an observation corresponding to the sampled next state.

TIP

Whenever an existing simulator needs to be wrapped instead of implemented fully in scikit-decide (e.g. a simulator), it is recommended to overwrite Simulation._sample() to call the external simulator and not use the Simulation._state_sample() helper function.

# Parameters

  • memory: The source memory (state or history) of the transition.
  • action: The action taken in the given memory (state or history) triggering the transition.

# Returns

The environment outcome of the sampled transition.

# _set_memory Simulation

_set_memory(
  self,
memory: Memory[D.T_state]
) -> None

Set internal memory attribute _memory to given one.

This can be useful to set a specific "starting point" before doing a rollout with successive Environment._step() calls.

# Parameters

  • memory: The memory to set internally.

# Example

# Set simulation_domain memory to my_state (assuming Markovian domain)
simulation_domain._set_memory(my_state)

# Start a 100-steps rollout from here (applying my_action at every step)
for _ in range(100):
    simulation_domain._step(my_action)

# _state_sample Simulation

_state_sample(
  self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> TransitionOutcome[D.T_state, StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]

Compute one sample of the transition's dynamics.

This is a helper function called by default from Simulation._sample(). It focuses on the state level, as opposed to the observation one for the latter.

# Parameters

  • memory: The source memory (state or history) of the transition.
  • action: The action taken in the given memory (state or history) triggering the transition.

# Returns

The transition outcome of the sampled transition.

# _state_step Environment

_state_step(
  self,
action: StrDict[list[D.T_event]]
) -> TransitionOutcome[D.T_state, StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]

Compute one step of the transition's dynamics.

This is a helper function called by default from Environment._step(). It focuses on the state level, as opposed to the observation one for the latter.

# Parameters

  • action: The action taken in the current memory (state or history) triggering the transition.

# Returns

The transition outcome of this step.

# _step Environment

_step(
  self,
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]

Run one step of the environment's dynamics.

By default, Environment._step() provides some boilerplate code and internally calls Environment._state_step() (which returns a transition outcome). The boilerplate code automatically stores next state into the _memory attribute and samples a corresponding observation.

TIP

Whenever an existing environment needs to be wrapped instead of implemented fully in scikit-decide (e.g. compiled ATARI games), it is recommended to overwrite Environment._step() to call the external environment and not use the Environment._state_step() helper function.

WARNING

Before calling Environment._step() the first time or when the end of an episode is reached, Initializable._reset() must be called to reset the environment's state.

# Parameters

  • action: The action taken in the current memory (state or history) triggering the transition.

# Returns

The environment outcome of this step.

# EnumerableTransitions

A domain must inherit this class if its dynamics is uncertain (with enumerable transitions) and provided as a white-box model.

Compared to pure uncertain transition domains, enumerable transition ones guarantee that all probability distributions of next state are discrete.

TIP

Enumerable transition domains are typically stateless: they do not need to store the current state or history in memory since it is usually passed as parameter of their functions. By default, they only become stateful whenever they are used as environments (e.g. via Initializable.reset() and Environment.step() functions).

# get_next_state_distribution UncertainTransitions

get_next_state_distribution(
  self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> DiscreteDistribution[D.T_state]

Get the discrete probability distribution of next state given a memory and action.

TIP

In the Markovian case (memory only holds last state ), given an action , this function can be mathematically represented by , where is the next state random variable.

# Parameters

  • memory: The source memory (state or history) of the transition.
  • action: The action taken in the given memory (state or history) triggering the transition.

# Returns

The discrete probability distribution of next state.

# get_transition_value UncertainTransitions

get_transition_value(
  self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]],
next_state: Optional[D.T_state] = None
) -> StrDict[Value[D.T_value]]

Get the value (reward or cost) of a transition.

The transition to consider is defined by the function parameters.

TIP

If this function never depends on the next_state parameter for its computation, it is recommended to indicate it by overriding UncertainTransitions._is_transition_value_dependent_on_next_state_() to return False. This information can then be exploited by solvers to avoid computing next state to evaluate a transition value (more efficient).

# Parameters

  • memory: The source memory (state or history) of the transition.
  • action: The action taken in the given memory (state or history) triggering the transition.
  • next_state: The next state in which the transition ends (if needed for the computation).

# Returns

The transition value (reward or cost).

# is_terminal UncertainTransitions

is_terminal(
  self,
state: D.T_state
) -> StrDict[D.T_predicate]

Indicate whether a state is terminal.

A terminal state is a state with no outgoing transition (except to itself with value 0).

# Parameters

  • state: The state to consider.

# Returns

True if the state is terminal (False otherwise).

# is_transition_value_dependent_on_next_state UncertainTransitions

is_transition_value_dependent_on_next_state(
  self
) -> bool

Indicate whether get_transition_value() requires the next_state parameter for its computation (cached).

By default, UncertainTransitions.is_transition_value_dependent_on_next_state() internally calls UncertainTransitions._is_transition_value_dependent_on_next_state_() the first time and automatically caches its value to make future calls more efficient (since the returned value is assumed to be constant).

# Returns

True if the transition value computation depends on next_state (False otherwise).

# sample Simulation

sample(
  self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]

Sample one transition of the simulator's dynamics.

By default, Simulation.sample() provides some boilerplate code and internally calls Simulation._sample() (which returns a transition outcome). The boilerplate code automatically samples an observation corresponding to the sampled next state.

TIP

Whenever an existing simulator needs to be wrapped instead of implemented fully in scikit-decide (e.g. a simulator), it is recommended to overwrite Simulation.sample() to call the external simulator and not use the Simulation._sample() helper function.

# Parameters

  • memory: The source memory (state or history) of the transition.
  • action: The action taken in the given memory (state or history) triggering the transition.

# Returns

The environment outcome of the sampled transition.

# set_memory Simulation

set_memory(
  self,
memory: Memory[D.T_state]
) -> None

Set internal memory attribute _memory to given one.

This can be useful to set a specific "starting point" before doing a rollout with successive Environment.step() calls.

# Parameters

  • memory: The memory to set internally.

# Example

# Set simulation_domain memory to my_state (assuming Markovian domain)
simulation_domain.set_memory(my_state)

# Start a 100-steps rollout from here (applying my_action at every step)
for _ in range(100):
    simulation_domain.step(my_action)

# step Environment

step(
  self,
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]

Run one step of the environment's dynamics.

By default, Environment.step() provides some boilerplate code and internally calls Environment._step() (which returns a transition outcome). The boilerplate code automatically stores next state into the _memory attribute and samples a corresponding observation.

TIP

Whenever an existing environment needs to be wrapped instead of implemented fully in scikit-decide (e.g. compiled ATARI games), it is recommended to overwrite Environment.step() to call the external environment and not use the Environment._step() helper function.

WARNING

Before calling Environment.step() the first time or when the end of an episode is reached, Initializable.reset() must be called to reset the environment's state.

# Parameters

  • action: The action taken in the current memory (state or history) triggering the transition.

# Returns

The environment outcome of this step.

# _get_next_state_distribution UncertainTransitions

_get_next_state_distribution(
  self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> DiscreteDistribution[D.T_state]

Get the discrete probability distribution of next state given a memory and action.

TIP

In the Markovian case (memory only holds last state ), given an action , this function can be mathematically represented by , where is the next state random variable.

# Parameters

  • memory: The source memory (state or history) of the transition.
  • action: The action taken in the given memory (state or history) triggering the transition.

# Returns

The discrete probability distribution of next state.

# _get_transition_value UncertainTransitions

_get_transition_value(
  self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]],
next_state: Optional[D.T_state] = None
) -> StrDict[Value[D.T_value]]

Get the value (reward or cost) of a transition.

The transition to consider is defined by the function parameters.

TIP

If this function never depends on the next_state parameter for its computation, it is recommended to indicate it by overriding UncertainTransitions._is_transition_value_dependent_on_next_state_() to return False. This information can then be exploited by solvers to avoid computing next state to evaluate a transition value (more efficient).

# Parameters

  • memory: The source memory (state or history) of the transition.
  • action: The action taken in the given memory (state or history) triggering the transition.
  • next_state: The next state in which the transition ends (if needed for the computation).

# Returns

The transition value (reward or cost).

# _is_terminal UncertainTransitions

_is_terminal(
  self,
state: D.T_state
) -> StrDict[D.T_predicate]

Indicate whether a state is terminal.

A terminal state is a state with no outgoing transition (except to itself with value 0).

# Parameters

  • state: The state to consider.

# Returns

True if the state is terminal (False otherwise).

# _is_transition_value_dependent_on_next_state UncertainTransitions

_is_transition_value_dependent_on_next_state(
  self
) -> bool

Indicate whether _get_transition_value() requires the next_state parameter for its computation (cached).

By default, UncertainTransitions._is_transition_value_dependent_on_next_state() internally calls UncertainTransitions._is_transition_value_dependent_on_next_state_() the first time and automatically caches its value to make future calls more efficient (since the returned value is assumed to be constant).

# Returns

True if the transition value computation depends on next_state (False otherwise).

# _is_transition_value_dependent_on_next_state_ UncertainTransitions

_is_transition_value_dependent_on_next_state_(
  self
) -> bool

Indicate whether _get_transition_value() requires the next_state parameter for its computation.

This is a helper function called by default from UncertainTransitions._is_transition_value_dependent_on_next_state(), the difference being that the result is not cached here.

TIP

The underscore at the end of this function's name is a convention to remind that its result should be constant.

# Returns

True if the transition value computation depends on next_state (False otherwise).

# _sample Simulation

_sample(
  self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]

Sample one transition of the simulator's dynamics.

By default, Simulation._sample() provides some boilerplate code and internally calls Simulation._state_sample() (which returns a transition outcome). The boilerplate code automatically samples an observation corresponding to the sampled next state.

TIP

Whenever an existing simulator needs to be wrapped instead of implemented fully in scikit-decide (e.g. a simulator), it is recommended to overwrite Simulation._sample() to call the external simulator and not use the Simulation._state_sample() helper function.

# Parameters

  • memory: The source memory (state or history) of the transition.
  • action: The action taken in the given memory (state or history) triggering the transition.

# Returns

The environment outcome of the sampled transition.

# _set_memory Simulation

_set_memory(
  self,
memory: Memory[D.T_state]
) -> None

Set internal memory attribute _memory to given one.

This can be useful to set a specific "starting point" before doing a rollout with successive Environment._step() calls.

# Parameters

  • memory: The memory to set internally.

# Example

# Set simulation_domain memory to my_state (assuming Markovian domain)
simulation_domain._set_memory(my_state)

# Start a 100-steps rollout from here (applying my_action at every step)
for _ in range(100):
    simulation_domain._step(my_action)

# _state_sample Simulation

_state_sample(
  self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> TransitionOutcome[D.T_state, StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]

Compute one sample of the transition's dynamics.

This is a helper function called by default from Simulation._sample(). It focuses on the state level, as opposed to the observation one for the latter.

# Parameters

  • memory: The source memory (state or history) of the transition.
  • action: The action taken in the given memory (state or history) triggering the transition.

# Returns

The transition outcome of the sampled transition.

# _state_step Environment

_state_step(
  self,
action: StrDict[list[D.T_event]]
) -> TransitionOutcome[D.T_state, StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]

Compute one step of the transition's dynamics.

This is a helper function called by default from Environment._step(). It focuses on the state level, as opposed to the observation one for the latter.

# Parameters

  • action: The action taken in the current memory (state or history) triggering the transition.

# Returns

The transition outcome of this step.

# _step Environment

_step(
  self,
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]

Run one step of the environment's dynamics.

By default, Environment._step() provides some boilerplate code and internally calls Environment._state_step() (which returns a transition outcome). The boilerplate code automatically stores next state into the _memory attribute and samples a corresponding observation.

TIP

Whenever an existing environment needs to be wrapped instead of implemented fully in scikit-decide (e.g. compiled ATARI games), it is recommended to overwrite Environment._step() to call the external environment and not use the Environment._state_step() helper function.

WARNING

Before calling Environment._step() the first time or when the end of an episode is reached, Initializable._reset() must be called to reset the environment's state.

# Parameters

  • action: The action taken in the current memory (state or history) triggering the transition.

# Returns

The environment outcome of this step.

# DeterministicTransitions

A domain must inherit this class if its dynamics is deterministic and provided as a white-box model.

Compared to pure enumerable transition domains, deterministic transition ones guarantee that there is only one next state for a given source memory (state or history) and action.

TIP

Deterministic transition domains are typically stateless: they do not need to store the current state or history in memory since it is usually passed as parameter of their functions. By default, they only become stateful whenever they are used as environments (e.g. via Initializable.reset() and Environment.step() functions).

# get_next_state DeterministicTransitions

get_next_state(
  self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> D.T_state

Get the next state given a memory and action.

# Parameters

  • memory: The source memory (state or history) of the transition.
  • action: The action taken in the given memory (state or history) triggering the transition.

# Returns

The deterministic next state.

# get_next_state_distribution UncertainTransitions

get_next_state_distribution(
  self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> DiscreteDistribution[D.T_state]

Get the discrete probability distribution of next state given a memory and action.

TIP

In the Markovian case (memory only holds last state ), given an action , this function can be mathematically represented by , where is the next state random variable.

# Parameters

  • memory: The source memory (state or history) of the transition.
  • action: The action taken in the given memory (state or history) triggering the transition.

# Returns

The discrete probability distribution of next state.

# get_transition_value UncertainTransitions

get_transition_value(
  self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]],
next_state: Optional[D.T_state] = None
) -> StrDict[Value[D.T_value]]

Get the value (reward or cost) of a transition.

The transition to consider is defined by the function parameters.

TIP

If this function never depends on the next_state parameter for its computation, it is recommended to indicate it by overriding UncertainTransitions._is_transition_value_dependent_on_next_state_() to return False. This information can then be exploited by solvers to avoid computing next state to evaluate a transition value (more efficient).

# Parameters

  • memory: The source memory (state or history) of the transition.
  • action: The action taken in the given memory (state or history) triggering the transition.
  • next_state: The next state in which the transition ends (if needed for the computation).

# Returns

The transition value (reward or cost).

# is_terminal UncertainTransitions

is_terminal(
  self,
state: D.T_state
) -> StrDict[D.T_predicate]

Indicate whether a state is terminal.

A terminal state is a state with no outgoing transition (except to itself with value 0).

# Parameters

  • state: The state to consider.

# Returns

True if the state is terminal (False otherwise).

# is_transition_value_dependent_on_next_state UncertainTransitions

is_transition_value_dependent_on_next_state(
  self
) -> bool

Indicate whether get_transition_value() requires the next_state parameter for its computation (cached).

By default, UncertainTransitions.is_transition_value_dependent_on_next_state() internally calls UncertainTransitions._is_transition_value_dependent_on_next_state_() the first time and automatically caches its value to make future calls more efficient (since the returned value is assumed to be constant).

# Returns

True if the transition value computation depends on next_state (False otherwise).

# sample Simulation

sample(
  self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]

Sample one transition of the simulator's dynamics.

By default, Simulation.sample() provides some boilerplate code and internally calls Simulation._sample() (which returns a transition outcome). The boilerplate code automatically samples an observation corresponding to the sampled next state.

TIP

Whenever an existing simulator needs to be wrapped instead of implemented fully in scikit-decide (e.g. a simulator), it is recommended to overwrite Simulation.sample() to call the external simulator and not use the Simulation._sample() helper function.

# Parameters

  • memory: The source memory (state or history) of the transition.
  • action: The action taken in the given memory (state or history) triggering the transition.

# Returns

The environment outcome of the sampled transition.

# set_memory Simulation

set_memory(
  self,
memory: Memory[D.T_state]
) -> None

Set internal memory attribute _memory to given one.

This can be useful to set a specific "starting point" before doing a rollout with successive Environment.step() calls.

# Parameters

  • memory: The memory to set internally.

# Example

# Set simulation_domain memory to my_state (assuming Markovian domain)
simulation_domain.set_memory(my_state)

# Start a 100-steps rollout from here (applying my_action at every step)
for _ in range(100):
    simulation_domain.step(my_action)

# step Environment

step(
  self,
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]

Run one step of the environment's dynamics.

By default, Environment.step() provides some boilerplate code and internally calls Environment._step() (which returns a transition outcome). The boilerplate code automatically stores next state into the _memory attribute and samples a corresponding observation.

TIP

Whenever an existing environment needs to be wrapped instead of implemented fully in scikit-decide (e.g. compiled ATARI games), it is recommended to overwrite Environment.step() to call the external environment and not use the Environment._step() helper function.

WARNING

Before calling Environment.step() the first time or when the end of an episode is reached, Initializable.reset() must be called to reset the environment's state.

# Parameters

  • action: The action taken in the current memory (state or history) triggering the transition.

# Returns

The environment outcome of this step.

# _get_next_state DeterministicTransitions

_get_next_state(
  self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> D.T_state

Get the next state given a memory and action.

# Parameters

  • memory: The source memory (state or history) of the transition.
  • action: The action taken in the given memory (state or history) triggering the transition.

# Returns

The deterministic next state.

# _get_next_state_distribution UncertainTransitions

_get_next_state_distribution(
  self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> SingleValueDistribution[D.T_state]

Get the discrete probability distribution of next state given a memory and action.

TIP

In the Markovian case (memory only holds last state ), given an action , this function can be mathematically represented by , where is the next state random variable.

# Parameters

  • memory: The source memory (state or history) of the transition.
  • action: The action taken in the given memory (state or history) triggering the transition.

# Returns

The discrete probability distribution of next state.

# _get_transition_value UncertainTransitions

_get_transition_value(
  self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]],
next_state: Optional[D.T_state] = None
) -> StrDict[Value[D.T_value]]

Get the value (reward or cost) of a transition.

The transition to consider is defined by the function parameters.

TIP

If this function never depends on the next_state parameter for its computation, it is recommended to indicate it by overriding UncertainTransitions._is_transition_value_dependent_on_next_state_() to return False. This information can then be exploited by solvers to avoid computing next state to evaluate a transition value (more efficient).

# Parameters

  • memory: The source memory (state or history) of the transition.
  • action: The action taken in the given memory (state or history) triggering the transition.
  • next_state: The next state in which the transition ends (if needed for the computation).

# Returns

The transition value (reward or cost).

# _is_terminal UncertainTransitions

_is_terminal(
  self,
state: D.T_state
) -> StrDict[D.T_predicate]

Indicate whether a state is terminal.

A terminal state is a state with no outgoing transition (except to itself with value 0).

# Parameters

  • state: The state to consider.

# Returns

True if the state is terminal (False otherwise).

# _is_transition_value_dependent_on_next_state UncertainTransitions

_is_transition_value_dependent_on_next_state(
  self
) -> bool

Indicate whether _get_transition_value() requires the next_state parameter for its computation (cached).

By default, UncertainTransitions._is_transition_value_dependent_on_next_state() internally calls UncertainTransitions._is_transition_value_dependent_on_next_state_() the first time and automatically caches its value to make future calls more efficient (since the returned value is assumed to be constant).

# Returns

True if the transition value computation depends on next_state (False otherwise).

# _is_transition_value_dependent_on_next_state_ UncertainTransitions

_is_transition_value_dependent_on_next_state_(
  self
) -> bool

Indicate whether _get_transition_value() requires the next_state parameter for its computation.

This is a helper function called by default from UncertainTransitions._is_transition_value_dependent_on_next_state(), the difference being that the result is not cached here.

TIP

The underscore at the end of this function's name is a convention to remind that its result should be constant.

# Returns

True if the transition value computation depends on next_state (False otherwise).

# _sample Simulation

_sample(
  self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]

Sample one transition of the simulator's dynamics.

By default, Simulation._sample() provides some boilerplate code and internally calls Simulation._state_sample() (which returns a transition outcome). The boilerplate code automatically samples an observation corresponding to the sampled next state.

TIP

Whenever an existing simulator needs to be wrapped instead of implemented fully in scikit-decide (e.g. a simulator), it is recommended to overwrite Simulation._sample() to call the external simulator and not use the Simulation._state_sample() helper function.

# Parameters

  • memory: The source memory (state or history) of the transition.
  • action: The action taken in the given memory (state or history) triggering the transition.

# Returns

The environment outcome of the sampled transition.

# _set_memory Simulation

_set_memory(
  self,
memory: Memory[D.T_state]
) -> None

Set internal memory attribute _memory to given one.

This can be useful to set a specific "starting point" before doing a rollout with successive Environment._step() calls.

# Parameters

  • memory: The memory to set internally.

# Example

# Set simulation_domain memory to my_state (assuming Markovian domain)
simulation_domain._set_memory(my_state)

# Start a 100-steps rollout from here (applying my_action at every step)
for _ in range(100):
    simulation_domain._step(my_action)

# _state_sample Simulation

_state_sample(
  self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> TransitionOutcome[D.T_state, StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]

Compute one sample of the transition's dynamics.

This is a helper function called by default from Simulation._sample(). It focuses on the state level, as opposed to the observation one for the latter.

# Parameters

  • memory: The source memory (state or history) of the transition.
  • action: The action taken in the given memory (state or history) triggering the transition.

# Returns

The transition outcome of the sampled transition.

# _state_step Environment

_state_step(
  self,
action: StrDict[list[D.T_event]]
) -> TransitionOutcome[D.T_state, StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]

Compute one step of the transition's dynamics.

This is a helper function called by default from Environment._step(). It focuses on the state level, as opposed to the observation one for the latter.

# Parameters

  • action: The action taken in the current memory (state or history) triggering the transition.

# Returns

The transition outcome of this step.

# _step Environment

_step(
  self,
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]

Run one step of the environment's dynamics.

By default, Environment._step() provides some boilerplate code and internally calls Environment._state_step() (which returns a transition outcome). The boilerplate code automatically stores next state into the _memory attribute and samples a corresponding observation.

TIP

Whenever an existing environment needs to be wrapped instead of implemented fully in scikit-decide (e.g. compiled ATARI games), it is recommended to overwrite Environment._step() to call the external environment and not use the Environment._state_step() helper function.

WARNING

Before calling Environment._step() the first time or when the end of an episode is reached, Initializable._reset() must be called to reset the environment's state.

# Parameters

  • action: The action taken in the current memory (state or history) triggering the transition.

# Returns

The environment outcome of this step.