# builders.domain.dynamics
Domain specification
# Environment
A domain must inherit this class if agents interact with it like a black-box environment.
Black-box environment examples include: the real world, compiled ATARI games, etc.
TIP
Environment domains are typically stateful: they must keep the current state or history in their memory to
compute next steps (automatically done by default in the _memory
attribute).
# step Environment
step(
self,
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Run one step of the environment's dynamics.
By default, Environment.step()
provides some boilerplate code and internally calls Environment._step()
(which
returns a transition outcome). The boilerplate code automatically stores next state into the _memory
attribute
and samples a corresponding observation.
TIP
Whenever an existing environment needs to be wrapped instead of implemented fully in scikit-decide (e.g. compiled
ATARI games), it is recommended to overwrite Environment.step()
to call the external environment and not
use the Environment._step()
helper function.
WARNING
Before calling Environment.step()
the first time or when the end of an episode is
reached, Initializable.reset()
must be called to reset the environment's state.
# Parameters
- action: The action taken in the current memory (state or history) triggering the transition.
# Returns
The environment outcome of this step.
# _state_step Environment
_state_step(
self,
action: StrDict[list[D.T_event]]
) -> TransitionOutcome[D.T_state, StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Compute one step of the transition's dynamics.
This is a helper function called by default from Environment._step()
. It focuses on the state level, as opposed
to the observation one for the latter.
# Parameters
- action: The action taken in the current memory (state or history) triggering the transition.
# Returns
The transition outcome of this step.
# _step Environment
_step(
self,
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Run one step of the environment's dynamics.
By default, Environment._step()
provides some boilerplate code and internally
calls Environment._state_step()
(which returns a transition outcome). The boilerplate code automatically stores
next state into the _memory
attribute and samples a corresponding observation.
TIP
Whenever an existing environment needs to be wrapped instead of implemented fully in scikit-decide (e.g. compiled
ATARI games), it is recommended to overwrite Environment._step()
to call the external environment and not
use the Environment._state_step()
helper function.
WARNING
Before calling Environment._step()
the first time or when the end of an episode is
reached, Initializable._reset()
must be called to reset the environment's state.
# Parameters
- action: The action taken in the current memory (state or history) triggering the transition.
# Returns
The environment outcome of this step.
# Simulation
A domain must inherit this class if agents interact with it like a simulation.
Compared to pure environment domains, simulation ones have the additional ability to sample transitions from any given state.
TIP
Simulation domains are typically stateless: they do not need to store the current state or history in memory
since it is usually passed as parameter of their functions. By default, they only become stateful whenever they
are used as environments (e.g. via Initializable.reset()
and Environment.step()
functions).
# sample Simulation
sample(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Sample one transition of the simulator's dynamics.
By default, Simulation.sample()
provides some boilerplate code and internally calls Simulation._sample()
(which returns a transition outcome). The boilerplate code automatically samples an observation corresponding to
the sampled next state.
TIP
Whenever an existing simulator needs to be wrapped instead of implemented fully in scikit-decide (e.g. a
simulator), it is recommended to overwrite Simulation.sample()
to call the external simulator and not use
the Simulation._sample()
helper function.
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
# Returns
The environment outcome of the sampled transition.
# set_memory Simulation
set_memory(
self,
memory: Memory[D.T_state]
) -> None
Set internal memory attribute _memory
to given one.
This can be useful to set a specific "starting point" before doing a rollout with successive Environment.step()
calls.
# Parameters
- memory: The memory to set internally.
# Example
# Set simulation_domain memory to my_state (assuming Markovian domain)
simulation_domain.set_memory(my_state)
# Start a 100-steps rollout from here (applying my_action at every step)
for _ in range(100):
simulation_domain.step(my_action)
# step Environment
step(
self,
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Run one step of the environment's dynamics.
By default, Environment.step()
provides some boilerplate code and internally calls Environment._step()
(which
returns a transition outcome). The boilerplate code automatically stores next state into the _memory
attribute
and samples a corresponding observation.
TIP
Whenever an existing environment needs to be wrapped instead of implemented fully in scikit-decide (e.g. compiled
ATARI games), it is recommended to overwrite Environment.step()
to call the external environment and not
use the Environment._step()
helper function.
WARNING
Before calling Environment.step()
the first time or when the end of an episode is
reached, Initializable.reset()
must be called to reset the environment's state.
# Parameters
- action: The action taken in the current memory (state or history) triggering the transition.
# Returns
The environment outcome of this step.
# _sample Simulation
_sample(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Sample one transition of the simulator's dynamics.
By default, Simulation._sample()
provides some boilerplate code and internally
calls Simulation._state_sample()
(which returns a transition outcome). The boilerplate code automatically
samples an observation corresponding to the sampled next state.
TIP
Whenever an existing simulator needs to be wrapped instead of implemented fully in scikit-decide (e.g. a
simulator), it is recommended to overwrite Simulation._sample()
to call the external simulator and not use
the Simulation._state_sample()
helper function.
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
# Returns
The environment outcome of the sampled transition.
# _set_memory Simulation
_set_memory(
self,
memory: Memory[D.T_state]
) -> None
Set internal memory attribute _memory
to given one.
This can be useful to set a specific "starting point" before doing a rollout with
successive Environment._step()
calls.
# Parameters
- memory: The memory to set internally.
# Example
# Set simulation_domain memory to my_state (assuming Markovian domain)
simulation_domain._set_memory(my_state)
# Start a 100-steps rollout from here (applying my_action at every step)
for _ in range(100):
simulation_domain._step(my_action)
# _state_sample Simulation
_state_sample(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> TransitionOutcome[D.T_state, StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Compute one sample of the transition's dynamics.
This is a helper function called by default from Simulation._sample()
. It focuses on the state level, as
opposed to the observation one for the latter.
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
# Returns
The transition outcome of the sampled transition.
# _state_step Environment
_state_step(
self,
action: StrDict[list[D.T_event]]
) -> TransitionOutcome[D.T_state, StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Compute one step of the transition's dynamics.
This is a helper function called by default from Environment._step()
. It focuses on the state level, as opposed
to the observation one for the latter.
# Parameters
- action: The action taken in the current memory (state or history) triggering the transition.
# Returns
The transition outcome of this step.
# _step Environment
_step(
self,
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Run one step of the environment's dynamics.
By default, Environment._step()
provides some boilerplate code and internally
calls Environment._state_step()
(which returns a transition outcome). The boilerplate code automatically stores
next state into the _memory
attribute and samples a corresponding observation.
TIP
Whenever an existing environment needs to be wrapped instead of implemented fully in scikit-decide (e.g. compiled
ATARI games), it is recommended to overwrite Environment._step()
to call the external environment and not
use the Environment._state_step()
helper function.
WARNING
Before calling Environment._step()
the first time or when the end of an episode is
reached, Initializable._reset()
must be called to reset the environment's state.
# Parameters
- action: The action taken in the current memory (state or history) triggering the transition.
# Returns
The environment outcome of this step.
# UncertainTransitions
A domain must inherit this class if its dynamics is uncertain and provided as a white-box model.
Compared to pure simulation domains, uncertain transition ones provide in addition the full probability distribution of next states given a memory and action.
TIP
Uncertain transition domains are typically stateless: they do not need to store the current state or history in
memory since it is usually passed as parameter of their functions. By default, they only become stateful
whenever they are used as environments (e.g. via Initializable.reset()
and Environment.step()
functions).
# get_next_state_distribution UncertainTransitions
get_next_state_distribution(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> Distribution[D.T_state]
Get the probability distribution of next state given a memory and action.
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
# Returns
The probability distribution of next state.
# get_transition_value UncertainTransitions
get_transition_value(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]],
next_state: Optional[D.T_state] = None
) -> StrDict[Value[D.T_value]]
Get the value (reward or cost) of a transition.
The transition to consider is defined by the function parameters.
TIP
If this function never depends on the next_state parameter for its computation, it is recommended to
indicate it by overriding UncertainTransitions._is_transition_value_dependent_on_next_state_()
to return
False. This information can then be exploited by solvers to avoid computing next state to evaluate a
transition value (more efficient).
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
- next_state: The next state in which the transition ends (if needed for the computation).
# Returns
The transition value (reward or cost).
# is_terminal UncertainTransitions
is_terminal(
self,
state: D.T_state
) -> StrDict[D.T_predicate]
Indicate whether a state is terminal.
A terminal state is a state with no outgoing transition (except to itself with value 0).
# Parameters
- state: The state to consider.
# Returns
True if the state is terminal (False otherwise).
# is_transition_value_dependent_on_next_state UncertainTransitions
is_transition_value_dependent_on_next_state(
self
) -> bool
Indicate whether get_transition_value() requires the next_state parameter for its computation (cached).
By default, UncertainTransitions.is_transition_value_dependent_on_next_state()
internally
calls UncertainTransitions._is_transition_value_dependent_on_next_state_()
the first time and automatically
caches its value to make future calls more efficient (since the returned value is assumed to be constant).
# Returns
True if the transition value computation depends on next_state (False otherwise).
# sample Simulation
sample(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Sample one transition of the simulator's dynamics.
By default, Simulation.sample()
provides some boilerplate code and internally calls Simulation._sample()
(which returns a transition outcome). The boilerplate code automatically samples an observation corresponding to
the sampled next state.
TIP
Whenever an existing simulator needs to be wrapped instead of implemented fully in scikit-decide (e.g. a
simulator), it is recommended to overwrite Simulation.sample()
to call the external simulator and not use
the Simulation._sample()
helper function.
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
# Returns
The environment outcome of the sampled transition.
# set_memory Simulation
set_memory(
self,
memory: Memory[D.T_state]
) -> None
Set internal memory attribute _memory
to given one.
This can be useful to set a specific "starting point" before doing a rollout with successive Environment.step()
calls.
# Parameters
- memory: The memory to set internally.
# Example
# Set simulation_domain memory to my_state (assuming Markovian domain)
simulation_domain.set_memory(my_state)
# Start a 100-steps rollout from here (applying my_action at every step)
for _ in range(100):
simulation_domain.step(my_action)
# step Environment
step(
self,
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Run one step of the environment's dynamics.
By default, Environment.step()
provides some boilerplate code and internally calls Environment._step()
(which
returns a transition outcome). The boilerplate code automatically stores next state into the _memory
attribute
and samples a corresponding observation.
TIP
Whenever an existing environment needs to be wrapped instead of implemented fully in scikit-decide (e.g. compiled
ATARI games), it is recommended to overwrite Environment.step()
to call the external environment and not
use the Environment._step()
helper function.
WARNING
Before calling Environment.step()
the first time or when the end of an episode is
reached, Initializable.reset()
must be called to reset the environment's state.
# Parameters
- action: The action taken in the current memory (state or history) triggering the transition.
# Returns
The environment outcome of this step.
# _get_next_state_distribution UncertainTransitions
_get_next_state_distribution(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> Distribution[D.T_state]
Get the probability distribution of next state given a memory and action.
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
# Returns
The probability distribution of next state.
# _get_transition_value UncertainTransitions
_get_transition_value(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]],
next_state: Optional[D.T_state] = None
) -> StrDict[Value[D.T_value]]
Get the value (reward or cost) of a transition.
The transition to consider is defined by the function parameters.
TIP
If this function never depends on the next_state parameter for its computation, it is recommended to
indicate it by overriding UncertainTransitions._is_transition_value_dependent_on_next_state_()
to return
False. This information can then be exploited by solvers to avoid computing next state to evaluate a
transition value (more efficient).
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
- next_state: The next state in which the transition ends (if needed for the computation).
# Returns
The transition value (reward or cost).
# _is_terminal UncertainTransitions
_is_terminal(
self,
state: D.T_state
) -> StrDict[D.T_predicate]
Indicate whether a state is terminal.
A terminal state is a state with no outgoing transition (except to itself with value 0).
# Parameters
- state: The state to consider.
# Returns
True if the state is terminal (False otherwise).
# _is_transition_value_dependent_on_next_state UncertainTransitions
_is_transition_value_dependent_on_next_state(
self
) -> bool
Indicate whether _get_transition_value() requires the next_state parameter for its computation (cached).
By default, UncertainTransitions._is_transition_value_dependent_on_next_state()
internally
calls UncertainTransitions._is_transition_value_dependent_on_next_state_()
the first time and automatically
caches its value to make future calls more efficient (since the returned value is assumed to be constant).
# Returns
True if the transition value computation depends on next_state (False otherwise).
# _is_transition_value_dependent_on_next_state_ UncertainTransitions
_is_transition_value_dependent_on_next_state_(
self
) -> bool
Indicate whether _get_transition_value() requires the next_state parameter for its computation.
This is a helper function called by default
from UncertainTransitions._is_transition_value_dependent_on_next_state()
, the difference being that the result
is not cached here.
TIP
The underscore at the end of this function's name is a convention to remind that its result should be constant.
# Returns
True if the transition value computation depends on next_state (False otherwise).
# _sample Simulation
_sample(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Sample one transition of the simulator's dynamics.
By default, Simulation._sample()
provides some boilerplate code and internally
calls Simulation._state_sample()
(which returns a transition outcome). The boilerplate code automatically
samples an observation corresponding to the sampled next state.
TIP
Whenever an existing simulator needs to be wrapped instead of implemented fully in scikit-decide (e.g. a
simulator), it is recommended to overwrite Simulation._sample()
to call the external simulator and not use
the Simulation._state_sample()
helper function.
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
# Returns
The environment outcome of the sampled transition.
# _set_memory Simulation
_set_memory(
self,
memory: Memory[D.T_state]
) -> None
Set internal memory attribute _memory
to given one.
This can be useful to set a specific "starting point" before doing a rollout with
successive Environment._step()
calls.
# Parameters
- memory: The memory to set internally.
# Example
# Set simulation_domain memory to my_state (assuming Markovian domain)
simulation_domain._set_memory(my_state)
# Start a 100-steps rollout from here (applying my_action at every step)
for _ in range(100):
simulation_domain._step(my_action)
# _state_sample Simulation
_state_sample(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> TransitionOutcome[D.T_state, StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Compute one sample of the transition's dynamics.
This is a helper function called by default from Simulation._sample()
. It focuses on the state level, as
opposed to the observation one for the latter.
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
# Returns
The transition outcome of the sampled transition.
# _state_step Environment
_state_step(
self,
action: StrDict[list[D.T_event]]
) -> TransitionOutcome[D.T_state, StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Compute one step of the transition's dynamics.
This is a helper function called by default from Environment._step()
. It focuses on the state level, as opposed
to the observation one for the latter.
# Parameters
- action: The action taken in the current memory (state or history) triggering the transition.
# Returns
The transition outcome of this step.
# _step Environment
_step(
self,
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Run one step of the environment's dynamics.
By default, Environment._step()
provides some boilerplate code and internally
calls Environment._state_step()
(which returns a transition outcome). The boilerplate code automatically stores
next state into the _memory
attribute and samples a corresponding observation.
TIP
Whenever an existing environment needs to be wrapped instead of implemented fully in scikit-decide (e.g. compiled
ATARI games), it is recommended to overwrite Environment._step()
to call the external environment and not
use the Environment._state_step()
helper function.
WARNING
Before calling Environment._step()
the first time or when the end of an episode is
reached, Initializable._reset()
must be called to reset the environment's state.
# Parameters
- action: The action taken in the current memory (state or history) triggering the transition.
# Returns
The environment outcome of this step.
# EnumerableTransitions
A domain must inherit this class if its dynamics is uncertain (with enumerable transitions) and provided as a white-box model.
Compared to pure uncertain transition domains, enumerable transition ones guarantee that all probability distributions of next state are discrete.
TIP
Enumerable transition domains are typically stateless: they do not need to store the current state or history in
memory since it is usually passed as parameter of their functions. By default, they only become stateful
whenever they are used as environments (e.g. via Initializable.reset()
and Environment.step()
functions).
# get_next_state_distribution UncertainTransitions
get_next_state_distribution(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> DiscreteDistribution[D.T_state]
Get the discrete probability distribution of next state given a memory and action.
TIP
In the Markovian case (memory only holds last state
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
# Returns
The discrete probability distribution of next state.
# get_transition_value UncertainTransitions
get_transition_value(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]],
next_state: Optional[D.T_state] = None
) -> StrDict[Value[D.T_value]]
Get the value (reward or cost) of a transition.
The transition to consider is defined by the function parameters.
TIP
If this function never depends on the next_state parameter for its computation, it is recommended to
indicate it by overriding UncertainTransitions._is_transition_value_dependent_on_next_state_()
to return
False. This information can then be exploited by solvers to avoid computing next state to evaluate a
transition value (more efficient).
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
- next_state: The next state in which the transition ends (if needed for the computation).
# Returns
The transition value (reward or cost).
# is_terminal UncertainTransitions
is_terminal(
self,
state: D.T_state
) -> StrDict[D.T_predicate]
Indicate whether a state is terminal.
A terminal state is a state with no outgoing transition (except to itself with value 0).
# Parameters
- state: The state to consider.
# Returns
True if the state is terminal (False otherwise).
# is_transition_value_dependent_on_next_state UncertainTransitions
is_transition_value_dependent_on_next_state(
self
) -> bool
Indicate whether get_transition_value() requires the next_state parameter for its computation (cached).
By default, UncertainTransitions.is_transition_value_dependent_on_next_state()
internally
calls UncertainTransitions._is_transition_value_dependent_on_next_state_()
the first time and automatically
caches its value to make future calls more efficient (since the returned value is assumed to be constant).
# Returns
True if the transition value computation depends on next_state (False otherwise).
# sample Simulation
sample(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Sample one transition of the simulator's dynamics.
By default, Simulation.sample()
provides some boilerplate code and internally calls Simulation._sample()
(which returns a transition outcome). The boilerplate code automatically samples an observation corresponding to
the sampled next state.
TIP
Whenever an existing simulator needs to be wrapped instead of implemented fully in scikit-decide (e.g. a
simulator), it is recommended to overwrite Simulation.sample()
to call the external simulator and not use
the Simulation._sample()
helper function.
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
# Returns
The environment outcome of the sampled transition.
# set_memory Simulation
set_memory(
self,
memory: Memory[D.T_state]
) -> None
Set internal memory attribute _memory
to given one.
This can be useful to set a specific "starting point" before doing a rollout with successive Environment.step()
calls.
# Parameters
- memory: The memory to set internally.
# Example
# Set simulation_domain memory to my_state (assuming Markovian domain)
simulation_domain.set_memory(my_state)
# Start a 100-steps rollout from here (applying my_action at every step)
for _ in range(100):
simulation_domain.step(my_action)
# step Environment
step(
self,
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Run one step of the environment's dynamics.
By default, Environment.step()
provides some boilerplate code and internally calls Environment._step()
(which
returns a transition outcome). The boilerplate code automatically stores next state into the _memory
attribute
and samples a corresponding observation.
TIP
Whenever an existing environment needs to be wrapped instead of implemented fully in scikit-decide (e.g. compiled
ATARI games), it is recommended to overwrite Environment.step()
to call the external environment and not
use the Environment._step()
helper function.
WARNING
Before calling Environment.step()
the first time or when the end of an episode is
reached, Initializable.reset()
must be called to reset the environment's state.
# Parameters
- action: The action taken in the current memory (state or history) triggering the transition.
# Returns
The environment outcome of this step.
# _get_next_state_distribution UncertainTransitions
_get_next_state_distribution(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> DiscreteDistribution[D.T_state]
Get the discrete probability distribution of next state given a memory and action.
TIP
In the Markovian case (memory only holds last state
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
# Returns
The discrete probability distribution of next state.
# _get_transition_value UncertainTransitions
_get_transition_value(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]],
next_state: Optional[D.T_state] = None
) -> StrDict[Value[D.T_value]]
Get the value (reward or cost) of a transition.
The transition to consider is defined by the function parameters.
TIP
If this function never depends on the next_state parameter for its computation, it is recommended to
indicate it by overriding UncertainTransitions._is_transition_value_dependent_on_next_state_()
to return
False. This information can then be exploited by solvers to avoid computing next state to evaluate a
transition value (more efficient).
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
- next_state: The next state in which the transition ends (if needed for the computation).
# Returns
The transition value (reward or cost).
# _is_terminal UncertainTransitions
_is_terminal(
self,
state: D.T_state
) -> StrDict[D.T_predicate]
Indicate whether a state is terminal.
A terminal state is a state with no outgoing transition (except to itself with value 0).
# Parameters
- state: The state to consider.
# Returns
True if the state is terminal (False otherwise).
# _is_transition_value_dependent_on_next_state UncertainTransitions
_is_transition_value_dependent_on_next_state(
self
) -> bool
Indicate whether _get_transition_value() requires the next_state parameter for its computation (cached).
By default, UncertainTransitions._is_transition_value_dependent_on_next_state()
internally
calls UncertainTransitions._is_transition_value_dependent_on_next_state_()
the first time and automatically
caches its value to make future calls more efficient (since the returned value is assumed to be constant).
# Returns
True if the transition value computation depends on next_state (False otherwise).
# _is_transition_value_dependent_on_next_state_ UncertainTransitions
_is_transition_value_dependent_on_next_state_(
self
) -> bool
Indicate whether _get_transition_value() requires the next_state parameter for its computation.
This is a helper function called by default
from UncertainTransitions._is_transition_value_dependent_on_next_state()
, the difference being that the result
is not cached here.
TIP
The underscore at the end of this function's name is a convention to remind that its result should be constant.
# Returns
True if the transition value computation depends on next_state (False otherwise).
# _sample Simulation
_sample(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Sample one transition of the simulator's dynamics.
By default, Simulation._sample()
provides some boilerplate code and internally
calls Simulation._state_sample()
(which returns a transition outcome). The boilerplate code automatically
samples an observation corresponding to the sampled next state.
TIP
Whenever an existing simulator needs to be wrapped instead of implemented fully in scikit-decide (e.g. a
simulator), it is recommended to overwrite Simulation._sample()
to call the external simulator and not use
the Simulation._state_sample()
helper function.
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
# Returns
The environment outcome of the sampled transition.
# _set_memory Simulation
_set_memory(
self,
memory: Memory[D.T_state]
) -> None
Set internal memory attribute _memory
to given one.
This can be useful to set a specific "starting point" before doing a rollout with
successive Environment._step()
calls.
# Parameters
- memory: The memory to set internally.
# Example
# Set simulation_domain memory to my_state (assuming Markovian domain)
simulation_domain._set_memory(my_state)
# Start a 100-steps rollout from here (applying my_action at every step)
for _ in range(100):
simulation_domain._step(my_action)
# _state_sample Simulation
_state_sample(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> TransitionOutcome[D.T_state, StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Compute one sample of the transition's dynamics.
This is a helper function called by default from Simulation._sample()
. It focuses on the state level, as
opposed to the observation one for the latter.
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
# Returns
The transition outcome of the sampled transition.
# _state_step Environment
_state_step(
self,
action: StrDict[list[D.T_event]]
) -> TransitionOutcome[D.T_state, StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Compute one step of the transition's dynamics.
This is a helper function called by default from Environment._step()
. It focuses on the state level, as opposed
to the observation one for the latter.
# Parameters
- action: The action taken in the current memory (state or history) triggering the transition.
# Returns
The transition outcome of this step.
# _step Environment
_step(
self,
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Run one step of the environment's dynamics.
By default, Environment._step()
provides some boilerplate code and internally
calls Environment._state_step()
(which returns a transition outcome). The boilerplate code automatically stores
next state into the _memory
attribute and samples a corresponding observation.
TIP
Whenever an existing environment needs to be wrapped instead of implemented fully in scikit-decide (e.g. compiled
ATARI games), it is recommended to overwrite Environment._step()
to call the external environment and not
use the Environment._state_step()
helper function.
WARNING
Before calling Environment._step()
the first time or when the end of an episode is
reached, Initializable._reset()
must be called to reset the environment's state.
# Parameters
- action: The action taken in the current memory (state or history) triggering the transition.
# Returns
The environment outcome of this step.
# DeterministicTransitions
A domain must inherit this class if its dynamics is deterministic and provided as a white-box model.
Compared to pure enumerable transition domains, deterministic transition ones guarantee that there is only one next state for a given source memory (state or history) and action.
TIP
Deterministic transition domains are typically stateless: they do not need to store the current state or history
in memory since it is usually passed as parameter of their functions. By default, they only become stateful
whenever they are used as environments (e.g. via Initializable.reset()
and Environment.step()
functions).
# get_next_state DeterministicTransitions
get_next_state(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> D.T_state
Get the next state given a memory and action.
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
# Returns
The deterministic next state.
# get_next_state_distribution UncertainTransitions
get_next_state_distribution(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> DiscreteDistribution[D.T_state]
Get the discrete probability distribution of next state given a memory and action.
TIP
In the Markovian case (memory only holds last state
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
# Returns
The discrete probability distribution of next state.
# get_transition_value UncertainTransitions
get_transition_value(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]],
next_state: Optional[D.T_state] = None
) -> StrDict[Value[D.T_value]]
Get the value (reward or cost) of a transition.
The transition to consider is defined by the function parameters.
TIP
If this function never depends on the next_state parameter for its computation, it is recommended to
indicate it by overriding UncertainTransitions._is_transition_value_dependent_on_next_state_()
to return
False. This information can then be exploited by solvers to avoid computing next state to evaluate a
transition value (more efficient).
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
- next_state: The next state in which the transition ends (if needed for the computation).
# Returns
The transition value (reward or cost).
# is_terminal UncertainTransitions
is_terminal(
self,
state: D.T_state
) -> StrDict[D.T_predicate]
Indicate whether a state is terminal.
A terminal state is a state with no outgoing transition (except to itself with value 0).
# Parameters
- state: The state to consider.
# Returns
True if the state is terminal (False otherwise).
# is_transition_value_dependent_on_next_state UncertainTransitions
is_transition_value_dependent_on_next_state(
self
) -> bool
Indicate whether get_transition_value() requires the next_state parameter for its computation (cached).
By default, UncertainTransitions.is_transition_value_dependent_on_next_state()
internally
calls UncertainTransitions._is_transition_value_dependent_on_next_state_()
the first time and automatically
caches its value to make future calls more efficient (since the returned value is assumed to be constant).
# Returns
True if the transition value computation depends on next_state (False otherwise).
# sample Simulation
sample(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Sample one transition of the simulator's dynamics.
By default, Simulation.sample()
provides some boilerplate code and internally calls Simulation._sample()
(which returns a transition outcome). The boilerplate code automatically samples an observation corresponding to
the sampled next state.
TIP
Whenever an existing simulator needs to be wrapped instead of implemented fully in scikit-decide (e.g. a
simulator), it is recommended to overwrite Simulation.sample()
to call the external simulator and not use
the Simulation._sample()
helper function.
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
# Returns
The environment outcome of the sampled transition.
# set_memory Simulation
set_memory(
self,
memory: Memory[D.T_state]
) -> None
Set internal memory attribute _memory
to given one.
This can be useful to set a specific "starting point" before doing a rollout with successive Environment.step()
calls.
# Parameters
- memory: The memory to set internally.
# Example
# Set simulation_domain memory to my_state (assuming Markovian domain)
simulation_domain.set_memory(my_state)
# Start a 100-steps rollout from here (applying my_action at every step)
for _ in range(100):
simulation_domain.step(my_action)
# step Environment
step(
self,
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Run one step of the environment's dynamics.
By default, Environment.step()
provides some boilerplate code and internally calls Environment._step()
(which
returns a transition outcome). The boilerplate code automatically stores next state into the _memory
attribute
and samples a corresponding observation.
TIP
Whenever an existing environment needs to be wrapped instead of implemented fully in scikit-decide (e.g. compiled
ATARI games), it is recommended to overwrite Environment.step()
to call the external environment and not
use the Environment._step()
helper function.
WARNING
Before calling Environment.step()
the first time or when the end of an episode is
reached, Initializable.reset()
must be called to reset the environment's state.
# Parameters
- action: The action taken in the current memory (state or history) triggering the transition.
# Returns
The environment outcome of this step.
# _get_next_state DeterministicTransitions
_get_next_state(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> D.T_state
Get the next state given a memory and action.
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
# Returns
The deterministic next state.
# _get_next_state_distribution UncertainTransitions
_get_next_state_distribution(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> SingleValueDistribution[D.T_state]
Get the discrete probability distribution of next state given a memory and action.
TIP
In the Markovian case (memory only holds last state
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
# Returns
The discrete probability distribution of next state.
# _get_transition_value UncertainTransitions
_get_transition_value(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]],
next_state: Optional[D.T_state] = None
) -> StrDict[Value[D.T_value]]
Get the value (reward or cost) of a transition.
The transition to consider is defined by the function parameters.
TIP
If this function never depends on the next_state parameter for its computation, it is recommended to
indicate it by overriding UncertainTransitions._is_transition_value_dependent_on_next_state_()
to return
False. This information can then be exploited by solvers to avoid computing next state to evaluate a
transition value (more efficient).
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
- next_state: The next state in which the transition ends (if needed for the computation).
# Returns
The transition value (reward or cost).
# _is_terminal UncertainTransitions
_is_terminal(
self,
state: D.T_state
) -> StrDict[D.T_predicate]
Indicate whether a state is terminal.
A terminal state is a state with no outgoing transition (except to itself with value 0).
# Parameters
- state: The state to consider.
# Returns
True if the state is terminal (False otherwise).
# _is_transition_value_dependent_on_next_state UncertainTransitions
_is_transition_value_dependent_on_next_state(
self
) -> bool
Indicate whether _get_transition_value() requires the next_state parameter for its computation (cached).
By default, UncertainTransitions._is_transition_value_dependent_on_next_state()
internally
calls UncertainTransitions._is_transition_value_dependent_on_next_state_()
the first time and automatically
caches its value to make future calls more efficient (since the returned value is assumed to be constant).
# Returns
True if the transition value computation depends on next_state (False otherwise).
# _is_transition_value_dependent_on_next_state_ UncertainTransitions
_is_transition_value_dependent_on_next_state_(
self
) -> bool
Indicate whether _get_transition_value() requires the next_state parameter for its computation.
This is a helper function called by default
from UncertainTransitions._is_transition_value_dependent_on_next_state()
, the difference being that the result
is not cached here.
TIP
The underscore at the end of this function's name is a convention to remind that its result should be constant.
# Returns
True if the transition value computation depends on next_state (False otherwise).
# _sample Simulation
_sample(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Sample one transition of the simulator's dynamics.
By default, Simulation._sample()
provides some boilerplate code and internally
calls Simulation._state_sample()
(which returns a transition outcome). The boilerplate code automatically
samples an observation corresponding to the sampled next state.
TIP
Whenever an existing simulator needs to be wrapped instead of implemented fully in scikit-decide (e.g. a
simulator), it is recommended to overwrite Simulation._sample()
to call the external simulator and not use
the Simulation._state_sample()
helper function.
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
# Returns
The environment outcome of the sampled transition.
# _set_memory Simulation
_set_memory(
self,
memory: Memory[D.T_state]
) -> None
Set internal memory attribute _memory
to given one.
This can be useful to set a specific "starting point" before doing a rollout with
successive Environment._step()
calls.
# Parameters
- memory: The memory to set internally.
# Example
# Set simulation_domain memory to my_state (assuming Markovian domain)
simulation_domain._set_memory(my_state)
# Start a 100-steps rollout from here (applying my_action at every step)
for _ in range(100):
simulation_domain._step(my_action)
# _state_sample Simulation
_state_sample(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> TransitionOutcome[D.T_state, StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Compute one sample of the transition's dynamics.
This is a helper function called by default from Simulation._sample()
. It focuses on the state level, as
opposed to the observation one for the latter.
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
# Returns
The transition outcome of the sampled transition.
# _state_step Environment
_state_step(
self,
action: StrDict[list[D.T_event]]
) -> TransitionOutcome[D.T_state, StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Compute one step of the transition's dynamics.
This is a helper function called by default from Environment._step()
. It focuses on the state level, as opposed
to the observation one for the latter.
# Parameters
- action: The action taken in the current memory (state or history) triggering the transition.
# Returns
The transition outcome of this step.
# _step Environment
_step(
self,
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Run one step of the environment's dynamics.
By default, Environment._step()
provides some boilerplate code and internally
calls Environment._state_step()
(which returns a transition outcome). The boilerplate code automatically stores
next state into the _memory
attribute and samples a corresponding observation.
TIP
Whenever an existing environment needs to be wrapped instead of implemented fully in scikit-decide (e.g. compiled
ATARI games), it is recommended to overwrite Environment._step()
to call the external environment and not
use the Environment._state_step()
helper function.
WARNING
Before calling Environment._step()
the first time or when the end of an episode is
reached, Initializable._reset()
must be called to reset the environment's state.
# Parameters
- action: The action taken in the current memory (state or history) triggering the transition.
# Returns
The environment outcome of this step.