# domains
This module contains base classes for quickly building domains.
Domain specification
# Domain
This is the highest level domain class (inheriting top-level class for each mandatory domain characteristic).
This helper class can be used as the main base class for domains.
Typical use:
class D(Domain, ...)
with "..." replaced when needed by a number of classes from following domain characteristics (the ones in parentheses are optional):
- agent: MultiAgent -> SingleAgent
- concurrency: Parallel -> Sequential
- (constraints): Constrained
- dynamics: Environment -> Simulation -> UncertainTransitions -> EnumerableTransitions -> DeterministicTransitions
- events: Events -> Actions -> UnrestrictedActions
- (goals): Goals
- (initialization): Initializable -> UncertainInitialized -> DeterministicInitialized
- memory: History -> FiniteHistory -> Markovian -> Memoryless
- observability: PartiallyObservable -> TransformedObservable -> FullyObservable
- (renderability): Renderable
- value: Rewards -> PositiveCosts
# check_value Rewards
check_value(
self,
value: Value[D.T_value]
) -> bool
Check that a value is compliant with its reward specification.
TIP
This function returns always True by default because any kind of reward should be accepted at this level.
# Parameters
- value: The value to check.
# Returns
True if the value is compliant (False otherwise).
# get_action_mask Events
get_action_mask(
self,
memory: Optional[Memory[D.T_state]] = None
) -> StrDict[Mask]
Get action mask for the given memory or internal one if omitted.
An action mask is another (more specific) format for applicable actions, that has a meaning only if the action space can be iterated over in some way. It is represented by a flat array of 0's and 1's ordered as the actions when enumerated: 1 for an applicable action, and 0 for a not applicable action.
More precisely, this implementation makes the assumption that each agent action space is an EnumerableSpace
,
and calls internally self.get_applicable_action()
.
The action mask is used for instance by RL solvers to shut down logits associated to non-applicable actions in the output of their internal neural network.
# Parameters
- memory: The memory to consider. If None, works on the internal memory of the domain.
# Returns
a numpy array (or dict agent-> numpy array for multi-agent domains) with 0-1 indicating applicability of the action (1 meaning applicable and 0 not applicable)
# get_action_space Events
get_action_space(
self
) -> StrDict[Space[D.T_event]]
Get the (cached) domain action space (finite or infinite set).
By default, Events.get_action_space()
internally calls Events._get_action_space_()
the first time and
automatically caches its value to make future calls more efficient (since the action space is assumed to be
constant).
# Returns
The action space.
# get_agents MultiAgent
get_agents(
self
) -> set[str]
Return the set of available agents ids.
# get_applicable_actions Events
get_applicable_actions(
self,
memory: Optional[Memory[D.T_state]] = None
) -> StrDict[Space[D.T_event]]
Get the space (finite or infinite set) of applicable actions in the given memory (state or history), or in the internal one if omitted.
By default, Events.get_applicable_actions()
provides some boilerplate code and internally
calls Events._get_applicable_actions()
. The boilerplate code automatically passes the _memory
attribute
instead of the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
The space of applicable actions.
# get_enabled_events Events
get_enabled_events(
self,
memory: Optional[Memory[D.T_state]] = None
) -> Space[D.T_event]
Get the space (finite or infinite set) of enabled uncontrollable events in the given memory (state or history), or in the internal one if omitted.
By default, Events.get_enabled_events()
provides some boilerplate code and internally
calls Events._get_enabled_events()
. The boilerplate code automatically passes the _memory
attribute instead of
the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
The space of enabled events.
# get_observation_distribution PartiallyObservable
get_observation_distribution(
self,
state: D.T_state,
action: Optional[StrDict[list[D.T_event]]] = None
) -> Distribution[StrDict[D.T_observation]]
Get the probability distribution of the observation given a state and action.
In mathematical terms (discrete case), given an action
# Parameters
- state: The state to be observed.
- action: The last applied action (or None if the state is an initial state).
# Returns
The probability distribution of the observation.
# get_observation_space PartiallyObservable
get_observation_space(
self
) -> StrDict[Space[D.T_observation]]
Get the (cached) observation space (finite or infinite set).
By default, PartiallyObservable.get_observation_space()
internally
calls PartiallyObservable._get_observation_space_()
the first time and automatically caches its value to make
future calls more efficient (since the observation space is assumed to be constant).
# Returns
The observation space.
# is_action Events
is_action(
self,
event: D.T_event
) -> bool
Indicate whether an event is an action (i.e. a controllable event for the agents).
TIP
By default, this function is implemented using the skdecide.core.Space.contains()
function on the domain
action space provided by Events.get_action_space()
, but it can be overridden for faster implementations.
# Parameters
- event: The event to consider.
# Returns
True if the event is an action (False otherwise).
# is_applicable_action Events
is_applicable_action(
self,
action: StrDict[D.T_event],
memory: Optional[Memory[D.T_state]] = None
) -> bool
Indicate whether an action is applicable in the given memory (state or history), or in the internal one if omitted.
By default, Events.is_applicable_action()
provides some boilerplate code and internally
calls Events._is_applicable_action()
. The boilerplate code automatically passes the _memory
attribute instead
of the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
True if the action is applicable (False otherwise).
# is_enabled_event Events
is_enabled_event(
self,
event: D.T_event,
memory: Optional[Memory[D.T_state]] = None
) -> bool
Indicate whether an uncontrollable event is enabled in the given memory (state or history), or in the internal one if omitted.
By default, Events.is_enabled_event()
provides some boilerplate code and internally
calls Events._is_enabled_event()
. The boilerplate code automatically passes the _memory
attribute instead of
the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
True if the event is enabled (False otherwise).
# is_observation PartiallyObservable
is_observation(
self,
observation: StrDict[D.T_observation]
) -> bool
Check that an observation indeed belongs to the domain observation space.
TIP
By default, this function is implemented using the skdecide.core.Space.contains()
function on the domain
observation space provided by PartiallyObservable.get_observation_space()
, but it can be overridden for
faster implementations.
# Parameters
- observation: The observation to consider.
# Returns
True if the observation belongs to the domain observation space (False otherwise).
# step Environment
step(
self,
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Run one step of the environment's dynamics.
By default, Environment.step()
provides some boilerplate code and internally calls Environment._step()
(which
returns a transition outcome). The boilerplate code automatically stores next state into the _memory
attribute
and samples a corresponding observation.
TIP
Whenever an existing environment needs to be wrapped instead of implemented fully in scikit-decide (e.g. compiled
ATARI games), it is recommended to overwrite Environment.step()
to call the external environment and not
use the Environment._step()
helper function.
WARNING
Before calling Environment.step()
the first time or when the end of an episode is
reached, Initializable.reset()
must be called to reset the environment's state.
# Parameters
- action: The action taken in the current memory (state or history) triggering the transition.
# Returns
The environment outcome of this step.
# _check_value Rewards
_check_value(
self,
value: Value[D.T_value]
) -> bool
Check that a value is compliant with its reward specification.
TIP
This function returns always True by default because any kind of reward should be accepted at this level.
# Parameters
- value: The value to check.
# Returns
True if the value is compliant (False otherwise).
# _get_action_mask Events
_get_action_mask(
self,
memory: Optional[Memory[D.T_state]] = None
) -> StrDict[Mask]
Get action mask for the given memory or internal one if omitted.
An action mask is another (more specific) format for applicable actions, that has a meaning only if the action space can be iterated over in some way. It is represented by a flat array of 0's and 1's ordered as the actions when enumerated: 1 for an applicable action, and 0 for a not applicable action.
More precisely, this implementation makes the assumption that each agent action space is an EnumerableSpace
,
and calls internally self.get_applicable_action()
.
The action mask is used for instance by RL solvers to shut down logits associated to non-applicable actions in the output of their internal neural network.
# Parameters
- memory: The memory to consider. If None, works on the internal memory of the domain.
# Returns
a numpy array (or dict agent-> numpy array for multi-agent domains) with 0-1 indicating applicability of the action (1 meaning applicable and 0 not applicable)
# _get_action_space Events
_get_action_space(
self
) -> StrDict[Space[D.T_event]]
Get the (cached) domain action space (finite or infinite set).
By default, Events._get_action_space()
internally calls Events._get_action_space_()
the first time and
automatically caches its value to make future calls more efficient (since the action space is assumed to be
constant).
# Returns
The action space.
# _get_action_space_ Events
_get_action_space_(
self
) -> StrDict[Space[D.T_event]]
Get the domain action space (finite or infinite set).
This is a helper function called by default from Events._get_action_space()
, the difference being that the
result is not cached here.
TIP
The underscore at the end of this function's name is a convention to remind that its result should be constant.
# Returns
The action space.
# _get_applicable_actions Events
_get_applicable_actions(
self,
memory: Optional[Memory[D.T_state]] = None
) -> StrDict[Space[D.T_event]]
Get the space (finite or infinite set) of applicable actions in the given memory (state or history), or in the internal one if omitted.
By default, Events._get_applicable_actions()
provides some boilerplate code and internally
calls Events._get_applicable_actions_from()
. The boilerplate code automatically passes the _memory
attribute
instead of the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
The space of applicable actions.
# _get_applicable_actions_from Events
_get_applicable_actions_from(
self,
memory: Memory[D.T_state]
) -> StrDict[Space[D.T_event]]
Get the space (finite or infinite set) of applicable actions in the given memory (state or history).
This is a helper function called by default from Events._get_applicable_actions()
, the difference being that
the memory parameter is mandatory here.
# Parameters
- memory: The memory to consider.
# Returns
The space of applicable actions.
# _get_enabled_events Events
_get_enabled_events(
self,
memory: Optional[Memory[D.T_state]] = None
) -> Space[D.T_event]
Get the space (finite or infinite set) of enabled uncontrollable events in the given memory (state or history), or in the internal one if omitted.
By default, Events._get_enabled_events()
provides some boilerplate code and internally
calls Events._get_enabled_events_from()
. The boilerplate code automatically passes the _memory
attribute
instead of the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
The space of enabled events.
# _get_enabled_events_from Events
_get_enabled_events_from(
self,
memory: Memory[D.T_state]
) -> Space[D.T_event]
Get the space (finite or infinite set) of enabled uncontrollable events in the given memory (state or history).
This is a helper function called by default from Events._get_enabled_events()
, the difference being that the
memory parameter is mandatory here.
# Parameters
- memory: The memory to consider.
# Returns
The space of enabled events.
# _get_memory_maxlen History
_get_memory_maxlen(
self
) -> Optional[int]
Get the memory max length (or None if unbounded).
TIP
This function returns always None by default because the memory length is unbounded at this level.
# Returns
The memory max length (or None if unbounded).
# _get_observation_distribution PartiallyObservable
_get_observation_distribution(
self,
state: D.T_state,
action: Optional[StrDict[list[D.T_event]]] = None
) -> Distribution[StrDict[D.T_observation]]
Get the probability distribution of the observation given a state and action.
In mathematical terms (discrete case), given an action
# Parameters
- state: The state to be observed.
- action: The last applied action (or None if the state is an initial state).
# Returns
The probability distribution of the observation.
# _get_observation_space PartiallyObservable
_get_observation_space(
self
) -> StrDict[Space[D.T_observation]]
Get the (cached) observation space (finite or infinite set).
By default, PartiallyObservable._get_observation_space()
internally
calls PartiallyObservable._get_observation_space_()
the first time and automatically caches its value to make
future calls more efficient (since the observation space is assumed to be constant).
# Returns
The observation space.
# _get_observation_space_ PartiallyObservable
_get_observation_space_(
self
) -> StrDict[Space[D.T_observation]]
Get the observation space (finite or infinite set).
This is a helper function called by default from PartiallyObservable._get_observation_space()
, the difference
being that the result is not cached here.
TIP
The underscore at the end of this function's name is a convention to remind that its result should be constant.
# Returns
The observation space.
# _init_memory History
_init_memory(
self,
state: Optional[D.T_state] = None
) -> Memory[D.T_state]
Initialize memory (possibly with a state) according to its specification and return it.
This function is automatically called by Initializable._reset()
to reinitialize the internal memory whenever
the domain is used as an environment.
# Parameters
- state: An optional state to initialize the memory with (typically the initial state).
# Returns
The new initialized memory.
# _is_action Events
_is_action(
self,
event: D.T_event
) -> bool
Indicate whether an event is an action (i.e. a controllable event for the agents).
TIP
By default, this function is implemented using the skdecide.core.Space.contains()
function on the domain
action space provided by Events._get_action_space()
, but it can be overridden for faster implementations.
# Parameters
- event: The event to consider.
# Returns
True if the event is an action (False otherwise).
# _is_applicable_action Events
_is_applicable_action(
self,
action: StrDict[D.T_event],
memory: Optional[Memory[D.T_state]] = None
) -> bool
Indicate whether an action is applicable in the given memory (state or history), or in the internal one if omitted.
By default, Events._is_applicable_action()
provides some boilerplate code and internally
calls Events._is_applicable_action_from()
. The boilerplate code automatically passes the _memory
attribute
instead of the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
True if the action is applicable (False otherwise).
# _is_applicable_action_from Events
_is_applicable_action_from(
self,
action: StrDict[D.T_event],
memory: Memory[D.T_state]
) -> bool
Indicate whether an action is applicable in the given memory (state or history).
This is a helper function called by default from Events._is_applicable_action()
, the difference being that the
memory parameter is mandatory here.
TIP
By default, this function is implemented using the skdecide.core.Space.contains()
function on the space of
applicable actions provided by Events._get_applicable_actions_from()
, but it can be overridden for faster
implementations.
# Parameters
- memory: The memory to consider.
# Returns
True if the action is applicable (False otherwise).
# _is_enabled_event Events
_is_enabled_event(
self,
event: D.T_event,
memory: Optional[Memory[D.T_state]] = None
) -> bool
Indicate whether an uncontrollable event is enabled in the given memory (state or history), or in the internal one if omitted.
By default, Events._is_enabled_event()
provides some boilerplate code and internally
calls Events._is_enabled_event_from()
. The boilerplate code automatically passes the _memory
attribute instead
of the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
True if the event is enabled (False otherwise).
# _is_enabled_event_from Events
_is_enabled_event_from(
self,
event: D.T_event,
memory: Memory[D.T_state]
) -> bool
Indicate whether an event is enabled in the given memory (state or history).
This is a helper function called by default from Events._is_enabled_event()
, the difference being that the
memory parameter is mandatory here.
TIP
By default, this function is implemented using the skdecide.core.Space.contains()
function on the space of
enabled events provided by Events._get_enabled_events_from()
, but it can be overridden for faster
implementations.
# Parameters
- memory: The memory to consider.
# Returns
True if the event is enabled (False otherwise).
# _is_observation PartiallyObservable
_is_observation(
self,
observation: StrDict[D.T_observation]
) -> bool
Check that an observation indeed belongs to the domain observation space.
TIP
By default, this function is implemented using the skdecide.core.Space.contains()
function on the domain
observation space provided by PartiallyObservable._get_observation_space()
, but it can be overridden for
faster implementations.
# Parameters
- observation: The observation to consider.
# Returns
True if the observation belongs to the domain observation space (False otherwise).
# _state_step Environment
_state_step(
self,
action: StrDict[list[D.T_event]]
) -> TransitionOutcome[D.T_state, StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Compute one step of the transition's dynamics.
This is a helper function called by default from Environment._step()
. It focuses on the state level, as opposed
to the observation one for the latter.
# Parameters
- action: The action taken in the current memory (state or history) triggering the transition.
# Returns
The transition outcome of this step.
# _step Environment
_step(
self,
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Run one step of the environment's dynamics.
By default, Environment._step()
provides some boilerplate code and internally
calls Environment._state_step()
(which returns a transition outcome). The boilerplate code automatically stores
next state into the _memory
attribute and samples a corresponding observation.
TIP
Whenever an existing environment needs to be wrapped instead of implemented fully in scikit-decide (e.g. compiled
ATARI games), it is recommended to overwrite Environment._step()
to call the external environment and not
use the Environment._state_step()
helper function.
WARNING
Before calling Environment._step()
the first time or when the end of an episode is
reached, Initializable._reset()
must be called to reset the environment's state.
# Parameters
- action: The action taken in the current memory (state or history) triggering the transition.
# Returns
The environment outcome of this step.
# RLDomain
This is a typical Reinforcement Learning domain class.
This helper class can be used as an alternate base class for domains, inheriting the following:
- Domain
- SingleAgent
- Sequential
- Environment
- Actions
- Initializable
- Markovian
- TransformedObservable
- Rewards
Typical use:
class D(RLDomain)
TIP
It is also possible to refine any alternate base class, like for instance:
class D(RLDomain, FullyObservable)
# check_value Rewards
check_value(
self,
value: Value[D.T_value]
) -> bool
Check that a value is compliant with its reward specification.
TIP
This function returns always True by default because any kind of reward should be accepted at this level.
# Parameters
- value: The value to check.
# Returns
True if the value is compliant (False otherwise).
# get_action_mask Events
get_action_mask(
self,
memory: Optional[Memory[D.T_state]] = None
) -> StrDict[Mask]
Get action mask for the given memory or internal one if omitted.
An action mask is another (more specific) format for applicable actions, that has a meaning only if the action space can be iterated over in some way. It is represented by a flat array of 0's and 1's ordered as the actions when enumerated: 1 for an applicable action, and 0 for a not applicable action.
More precisely, this implementation makes the assumption that each agent action space is an EnumerableSpace
,
and calls internally self.get_applicable_action()
.
The action mask is used for instance by RL solvers to shut down logits associated to non-applicable actions in the output of their internal neural network.
# Parameters
- memory: The memory to consider. If None, works on the internal memory of the domain.
# Returns
a numpy array (or dict agent-> numpy array for multi-agent domains) with 0-1 indicating applicability of the action (1 meaning applicable and 0 not applicable)
# get_action_space Events
get_action_space(
self
) -> StrDict[Space[D.T_event]]
Get the (cached) domain action space (finite or infinite set).
By default, Events.get_action_space()
internally calls Events._get_action_space_()
the first time and
automatically caches its value to make future calls more efficient (since the action space is assumed to be
constant).
# Returns
The action space.
# get_agents MultiAgent
get_agents(
self
) -> set[str]
Return a singleton for single agent domains.
We must be here consistent with skdecide.core.autocast()
which transforms a single agent domain
into a multi agents domain whose only agent has the id "agent".
# get_applicable_actions Events
get_applicable_actions(
self,
memory: Optional[Memory[D.T_state]] = None
) -> StrDict[Space[D.T_event]]
Get the space (finite or infinite set) of applicable actions in the given memory (state or history), or in the internal one if omitted.
By default, Events.get_applicable_actions()
provides some boilerplate code and internally
calls Events._get_applicable_actions()
. The boilerplate code automatically passes the _memory
attribute
instead of the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
The space of applicable actions.
# get_enabled_events Events
get_enabled_events(
self,
memory: Optional[Memory[D.T_state]] = None
) -> Space[D.T_event]
Get the space (finite or infinite set) of enabled uncontrollable events in the given memory (state or history), or in the internal one if omitted.
By default, Events.get_enabled_events()
provides some boilerplate code and internally
calls Events._get_enabled_events()
. The boilerplate code automatically passes the _memory
attribute instead of
the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
The space of enabled events.
# get_observation TransformedObservable
get_observation(
self,
state: D.T_state,
action: Optional[StrDict[list[D.T_event]]] = None
) -> StrDict[D.T_observation]
Get the deterministic observation given a state and action.
# Parameters
- state: The state to be observed.
- action: The last applied action (or None if the state is an initial state).
# Returns
The probability distribution of the observation.
# get_observation_distribution PartiallyObservable
get_observation_distribution(
self,
state: D.T_state,
action: Optional[StrDict[list[D.T_event]]] = None
) -> Distribution[StrDict[D.T_observation]]
Get the probability distribution of the observation given a state and action.
In mathematical terms (discrete case), given an action
# Parameters
- state: The state to be observed.
- action: The last applied action (or None if the state is an initial state).
# Returns
The probability distribution of the observation.
# get_observation_space PartiallyObservable
get_observation_space(
self
) -> StrDict[Space[D.T_observation]]
Get the (cached) observation space (finite or infinite set).
By default, PartiallyObservable.get_observation_space()
internally
calls PartiallyObservable._get_observation_space_()
the first time and automatically caches its value to make
future calls more efficient (since the observation space is assumed to be constant).
# Returns
The observation space.
# is_action Events
is_action(
self,
event: D.T_event
) -> bool
Indicate whether an event is an action (i.e. a controllable event for the agents).
TIP
By default, this function is implemented using the skdecide.core.Space.contains()
function on the domain
action space provided by Events.get_action_space()
, but it can be overridden for faster implementations.
# Parameters
- event: The event to consider.
# Returns
True if the event is an action (False otherwise).
# is_applicable_action Events
is_applicable_action(
self,
action: StrDict[D.T_event],
memory: Optional[Memory[D.T_state]] = None
) -> bool
Indicate whether an action is applicable in the given memory (state or history), or in the internal one if omitted.
By default, Events.is_applicable_action()
provides some boilerplate code and internally
calls Events._is_applicable_action()
. The boilerplate code automatically passes the _memory
attribute instead
of the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
True if the action is applicable (False otherwise).
# is_enabled_event Events
is_enabled_event(
self,
event: D.T_event,
memory: Optional[Memory[D.T_state]] = None
) -> bool
Indicate whether an uncontrollable event is enabled in the given memory (state or history), or in the internal one if omitted.
By default, Events.is_enabled_event()
provides some boilerplate code and internally
calls Events._is_enabled_event()
. The boilerplate code automatically passes the _memory
attribute instead of
the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
True if the event is enabled (False otherwise).
# is_observation PartiallyObservable
is_observation(
self,
observation: StrDict[D.T_observation]
) -> bool
Check that an observation indeed belongs to the domain observation space.
TIP
By default, this function is implemented using the skdecide.core.Space.contains()
function on the domain
observation space provided by PartiallyObservable.get_observation_space()
, but it can be overridden for
faster implementations.
# Parameters
- observation: The observation to consider.
# Returns
True if the observation belongs to the domain observation space (False otherwise).
# reset Initializable
reset(
self
) -> StrDict[D.T_observation]
Reset the state of the environment and return an initial observation.
By default, Initializable.reset()
provides some boilerplate code and internally calls Initializable._reset()
(which returns an initial state). The boilerplate code automatically stores the initial state into the _memory
attribute and samples a corresponding observation.
# Returns
An initial observation.
# step Environment
step(
self,
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Run one step of the environment's dynamics.
By default, Environment.step()
provides some boilerplate code and internally calls Environment._step()
(which
returns a transition outcome). The boilerplate code automatically stores next state into the _memory
attribute
and samples a corresponding observation.
TIP
Whenever an existing environment needs to be wrapped instead of implemented fully in scikit-decide (e.g. compiled
ATARI games), it is recommended to overwrite Environment.step()
to call the external environment and not
use the Environment._step()
helper function.
WARNING
Before calling Environment.step()
the first time or when the end of an episode is
reached, Initializable.reset()
must be called to reset the environment's state.
# Parameters
- action: The action taken in the current memory (state or history) triggering the transition.
# Returns
The environment outcome of this step.
# _check_value Rewards
_check_value(
self,
value: Value[D.T_value]
) -> bool
Check that a value is compliant with its reward specification.
TIP
This function returns always True by default because any kind of reward should be accepted at this level.
# Parameters
- value: The value to check.
# Returns
True if the value is compliant (False otherwise).
# _get_action_mask Events
_get_action_mask(
self,
memory: Optional[Memory[D.T_state]] = None
) -> StrDict[Mask]
Get action mask for the given memory or internal one if omitted.
An action mask is another (more specific) format for applicable actions, that has a meaning only if the action space can be iterated over in some way. It is represented by a flat array of 0's and 1's ordered as the actions when enumerated: 1 for an applicable action, and 0 for a not applicable action.
More precisely, this implementation makes the assumption that each agent action space is an EnumerableSpace
,
and calls internally self.get_applicable_action()
.
The action mask is used for instance by RL solvers to shut down logits associated to non-applicable actions in the output of their internal neural network.
# Parameters
- memory: The memory to consider. If None, works on the internal memory of the domain.
# Returns
a numpy array (or dict agent-> numpy array for multi-agent domains) with 0-1 indicating applicability of the action (1 meaning applicable and 0 not applicable)
# _get_action_space Events
_get_action_space(
self
) -> StrDict[Space[D.T_event]]
Get the (cached) domain action space (finite or infinite set).
By default, Events._get_action_space()
internally calls Events._get_action_space_()
the first time and
automatically caches its value to make future calls more efficient (since the action space is assumed to be
constant).
# Returns
The action space.
# _get_action_space_ Events
_get_action_space_(
self
) -> StrDict[Space[D.T_event]]
Get the domain action space (finite or infinite set).
This is a helper function called by default from Events._get_action_space()
, the difference being that the
result is not cached here.
TIP
The underscore at the end of this function's name is a convention to remind that its result should be constant.
# Returns
The action space.
# _get_applicable_actions Events
_get_applicable_actions(
self,
memory: Optional[Memory[D.T_state]] = None
) -> StrDict[Space[D.T_event]]
Get the space (finite or infinite set) of applicable actions in the given memory (state or history), or in the internal one if omitted.
By default, Events._get_applicable_actions()
provides some boilerplate code and internally
calls Events._get_applicable_actions_from()
. The boilerplate code automatically passes the _memory
attribute
instead of the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
The space of applicable actions.
# _get_applicable_actions_from Events
_get_applicable_actions_from(
self,
memory: Memory[D.T_state]
) -> StrDict[Space[D.T_event]]
Get the space (finite or infinite set) of applicable actions in the given memory (state or history).
This is a helper function called by default from Events._get_applicable_actions()
, the difference being that
the memory parameter is mandatory here.
# Parameters
- memory: The memory to consider.
# Returns
The space of applicable actions.
# _get_enabled_events Events
_get_enabled_events(
self,
memory: Optional[Memory[D.T_state]] = None
) -> Space[D.T_event]
Get the space (finite or infinite set) of enabled uncontrollable events in the given memory (state or history), or in the internal one if omitted.
By default, Events._get_enabled_events()
provides some boilerplate code and internally
calls Events._get_enabled_events_from()
. The boilerplate code automatically passes the _memory
attribute
instead of the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
The space of enabled events.
# _get_enabled_events_from Events
_get_enabled_events_from(
self,
memory: Memory[D.T_state]
) -> Space[D.T_event]
Get the space (finite or infinite set) of enabled uncontrollable events in the given memory (state or history).
This is a helper function called by default from Events._get_enabled_events()
, the difference being that the
memory parameter is mandatory here.
# Parameters
- memory: The memory to consider.
# Returns
The space of enabled events.
# _get_memory_maxlen History
_get_memory_maxlen(
self
) -> int
Get the (cached) memory max length.
By default, FiniteHistory._get_memory_maxlen()
internally calls FiniteHistory._get_memory_maxlen_()
the first
time and automatically caches its value to make future calls more efficient (since the memory max length is
assumed to be constant).
# Returns
The memory max length.
# _get_memory_maxlen_ FiniteHistory
_get_memory_maxlen_(
self
) -> int
Get the memory max length.
This is a helper function called by default from FiniteHistory._get_memory_maxlen()
, the difference being that
the result is not cached here.
TIP
The underscore at the end of this function's name is a convention to remind that its result should be constant.
# Returns
The memory max length.
# _get_observation TransformedObservable
_get_observation(
self,
state: D.T_state,
action: Optional[StrDict[list[D.T_event]]] = None
) -> StrDict[D.T_observation]
Get the deterministic observation given a state and action.
# Parameters
- state: The state to be observed.
- action: The last applied action (or None if the state is an initial state).
# Returns
The probability distribution of the observation.
# _get_observation_distribution PartiallyObservable
_get_observation_distribution(
self,
state: D.T_state,
action: Optional[StrDict[list[D.T_event]]] = None
) -> Distribution[StrDict[D.T_observation]]
Get the probability distribution of the observation given a state and action.
In mathematical terms (discrete case), given an action
# Parameters
- state: The state to be observed.
- action: The last applied action (or None if the state is an initial state).
# Returns
The probability distribution of the observation.
# _get_observation_space PartiallyObservable
_get_observation_space(
self
) -> StrDict[Space[D.T_observation]]
Get the (cached) observation space (finite or infinite set).
By default, PartiallyObservable._get_observation_space()
internally
calls PartiallyObservable._get_observation_space_()
the first time and automatically caches its value to make
future calls more efficient (since the observation space is assumed to be constant).
# Returns
The observation space.
# _get_observation_space_ PartiallyObservable
_get_observation_space_(
self
) -> StrDict[Space[D.T_observation]]
Get the observation space (finite or infinite set).
This is a helper function called by default from PartiallyObservable._get_observation_space()
, the difference
being that the result is not cached here.
TIP
The underscore at the end of this function's name is a convention to remind that its result should be constant.
# Returns
The observation space.
# _init_memory History
_init_memory(
self,
state: Optional[D.T_state] = None
) -> Memory[D.T_state]
Initialize memory (possibly with a state) according to its specification and return it.
This function is automatically called by Initializable._reset()
to reinitialize the internal memory whenever
the domain is used as an environment.
# Parameters
- state: An optional state to initialize the memory with (typically the initial state).
# Returns
The new initialized memory.
# _is_action Events
_is_action(
self,
event: D.T_event
) -> bool
Indicate whether an event is an action (i.e. a controllable event for the agents).
TIP
By default, this function is implemented using the skdecide.core.Space.contains()
function on the domain
action space provided by Events._get_action_space()
, but it can be overridden for faster implementations.
# Parameters
- event: The event to consider.
# Returns
True if the event is an action (False otherwise).
# _is_applicable_action Events
_is_applicable_action(
self,
action: StrDict[D.T_event],
memory: Optional[Memory[D.T_state]] = None
) -> bool
Indicate whether an action is applicable in the given memory (state or history), or in the internal one if omitted.
By default, Events._is_applicable_action()
provides some boilerplate code and internally
calls Events._is_applicable_action_from()
. The boilerplate code automatically passes the _memory
attribute
instead of the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
True if the action is applicable (False otherwise).
# _is_applicable_action_from Events
_is_applicable_action_from(
self,
action: StrDict[D.T_event],
memory: Memory[D.T_state]
) -> bool
Indicate whether an action is applicable in the given memory (state or history).
This is a helper function called by default from Events._is_applicable_action()
, the difference being that the
memory parameter is mandatory here.
TIP
By default, this function is implemented using the skdecide.core.Space.contains()
function on the space of
applicable actions provided by Events._get_applicable_actions_from()
, but it can be overridden for faster
implementations.
# Parameters
- memory: The memory to consider.
# Returns
True if the action is applicable (False otherwise).
# _is_enabled_event Events
_is_enabled_event(
self,
event: D.T_event,
memory: Optional[Memory[D.T_state]] = None
) -> bool
Indicate whether an uncontrollable event is enabled in the given memory (state or history), or in the internal one if omitted.
By default, Events._is_enabled_event()
provides some boilerplate code and internally
calls Events._is_enabled_event_from()
. The boilerplate code automatically passes the _memory
attribute instead
of the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
True if the event is enabled (False otherwise).
# _is_enabled_event_from Events
_is_enabled_event_from(
self,
event: D.T_event,
memory: Memory[D.T_state]
) -> bool
Indicate whether an event is enabled in the given memory (state or history).
This is a helper function called by default from Events._is_enabled_event()
, the difference being that the
memory parameter is mandatory here.
TIP
By default, this function is implemented using the skdecide.core.Space.contains()
function on the space of
enabled events provided by Events._get_enabled_events_from()
, but it can be overridden for faster
implementations.
# Parameters
- memory: The memory to consider.
# Returns
True if the event is enabled (False otherwise).
# _is_observation PartiallyObservable
_is_observation(
self,
observation: StrDict[D.T_observation]
) -> bool
Check that an observation indeed belongs to the domain observation space.
TIP
By default, this function is implemented using the skdecide.core.Space.contains()
function on the domain
observation space provided by PartiallyObservable._get_observation_space()
, but it can be overridden for
faster implementations.
# Parameters
- observation: The observation to consider.
# Returns
True if the observation belongs to the domain observation space (False otherwise).
# _reset Initializable
_reset(
self
) -> StrDict[D.T_observation]
Reset the state of the environment and return an initial observation.
By default, Initializable._reset()
provides some boilerplate code and internally
calls Initializable._state_reset()
(which returns an initial state). The boilerplate code automatically stores
the initial state into the _memory
attribute and samples a corresponding observation.
# Returns
An initial observation.
# _state_reset Initializable
_state_reset(
self
) -> D.T_state
Reset the state of the environment and return an initial state.
This is a helper function called by default from Initializable._reset()
. It focuses on the state level, as
opposed to the observation one for the latter.
# Returns
An initial state.
# _state_step Environment
_state_step(
self,
action: StrDict[list[D.T_event]]
) -> TransitionOutcome[D.T_state, StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Compute one step of the transition's dynamics.
This is a helper function called by default from Environment._step()
. It focuses on the state level, as opposed
to the observation one for the latter.
# Parameters
- action: The action taken in the current memory (state or history) triggering the transition.
# Returns
The transition outcome of this step.
# _step Environment
_step(
self,
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Run one step of the environment's dynamics.
By default, Environment._step()
provides some boilerplate code and internally
calls Environment._state_step()
(which returns a transition outcome). The boilerplate code automatically stores
next state into the _memory
attribute and samples a corresponding observation.
TIP
Whenever an existing environment needs to be wrapped instead of implemented fully in scikit-decide (e.g. compiled
ATARI games), it is recommended to overwrite Environment._step()
to call the external environment and not
use the Environment._state_step()
helper function.
WARNING
Before calling Environment._step()
the first time or when the end of an episode is
reached, Initializable._reset()
must be called to reset the environment's state.
# Parameters
- action: The action taken in the current memory (state or history) triggering the transition.
# Returns
The environment outcome of this step.
# MultiAgentRLDomain
This is a typical multi-agent Reinforcement Learning domain class.
This helper class can be used as an alternate base class for domains, inheriting the following:
- Domain
- MultiAgent
- Sequential
- Environment
- Actions
- Initializable
- Markovian
- TransformedObservable
- Rewards
Typical use:
class D(RLDomain)
TIP
It is also possible to refine any alternate base class, like for instance:
class D(RLDomain, FullyObservable)
# check_value Rewards
check_value(
self,
value: Value[D.T_value]
) -> bool
Check that a value is compliant with its reward specification.
TIP
This function returns always True by default because any kind of reward should be accepted at this level.
# Parameters
- value: The value to check.
# Returns
True if the value is compliant (False otherwise).
# get_action_mask Events
get_action_mask(
self,
memory: Optional[Memory[D.T_state]] = None
) -> StrDict[Mask]
Get action mask for the given memory or internal one if omitted.
An action mask is another (more specific) format for applicable actions, that has a meaning only if the action space can be iterated over in some way. It is represented by a flat array of 0's and 1's ordered as the actions when enumerated: 1 for an applicable action, and 0 for a not applicable action.
More precisely, this implementation makes the assumption that each agent action space is an EnumerableSpace
,
and calls internally self.get_applicable_action()
.
The action mask is used for instance by RL solvers to shut down logits associated to non-applicable actions in the output of their internal neural network.
# Parameters
- memory: The memory to consider. If None, works on the internal memory of the domain.
# Returns
a numpy array (or dict agent-> numpy array for multi-agent domains) with 0-1 indicating applicability of the action (1 meaning applicable and 0 not applicable)
# get_action_space Events
get_action_space(
self
) -> StrDict[Space[D.T_event]]
Get the (cached) domain action space (finite or infinite set).
By default, Events.get_action_space()
internally calls Events._get_action_space_()
the first time and
automatically caches its value to make future calls more efficient (since the action space is assumed to be
constant).
# Returns
The action space.
# get_agents MultiAgent
get_agents(
self
) -> set[str]
Return the set of available agents ids.
# get_applicable_actions Events
get_applicable_actions(
self,
memory: Optional[Memory[D.T_state]] = None
) -> StrDict[Space[D.T_event]]
Get the space (finite or infinite set) of applicable actions in the given memory (state or history), or in the internal one if omitted.
By default, Events.get_applicable_actions()
provides some boilerplate code and internally
calls Events._get_applicable_actions()
. The boilerplate code automatically passes the _memory
attribute
instead of the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
The space of applicable actions.
# get_enabled_events Events
get_enabled_events(
self,
memory: Optional[Memory[D.T_state]] = None
) -> Space[D.T_event]
Get the space (finite or infinite set) of enabled uncontrollable events in the given memory (state or history), or in the internal one if omitted.
By default, Events.get_enabled_events()
provides some boilerplate code and internally
calls Events._get_enabled_events()
. The boilerplate code automatically passes the _memory
attribute instead of
the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
The space of enabled events.
# get_observation TransformedObservable
get_observation(
self,
state: D.T_state,
action: Optional[StrDict[list[D.T_event]]] = None
) -> StrDict[D.T_observation]
Get the deterministic observation given a state and action.
# Parameters
- state: The state to be observed.
- action: The last applied action (or None if the state is an initial state).
# Returns
The probability distribution of the observation.
# get_observation_distribution PartiallyObservable
get_observation_distribution(
self,
state: D.T_state,
action: Optional[StrDict[list[D.T_event]]] = None
) -> Distribution[StrDict[D.T_observation]]
Get the probability distribution of the observation given a state and action.
In mathematical terms (discrete case), given an action
# Parameters
- state: The state to be observed.
- action: The last applied action (or None if the state is an initial state).
# Returns
The probability distribution of the observation.
# get_observation_space PartiallyObservable
get_observation_space(
self
) -> StrDict[Space[D.T_observation]]
Get the (cached) observation space (finite or infinite set).
By default, PartiallyObservable.get_observation_space()
internally
calls PartiallyObservable._get_observation_space_()
the first time and automatically caches its value to make
future calls more efficient (since the observation space is assumed to be constant).
# Returns
The observation space.
# is_action Events
is_action(
self,
event: D.T_event
) -> bool
Indicate whether an event is an action (i.e. a controllable event for the agents).
TIP
By default, this function is implemented using the skdecide.core.Space.contains()
function on the domain
action space provided by Events.get_action_space()
, but it can be overridden for faster implementations.
# Parameters
- event: The event to consider.
# Returns
True if the event is an action (False otherwise).
# is_applicable_action Events
is_applicable_action(
self,
action: StrDict[D.T_event],
memory: Optional[Memory[D.T_state]] = None
) -> bool
Indicate whether an action is applicable in the given memory (state or history), or in the internal one if omitted.
By default, Events.is_applicable_action()
provides some boilerplate code and internally
calls Events._is_applicable_action()
. The boilerplate code automatically passes the _memory
attribute instead
of the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
True if the action is applicable (False otherwise).
# is_enabled_event Events
is_enabled_event(
self,
event: D.T_event,
memory: Optional[Memory[D.T_state]] = None
) -> bool
Indicate whether an uncontrollable event is enabled in the given memory (state or history), or in the internal one if omitted.
By default, Events.is_enabled_event()
provides some boilerplate code and internally
calls Events._is_enabled_event()
. The boilerplate code automatically passes the _memory
attribute instead of
the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
True if the event is enabled (False otherwise).
# is_observation PartiallyObservable
is_observation(
self,
observation: StrDict[D.T_observation]
) -> bool
Check that an observation indeed belongs to the domain observation space.
TIP
By default, this function is implemented using the skdecide.core.Space.contains()
function on the domain
observation space provided by PartiallyObservable.get_observation_space()
, but it can be overridden for
faster implementations.
# Parameters
- observation: The observation to consider.
# Returns
True if the observation belongs to the domain observation space (False otherwise).
# reset Initializable
reset(
self
) -> StrDict[D.T_observation]
Reset the state of the environment and return an initial observation.
By default, Initializable.reset()
provides some boilerplate code and internally calls Initializable._reset()
(which returns an initial state). The boilerplate code automatically stores the initial state into the _memory
attribute and samples a corresponding observation.
# Returns
An initial observation.
# step Environment
step(
self,
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Run one step of the environment's dynamics.
By default, Environment.step()
provides some boilerplate code and internally calls Environment._step()
(which
returns a transition outcome). The boilerplate code automatically stores next state into the _memory
attribute
and samples a corresponding observation.
TIP
Whenever an existing environment needs to be wrapped instead of implemented fully in scikit-decide (e.g. compiled
ATARI games), it is recommended to overwrite Environment.step()
to call the external environment and not
use the Environment._step()
helper function.
WARNING
Before calling Environment.step()
the first time or when the end of an episode is
reached, Initializable.reset()
must be called to reset the environment's state.
# Parameters
- action: The action taken in the current memory (state or history) triggering the transition.
# Returns
The environment outcome of this step.
# _check_value Rewards
_check_value(
self,
value: Value[D.T_value]
) -> bool
Check that a value is compliant with its reward specification.
TIP
This function returns always True by default because any kind of reward should be accepted at this level.
# Parameters
- value: The value to check.
# Returns
True if the value is compliant (False otherwise).
# _get_action_mask Events
_get_action_mask(
self,
memory: Optional[Memory[D.T_state]] = None
) -> StrDict[Mask]
Get action mask for the given memory or internal one if omitted.
An action mask is another (more specific) format for applicable actions, that has a meaning only if the action space can be iterated over in some way. It is represented by a flat array of 0's and 1's ordered as the actions when enumerated: 1 for an applicable action, and 0 for a not applicable action.
More precisely, this implementation makes the assumption that each agent action space is an EnumerableSpace
,
and calls internally self.get_applicable_action()
.
The action mask is used for instance by RL solvers to shut down logits associated to non-applicable actions in the output of their internal neural network.
# Parameters
- memory: The memory to consider. If None, works on the internal memory of the domain.
# Returns
a numpy array (or dict agent-> numpy array for multi-agent domains) with 0-1 indicating applicability of the action (1 meaning applicable and 0 not applicable)
# _get_action_space Events
_get_action_space(
self
) -> StrDict[Space[D.T_event]]
Get the (cached) domain action space (finite or infinite set).
By default, Events._get_action_space()
internally calls Events._get_action_space_()
the first time and
automatically caches its value to make future calls more efficient (since the action space is assumed to be
constant).
# Returns
The action space.
# _get_action_space_ Events
_get_action_space_(
self
) -> StrDict[Space[D.T_event]]
Get the domain action space (finite or infinite set).
This is a helper function called by default from Events._get_action_space()
, the difference being that the
result is not cached here.
TIP
The underscore at the end of this function's name is a convention to remind that its result should be constant.
# Returns
The action space.
# _get_applicable_actions Events
_get_applicable_actions(
self,
memory: Optional[Memory[D.T_state]] = None
) -> StrDict[Space[D.T_event]]
Get the space (finite or infinite set) of applicable actions in the given memory (state or history), or in the internal one if omitted.
By default, Events._get_applicable_actions()
provides some boilerplate code and internally
calls Events._get_applicable_actions_from()
. The boilerplate code automatically passes the _memory
attribute
instead of the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
The space of applicable actions.
# _get_applicable_actions_from Events
_get_applicable_actions_from(
self,
memory: Memory[D.T_state]
) -> StrDict[Space[D.T_event]]
Get the space (finite or infinite set) of applicable actions in the given memory (state or history).
This is a helper function called by default from Events._get_applicable_actions()
, the difference being that
the memory parameter is mandatory here.
# Parameters
- memory: The memory to consider.
# Returns
The space of applicable actions.
# _get_enabled_events Events
_get_enabled_events(
self,
memory: Optional[Memory[D.T_state]] = None
) -> Space[D.T_event]
Get the space (finite or infinite set) of enabled uncontrollable events in the given memory (state or history), or in the internal one if omitted.
By default, Events._get_enabled_events()
provides some boilerplate code and internally
calls Events._get_enabled_events_from()
. The boilerplate code automatically passes the _memory
attribute
instead of the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
The space of enabled events.
# _get_enabled_events_from Events
_get_enabled_events_from(
self,
memory: Memory[D.T_state]
) -> Space[D.T_event]
Get the space (finite or infinite set) of enabled uncontrollable events in the given memory (state or history).
This is a helper function called by default from Events._get_enabled_events()
, the difference being that the
memory parameter is mandatory here.
# Parameters
- memory: The memory to consider.
# Returns
The space of enabled events.
# _get_memory_maxlen History
_get_memory_maxlen(
self
) -> int
Get the (cached) memory max length.
By default, FiniteHistory._get_memory_maxlen()
internally calls FiniteHistory._get_memory_maxlen_()
the first
time and automatically caches its value to make future calls more efficient (since the memory max length is
assumed to be constant).
# Returns
The memory max length.
# _get_memory_maxlen_ FiniteHistory
_get_memory_maxlen_(
self
) -> int
Get the memory max length.
This is a helper function called by default from FiniteHistory._get_memory_maxlen()
, the difference being that
the result is not cached here.
TIP
The underscore at the end of this function's name is a convention to remind that its result should be constant.
# Returns
The memory max length.
# _get_observation TransformedObservable
_get_observation(
self,
state: D.T_state,
action: Optional[StrDict[list[D.T_event]]] = None
) -> StrDict[D.T_observation]
Get the deterministic observation given a state and action.
# Parameters
- state: The state to be observed.
- action: The last applied action (or None if the state is an initial state).
# Returns
The probability distribution of the observation.
# _get_observation_distribution PartiallyObservable
_get_observation_distribution(
self,
state: D.T_state,
action: Optional[StrDict[list[D.T_event]]] = None
) -> Distribution[StrDict[D.T_observation]]
Get the probability distribution of the observation given a state and action.
In mathematical terms (discrete case), given an action
# Parameters
- state: The state to be observed.
- action: The last applied action (or None if the state is an initial state).
# Returns
The probability distribution of the observation.
# _get_observation_space PartiallyObservable
_get_observation_space(
self
) -> StrDict[Space[D.T_observation]]
Get the (cached) observation space (finite or infinite set).
By default, PartiallyObservable._get_observation_space()
internally
calls PartiallyObservable._get_observation_space_()
the first time and automatically caches its value to make
future calls more efficient (since the observation space is assumed to be constant).
# Returns
The observation space.
# _get_observation_space_ PartiallyObservable
_get_observation_space_(
self
) -> StrDict[Space[D.T_observation]]
Get the observation space (finite or infinite set).
This is a helper function called by default from PartiallyObservable._get_observation_space()
, the difference
being that the result is not cached here.
TIP
The underscore at the end of this function's name is a convention to remind that its result should be constant.
# Returns
The observation space.
# _init_memory History
_init_memory(
self,
state: Optional[D.T_state] = None
) -> Memory[D.T_state]
Initialize memory (possibly with a state) according to its specification and return it.
This function is automatically called by Initializable._reset()
to reinitialize the internal memory whenever
the domain is used as an environment.
# Parameters
- state: An optional state to initialize the memory with (typically the initial state).
# Returns
The new initialized memory.
# _is_action Events
_is_action(
self,
event: D.T_event
) -> bool
Indicate whether an event is an action (i.e. a controllable event for the agents).
TIP
By default, this function is implemented using the skdecide.core.Space.contains()
function on the domain
action space provided by Events._get_action_space()
, but it can be overridden for faster implementations.
# Parameters
- event: The event to consider.
# Returns
True if the event is an action (False otherwise).
# _is_applicable_action Events
_is_applicable_action(
self,
action: StrDict[D.T_event],
memory: Optional[Memory[D.T_state]] = None
) -> bool
Indicate whether an action is applicable in the given memory (state or history), or in the internal one if omitted.
By default, Events._is_applicable_action()
provides some boilerplate code and internally
calls Events._is_applicable_action_from()
. The boilerplate code automatically passes the _memory
attribute
instead of the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
True if the action is applicable (False otherwise).
# _is_applicable_action_from Events
_is_applicable_action_from(
self,
action: StrDict[D.T_event],
memory: Memory[D.T_state]
) -> bool
Indicate whether an action is applicable in the given memory (state or history).
This is a helper function called by default from Events._is_applicable_action()
, the difference being that the
memory parameter is mandatory here.
TIP
By default, this function is implemented using the skdecide.core.Space.contains()
function on the space of
applicable actions provided by Events._get_applicable_actions_from()
, but it can be overridden for faster
implementations.
# Parameters
- memory: The memory to consider.
# Returns
True if the action is applicable (False otherwise).
# _is_enabled_event Events
_is_enabled_event(
self,
event: D.T_event,
memory: Optional[Memory[D.T_state]] = None
) -> bool
Indicate whether an uncontrollable event is enabled in the given memory (state or history), or in the internal one if omitted.
By default, Events._is_enabled_event()
provides some boilerplate code and internally
calls Events._is_enabled_event_from()
. The boilerplate code automatically passes the _memory
attribute instead
of the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
True if the event is enabled (False otherwise).
# _is_enabled_event_from Events
_is_enabled_event_from(
self,
event: D.T_event,
memory: Memory[D.T_state]
) -> bool
Indicate whether an event is enabled in the given memory (state or history).
This is a helper function called by default from Events._is_enabled_event()
, the difference being that the
memory parameter is mandatory here.
TIP
By default, this function is implemented using the skdecide.core.Space.contains()
function on the space of
enabled events provided by Events._get_enabled_events_from()
, but it can be overridden for faster
implementations.
# Parameters
- memory: The memory to consider.
# Returns
True if the event is enabled (False otherwise).
# _is_observation PartiallyObservable
_is_observation(
self,
observation: StrDict[D.T_observation]
) -> bool
Check that an observation indeed belongs to the domain observation space.
TIP
By default, this function is implemented using the skdecide.core.Space.contains()
function on the domain
observation space provided by PartiallyObservable._get_observation_space()
, but it can be overridden for
faster implementations.
# Parameters
- observation: The observation to consider.
# Returns
True if the observation belongs to the domain observation space (False otherwise).
# _reset Initializable
_reset(
self
) -> StrDict[D.T_observation]
Reset the state of the environment and return an initial observation.
By default, Initializable._reset()
provides some boilerplate code and internally
calls Initializable._state_reset()
(which returns an initial state). The boilerplate code automatically stores
the initial state into the _memory
attribute and samples a corresponding observation.
# Returns
An initial observation.
# _state_reset Initializable
_state_reset(
self
) -> D.T_state
Reset the state of the environment and return an initial state.
This is a helper function called by default from Initializable._reset()
. It focuses on the state level, as
opposed to the observation one for the latter.
# Returns
An initial state.
# _state_step Environment
_state_step(
self,
action: StrDict[list[D.T_event]]
) -> TransitionOutcome[D.T_state, StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Compute one step of the transition's dynamics.
This is a helper function called by default from Environment._step()
. It focuses on the state level, as opposed
to the observation one for the latter.
# Parameters
- action: The action taken in the current memory (state or history) triggering the transition.
# Returns
The transition outcome of this step.
# _step Environment
_step(
self,
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Run one step of the environment's dynamics.
By default, Environment._step()
provides some boilerplate code and internally
calls Environment._state_step()
(which returns a transition outcome). The boilerplate code automatically stores
next state into the _memory
attribute and samples a corresponding observation.
TIP
Whenever an existing environment needs to be wrapped instead of implemented fully in scikit-decide (e.g. compiled
ATARI games), it is recommended to overwrite Environment._step()
to call the external environment and not
use the Environment._state_step()
helper function.
WARNING
Before calling Environment._step()
the first time or when the end of an episode is
reached, Initializable._reset()
must be called to reset the environment's state.
# Parameters
- action: The action taken in the current memory (state or history) triggering the transition.
# Returns
The environment outcome of this step.
# StatelessSimulatorDomain
This is a typical stateless simulator domain class.
This helper class can be used as an alternate base class for domains, inheriting the following:
- Domain
- SingleAgent
- Sequential
- Simulation
- Actions
- Markovian
- TransformedObservable
- Rewards
Typical use:
class D(StatelessSimulatorDomain)
TIP
It is also possible to refine any alternate base class, like for instance:
class D(RLDomain, FullyObservable)
# check_value Rewards
check_value(
self,
value: Value[D.T_value]
) -> bool
Check that a value is compliant with its reward specification.
TIP
This function returns always True by default because any kind of reward should be accepted at this level.
# Parameters
- value: The value to check.
# Returns
True if the value is compliant (False otherwise).
# get_action_mask Events
get_action_mask(
self,
memory: Optional[Memory[D.T_state]] = None
) -> StrDict[Mask]
Get action mask for the given memory or internal one if omitted.
An action mask is another (more specific) format for applicable actions, that has a meaning only if the action space can be iterated over in some way. It is represented by a flat array of 0's and 1's ordered as the actions when enumerated: 1 for an applicable action, and 0 for a not applicable action.
More precisely, this implementation makes the assumption that each agent action space is an EnumerableSpace
,
and calls internally self.get_applicable_action()
.
The action mask is used for instance by RL solvers to shut down logits associated to non-applicable actions in the output of their internal neural network.
# Parameters
- memory: The memory to consider. If None, works on the internal memory of the domain.
# Returns
a numpy array (or dict agent-> numpy array for multi-agent domains) with 0-1 indicating applicability of the action (1 meaning applicable and 0 not applicable)
# get_action_space Events
get_action_space(
self
) -> StrDict[Space[D.T_event]]
Get the (cached) domain action space (finite or infinite set).
By default, Events.get_action_space()
internally calls Events._get_action_space_()
the first time and
automatically caches its value to make future calls more efficient (since the action space is assumed to be
constant).
# Returns
The action space.
# get_agents MultiAgent
get_agents(
self
) -> set[str]
Return a singleton for single agent domains.
We must be here consistent with skdecide.core.autocast()
which transforms a single agent domain
into a multi agents domain whose only agent has the id "agent".
# get_applicable_actions Events
get_applicable_actions(
self,
memory: Optional[Memory[D.T_state]] = None
) -> StrDict[Space[D.T_event]]
Get the space (finite or infinite set) of applicable actions in the given memory (state or history), or in the internal one if omitted.
By default, Events.get_applicable_actions()
provides some boilerplate code and internally
calls Events._get_applicable_actions()
. The boilerplate code automatically passes the _memory
attribute
instead of the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
The space of applicable actions.
# get_enabled_events Events
get_enabled_events(
self,
memory: Optional[Memory[D.T_state]] = None
) -> Space[D.T_event]
Get the space (finite or infinite set) of enabled uncontrollable events in the given memory (state or history), or in the internal one if omitted.
By default, Events.get_enabled_events()
provides some boilerplate code and internally
calls Events._get_enabled_events()
. The boilerplate code automatically passes the _memory
attribute instead of
the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
The space of enabled events.
# get_observation TransformedObservable
get_observation(
self,
state: D.T_state,
action: Optional[StrDict[list[D.T_event]]] = None
) -> StrDict[D.T_observation]
Get the deterministic observation given a state and action.
# Parameters
- state: The state to be observed.
- action: The last applied action (or None if the state is an initial state).
# Returns
The probability distribution of the observation.
# get_observation_distribution PartiallyObservable
get_observation_distribution(
self,
state: D.T_state,
action: Optional[StrDict[list[D.T_event]]] = None
) -> Distribution[StrDict[D.T_observation]]
Get the probability distribution of the observation given a state and action.
In mathematical terms (discrete case), given an action
# Parameters
- state: The state to be observed.
- action: The last applied action (or None if the state is an initial state).
# Returns
The probability distribution of the observation.
# get_observation_space PartiallyObservable
get_observation_space(
self
) -> StrDict[Space[D.T_observation]]
Get the (cached) observation space (finite or infinite set).
By default, PartiallyObservable.get_observation_space()
internally
calls PartiallyObservable._get_observation_space_()
the first time and automatically caches its value to make
future calls more efficient (since the observation space is assumed to be constant).
# Returns
The observation space.
# is_action Events
is_action(
self,
event: D.T_event
) -> bool
Indicate whether an event is an action (i.e. a controllable event for the agents).
TIP
By default, this function is implemented using the skdecide.core.Space.contains()
function on the domain
action space provided by Events.get_action_space()
, but it can be overridden for faster implementations.
# Parameters
- event: The event to consider.
# Returns
True if the event is an action (False otherwise).
# is_applicable_action Events
is_applicable_action(
self,
action: StrDict[D.T_event],
memory: Optional[Memory[D.T_state]] = None
) -> bool
Indicate whether an action is applicable in the given memory (state or history), or in the internal one if omitted.
By default, Events.is_applicable_action()
provides some boilerplate code and internally
calls Events._is_applicable_action()
. The boilerplate code automatically passes the _memory
attribute instead
of the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
True if the action is applicable (False otherwise).
# is_enabled_event Events
is_enabled_event(
self,
event: D.T_event,
memory: Optional[Memory[D.T_state]] = None
) -> bool
Indicate whether an uncontrollable event is enabled in the given memory (state or history), or in the internal one if omitted.
By default, Events.is_enabled_event()
provides some boilerplate code and internally
calls Events._is_enabled_event()
. The boilerplate code automatically passes the _memory
attribute instead of
the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
True if the event is enabled (False otherwise).
# is_observation PartiallyObservable
is_observation(
self,
observation: StrDict[D.T_observation]
) -> bool
Check that an observation indeed belongs to the domain observation space.
TIP
By default, this function is implemented using the skdecide.core.Space.contains()
function on the domain
observation space provided by PartiallyObservable.get_observation_space()
, but it can be overridden for
faster implementations.
# Parameters
- observation: The observation to consider.
# Returns
True if the observation belongs to the domain observation space (False otherwise).
# sample Simulation
sample(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Sample one transition of the simulator's dynamics.
By default, Simulation.sample()
provides some boilerplate code and internally calls Simulation._sample()
(which returns a transition outcome). The boilerplate code automatically samples an observation corresponding to
the sampled next state.
TIP
Whenever an existing simulator needs to be wrapped instead of implemented fully in scikit-decide (e.g. a
simulator), it is recommended to overwrite Simulation.sample()
to call the external simulator and not use
the Simulation._sample()
helper function.
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
# Returns
The environment outcome of the sampled transition.
# set_memory Simulation
set_memory(
self,
memory: Memory[D.T_state]
) -> None
Set internal memory attribute _memory
to given one.
This can be useful to set a specific "starting point" before doing a rollout with successive Environment.step()
calls.
# Parameters
- memory: The memory to set internally.
# Example
# Set simulation_domain memory to my_state (assuming Markovian domain)
simulation_domain.set_memory(my_state)
# Start a 100-steps rollout from here (applying my_action at every step)
for _ in range(100):
simulation_domain.step(my_action)
# step Environment
step(
self,
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Run one step of the environment's dynamics.
By default, Environment.step()
provides some boilerplate code and internally calls Environment._step()
(which
returns a transition outcome). The boilerplate code automatically stores next state into the _memory
attribute
and samples a corresponding observation.
TIP
Whenever an existing environment needs to be wrapped instead of implemented fully in scikit-decide (e.g. compiled
ATARI games), it is recommended to overwrite Environment.step()
to call the external environment and not
use the Environment._step()
helper function.
WARNING
Before calling Environment.step()
the first time or when the end of an episode is
reached, Initializable.reset()
must be called to reset the environment's state.
# Parameters
- action: The action taken in the current memory (state or history) triggering the transition.
# Returns
The environment outcome of this step.
# _check_value Rewards
_check_value(
self,
value: Value[D.T_value]
) -> bool
Check that a value is compliant with its reward specification.
TIP
This function returns always True by default because any kind of reward should be accepted at this level.
# Parameters
- value: The value to check.
# Returns
True if the value is compliant (False otherwise).
# _get_action_mask Events
_get_action_mask(
self,
memory: Optional[Memory[D.T_state]] = None
) -> StrDict[Mask]
Get action mask for the given memory or internal one if omitted.
An action mask is another (more specific) format for applicable actions, that has a meaning only if the action space can be iterated over in some way. It is represented by a flat array of 0's and 1's ordered as the actions when enumerated: 1 for an applicable action, and 0 for a not applicable action.
More precisely, this implementation makes the assumption that each agent action space is an EnumerableSpace
,
and calls internally self.get_applicable_action()
.
The action mask is used for instance by RL solvers to shut down logits associated to non-applicable actions in the output of their internal neural network.
# Parameters
- memory: The memory to consider. If None, works on the internal memory of the domain.
# Returns
a numpy array (or dict agent-> numpy array for multi-agent domains) with 0-1 indicating applicability of the action (1 meaning applicable and 0 not applicable)
# _get_action_space Events
_get_action_space(
self
) -> StrDict[Space[D.T_event]]
Get the (cached) domain action space (finite or infinite set).
By default, Events._get_action_space()
internally calls Events._get_action_space_()
the first time and
automatically caches its value to make future calls more efficient (since the action space is assumed to be
constant).
# Returns
The action space.
# _get_action_space_ Events
_get_action_space_(
self
) -> StrDict[Space[D.T_event]]
Get the domain action space (finite or infinite set).
This is a helper function called by default from Events._get_action_space()
, the difference being that the
result is not cached here.
TIP
The underscore at the end of this function's name is a convention to remind that its result should be constant.
# Returns
The action space.
# _get_applicable_actions Events
_get_applicable_actions(
self,
memory: Optional[Memory[D.T_state]] = None
) -> StrDict[Space[D.T_event]]
Get the space (finite or infinite set) of applicable actions in the given memory (state or history), or in the internal one if omitted.
By default, Events._get_applicable_actions()
provides some boilerplate code and internally
calls Events._get_applicable_actions_from()
. The boilerplate code automatically passes the _memory
attribute
instead of the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
The space of applicable actions.
# _get_applicable_actions_from Events
_get_applicable_actions_from(
self,
memory: Memory[D.T_state]
) -> StrDict[Space[D.T_event]]
Get the space (finite or infinite set) of applicable actions in the given memory (state or history).
This is a helper function called by default from Events._get_applicable_actions()
, the difference being that
the memory parameter is mandatory here.
# Parameters
- memory: The memory to consider.
# Returns
The space of applicable actions.
# _get_enabled_events Events
_get_enabled_events(
self,
memory: Optional[Memory[D.T_state]] = None
) -> Space[D.T_event]
Get the space (finite or infinite set) of enabled uncontrollable events in the given memory (state or history), or in the internal one if omitted.
By default, Events._get_enabled_events()
provides some boilerplate code and internally
calls Events._get_enabled_events_from()
. The boilerplate code automatically passes the _memory
attribute
instead of the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
The space of enabled events.
# _get_enabled_events_from Events
_get_enabled_events_from(
self,
memory: Memory[D.T_state]
) -> Space[D.T_event]
Get the space (finite or infinite set) of enabled uncontrollable events in the given memory (state or history).
This is a helper function called by default from Events._get_enabled_events()
, the difference being that the
memory parameter is mandatory here.
# Parameters
- memory: The memory to consider.
# Returns
The space of enabled events.
# _get_memory_maxlen History
_get_memory_maxlen(
self
) -> int
Get the (cached) memory max length.
By default, FiniteHistory._get_memory_maxlen()
internally calls FiniteHistory._get_memory_maxlen_()
the first
time and automatically caches its value to make future calls more efficient (since the memory max length is
assumed to be constant).
# Returns
The memory max length.
# _get_memory_maxlen_ FiniteHistory
_get_memory_maxlen_(
self
) -> int
Get the memory max length.
This is a helper function called by default from FiniteHistory._get_memory_maxlen()
, the difference being that
the result is not cached here.
TIP
The underscore at the end of this function's name is a convention to remind that its result should be constant.
# Returns
The memory max length.
# _get_observation TransformedObservable
_get_observation(
self,
state: D.T_state,
action: Optional[StrDict[list[D.T_event]]] = None
) -> StrDict[D.T_observation]
Get the deterministic observation given a state and action.
# Parameters
- state: The state to be observed.
- action: The last applied action (or None if the state is an initial state).
# Returns
The probability distribution of the observation.
# _get_observation_distribution PartiallyObservable
_get_observation_distribution(
self,
state: D.T_state,
action: Optional[StrDict[list[D.T_event]]] = None
) -> Distribution[StrDict[D.T_observation]]
Get the probability distribution of the observation given a state and action.
In mathematical terms (discrete case), given an action
# Parameters
- state: The state to be observed.
- action: The last applied action (or None if the state is an initial state).
# Returns
The probability distribution of the observation.
# _get_observation_space PartiallyObservable
_get_observation_space(
self
) -> StrDict[Space[D.T_observation]]
Get the (cached) observation space (finite or infinite set).
By default, PartiallyObservable._get_observation_space()
internally
calls PartiallyObservable._get_observation_space_()
the first time and automatically caches its value to make
future calls more efficient (since the observation space is assumed to be constant).
# Returns
The observation space.
# _get_observation_space_ PartiallyObservable
_get_observation_space_(
self
) -> StrDict[Space[D.T_observation]]
Get the observation space (finite or infinite set).
This is a helper function called by default from PartiallyObservable._get_observation_space()
, the difference
being that the result is not cached here.
TIP
The underscore at the end of this function's name is a convention to remind that its result should be constant.
# Returns
The observation space.
# _init_memory History
_init_memory(
self,
state: Optional[D.T_state] = None
) -> Memory[D.T_state]
Initialize memory (possibly with a state) according to its specification and return it.
This function is automatically called by Initializable._reset()
to reinitialize the internal memory whenever
the domain is used as an environment.
# Parameters
- state: An optional state to initialize the memory with (typically the initial state).
# Returns
The new initialized memory.
# _is_action Events
_is_action(
self,
event: D.T_event
) -> bool
Indicate whether an event is an action (i.e. a controllable event for the agents).
TIP
By default, this function is implemented using the skdecide.core.Space.contains()
function on the domain
action space provided by Events._get_action_space()
, but it can be overridden for faster implementations.
# Parameters
- event: The event to consider.
# Returns
True if the event is an action (False otherwise).
# _is_applicable_action Events
_is_applicable_action(
self,
action: StrDict[D.T_event],
memory: Optional[Memory[D.T_state]] = None
) -> bool
Indicate whether an action is applicable in the given memory (state or history), or in the internal one if omitted.
By default, Events._is_applicable_action()
provides some boilerplate code and internally
calls Events._is_applicable_action_from()
. The boilerplate code automatically passes the _memory
attribute
instead of the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
True if the action is applicable (False otherwise).
# _is_applicable_action_from Events
_is_applicable_action_from(
self,
action: StrDict[D.T_event],
memory: Memory[D.T_state]
) -> bool
Indicate whether an action is applicable in the given memory (state or history).
This is a helper function called by default from Events._is_applicable_action()
, the difference being that the
memory parameter is mandatory here.
TIP
By default, this function is implemented using the skdecide.core.Space.contains()
function on the space of
applicable actions provided by Events._get_applicable_actions_from()
, but it can be overridden for faster
implementations.
# Parameters
- memory: The memory to consider.
# Returns
True if the action is applicable (False otherwise).
# _is_enabled_event Events
_is_enabled_event(
self,
event: D.T_event,
memory: Optional[Memory[D.T_state]] = None
) -> bool
Indicate whether an uncontrollable event is enabled in the given memory (state or history), or in the internal one if omitted.
By default, Events._is_enabled_event()
provides some boilerplate code and internally
calls Events._is_enabled_event_from()
. The boilerplate code automatically passes the _memory
attribute instead
of the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
True if the event is enabled (False otherwise).
# _is_enabled_event_from Events
_is_enabled_event_from(
self,
event: D.T_event,
memory: Memory[D.T_state]
) -> bool
Indicate whether an event is enabled in the given memory (state or history).
This is a helper function called by default from Events._is_enabled_event()
, the difference being that the
memory parameter is mandatory here.
TIP
By default, this function is implemented using the skdecide.core.Space.contains()
function on the space of
enabled events provided by Events._get_enabled_events_from()
, but it can be overridden for faster
implementations.
# Parameters
- memory: The memory to consider.
# Returns
True if the event is enabled (False otherwise).
# _is_observation PartiallyObservable
_is_observation(
self,
observation: StrDict[D.T_observation]
) -> bool
Check that an observation indeed belongs to the domain observation space.
TIP
By default, this function is implemented using the skdecide.core.Space.contains()
function on the domain
observation space provided by PartiallyObservable._get_observation_space()
, but it can be overridden for
faster implementations.
# Parameters
- observation: The observation to consider.
# Returns
True if the observation belongs to the domain observation space (False otherwise).
# _sample Simulation
_sample(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Sample one transition of the simulator's dynamics.
By default, Simulation._sample()
provides some boilerplate code and internally
calls Simulation._state_sample()
(which returns a transition outcome). The boilerplate code automatically
samples an observation corresponding to the sampled next state.
TIP
Whenever an existing simulator needs to be wrapped instead of implemented fully in scikit-decide (e.g. a
simulator), it is recommended to overwrite Simulation._sample()
to call the external simulator and not use
the Simulation._state_sample()
helper function.
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
# Returns
The environment outcome of the sampled transition.
# _set_memory Simulation
_set_memory(
self,
memory: Memory[D.T_state]
) -> None
Set internal memory attribute _memory
to given one.
This can be useful to set a specific "starting point" before doing a rollout with
successive Environment._step()
calls.
# Parameters
- memory: The memory to set internally.
# Example
# Set simulation_domain memory to my_state (assuming Markovian domain)
simulation_domain._set_memory(my_state)
# Start a 100-steps rollout from here (applying my_action at every step)
for _ in range(100):
simulation_domain._step(my_action)
# _state_sample Simulation
_state_sample(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> TransitionOutcome[D.T_state, StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Compute one sample of the transition's dynamics.
This is a helper function called by default from Simulation._sample()
. It focuses on the state level, as
opposed to the observation one for the latter.
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
# Returns
The transition outcome of the sampled transition.
# _state_step Environment
_state_step(
self,
action: StrDict[list[D.T_event]]
) -> TransitionOutcome[D.T_state, StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Compute one step of the transition's dynamics.
This is a helper function called by default from Environment._step()
. It focuses on the state level, as opposed
to the observation one for the latter.
# Parameters
- action: The action taken in the current memory (state or history) triggering the transition.
# Returns
The transition outcome of this step.
# _step Environment
_step(
self,
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Run one step of the environment's dynamics.
By default, Environment._step()
provides some boilerplate code and internally
calls Environment._state_step()
(which returns a transition outcome). The boilerplate code automatically stores
next state into the _memory
attribute and samples a corresponding observation.
TIP
Whenever an existing environment needs to be wrapped instead of implemented fully in scikit-decide (e.g. compiled
ATARI games), it is recommended to overwrite Environment._step()
to call the external environment and not
use the Environment._state_step()
helper function.
WARNING
Before calling Environment._step()
the first time or when the end of an episode is
reached, Initializable._reset()
must be called to reset the environment's state.
# Parameters
- action: The action taken in the current memory (state or history) triggering the transition.
# Returns
The environment outcome of this step.
# MDPDomain
This is a typical Markov Decision Process domain class.
This helper class can be used as an alternate base class for domains, inheriting the following:
- Domain
- SingleAgent
- Sequential
- EnumerableTransitions
- Actions
- DeterministicInitialized
- Markovian
- FullyObservable
- Rewards
Typical use:
class D(MDPDomain)
TIP
It is also possible to refine any alternate base class, like for instance:
class D(RLDomain, FullyObservable)
# check_value Rewards
check_value(
self,
value: Value[D.T_value]
) -> bool
Check that a value is compliant with its reward specification.
TIP
This function returns always True by default because any kind of reward should be accepted at this level.
# Parameters
- value: The value to check.
# Returns
True if the value is compliant (False otherwise).
# get_action_mask Events
get_action_mask(
self,
memory: Optional[Memory[D.T_state]] = None
) -> StrDict[Mask]
Get action mask for the given memory or internal one if omitted.
An action mask is another (more specific) format for applicable actions, that has a meaning only if the action space can be iterated over in some way. It is represented by a flat array of 0's and 1's ordered as the actions when enumerated: 1 for an applicable action, and 0 for a not applicable action.
More precisely, this implementation makes the assumption that each agent action space is an EnumerableSpace
,
and calls internally self.get_applicable_action()
.
The action mask is used for instance by RL solvers to shut down logits associated to non-applicable actions in the output of their internal neural network.
# Parameters
- memory: The memory to consider. If None, works on the internal memory of the domain.
# Returns
a numpy array (or dict agent-> numpy array for multi-agent domains) with 0-1 indicating applicability of the action (1 meaning applicable and 0 not applicable)
# get_action_space Events
get_action_space(
self
) -> StrDict[Space[D.T_event]]
Get the (cached) domain action space (finite or infinite set).
By default, Events.get_action_space()
internally calls Events._get_action_space_()
the first time and
automatically caches its value to make future calls more efficient (since the action space is assumed to be
constant).
# Returns
The action space.
# get_agents MultiAgent
get_agents(
self
) -> set[str]
Return a singleton for single agent domains.
We must be here consistent with skdecide.core.autocast()
which transforms a single agent domain
into a multi agents domain whose only agent has the id "agent".
# get_applicable_actions Events
get_applicable_actions(
self,
memory: Optional[Memory[D.T_state]] = None
) -> StrDict[Space[D.T_event]]
Get the space (finite or infinite set) of applicable actions in the given memory (state or history), or in the internal one if omitted.
By default, Events.get_applicable_actions()
provides some boilerplate code and internally
calls Events._get_applicable_actions()
. The boilerplate code automatically passes the _memory
attribute
instead of the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
The space of applicable actions.
# get_enabled_events Events
get_enabled_events(
self,
memory: Optional[Memory[D.T_state]] = None
) -> Space[D.T_event]
Get the space (finite or infinite set) of enabled uncontrollable events in the given memory (state or history), or in the internal one if omitted.
By default, Events.get_enabled_events()
provides some boilerplate code and internally
calls Events._get_enabled_events()
. The boilerplate code automatically passes the _memory
attribute instead of
the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
The space of enabled events.
# get_initial_state DeterministicInitialized
get_initial_state(
self
) -> D.T_state
Get the (cached) initial state.
By default, DeterministicInitialized.get_initial_state()
internally
calls DeterministicInitialized._get_initial_state_()
the first time and automatically caches its value to make
future calls more efficient (since the initial state is assumed to be constant).
# Returns
The initial state.
# get_initial_state_distribution UncertainInitialized
get_initial_state_distribution(
self
) -> Distribution[D.T_state]
Get the (cached) probability distribution of initial states.
By default, UncertainInitialized.get_initial_state_distribution()
internally
calls UncertainInitialized._get_initial_state_distribution_()
the first time and automatically caches its value
to make future calls more efficient (since the initial state distribution is assumed to be constant).
# Returns
The probability distribution of initial states.
# get_next_state_distribution UncertainTransitions
get_next_state_distribution(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> DiscreteDistribution[D.T_state]
Get the discrete probability distribution of next state given a memory and action.
TIP
In the Markovian case (memory only holds last state
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
# Returns
The discrete probability distribution of next state.
# get_observation TransformedObservable
get_observation(
self,
state: D.T_state,
action: Optional[StrDict[list[D.T_event]]] = None
) -> StrDict[D.T_observation]
Get the deterministic observation given a state and action.
# Parameters
- state: The state to be observed.
- action: The last applied action (or None if the state is an initial state).
# Returns
The probability distribution of the observation.
# get_observation_distribution PartiallyObservable
get_observation_distribution(
self,
state: D.T_state,
action: Optional[StrDict[list[D.T_event]]] = None
) -> Distribution[StrDict[D.T_observation]]
Get the probability distribution of the observation given a state and action.
In mathematical terms (discrete case), given an action
# Parameters
- state: The state to be observed.
- action: The last applied action (or None if the state is an initial state).
# Returns
The probability distribution of the observation.
# get_observation_space PartiallyObservable
get_observation_space(
self
) -> StrDict[Space[D.T_observation]]
Get the (cached) observation space (finite or infinite set).
By default, PartiallyObservable.get_observation_space()
internally
calls PartiallyObservable._get_observation_space_()
the first time and automatically caches its value to make
future calls more efficient (since the observation space is assumed to be constant).
# Returns
The observation space.
# get_transition_value UncertainTransitions
get_transition_value(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]],
next_state: Optional[D.T_state] = None
) -> StrDict[Value[D.T_value]]
Get the value (reward or cost) of a transition.
The transition to consider is defined by the function parameters.
TIP
If this function never depends on the next_state parameter for its computation, it is recommended to
indicate it by overriding UncertainTransitions._is_transition_value_dependent_on_next_state_()
to return
False. This information can then be exploited by solvers to avoid computing next state to evaluate a
transition value (more efficient).
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
- next_state: The next state in which the transition ends (if needed for the computation).
# Returns
The transition value (reward or cost).
# is_action Events
is_action(
self,
event: D.T_event
) -> bool
Indicate whether an event is an action (i.e. a controllable event for the agents).
TIP
By default, this function is implemented using the skdecide.core.Space.contains()
function on the domain
action space provided by Events.get_action_space()
, but it can be overridden for faster implementations.
# Parameters
- event: The event to consider.
# Returns
True if the event is an action (False otherwise).
# is_applicable_action Events
is_applicable_action(
self,
action: StrDict[D.T_event],
memory: Optional[Memory[D.T_state]] = None
) -> bool
Indicate whether an action is applicable in the given memory (state or history), or in the internal one if omitted.
By default, Events.is_applicable_action()
provides some boilerplate code and internally
calls Events._is_applicable_action()
. The boilerplate code automatically passes the _memory
attribute instead
of the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
True if the action is applicable (False otherwise).
# is_enabled_event Events
is_enabled_event(
self,
event: D.T_event,
memory: Optional[Memory[D.T_state]] = None
) -> bool
Indicate whether an uncontrollable event is enabled in the given memory (state or history), or in the internal one if omitted.
By default, Events.is_enabled_event()
provides some boilerplate code and internally
calls Events._is_enabled_event()
. The boilerplate code automatically passes the _memory
attribute instead of
the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
True if the event is enabled (False otherwise).
# is_observation PartiallyObservable
is_observation(
self,
observation: StrDict[D.T_observation]
) -> bool
Check that an observation indeed belongs to the domain observation space.
TIP
By default, this function is implemented using the skdecide.core.Space.contains()
function on the domain
observation space provided by PartiallyObservable.get_observation_space()
, but it can be overridden for
faster implementations.
# Parameters
- observation: The observation to consider.
# Returns
True if the observation belongs to the domain observation space (False otherwise).
# is_terminal UncertainTransitions
is_terminal(
self,
state: D.T_state
) -> StrDict[D.T_predicate]
Indicate whether a state is terminal.
A terminal state is a state with no outgoing transition (except to itself with value 0).
# Parameters
- state: The state to consider.
# Returns
True if the state is terminal (False otherwise).
# is_transition_value_dependent_on_next_state UncertainTransitions
is_transition_value_dependent_on_next_state(
self
) -> bool
Indicate whether get_transition_value() requires the next_state parameter for its computation (cached).
By default, UncertainTransitions.is_transition_value_dependent_on_next_state()
internally
calls UncertainTransitions._is_transition_value_dependent_on_next_state_()
the first time and automatically
caches its value to make future calls more efficient (since the returned value is assumed to be constant).
# Returns
True if the transition value computation depends on next_state (False otherwise).
# reset Initializable
reset(
self
) -> StrDict[D.T_observation]
Reset the state of the environment and return an initial observation.
By default, Initializable.reset()
provides some boilerplate code and internally calls Initializable._reset()
(which returns an initial state). The boilerplate code automatically stores the initial state into the _memory
attribute and samples a corresponding observation.
# Returns
An initial observation.
# sample Simulation
sample(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Sample one transition of the simulator's dynamics.
By default, Simulation.sample()
provides some boilerplate code and internally calls Simulation._sample()
(which returns a transition outcome). The boilerplate code automatically samples an observation corresponding to
the sampled next state.
TIP
Whenever an existing simulator needs to be wrapped instead of implemented fully in scikit-decide (e.g. a
simulator), it is recommended to overwrite Simulation.sample()
to call the external simulator and not use
the Simulation._sample()
helper function.
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
# Returns
The environment outcome of the sampled transition.
# set_memory Simulation
set_memory(
self,
memory: Memory[D.T_state]
) -> None
Set internal memory attribute _memory
to given one.
This can be useful to set a specific "starting point" before doing a rollout with successive Environment.step()
calls.
# Parameters
- memory: The memory to set internally.
# Example
# Set simulation_domain memory to my_state (assuming Markovian domain)
simulation_domain.set_memory(my_state)
# Start a 100-steps rollout from here (applying my_action at every step)
for _ in range(100):
simulation_domain.step(my_action)
# step Environment
step(
self,
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Run one step of the environment's dynamics.
By default, Environment.step()
provides some boilerplate code and internally calls Environment._step()
(which
returns a transition outcome). The boilerplate code automatically stores next state into the _memory
attribute
and samples a corresponding observation.
TIP
Whenever an existing environment needs to be wrapped instead of implemented fully in scikit-decide (e.g. compiled
ATARI games), it is recommended to overwrite Environment.step()
to call the external environment and not
use the Environment._step()
helper function.
WARNING
Before calling Environment.step()
the first time or when the end of an episode is
reached, Initializable.reset()
must be called to reset the environment's state.
# Parameters
- action: The action taken in the current memory (state or history) triggering the transition.
# Returns
The environment outcome of this step.
# _check_value Rewards
_check_value(
self,
value: Value[D.T_value]
) -> bool
Check that a value is compliant with its reward specification.
TIP
This function returns always True by default because any kind of reward should be accepted at this level.
# Parameters
- value: The value to check.
# Returns
True if the value is compliant (False otherwise).
# _get_action_mask Events
_get_action_mask(
self,
memory: Optional[Memory[D.T_state]] = None
) -> StrDict[Mask]
Get action mask for the given memory or internal one if omitted.
An action mask is another (more specific) format for applicable actions, that has a meaning only if the action space can be iterated over in some way. It is represented by a flat array of 0's and 1's ordered as the actions when enumerated: 1 for an applicable action, and 0 for a not applicable action.
More precisely, this implementation makes the assumption that each agent action space is an EnumerableSpace
,
and calls internally self.get_applicable_action()
.
The action mask is used for instance by RL solvers to shut down logits associated to non-applicable actions in the output of their internal neural network.
# Parameters
- memory: The memory to consider. If None, works on the internal memory of the domain.
# Returns
a numpy array (or dict agent-> numpy array for multi-agent domains) with 0-1 indicating applicability of the action (1 meaning applicable and 0 not applicable)
# _get_action_space Events
_get_action_space(
self
) -> StrDict[Space[D.T_event]]
Get the (cached) domain action space (finite or infinite set).
By default, Events._get_action_space()
internally calls Events._get_action_space_()
the first time and
automatically caches its value to make future calls more efficient (since the action space is assumed to be
constant).
# Returns
The action space.
# _get_action_space_ Events
_get_action_space_(
self
) -> StrDict[Space[D.T_event]]
Get the domain action space (finite or infinite set).
This is a helper function called by default from Events._get_action_space()
, the difference being that the
result is not cached here.
TIP
The underscore at the end of this function's name is a convention to remind that its result should be constant.
# Returns
The action space.
# _get_applicable_actions Events
_get_applicable_actions(
self,
memory: Optional[Memory[D.T_state]] = None
) -> StrDict[Space[D.T_event]]
Get the space (finite or infinite set) of applicable actions in the given memory (state or history), or in the internal one if omitted.
By default, Events._get_applicable_actions()
provides some boilerplate code and internally
calls Events._get_applicable_actions_from()
. The boilerplate code automatically passes the _memory
attribute
instead of the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
The space of applicable actions.
# _get_applicable_actions_from Events
_get_applicable_actions_from(
self,
memory: Memory[D.T_state]
) -> StrDict[Space[D.T_event]]
Get the space (finite or infinite set) of applicable actions in the given memory (state or history).
This is a helper function called by default from Events._get_applicable_actions()
, the difference being that
the memory parameter is mandatory here.
# Parameters
- memory: The memory to consider.
# Returns
The space of applicable actions.
# _get_enabled_events Events
_get_enabled_events(
self,
memory: Optional[Memory[D.T_state]] = None
) -> Space[D.T_event]
Get the space (finite or infinite set) of enabled uncontrollable events in the given memory (state or history), or in the internal one if omitted.
By default, Events._get_enabled_events()
provides some boilerplate code and internally
calls Events._get_enabled_events_from()
. The boilerplate code automatically passes the _memory
attribute
instead of the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
The space of enabled events.
# _get_enabled_events_from Events
_get_enabled_events_from(
self,
memory: Memory[D.T_state]
) -> Space[D.T_event]
Get the space (finite or infinite set) of enabled uncontrollable events in the given memory (state or history).
This is a helper function called by default from Events._get_enabled_events()
, the difference being that the
memory parameter is mandatory here.
# Parameters
- memory: The memory to consider.
# Returns
The space of enabled events.
# _get_initial_state DeterministicInitialized
_get_initial_state(
self
) -> D.T_state
Get the (cached) initial state.
By default, DeterministicInitialized._get_initial_state()
internally
calls DeterministicInitialized._get_initial_state_()
the first time and automatically caches its value to make
future calls more efficient (since the initial state is assumed to be constant).
# Returns
The initial state.
# _get_initial_state_ DeterministicInitialized
_get_initial_state_(
self
) -> D.T_state
Get the initial state.
This is a helper function called by default from DeterministicInitialized._get_initial_state()
, the difference
being that the result is not cached here.
# Returns
The initial state.
# _get_initial_state_distribution UncertainInitialized
_get_initial_state_distribution(
self
) -> Distribution[D.T_state]
Get the (cached) probability distribution of initial states.
By default, UncertainInitialized._get_initial_state_distribution()
internally
calls UncertainInitialized._get_initial_state_distribution_()
the first time and automatically caches its value
to make future calls more efficient (since the initial state distribution is assumed to be constant).
# Returns
The probability distribution of initial states.
# _get_initial_state_distribution_ UncertainInitialized
_get_initial_state_distribution_(
self
) -> Distribution[D.T_state]
Get the probability distribution of initial states.
This is a helper function called by default from UncertainInitialized._get_initial_state_distribution()
, the
difference being that the result is not cached here.
TIP
The underscore at the end of this function's name is a convention to remind that its result should be constant.
# Returns
The probability distribution of initial states.
# _get_memory_maxlen History
_get_memory_maxlen(
self
) -> int
Get the (cached) memory max length.
By default, FiniteHistory._get_memory_maxlen()
internally calls FiniteHistory._get_memory_maxlen_()
the first
time and automatically caches its value to make future calls more efficient (since the memory max length is
assumed to be constant).
# Returns
The memory max length.
# _get_memory_maxlen_ FiniteHistory
_get_memory_maxlen_(
self
) -> int
Get the memory max length.
This is a helper function called by default from FiniteHistory._get_memory_maxlen()
, the difference being that
the result is not cached here.
TIP
The underscore at the end of this function's name is a convention to remind that its result should be constant.
# Returns
The memory max length.
# _get_next_state_distribution UncertainTransitions
_get_next_state_distribution(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> DiscreteDistribution[D.T_state]
Get the discrete probability distribution of next state given a memory and action.
TIP
In the Markovian case (memory only holds last state
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
# Returns
The discrete probability distribution of next state.
# _get_observation TransformedObservable
_get_observation(
self,
state: D.T_state,
action: Optional[StrDict[list[D.T_event]]] = None
) -> StrDict[D.T_observation]
Get the deterministic observation given a state and action.
# Parameters
- state: The state to be observed.
- action: The last applied action (or None if the state is an initial state).
# Returns
The probability distribution of the observation.
# _get_observation_distribution PartiallyObservable
_get_observation_distribution(
self,
state: D.T_state,
action: Optional[StrDict[list[D.T_event]]] = None
) -> Distribution[StrDict[D.T_observation]]
Get the probability distribution of the observation given a state and action.
In mathematical terms (discrete case), given an action
# Parameters
- state: The state to be observed.
- action: The last applied action (or None if the state is an initial state).
# Returns
The probability distribution of the observation.
# _get_observation_space PartiallyObservable
_get_observation_space(
self
) -> StrDict[Space[D.T_observation]]
Get the (cached) observation space (finite or infinite set).
By default, PartiallyObservable._get_observation_space()
internally
calls PartiallyObservable._get_observation_space_()
the first time and automatically caches its value to make
future calls more efficient (since the observation space is assumed to be constant).
# Returns
The observation space.
# _get_observation_space_ PartiallyObservable
_get_observation_space_(
self
) -> StrDict[Space[D.T_observation]]
Get the observation space (finite or infinite set).
This is a helper function called by default from PartiallyObservable._get_observation_space()
, the difference
being that the result is not cached here.
TIP
The underscore at the end of this function's name is a convention to remind that its result should be constant.
# Returns
The observation space.
# _get_transition_value UncertainTransitions
_get_transition_value(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]],
next_state: Optional[D.T_state] = None
) -> StrDict[Value[D.T_value]]
Get the value (reward or cost) of a transition.
The transition to consider is defined by the function parameters.
TIP
If this function never depends on the next_state parameter for its computation, it is recommended to
indicate it by overriding UncertainTransitions._is_transition_value_dependent_on_next_state_()
to return
False. This information can then be exploited by solvers to avoid computing next state to evaluate a
transition value (more efficient).
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
- next_state: The next state in which the transition ends (if needed for the computation).
# Returns
The transition value (reward or cost).
# _init_memory History
_init_memory(
self,
state: Optional[D.T_state] = None
) -> Memory[D.T_state]
Initialize memory (possibly with a state) according to its specification and return it.
This function is automatically called by Initializable._reset()
to reinitialize the internal memory whenever
the domain is used as an environment.
# Parameters
- state: An optional state to initialize the memory with (typically the initial state).
# Returns
The new initialized memory.
# _is_action Events
_is_action(
self,
event: D.T_event
) -> bool
Indicate whether an event is an action (i.e. a controllable event for the agents).
TIP
By default, this function is implemented using the skdecide.core.Space.contains()
function on the domain
action space provided by Events._get_action_space()
, but it can be overridden for faster implementations.
# Parameters
- event: The event to consider.
# Returns
True if the event is an action (False otherwise).
# _is_applicable_action Events
_is_applicable_action(
self,
action: StrDict[D.T_event],
memory: Optional[Memory[D.T_state]] = None
) -> bool
Indicate whether an action is applicable in the given memory (state or history), or in the internal one if omitted.
By default, Events._is_applicable_action()
provides some boilerplate code and internally
calls Events._is_applicable_action_from()
. The boilerplate code automatically passes the _memory
attribute
instead of the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
True if the action is applicable (False otherwise).
# _is_applicable_action_from Events
_is_applicable_action_from(
self,
action: StrDict[D.T_event],
memory: Memory[D.T_state]
) -> bool
Indicate whether an action is applicable in the given memory (state or history).
This is a helper function called by default from Events._is_applicable_action()
, the difference being that the
memory parameter is mandatory here.
TIP
By default, this function is implemented using the skdecide.core.Space.contains()
function on the space of
applicable actions provided by Events._get_applicable_actions_from()
, but it can be overridden for faster
implementations.
# Parameters
- memory: The memory to consider.
# Returns
True if the action is applicable (False otherwise).
# _is_enabled_event Events
_is_enabled_event(
self,
event: D.T_event,
memory: Optional[Memory[D.T_state]] = None
) -> bool
Indicate whether an uncontrollable event is enabled in the given memory (state or history), or in the internal one if omitted.
By default, Events._is_enabled_event()
provides some boilerplate code and internally
calls Events._is_enabled_event_from()
. The boilerplate code automatically passes the _memory
attribute instead
of the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
True if the event is enabled (False otherwise).
# _is_enabled_event_from Events
_is_enabled_event_from(
self,
event: D.T_event,
memory: Memory[D.T_state]
) -> bool
Indicate whether an event is enabled in the given memory (state or history).
This is a helper function called by default from Events._is_enabled_event()
, the difference being that the
memory parameter is mandatory here.
TIP
By default, this function is implemented using the skdecide.core.Space.contains()
function on the space of
enabled events provided by Events._get_enabled_events_from()
, but it can be overridden for faster
implementations.
# Parameters
- memory: The memory to consider.
# Returns
True if the event is enabled (False otherwise).
# _is_observation PartiallyObservable
_is_observation(
self,
observation: StrDict[D.T_observation]
) -> bool
Check that an observation indeed belongs to the domain observation space.
TIP
By default, this function is implemented using the skdecide.core.Space.contains()
function on the domain
observation space provided by PartiallyObservable._get_observation_space()
, but it can be overridden for
faster implementations.
# Parameters
- observation: The observation to consider.
# Returns
True if the observation belongs to the domain observation space (False otherwise).
# _is_terminal UncertainTransitions
_is_terminal(
self,
state: D.T_state
) -> StrDict[D.T_predicate]
Indicate whether a state is terminal.
A terminal state is a state with no outgoing transition (except to itself with value 0).
# Parameters
- state: The state to consider.
# Returns
True if the state is terminal (False otherwise).
# _is_transition_value_dependent_on_next_state UncertainTransitions
_is_transition_value_dependent_on_next_state(
self
) -> bool
Indicate whether _get_transition_value() requires the next_state parameter for its computation (cached).
By default, UncertainTransitions._is_transition_value_dependent_on_next_state()
internally
calls UncertainTransitions._is_transition_value_dependent_on_next_state_()
the first time and automatically
caches its value to make future calls more efficient (since the returned value is assumed to be constant).
# Returns
True if the transition value computation depends on next_state (False otherwise).
# _is_transition_value_dependent_on_next_state_ UncertainTransitions
_is_transition_value_dependent_on_next_state_(
self
) -> bool
Indicate whether _get_transition_value() requires the next_state parameter for its computation.
This is a helper function called by default
from UncertainTransitions._is_transition_value_dependent_on_next_state()
, the difference being that the result
is not cached here.
TIP
The underscore at the end of this function's name is a convention to remind that its result should be constant.
# Returns
True if the transition value computation depends on next_state (False otherwise).
# _reset Initializable
_reset(
self
) -> StrDict[D.T_observation]
Reset the state of the environment and return an initial observation.
By default, Initializable._reset()
provides some boilerplate code and internally
calls Initializable._state_reset()
(which returns an initial state). The boilerplate code automatically stores
the initial state into the _memory
attribute and samples a corresponding observation.
# Returns
An initial observation.
# _sample Simulation
_sample(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Sample one transition of the simulator's dynamics.
By default, Simulation._sample()
provides some boilerplate code and internally
calls Simulation._state_sample()
(which returns a transition outcome). The boilerplate code automatically
samples an observation corresponding to the sampled next state.
TIP
Whenever an existing simulator needs to be wrapped instead of implemented fully in scikit-decide (e.g. a
simulator), it is recommended to overwrite Simulation._sample()
to call the external simulator and not use
the Simulation._state_sample()
helper function.
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
# Returns
The environment outcome of the sampled transition.
# _set_memory Simulation
_set_memory(
self,
memory: Memory[D.T_state]
) -> None
Set internal memory attribute _memory
to given one.
This can be useful to set a specific "starting point" before doing a rollout with
successive Environment._step()
calls.
# Parameters
- memory: The memory to set internally.
# Example
# Set simulation_domain memory to my_state (assuming Markovian domain)
simulation_domain._set_memory(my_state)
# Start a 100-steps rollout from here (applying my_action at every step)
for _ in range(100):
simulation_domain._step(my_action)
# _state_reset Initializable
_state_reset(
self
) -> D.T_state
Reset the state of the environment and return an initial state.
This is a helper function called by default from Initializable._reset()
. It focuses on the state level, as
opposed to the observation one for the latter.
# Returns
An initial state.
# _state_sample Simulation
_state_sample(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> TransitionOutcome[D.T_state, StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Compute one sample of the transition's dynamics.
This is a helper function called by default from Simulation._sample()
. It focuses on the state level, as
opposed to the observation one for the latter.
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
# Returns
The transition outcome of the sampled transition.
# _state_step Environment
_state_step(
self,
action: StrDict[list[D.T_event]]
) -> TransitionOutcome[D.T_state, StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Compute one step of the transition's dynamics.
This is a helper function called by default from Environment._step()
. It focuses on the state level, as opposed
to the observation one for the latter.
# Parameters
- action: The action taken in the current memory (state or history) triggering the transition.
# Returns
The transition outcome of this step.
# _step Environment
_step(
self,
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Run one step of the environment's dynamics.
By default, Environment._step()
provides some boilerplate code and internally
calls Environment._state_step()
(which returns a transition outcome). The boilerplate code automatically stores
next state into the _memory
attribute and samples a corresponding observation.
TIP
Whenever an existing environment needs to be wrapped instead of implemented fully in scikit-decide (e.g. compiled
ATARI games), it is recommended to overwrite Environment._step()
to call the external environment and not
use the Environment._state_step()
helper function.
WARNING
Before calling Environment._step()
the first time or when the end of an episode is
reached, Initializable._reset()
must be called to reset the environment's state.
# Parameters
- action: The action taken in the current memory (state or history) triggering the transition.
# Returns
The environment outcome of this step.
# POMDPDomain
This is a typical Partially Observable Markov Decision Process domain class.
This helper class can be used as an alternate base class for domains, inheriting the following:
- Domain
- SingleAgent
- Sequential
- EnumerableTransitions
- Actions
- UncertainInitialized
- Markovian
- PartiallyObservable
- Rewards
Typical use:
class D(POMDPDomain)
TIP
It is also possible to refine any alternate base class, like for instance:
class D(RLDomain, FullyObservable)
# check_value Rewards
check_value(
self,
value: Value[D.T_value]
) -> bool
Check that a value is compliant with its reward specification.
TIP
This function returns always True by default because any kind of reward should be accepted at this level.
# Parameters
- value: The value to check.
# Returns
True if the value is compliant (False otherwise).
# get_action_mask Events
get_action_mask(
self,
memory: Optional[Memory[D.T_state]] = None
) -> StrDict[Mask]
Get action mask for the given memory or internal one if omitted.
An action mask is another (more specific) format for applicable actions, that has a meaning only if the action space can be iterated over in some way. It is represented by a flat array of 0's and 1's ordered as the actions when enumerated: 1 for an applicable action, and 0 for a not applicable action.
More precisely, this implementation makes the assumption that each agent action space is an EnumerableSpace
,
and calls internally self.get_applicable_action()
.
The action mask is used for instance by RL solvers to shut down logits associated to non-applicable actions in the output of their internal neural network.
# Parameters
- memory: The memory to consider. If None, works on the internal memory of the domain.
# Returns
a numpy array (or dict agent-> numpy array for multi-agent domains) with 0-1 indicating applicability of the action (1 meaning applicable and 0 not applicable)
# get_action_space Events
get_action_space(
self
) -> StrDict[Space[D.T_event]]
Get the (cached) domain action space (finite or infinite set).
By default, Events.get_action_space()
internally calls Events._get_action_space_()
the first time and
automatically caches its value to make future calls more efficient (since the action space is assumed to be
constant).
# Returns
The action space.
# get_agents MultiAgent
get_agents(
self
) -> set[str]
Return a singleton for single agent domains.
We must be here consistent with skdecide.core.autocast()
which transforms a single agent domain
into a multi agents domain whose only agent has the id "agent".
# get_applicable_actions Events
get_applicable_actions(
self,
memory: Optional[Memory[D.T_state]] = None
) -> StrDict[Space[D.T_event]]
Get the space (finite or infinite set) of applicable actions in the given memory (state or history), or in the internal one if omitted.
By default, Events.get_applicable_actions()
provides some boilerplate code and internally
calls Events._get_applicable_actions()
. The boilerplate code automatically passes the _memory
attribute
instead of the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
The space of applicable actions.
# get_enabled_events Events
get_enabled_events(
self,
memory: Optional[Memory[D.T_state]] = None
) -> Space[D.T_event]
Get the space (finite or infinite set) of enabled uncontrollable events in the given memory (state or history), or in the internal one if omitted.
By default, Events.get_enabled_events()
provides some boilerplate code and internally
calls Events._get_enabled_events()
. The boilerplate code automatically passes the _memory
attribute instead of
the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
The space of enabled events.
# get_initial_state_distribution UncertainInitialized
get_initial_state_distribution(
self
) -> Distribution[D.T_state]
Get the (cached) probability distribution of initial states.
By default, UncertainInitialized.get_initial_state_distribution()
internally
calls UncertainInitialized._get_initial_state_distribution_()
the first time and automatically caches its value
to make future calls more efficient (since the initial state distribution is assumed to be constant).
# Returns
The probability distribution of initial states.
# get_next_state_distribution UncertainTransitions
get_next_state_distribution(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> DiscreteDistribution[D.T_state]
Get the discrete probability distribution of next state given a memory and action.
TIP
In the Markovian case (memory only holds last state
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
# Returns
The discrete probability distribution of next state.
# get_observation_distribution PartiallyObservable
get_observation_distribution(
self,
state: D.T_state,
action: Optional[StrDict[list[D.T_event]]] = None
) -> Distribution[StrDict[D.T_observation]]
Get the probability distribution of the observation given a state and action.
In mathematical terms (discrete case), given an action
# Parameters
- state: The state to be observed.
- action: The last applied action (or None if the state is an initial state).
# Returns
The probability distribution of the observation.
# get_observation_space PartiallyObservable
get_observation_space(
self
) -> StrDict[Space[D.T_observation]]
Get the (cached) observation space (finite or infinite set).
By default, PartiallyObservable.get_observation_space()
internally
calls PartiallyObservable._get_observation_space_()
the first time and automatically caches its value to make
future calls more efficient (since the observation space is assumed to be constant).
# Returns
The observation space.
# get_transition_value UncertainTransitions
get_transition_value(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]],
next_state: Optional[D.T_state] = None
) -> StrDict[Value[D.T_value]]
Get the value (reward or cost) of a transition.
The transition to consider is defined by the function parameters.
TIP
If this function never depends on the next_state parameter for its computation, it is recommended to
indicate it by overriding UncertainTransitions._is_transition_value_dependent_on_next_state_()
to return
False. This information can then be exploited by solvers to avoid computing next state to evaluate a
transition value (more efficient).
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
- next_state: The next state in which the transition ends (if needed for the computation).
# Returns
The transition value (reward or cost).
# is_action Events
is_action(
self,
event: D.T_event
) -> bool
Indicate whether an event is an action (i.e. a controllable event for the agents).
TIP
By default, this function is implemented using the skdecide.core.Space.contains()
function on the domain
action space provided by Events.get_action_space()
, but it can be overridden for faster implementations.
# Parameters
- event: The event to consider.
# Returns
True if the event is an action (False otherwise).
# is_applicable_action Events
is_applicable_action(
self,
action: StrDict[D.T_event],
memory: Optional[Memory[D.T_state]] = None
) -> bool
Indicate whether an action is applicable in the given memory (state or history), or in the internal one if omitted.
By default, Events.is_applicable_action()
provides some boilerplate code and internally
calls Events._is_applicable_action()
. The boilerplate code automatically passes the _memory
attribute instead
of the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
True if the action is applicable (False otherwise).
# is_enabled_event Events
is_enabled_event(
self,
event: D.T_event,
memory: Optional[Memory[D.T_state]] = None
) -> bool
Indicate whether an uncontrollable event is enabled in the given memory (state or history), or in the internal one if omitted.
By default, Events.is_enabled_event()
provides some boilerplate code and internally
calls Events._is_enabled_event()
. The boilerplate code automatically passes the _memory
attribute instead of
the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
True if the event is enabled (False otherwise).
# is_observation PartiallyObservable
is_observation(
self,
observation: StrDict[D.T_observation]
) -> bool
Check that an observation indeed belongs to the domain observation space.
TIP
By default, this function is implemented using the skdecide.core.Space.contains()
function on the domain
observation space provided by PartiallyObservable.get_observation_space()
, but it can be overridden for
faster implementations.
# Parameters
- observation: The observation to consider.
# Returns
True if the observation belongs to the domain observation space (False otherwise).
# is_terminal UncertainTransitions
is_terminal(
self,
state: D.T_state
) -> StrDict[D.T_predicate]
Indicate whether a state is terminal.
A terminal state is a state with no outgoing transition (except to itself with value 0).
# Parameters
- state: The state to consider.
# Returns
True if the state is terminal (False otherwise).
# is_transition_value_dependent_on_next_state UncertainTransitions
is_transition_value_dependent_on_next_state(
self
) -> bool
Indicate whether get_transition_value() requires the next_state parameter for its computation (cached).
By default, UncertainTransitions.is_transition_value_dependent_on_next_state()
internally
calls UncertainTransitions._is_transition_value_dependent_on_next_state_()
the first time and automatically
caches its value to make future calls more efficient (since the returned value is assumed to be constant).
# Returns
True if the transition value computation depends on next_state (False otherwise).
# reset Initializable
reset(
self
) -> StrDict[D.T_observation]
Reset the state of the environment and return an initial observation.
By default, Initializable.reset()
provides some boilerplate code and internally calls Initializable._reset()
(which returns an initial state). The boilerplate code automatically stores the initial state into the _memory
attribute and samples a corresponding observation.
# Returns
An initial observation.
# sample Simulation
sample(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Sample one transition of the simulator's dynamics.
By default, Simulation.sample()
provides some boilerplate code and internally calls Simulation._sample()
(which returns a transition outcome). The boilerplate code automatically samples an observation corresponding to
the sampled next state.
TIP
Whenever an existing simulator needs to be wrapped instead of implemented fully in scikit-decide (e.g. a
simulator), it is recommended to overwrite Simulation.sample()
to call the external simulator and not use
the Simulation._sample()
helper function.
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
# Returns
The environment outcome of the sampled transition.
# set_memory Simulation
set_memory(
self,
memory: Memory[D.T_state]
) -> None
Set internal memory attribute _memory
to given one.
This can be useful to set a specific "starting point" before doing a rollout with successive Environment.step()
calls.
# Parameters
- memory: The memory to set internally.
# Example
# Set simulation_domain memory to my_state (assuming Markovian domain)
simulation_domain.set_memory(my_state)
# Start a 100-steps rollout from here (applying my_action at every step)
for _ in range(100):
simulation_domain.step(my_action)
# step Environment
step(
self,
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Run one step of the environment's dynamics.
By default, Environment.step()
provides some boilerplate code and internally calls Environment._step()
(which
returns a transition outcome). The boilerplate code automatically stores next state into the _memory
attribute
and samples a corresponding observation.
TIP
Whenever an existing environment needs to be wrapped instead of implemented fully in scikit-decide (e.g. compiled
ATARI games), it is recommended to overwrite Environment.step()
to call the external environment and not
use the Environment._step()
helper function.
WARNING
Before calling Environment.step()
the first time or when the end of an episode is
reached, Initializable.reset()
must be called to reset the environment's state.
# Parameters
- action: The action taken in the current memory (state or history) triggering the transition.
# Returns
The environment outcome of this step.
# _check_value Rewards
_check_value(
self,
value: Value[D.T_value]
) -> bool
Check that a value is compliant with its reward specification.
TIP
This function returns always True by default because any kind of reward should be accepted at this level.
# Parameters
- value: The value to check.
# Returns
True if the value is compliant (False otherwise).
# _get_action_mask Events
_get_action_mask(
self,
memory: Optional[Memory[D.T_state]] = None
) -> StrDict[Mask]
Get action mask for the given memory or internal one if omitted.
An action mask is another (more specific) format for applicable actions, that has a meaning only if the action space can be iterated over in some way. It is represented by a flat array of 0's and 1's ordered as the actions when enumerated: 1 for an applicable action, and 0 for a not applicable action.
More precisely, this implementation makes the assumption that each agent action space is an EnumerableSpace
,
and calls internally self.get_applicable_action()
.
The action mask is used for instance by RL solvers to shut down logits associated to non-applicable actions in the output of their internal neural network.
# Parameters
- memory: The memory to consider. If None, works on the internal memory of the domain.
# Returns
a numpy array (or dict agent-> numpy array for multi-agent domains) with 0-1 indicating applicability of the action (1 meaning applicable and 0 not applicable)
# _get_action_space Events
_get_action_space(
self
) -> StrDict[Space[D.T_event]]
Get the (cached) domain action space (finite or infinite set).
By default, Events._get_action_space()
internally calls Events._get_action_space_()
the first time and
automatically caches its value to make future calls more efficient (since the action space is assumed to be
constant).
# Returns
The action space.
# _get_action_space_ Events
_get_action_space_(
self
) -> StrDict[Space[D.T_event]]
Get the domain action space (finite or infinite set).
This is a helper function called by default from Events._get_action_space()
, the difference being that the
result is not cached here.
TIP
The underscore at the end of this function's name is a convention to remind that its result should be constant.
# Returns
The action space.
# _get_applicable_actions Events
_get_applicable_actions(
self,
memory: Optional[Memory[D.T_state]] = None
) -> StrDict[Space[D.T_event]]
Get the space (finite or infinite set) of applicable actions in the given memory (state or history), or in the internal one if omitted.
By default, Events._get_applicable_actions()
provides some boilerplate code and internally
calls Events._get_applicable_actions_from()
. The boilerplate code automatically passes the _memory
attribute
instead of the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
The space of applicable actions.
# _get_applicable_actions_from Events
_get_applicable_actions_from(
self,
memory: Memory[D.T_state]
) -> StrDict[Space[D.T_event]]
Get the space (finite or infinite set) of applicable actions in the given memory (state or history).
This is a helper function called by default from Events._get_applicable_actions()
, the difference being that
the memory parameter is mandatory here.
# Parameters
- memory: The memory to consider.
# Returns
The space of applicable actions.
# _get_enabled_events Events
_get_enabled_events(
self,
memory: Optional[Memory[D.T_state]] = None
) -> Space[D.T_event]
Get the space (finite or infinite set) of enabled uncontrollable events in the given memory (state or history), or in the internal one if omitted.
By default, Events._get_enabled_events()
provides some boilerplate code and internally
calls Events._get_enabled_events_from()
. The boilerplate code automatically passes the _memory
attribute
instead of the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
The space of enabled events.
# _get_enabled_events_from Events
_get_enabled_events_from(
self,
memory: Memory[D.T_state]
) -> Space[D.T_event]
Get the space (finite or infinite set) of enabled uncontrollable events in the given memory (state or history).
This is a helper function called by default from Events._get_enabled_events()
, the difference being that the
memory parameter is mandatory here.
# Parameters
- memory: The memory to consider.
# Returns
The space of enabled events.
# _get_initial_state_distribution UncertainInitialized
_get_initial_state_distribution(
self
) -> Distribution[D.T_state]
Get the (cached) probability distribution of initial states.
By default, UncertainInitialized._get_initial_state_distribution()
internally
calls UncertainInitialized._get_initial_state_distribution_()
the first time and automatically caches its value
to make future calls more efficient (since the initial state distribution is assumed to be constant).
# Returns
The probability distribution of initial states.
# _get_initial_state_distribution_ UncertainInitialized
_get_initial_state_distribution_(
self
) -> Distribution[D.T_state]
Get the probability distribution of initial states.
This is a helper function called by default from UncertainInitialized._get_initial_state_distribution()
, the
difference being that the result is not cached here.
TIP
The underscore at the end of this function's name is a convention to remind that its result should be constant.
# Returns
The probability distribution of initial states.
# _get_memory_maxlen History
_get_memory_maxlen(
self
) -> int
Get the (cached) memory max length.
By default, FiniteHistory._get_memory_maxlen()
internally calls FiniteHistory._get_memory_maxlen_()
the first
time and automatically caches its value to make future calls more efficient (since the memory max length is
assumed to be constant).
# Returns
The memory max length.
# _get_memory_maxlen_ FiniteHistory
_get_memory_maxlen_(
self
) -> int
Get the memory max length.
This is a helper function called by default from FiniteHistory._get_memory_maxlen()
, the difference being that
the result is not cached here.
TIP
The underscore at the end of this function's name is a convention to remind that its result should be constant.
# Returns
The memory max length.
# _get_next_state_distribution UncertainTransitions
_get_next_state_distribution(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> DiscreteDistribution[D.T_state]
Get the discrete probability distribution of next state given a memory and action.
TIP
In the Markovian case (memory only holds last state
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
# Returns
The discrete probability distribution of next state.
# _get_observation_distribution PartiallyObservable
_get_observation_distribution(
self,
state: D.T_state,
action: Optional[StrDict[list[D.T_event]]] = None
) -> Distribution[StrDict[D.T_observation]]
Get the probability distribution of the observation given a state and action.
In mathematical terms (discrete case), given an action
# Parameters
- state: The state to be observed.
- action: The last applied action (or None if the state is an initial state).
# Returns
The probability distribution of the observation.
# _get_observation_space PartiallyObservable
_get_observation_space(
self
) -> StrDict[Space[D.T_observation]]
Get the (cached) observation space (finite or infinite set).
By default, PartiallyObservable._get_observation_space()
internally
calls PartiallyObservable._get_observation_space_()
the first time and automatically caches its value to make
future calls more efficient (since the observation space is assumed to be constant).
# Returns
The observation space.
# _get_observation_space_ PartiallyObservable
_get_observation_space_(
self
) -> StrDict[Space[D.T_observation]]
Get the observation space (finite or infinite set).
This is a helper function called by default from PartiallyObservable._get_observation_space()
, the difference
being that the result is not cached here.
TIP
The underscore at the end of this function's name is a convention to remind that its result should be constant.
# Returns
The observation space.
# _get_transition_value UncertainTransitions
_get_transition_value(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]],
next_state: Optional[D.T_state] = None
) -> StrDict[Value[D.T_value]]
Get the value (reward or cost) of a transition.
The transition to consider is defined by the function parameters.
TIP
If this function never depends on the next_state parameter for its computation, it is recommended to
indicate it by overriding UncertainTransitions._is_transition_value_dependent_on_next_state_()
to return
False. This information can then be exploited by solvers to avoid computing next state to evaluate a
transition value (more efficient).
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
- next_state: The next state in which the transition ends (if needed for the computation).
# Returns
The transition value (reward or cost).
# _init_memory History
_init_memory(
self,
state: Optional[D.T_state] = None
) -> Memory[D.T_state]
Initialize memory (possibly with a state) according to its specification and return it.
This function is automatically called by Initializable._reset()
to reinitialize the internal memory whenever
the domain is used as an environment.
# Parameters
- state: An optional state to initialize the memory with (typically the initial state).
# Returns
The new initialized memory.
# _is_action Events
_is_action(
self,
event: D.T_event
) -> bool
Indicate whether an event is an action (i.e. a controllable event for the agents).
TIP
By default, this function is implemented using the skdecide.core.Space.contains()
function on the domain
action space provided by Events._get_action_space()
, but it can be overridden for faster implementations.
# Parameters
- event: The event to consider.
# Returns
True if the event is an action (False otherwise).
# _is_applicable_action Events
_is_applicable_action(
self,
action: StrDict[D.T_event],
memory: Optional[Memory[D.T_state]] = None
) -> bool
Indicate whether an action is applicable in the given memory (state or history), or in the internal one if omitted.
By default, Events._is_applicable_action()
provides some boilerplate code and internally
calls Events._is_applicable_action_from()
. The boilerplate code automatically passes the _memory
attribute
instead of the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
True if the action is applicable (False otherwise).
# _is_applicable_action_from Events
_is_applicable_action_from(
self,
action: StrDict[D.T_event],
memory: Memory[D.T_state]
) -> bool
Indicate whether an action is applicable in the given memory (state or history).
This is a helper function called by default from Events._is_applicable_action()
, the difference being that the
memory parameter is mandatory here.
TIP
By default, this function is implemented using the skdecide.core.Space.contains()
function on the space of
applicable actions provided by Events._get_applicable_actions_from()
, but it can be overridden for faster
implementations.
# Parameters
- memory: The memory to consider.
# Returns
True if the action is applicable (False otherwise).
# _is_enabled_event Events
_is_enabled_event(
self,
event: D.T_event,
memory: Optional[Memory[D.T_state]] = None
) -> bool
Indicate whether an uncontrollable event is enabled in the given memory (state or history), or in the internal one if omitted.
By default, Events._is_enabled_event()
provides some boilerplate code and internally
calls Events._is_enabled_event_from()
. The boilerplate code automatically passes the _memory
attribute instead
of the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
True if the event is enabled (False otherwise).
# _is_enabled_event_from Events
_is_enabled_event_from(
self,
event: D.T_event,
memory: Memory[D.T_state]
) -> bool
Indicate whether an event is enabled in the given memory (state or history).
This is a helper function called by default from Events._is_enabled_event()
, the difference being that the
memory parameter is mandatory here.
TIP
By default, this function is implemented using the skdecide.core.Space.contains()
function on the space of
enabled events provided by Events._get_enabled_events_from()
, but it can be overridden for faster
implementations.
# Parameters
- memory: The memory to consider.
# Returns
True if the event is enabled (False otherwise).
# _is_observation PartiallyObservable
_is_observation(
self,
observation: StrDict[D.T_observation]
) -> bool
Check that an observation indeed belongs to the domain observation space.
TIP
By default, this function is implemented using the skdecide.core.Space.contains()
function on the domain
observation space provided by PartiallyObservable._get_observation_space()
, but it can be overridden for
faster implementations.
# Parameters
- observation: The observation to consider.
# Returns
True if the observation belongs to the domain observation space (False otherwise).
# _is_terminal UncertainTransitions
_is_terminal(
self,
state: D.T_state
) -> StrDict[D.T_predicate]
Indicate whether a state is terminal.
A terminal state is a state with no outgoing transition (except to itself with value 0).
# Parameters
- state: The state to consider.
# Returns
True if the state is terminal (False otherwise).
# _is_transition_value_dependent_on_next_state UncertainTransitions
_is_transition_value_dependent_on_next_state(
self
) -> bool
Indicate whether _get_transition_value() requires the next_state parameter for its computation (cached).
By default, UncertainTransitions._is_transition_value_dependent_on_next_state()
internally
calls UncertainTransitions._is_transition_value_dependent_on_next_state_()
the first time and automatically
caches its value to make future calls more efficient (since the returned value is assumed to be constant).
# Returns
True if the transition value computation depends on next_state (False otherwise).
# _is_transition_value_dependent_on_next_state_ UncertainTransitions
_is_transition_value_dependent_on_next_state_(
self
) -> bool
Indicate whether _get_transition_value() requires the next_state parameter for its computation.
This is a helper function called by default
from UncertainTransitions._is_transition_value_dependent_on_next_state()
, the difference being that the result
is not cached here.
TIP
The underscore at the end of this function's name is a convention to remind that its result should be constant.
# Returns
True if the transition value computation depends on next_state (False otherwise).
# _reset Initializable
_reset(
self
) -> StrDict[D.T_observation]
Reset the state of the environment and return an initial observation.
By default, Initializable._reset()
provides some boilerplate code and internally
calls Initializable._state_reset()
(which returns an initial state). The boilerplate code automatically stores
the initial state into the _memory
attribute and samples a corresponding observation.
# Returns
An initial observation.
# _sample Simulation
_sample(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Sample one transition of the simulator's dynamics.
By default, Simulation._sample()
provides some boilerplate code and internally
calls Simulation._state_sample()
(which returns a transition outcome). The boilerplate code automatically
samples an observation corresponding to the sampled next state.
TIP
Whenever an existing simulator needs to be wrapped instead of implemented fully in scikit-decide (e.g. a
simulator), it is recommended to overwrite Simulation._sample()
to call the external simulator and not use
the Simulation._state_sample()
helper function.
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
# Returns
The environment outcome of the sampled transition.
# _set_memory Simulation
_set_memory(
self,
memory: Memory[D.T_state]
) -> None
Set internal memory attribute _memory
to given one.
This can be useful to set a specific "starting point" before doing a rollout with
successive Environment._step()
calls.
# Parameters
- memory: The memory to set internally.
# Example
# Set simulation_domain memory to my_state (assuming Markovian domain)
simulation_domain._set_memory(my_state)
# Start a 100-steps rollout from here (applying my_action at every step)
for _ in range(100):
simulation_domain._step(my_action)
# _state_reset Initializable
_state_reset(
self
) -> D.T_state
Reset the state of the environment and return an initial state.
This is a helper function called by default from Initializable._reset()
. It focuses on the state level, as
opposed to the observation one for the latter.
# Returns
An initial state.
# _state_sample Simulation
_state_sample(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> TransitionOutcome[D.T_state, StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Compute one sample of the transition's dynamics.
This is a helper function called by default from Simulation._sample()
. It focuses on the state level, as
opposed to the observation one for the latter.
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
# Returns
The transition outcome of the sampled transition.
# _state_step Environment
_state_step(
self,
action: StrDict[list[D.T_event]]
) -> TransitionOutcome[D.T_state, StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Compute one step of the transition's dynamics.
This is a helper function called by default from Environment._step()
. It focuses on the state level, as opposed
to the observation one for the latter.
# Parameters
- action: The action taken in the current memory (state or history) triggering the transition.
# Returns
The transition outcome of this step.
# _step Environment
_step(
self,
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Run one step of the environment's dynamics.
By default, Environment._step()
provides some boilerplate code and internally
calls Environment._state_step()
(which returns a transition outcome). The boilerplate code automatically stores
next state into the _memory
attribute and samples a corresponding observation.
TIP
Whenever an existing environment needs to be wrapped instead of implemented fully in scikit-decide (e.g. compiled
ATARI games), it is recommended to overwrite Environment._step()
to call the external environment and not
use the Environment._state_step()
helper function.
WARNING
Before calling Environment._step()
the first time or when the end of an episode is
reached, Initializable._reset()
must be called to reset the environment's state.
# Parameters
- action: The action taken in the current memory (state or history) triggering the transition.
# Returns
The environment outcome of this step.
# GoalMDPDomain
This is a typical Goal Markov Decision Process domain class.
This helper class can be used as an alternate base class for domains, inheriting the following:
- Domain
- SingleAgent
- Sequential
- EnumerableTransitions
- Actions
- Goals
- DeterministicInitialized
- Markovian
- FullyObservable
- PositiveCosts
Typical use:
class D(GoalMDPDomain)
TIP
It is also possible to refine any alternate base class, like for instance:
class D(RLDomain, FullyObservable)
# check_value Rewards
check_value(
self,
value: Value[D.T_value]
) -> bool
Check that a value is compliant with its reward specification.
TIP
This function returns always True by default because any kind of reward should be accepted at this level.
# Parameters
- value: The value to check.
# Returns
True if the value is compliant (False otherwise).
# get_action_mask Events
get_action_mask(
self,
memory: Optional[Memory[D.T_state]] = None
) -> StrDict[Mask]
Get action mask for the given memory or internal one if omitted.
An action mask is another (more specific) format for applicable actions, that has a meaning only if the action space can be iterated over in some way. It is represented by a flat array of 0's and 1's ordered as the actions when enumerated: 1 for an applicable action, and 0 for a not applicable action.
More precisely, this implementation makes the assumption that each agent action space is an EnumerableSpace
,
and calls internally self.get_applicable_action()
.
The action mask is used for instance by RL solvers to shut down logits associated to non-applicable actions in the output of their internal neural network.
# Parameters
- memory: The memory to consider. If None, works on the internal memory of the domain.
# Returns
a numpy array (or dict agent-> numpy array for multi-agent domains) with 0-1 indicating applicability of the action (1 meaning applicable and 0 not applicable)
# get_action_space Events
get_action_space(
self
) -> StrDict[Space[D.T_event]]
Get the (cached) domain action space (finite or infinite set).
By default, Events.get_action_space()
internally calls Events._get_action_space_()
the first time and
automatically caches its value to make future calls more efficient (since the action space is assumed to be
constant).
# Returns
The action space.
# get_agents MultiAgent
get_agents(
self
) -> set[str]
Return a singleton for single agent domains.
We must be here consistent with skdecide.core.autocast()
which transforms a single agent domain
into a multi agents domain whose only agent has the id "agent".
# get_applicable_actions Events
get_applicable_actions(
self,
memory: Optional[Memory[D.T_state]] = None
) -> StrDict[Space[D.T_event]]
Get the space (finite or infinite set) of applicable actions in the given memory (state or history), or in the internal one if omitted.
By default, Events.get_applicable_actions()
provides some boilerplate code and internally
calls Events._get_applicable_actions()
. The boilerplate code automatically passes the _memory
attribute
instead of the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
The space of applicable actions.
# get_enabled_events Events
get_enabled_events(
self,
memory: Optional[Memory[D.T_state]] = None
) -> Space[D.T_event]
Get the space (finite or infinite set) of enabled uncontrollable events in the given memory (state or history), or in the internal one if omitted.
By default, Events.get_enabled_events()
provides some boilerplate code and internally
calls Events._get_enabled_events()
. The boilerplate code automatically passes the _memory
attribute instead of
the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
The space of enabled events.
# get_goals Goals
get_goals(
self
) -> StrDict[Space[D.T_observation]]
Get the (cached) domain goals space (finite or infinite set).
By default, Goals.get_goals()
internally calls Goals._get_goals_()
the first time and automatically caches its
value to make future calls more efficient (since the goals space is assumed to be constant).
WARNING
Goal states are assumed to be fully observable (i.e. observation = state) so that there is never uncertainty about whether the goal has been reached or not. This assumption guarantees that any policy that does not reach the goal with certainty incurs in infinite expected cost. - Geffner, 2013: A Concise Introduction to Models and Methods for Automated Planning
# Returns
The goals space.
# get_initial_state DeterministicInitialized
get_initial_state(
self
) -> D.T_state
Get the (cached) initial state.
By default, DeterministicInitialized.get_initial_state()
internally
calls DeterministicInitialized._get_initial_state_()
the first time and automatically caches its value to make
future calls more efficient (since the initial state is assumed to be constant).
# Returns
The initial state.
# get_initial_state_distribution UncertainInitialized
get_initial_state_distribution(
self
) -> Distribution[D.T_state]
Get the (cached) probability distribution of initial states.
By default, UncertainInitialized.get_initial_state_distribution()
internally
calls UncertainInitialized._get_initial_state_distribution_()
the first time and automatically caches its value
to make future calls more efficient (since the initial state distribution is assumed to be constant).
# Returns
The probability distribution of initial states.
# get_next_state_distribution UncertainTransitions
get_next_state_distribution(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> DiscreteDistribution[D.T_state]
Get the discrete probability distribution of next state given a memory and action.
TIP
In the Markovian case (memory only holds last state
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
# Returns
The discrete probability distribution of next state.
# get_observation TransformedObservable
get_observation(
self,
state: D.T_state,
action: Optional[StrDict[list[D.T_event]]] = None
) -> StrDict[D.T_observation]
Get the deterministic observation given a state and action.
# Parameters
- state: The state to be observed.
- action: The last applied action (or None if the state is an initial state).
# Returns
The probability distribution of the observation.
# get_observation_distribution PartiallyObservable
get_observation_distribution(
self,
state: D.T_state,
action: Optional[StrDict[list[D.T_event]]] = None
) -> Distribution[StrDict[D.T_observation]]
Get the probability distribution of the observation given a state and action.
In mathematical terms (discrete case), given an action
# Parameters
- state: The state to be observed.
- action: The last applied action (or None if the state is an initial state).
# Returns
The probability distribution of the observation.
# get_observation_space PartiallyObservable
get_observation_space(
self
) -> StrDict[Space[D.T_observation]]
Get the (cached) observation space (finite or infinite set).
By default, PartiallyObservable.get_observation_space()
internally
calls PartiallyObservable._get_observation_space_()
the first time and automatically caches its value to make
future calls more efficient (since the observation space is assumed to be constant).
# Returns
The observation space.
# get_transition_value UncertainTransitions
get_transition_value(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]],
next_state: Optional[D.T_state] = None
) -> StrDict[Value[D.T_value]]
Get the value (reward or cost) of a transition.
The transition to consider is defined by the function parameters.
TIP
If this function never depends on the next_state parameter for its computation, it is recommended to
indicate it by overriding UncertainTransitions._is_transition_value_dependent_on_next_state_()
to return
False. This information can then be exploited by solvers to avoid computing next state to evaluate a
transition value (more efficient).
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
- next_state: The next state in which the transition ends (if needed for the computation).
# Returns
The transition value (reward or cost).
# is_action Events
is_action(
self,
event: D.T_event
) -> bool
Indicate whether an event is an action (i.e. a controllable event for the agents).
TIP
By default, this function is implemented using the skdecide.core.Space.contains()
function on the domain
action space provided by Events.get_action_space()
, but it can be overridden for faster implementations.
# Parameters
- event: The event to consider.
# Returns
True if the event is an action (False otherwise).
# is_applicable_action Events
is_applicable_action(
self,
action: StrDict[D.T_event],
memory: Optional[Memory[D.T_state]] = None
) -> bool
Indicate whether an action is applicable in the given memory (state or history), or in the internal one if omitted.
By default, Events.is_applicable_action()
provides some boilerplate code and internally
calls Events._is_applicable_action()
. The boilerplate code automatically passes the _memory
attribute instead
of the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
True if the action is applicable (False otherwise).
# is_enabled_event Events
is_enabled_event(
self,
event: D.T_event,
memory: Optional[Memory[D.T_state]] = None
) -> bool
Indicate whether an uncontrollable event is enabled in the given memory (state or history), or in the internal one if omitted.
By default, Events.is_enabled_event()
provides some boilerplate code and internally
calls Events._is_enabled_event()
. The boilerplate code automatically passes the _memory
attribute instead of
the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
True if the event is enabled (False otherwise).
# is_goal Goals
is_goal(
self,
observation: StrDict[D.T_observation]
) -> StrDict[D.T_predicate]
Indicate whether an observation belongs to the goals.
TIP
By default, this function is implemented using the skdecide.core.Space.contains()
function on the domain
goals space provided by Goals.get_goals()
, but it can be overridden for faster implementations.
# Parameters
- observation: The observation to consider.
# Returns
True if the observation is a goal (False otherwise).
# is_observation PartiallyObservable
is_observation(
self,
observation: StrDict[D.T_observation]
) -> bool
Check that an observation indeed belongs to the domain observation space.
TIP
By default, this function is implemented using the skdecide.core.Space.contains()
function on the domain
observation space provided by PartiallyObservable.get_observation_space()
, but it can be overridden for
faster implementations.
# Parameters
- observation: The observation to consider.
# Returns
True if the observation belongs to the domain observation space (False otherwise).
# is_terminal UncertainTransitions
is_terminal(
self,
state: D.T_state
) -> StrDict[D.T_predicate]
Indicate whether a state is terminal.
A terminal state is a state with no outgoing transition (except to itself with value 0).
# Parameters
- state: The state to consider.
# Returns
True if the state is terminal (False otherwise).
# is_transition_value_dependent_on_next_state UncertainTransitions
is_transition_value_dependent_on_next_state(
self
) -> bool
Indicate whether get_transition_value() requires the next_state parameter for its computation (cached).
By default, UncertainTransitions.is_transition_value_dependent_on_next_state()
internally
calls UncertainTransitions._is_transition_value_dependent_on_next_state_()
the first time and automatically
caches its value to make future calls more efficient (since the returned value is assumed to be constant).
# Returns
True if the transition value computation depends on next_state (False otherwise).
# reset Initializable
reset(
self
) -> StrDict[D.T_observation]
Reset the state of the environment and return an initial observation.
By default, Initializable.reset()
provides some boilerplate code and internally calls Initializable._reset()
(which returns an initial state). The boilerplate code automatically stores the initial state into the _memory
attribute and samples a corresponding observation.
# Returns
An initial observation.
# sample Simulation
sample(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Sample one transition of the simulator's dynamics.
By default, Simulation.sample()
provides some boilerplate code and internally calls Simulation._sample()
(which returns a transition outcome). The boilerplate code automatically samples an observation corresponding to
the sampled next state.
TIP
Whenever an existing simulator needs to be wrapped instead of implemented fully in scikit-decide (e.g. a
simulator), it is recommended to overwrite Simulation.sample()
to call the external simulator and not use
the Simulation._sample()
helper function.
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
# Returns
The environment outcome of the sampled transition.
# set_memory Simulation
set_memory(
self,
memory: Memory[D.T_state]
) -> None
Set internal memory attribute _memory
to given one.
This can be useful to set a specific "starting point" before doing a rollout with successive Environment.step()
calls.
# Parameters
- memory: The memory to set internally.
# Example
# Set simulation_domain memory to my_state (assuming Markovian domain)
simulation_domain.set_memory(my_state)
# Start a 100-steps rollout from here (applying my_action at every step)
for _ in range(100):
simulation_domain.step(my_action)
# step Environment
step(
self,
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Run one step of the environment's dynamics.
By default, Environment.step()
provides some boilerplate code and internally calls Environment._step()
(which
returns a transition outcome). The boilerplate code automatically stores next state into the _memory
attribute
and samples a corresponding observation.
TIP
Whenever an existing environment needs to be wrapped instead of implemented fully in scikit-decide (e.g. compiled
ATARI games), it is recommended to overwrite Environment.step()
to call the external environment and not
use the Environment._step()
helper function.
WARNING
Before calling Environment.step()
the first time or when the end of an episode is
reached, Initializable.reset()
must be called to reset the environment's state.
# Parameters
- action: The action taken in the current memory (state or history) triggering the transition.
# Returns
The environment outcome of this step.
# _check_value Rewards
_check_value(
self,
value: Value[D.T_value]
) -> bool
Check that a value is compliant with its cost specification (must be positive).
TIP
This function calls PositiveCost._is_positive()
to determine if a value is positive (can be overridden for
advanced value types).
# Parameters
- value: The value to check.
# Returns
True if the value is compliant (False otherwise).
# _get_action_mask Events
_get_action_mask(
self,
memory: Optional[Memory[D.T_state]] = None
) -> StrDict[Mask]
Get action mask for the given memory or internal one if omitted.
An action mask is another (more specific) format for applicable actions, that has a meaning only if the action space can be iterated over in some way. It is represented by a flat array of 0's and 1's ordered as the actions when enumerated: 1 for an applicable action, and 0 for a not applicable action.
More precisely, this implementation makes the assumption that each agent action space is an EnumerableSpace
,
and calls internally self.get_applicable_action()
.
The action mask is used for instance by RL solvers to shut down logits associated to non-applicable actions in the output of their internal neural network.
# Parameters
- memory: The memory to consider. If None, works on the internal memory of the domain.
# Returns
a numpy array (or dict agent-> numpy array for multi-agent domains) with 0-1 indicating applicability of the action (1 meaning applicable and 0 not applicable)
# _get_action_space Events
_get_action_space(
self
) -> StrDict[Space[D.T_event]]
Get the (cached) domain action space (finite or infinite set).
By default, Events._get_action_space()
internally calls Events._get_action_space_()
the first time and
automatically caches its value to make future calls more efficient (since the action space is assumed to be
constant).
# Returns
The action space.
# _get_action_space_ Events
_get_action_space_(
self
) -> StrDict[Space[D.T_event]]
Get the domain action space (finite or infinite set).
This is a helper function called by default from Events._get_action_space()
, the difference being that the
result is not cached here.
TIP
The underscore at the end of this function's name is a convention to remind that its result should be constant.
# Returns
The action space.
# _get_applicable_actions Events
_get_applicable_actions(
self,
memory: Optional[Memory[D.T_state]] = None
) -> StrDict[Space[D.T_event]]
Get the space (finite or infinite set) of applicable actions in the given memory (state or history), or in the internal one if omitted.
By default, Events._get_applicable_actions()
provides some boilerplate code and internally
calls Events._get_applicable_actions_from()
. The boilerplate code automatically passes the _memory
attribute
instead of the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
The space of applicable actions.
# _get_applicable_actions_from Events
_get_applicable_actions_from(
self,
memory: Memory[D.T_state]
) -> StrDict[Space[D.T_event]]
Get the space (finite or infinite set) of applicable actions in the given memory (state or history).
This is a helper function called by default from Events._get_applicable_actions()
, the difference being that
the memory parameter is mandatory here.
# Parameters
- memory: The memory to consider.
# Returns
The space of applicable actions.
# _get_enabled_events Events
_get_enabled_events(
self,
memory: Optional[Memory[D.T_state]] = None
) -> Space[D.T_event]
Get the space (finite or infinite set) of enabled uncontrollable events in the given memory (state or history), or in the internal one if omitted.
By default, Events._get_enabled_events()
provides some boilerplate code and internally
calls Events._get_enabled_events_from()
. The boilerplate code automatically passes the _memory
attribute
instead of the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
The space of enabled events.
# _get_enabled_events_from Events
_get_enabled_events_from(
self,
memory: Memory[D.T_state]
) -> Space[D.T_event]
Get the space (finite or infinite set) of enabled uncontrollable events in the given memory (state or history).
This is a helper function called by default from Events._get_enabled_events()
, the difference being that the
memory parameter is mandatory here.
# Parameters
- memory: The memory to consider.
# Returns
The space of enabled events.
# _get_goals Goals
_get_goals(
self
) -> StrDict[Space[D.T_observation]]
Get the (cached) domain goals space (finite or infinite set).
By default, Goals._get_goals()
internally calls Goals._get_goals_()
the first time and automatically caches
its value to make future calls more efficient (since the goals space is assumed to be constant).
WARNING
Goal states are assumed to be fully observable (i.e. observation = state) so that there is never uncertainty about whether the goal has been reached or not. This assumption guarantees that any policy that does not reach the goal with certainty incurs in infinite expected cost. - Geffner, 2013: A Concise Introduction to Models and Methods for Automated Planning
# Returns
The goals space.
# _get_goals_ Goals
_get_goals_(
self
) -> StrDict[Space[D.T_observation]]
Get the domain goals space (finite or infinite set).
This is a helper function called by default from Goals._get_goals()
, the difference being that the result is
not cached here.
TIP
The underscore at the end of this function's name is a convention to remind that its result should be constant.
# Returns
The goals space.
# _get_initial_state DeterministicInitialized
_get_initial_state(
self
) -> D.T_state
Get the (cached) initial state.
By default, DeterministicInitialized._get_initial_state()
internally
calls DeterministicInitialized._get_initial_state_()
the first time and automatically caches its value to make
future calls more efficient (since the initial state is assumed to be constant).
# Returns
The initial state.
# _get_initial_state_ DeterministicInitialized
_get_initial_state_(
self
) -> D.T_state
Get the initial state.
This is a helper function called by default from DeterministicInitialized._get_initial_state()
, the difference
being that the result is not cached here.
# Returns
The initial state.
# _get_initial_state_distribution UncertainInitialized
_get_initial_state_distribution(
self
) -> Distribution[D.T_state]
Get the (cached) probability distribution of initial states.
By default, UncertainInitialized._get_initial_state_distribution()
internally
calls UncertainInitialized._get_initial_state_distribution_()
the first time and automatically caches its value
to make future calls more efficient (since the initial state distribution is assumed to be constant).
# Returns
The probability distribution of initial states.
# _get_initial_state_distribution_ UncertainInitialized
_get_initial_state_distribution_(
self
) -> Distribution[D.T_state]
Get the probability distribution of initial states.
This is a helper function called by default from UncertainInitialized._get_initial_state_distribution()
, the
difference being that the result is not cached here.
TIP
The underscore at the end of this function's name is a convention to remind that its result should be constant.
# Returns
The probability distribution of initial states.
# _get_memory_maxlen History
_get_memory_maxlen(
self
) -> int
Get the (cached) memory max length.
By default, FiniteHistory._get_memory_maxlen()
internally calls FiniteHistory._get_memory_maxlen_()
the first
time and automatically caches its value to make future calls more efficient (since the memory max length is
assumed to be constant).
# Returns
The memory max length.
# _get_memory_maxlen_ FiniteHistory
_get_memory_maxlen_(
self
) -> int
Get the memory max length.
This is a helper function called by default from FiniteHistory._get_memory_maxlen()
, the difference being that
the result is not cached here.
TIP
The underscore at the end of this function's name is a convention to remind that its result should be constant.
# Returns
The memory max length.
# _get_next_state_distribution UncertainTransitions
_get_next_state_distribution(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> DiscreteDistribution[D.T_state]
Get the discrete probability distribution of next state given a memory and action.
TIP
In the Markovian case (memory only holds last state
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
# Returns
The discrete probability distribution of next state.
# _get_observation TransformedObservable
_get_observation(
self,
state: D.T_state,
action: Optional[StrDict[list[D.T_event]]] = None
) -> StrDict[D.T_observation]
Get the deterministic observation given a state and action.
# Parameters
- state: The state to be observed.
- action: The last applied action (or None if the state is an initial state).
# Returns
The probability distribution of the observation.
# _get_observation_distribution PartiallyObservable
_get_observation_distribution(
self,
state: D.T_state,
action: Optional[StrDict[list[D.T_event]]] = None
) -> Distribution[StrDict[D.T_observation]]
Get the probability distribution of the observation given a state and action.
In mathematical terms (discrete case), given an action
# Parameters
- state: The state to be observed.
- action: The last applied action (or None if the state is an initial state).
# Returns
The probability distribution of the observation.
# _get_observation_space PartiallyObservable
_get_observation_space(
self
) -> StrDict[Space[D.T_observation]]
Get the (cached) observation space (finite or infinite set).
By default, PartiallyObservable._get_observation_space()
internally
calls PartiallyObservable._get_observation_space_()
the first time and automatically caches its value to make
future calls more efficient (since the observation space is assumed to be constant).
# Returns
The observation space.
# _get_observation_space_ PartiallyObservable
_get_observation_space_(
self
) -> StrDict[Space[D.T_observation]]
Get the observation space (finite or infinite set).
This is a helper function called by default from PartiallyObservable._get_observation_space()
, the difference
being that the result is not cached here.
TIP
The underscore at the end of this function's name is a convention to remind that its result should be constant.
# Returns
The observation space.
# _get_transition_value UncertainTransitions
_get_transition_value(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]],
next_state: Optional[D.T_state] = None
) -> StrDict[Value[D.T_value]]
Get the value (reward or cost) of a transition.
The transition to consider is defined by the function parameters.
TIP
If this function never depends on the next_state parameter for its computation, it is recommended to
indicate it by overriding UncertainTransitions._is_transition_value_dependent_on_next_state_()
to return
False. This information can then be exploited by solvers to avoid computing next state to evaluate a
transition value (more efficient).
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
- next_state: The next state in which the transition ends (if needed for the computation).
# Returns
The transition value (reward or cost).
# _init_memory History
_init_memory(
self,
state: Optional[D.T_state] = None
) -> Memory[D.T_state]
Initialize memory (possibly with a state) according to its specification and return it.
This function is automatically called by Initializable._reset()
to reinitialize the internal memory whenever
the domain is used as an environment.
# Parameters
- state: An optional state to initialize the memory with (typically the initial state).
# Returns
The new initialized memory.
# _is_action Events
_is_action(
self,
event: D.T_event
) -> bool
Indicate whether an event is an action (i.e. a controllable event for the agents).
TIP
By default, this function is implemented using the skdecide.core.Space.contains()
function on the domain
action space provided by Events._get_action_space()
, but it can be overridden for faster implementations.
# Parameters
- event: The event to consider.
# Returns
True if the event is an action (False otherwise).
# _is_applicable_action Events
_is_applicable_action(
self,
action: StrDict[D.T_event],
memory: Optional[Memory[D.T_state]] = None
) -> bool
Indicate whether an action is applicable in the given memory (state or history), or in the internal one if omitted.
By default, Events._is_applicable_action()
provides some boilerplate code and internally
calls Events._is_applicable_action_from()
. The boilerplate code automatically passes the _memory
attribute
instead of the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
True if the action is applicable (False otherwise).
# _is_applicable_action_from Events
_is_applicable_action_from(
self,
action: StrDict[D.T_event],
memory: Memory[D.T_state]
) -> bool
Indicate whether an action is applicable in the given memory (state or history).
This is a helper function called by default from Events._is_applicable_action()
, the difference being that the
memory parameter is mandatory here.
TIP
By default, this function is implemented using the skdecide.core.Space.contains()
function on the space of
applicable actions provided by Events._get_applicable_actions_from()
, but it can be overridden for faster
implementations.
# Parameters
- memory: The memory to consider.
# Returns
True if the action is applicable (False otherwise).
# _is_enabled_event Events
_is_enabled_event(
self,
event: D.T_event,
memory: Optional[Memory[D.T_state]] = None
) -> bool
Indicate whether an uncontrollable event is enabled in the given memory (state or history), or in the internal one if omitted.
By default, Events._is_enabled_event()
provides some boilerplate code and internally
calls Events._is_enabled_event_from()
. The boilerplate code automatically passes the _memory
attribute instead
of the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
True if the event is enabled (False otherwise).
# _is_enabled_event_from Events
_is_enabled_event_from(
self,
event: D.T_event,
memory: Memory[D.T_state]
) -> bool
Indicate whether an event is enabled in the given memory (state or history).
This is a helper function called by default from Events._is_enabled_event()
, the difference being that the
memory parameter is mandatory here.
TIP
By default, this function is implemented using the skdecide.core.Space.contains()
function on the space of
enabled events provided by Events._get_enabled_events_from()
, but it can be overridden for faster
implementations.
# Parameters
- memory: The memory to consider.
# Returns
True if the event is enabled (False otherwise).
# _is_goal Goals
_is_goal(
self,
observation: StrDict[D.T_observation]
) -> StrDict[D.T_predicate]
Indicate whether an observation belongs to the goals.
TIP
By default, this function is implemented using the skdecide.core.Space.contains()
function on the domain
goals space provided by Goals._get_goals()
, but it can be overridden for faster implementations.
# Parameters
- observation: The observation to consider.
# Returns
True if the observation is a goal (False otherwise).
# _is_observation PartiallyObservable
_is_observation(
self,
observation: StrDict[D.T_observation]
) -> bool
Check that an observation indeed belongs to the domain observation space.
TIP
By default, this function is implemented using the skdecide.core.Space.contains()
function on the domain
observation space provided by PartiallyObservable._get_observation_space()
, but it can be overridden for
faster implementations.
# Parameters
- observation: The observation to consider.
# Returns
True if the observation belongs to the domain observation space (False otherwise).
# _is_positive PositiveCosts
_is_positive(
self,
cost: D.T_value
) -> bool
Determine if a value is positive (can be overridden for advanced value types).
# Parameters
- cost: The cost to evaluate.
# Returns
True if the cost is positive (False otherwise).
# _is_terminal UncertainTransitions
_is_terminal(
self,
state: D.T_state
) -> StrDict[D.T_predicate]
Indicate whether a state is terminal.
A terminal state is a state with no outgoing transition (except to itself with value 0).
# Parameters
- state: The state to consider.
# Returns
True if the state is terminal (False otherwise).
# _is_transition_value_dependent_on_next_state UncertainTransitions
_is_transition_value_dependent_on_next_state(
self
) -> bool
Indicate whether _get_transition_value() requires the next_state parameter for its computation (cached).
By default, UncertainTransitions._is_transition_value_dependent_on_next_state()
internally
calls UncertainTransitions._is_transition_value_dependent_on_next_state_()
the first time and automatically
caches its value to make future calls more efficient (since the returned value is assumed to be constant).
# Returns
True if the transition value computation depends on next_state (False otherwise).
# _is_transition_value_dependent_on_next_state_ UncertainTransitions
_is_transition_value_dependent_on_next_state_(
self
) -> bool
Indicate whether _get_transition_value() requires the next_state parameter for its computation.
This is a helper function called by default
from UncertainTransitions._is_transition_value_dependent_on_next_state()
, the difference being that the result
is not cached here.
TIP
The underscore at the end of this function's name is a convention to remind that its result should be constant.
# Returns
True if the transition value computation depends on next_state (False otherwise).
# _reset Initializable
_reset(
self
) -> StrDict[D.T_observation]
Reset the state of the environment and return an initial observation.
By default, Initializable._reset()
provides some boilerplate code and internally
calls Initializable._state_reset()
(which returns an initial state). The boilerplate code automatically stores
the initial state into the _memory
attribute and samples a corresponding observation.
# Returns
An initial observation.
# _sample Simulation
_sample(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Sample one transition of the simulator's dynamics.
By default, Simulation._sample()
provides some boilerplate code and internally
calls Simulation._state_sample()
(which returns a transition outcome). The boilerplate code automatically
samples an observation corresponding to the sampled next state.
TIP
Whenever an existing simulator needs to be wrapped instead of implemented fully in scikit-decide (e.g. a
simulator), it is recommended to overwrite Simulation._sample()
to call the external simulator and not use
the Simulation._state_sample()
helper function.
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
# Returns
The environment outcome of the sampled transition.
# _set_memory Simulation
_set_memory(
self,
memory: Memory[D.T_state]
) -> None
Set internal memory attribute _memory
to given one.
This can be useful to set a specific "starting point" before doing a rollout with
successive Environment._step()
calls.
# Parameters
- memory: The memory to set internally.
# Example
# Set simulation_domain memory to my_state (assuming Markovian domain)
simulation_domain._set_memory(my_state)
# Start a 100-steps rollout from here (applying my_action at every step)
for _ in range(100):
simulation_domain._step(my_action)
# _state_reset Initializable
_state_reset(
self
) -> D.T_state
Reset the state of the environment and return an initial state.
This is a helper function called by default from Initializable._reset()
. It focuses on the state level, as
opposed to the observation one for the latter.
# Returns
An initial state.
# _state_sample Simulation
_state_sample(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> TransitionOutcome[D.T_state, StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Compute one sample of the transition's dynamics.
This is a helper function called by default from Simulation._sample()
. It focuses on the state level, as
opposed to the observation one for the latter.
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
# Returns
The transition outcome of the sampled transition.
# _state_step Environment
_state_step(
self,
action: StrDict[list[D.T_event]]
) -> TransitionOutcome[D.T_state, StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Compute one step of the transition's dynamics.
This is a helper function called by default from Environment._step()
. It focuses on the state level, as opposed
to the observation one for the latter.
# Parameters
- action: The action taken in the current memory (state or history) triggering the transition.
# Returns
The transition outcome of this step.
# _step Environment
_step(
self,
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Run one step of the environment's dynamics.
By default, Environment._step()
provides some boilerplate code and internally
calls Environment._state_step()
(which returns a transition outcome). The boilerplate code automatically stores
next state into the _memory
attribute and samples a corresponding observation.
TIP
Whenever an existing environment needs to be wrapped instead of implemented fully in scikit-decide (e.g. compiled
ATARI games), it is recommended to overwrite Environment._step()
to call the external environment and not
use the Environment._state_step()
helper function.
WARNING
Before calling Environment._step()
the first time or when the end of an episode is
reached, Initializable._reset()
must be called to reset the environment's state.
# Parameters
- action: The action taken in the current memory (state or history) triggering the transition.
# Returns
The environment outcome of this step.
# GoalPOMDPDomain
This is a typical Goal Partially Observable Markov Decision Process domain class.
This helper class can be used as an alternate base class for domains, inheriting the following:
- Domain
- SingleAgent
- Sequential
- EnumerableTransitions
- Actions
- Goals
- UncertainInitialized
- Markovian
- PartiallyObservable
- PositiveCosts
Typical use:
class D(GoalPOMDPDomain)
TIP
It is also possible to refine any alternate base class, like for instance:
class D(RLDomain, FullyObservable)
# check_value Rewards
check_value(
self,
value: Value[D.T_value]
) -> bool
Check that a value is compliant with its reward specification.
TIP
This function returns always True by default because any kind of reward should be accepted at this level.
# Parameters
- value: The value to check.
# Returns
True if the value is compliant (False otherwise).
# get_action_mask Events
get_action_mask(
self,
memory: Optional[Memory[D.T_state]] = None
) -> StrDict[Mask]
Get action mask for the given memory or internal one if omitted.
An action mask is another (more specific) format for applicable actions, that has a meaning only if the action space can be iterated over in some way. It is represented by a flat array of 0's and 1's ordered as the actions when enumerated: 1 for an applicable action, and 0 for a not applicable action.
More precisely, this implementation makes the assumption that each agent action space is an EnumerableSpace
,
and calls internally self.get_applicable_action()
.
The action mask is used for instance by RL solvers to shut down logits associated to non-applicable actions in the output of their internal neural network.
# Parameters
- memory: The memory to consider. If None, works on the internal memory of the domain.
# Returns
a numpy array (or dict agent-> numpy array for multi-agent domains) with 0-1 indicating applicability of the action (1 meaning applicable and 0 not applicable)
# get_action_space Events
get_action_space(
self
) -> StrDict[Space[D.T_event]]
Get the (cached) domain action space (finite or infinite set).
By default, Events.get_action_space()
internally calls Events._get_action_space_()
the first time and
automatically caches its value to make future calls more efficient (since the action space is assumed to be
constant).
# Returns
The action space.
# get_agents MultiAgent
get_agents(
self
) -> set[str]
Return a singleton for single agent domains.
We must be here consistent with skdecide.core.autocast()
which transforms a single agent domain
into a multi agents domain whose only agent has the id "agent".
# get_applicable_actions Events
get_applicable_actions(
self,
memory: Optional[Memory[D.T_state]] = None
) -> StrDict[Space[D.T_event]]
Get the space (finite or infinite set) of applicable actions in the given memory (state or history), or in the internal one if omitted.
By default, Events.get_applicable_actions()
provides some boilerplate code and internally
calls Events._get_applicable_actions()
. The boilerplate code automatically passes the _memory
attribute
instead of the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
The space of applicable actions.
# get_enabled_events Events
get_enabled_events(
self,
memory: Optional[Memory[D.T_state]] = None
) -> Space[D.T_event]
Get the space (finite or infinite set) of enabled uncontrollable events in the given memory (state or history), or in the internal one if omitted.
By default, Events.get_enabled_events()
provides some boilerplate code and internally
calls Events._get_enabled_events()
. The boilerplate code automatically passes the _memory
attribute instead of
the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
The space of enabled events.
# get_goals Goals
get_goals(
self
) -> StrDict[Space[D.T_observation]]
Get the (cached) domain goals space (finite or infinite set).
By default, Goals.get_goals()
internally calls Goals._get_goals_()
the first time and automatically caches its
value to make future calls more efficient (since the goals space is assumed to be constant).
WARNING
Goal states are assumed to be fully observable (i.e. observation = state) so that there is never uncertainty about whether the goal has been reached or not. This assumption guarantees that any policy that does not reach the goal with certainty incurs in infinite expected cost. - Geffner, 2013: A Concise Introduction to Models and Methods for Automated Planning
# Returns
The goals space.
# get_initial_state_distribution UncertainInitialized
get_initial_state_distribution(
self
) -> Distribution[D.T_state]
Get the (cached) probability distribution of initial states.
By default, UncertainInitialized.get_initial_state_distribution()
internally
calls UncertainInitialized._get_initial_state_distribution_()
the first time and automatically caches its value
to make future calls more efficient (since the initial state distribution is assumed to be constant).
# Returns
The probability distribution of initial states.
# get_next_state_distribution UncertainTransitions
get_next_state_distribution(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> DiscreteDistribution[D.T_state]
Get the discrete probability distribution of next state given a memory and action.
TIP
In the Markovian case (memory only holds last state
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
# Returns
The discrete probability distribution of next state.
# get_observation_distribution PartiallyObservable
get_observation_distribution(
self,
state: D.T_state,
action: Optional[StrDict[list[D.T_event]]] = None
) -> Distribution[StrDict[D.T_observation]]
Get the probability distribution of the observation given a state and action.
In mathematical terms (discrete case), given an action
# Parameters
- state: The state to be observed.
- action: The last applied action (or None if the state is an initial state).
# Returns
The probability distribution of the observation.
# get_observation_space PartiallyObservable
get_observation_space(
self
) -> StrDict[Space[D.T_observation]]
Get the (cached) observation space (finite or infinite set).
By default, PartiallyObservable.get_observation_space()
internally
calls PartiallyObservable._get_observation_space_()
the first time and automatically caches its value to make
future calls more efficient (since the observation space is assumed to be constant).
# Returns
The observation space.
# get_transition_value UncertainTransitions
get_transition_value(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]],
next_state: Optional[D.T_state] = None
) -> StrDict[Value[D.T_value]]
Get the value (reward or cost) of a transition.
The transition to consider is defined by the function parameters.
TIP
If this function never depends on the next_state parameter for its computation, it is recommended to
indicate it by overriding UncertainTransitions._is_transition_value_dependent_on_next_state_()
to return
False. This information can then be exploited by solvers to avoid computing next state to evaluate a
transition value (more efficient).
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
- next_state: The next state in which the transition ends (if needed for the computation).
# Returns
The transition value (reward or cost).
# is_action Events
is_action(
self,
event: D.T_event
) -> bool
Indicate whether an event is an action (i.e. a controllable event for the agents).
TIP
By default, this function is implemented using the skdecide.core.Space.contains()
function on the domain
action space provided by Events.get_action_space()
, but it can be overridden for faster implementations.
# Parameters
- event: The event to consider.
# Returns
True if the event is an action (False otherwise).
# is_applicable_action Events
is_applicable_action(
self,
action: StrDict[D.T_event],
memory: Optional[Memory[D.T_state]] = None
) -> bool
Indicate whether an action is applicable in the given memory (state or history), or in the internal one if omitted.
By default, Events.is_applicable_action()
provides some boilerplate code and internally
calls Events._is_applicable_action()
. The boilerplate code automatically passes the _memory
attribute instead
of the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
True if the action is applicable (False otherwise).
# is_enabled_event Events
is_enabled_event(
self,
event: D.T_event,
memory: Optional[Memory[D.T_state]] = None
) -> bool
Indicate whether an uncontrollable event is enabled in the given memory (state or history), or in the internal one if omitted.
By default, Events.is_enabled_event()
provides some boilerplate code and internally
calls Events._is_enabled_event()
. The boilerplate code automatically passes the _memory
attribute instead of
the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
True if the event is enabled (False otherwise).
# is_goal Goals
is_goal(
self,
observation: StrDict[D.T_observation]
) -> StrDict[D.T_predicate]
Indicate whether an observation belongs to the goals.
TIP
By default, this function is implemented using the skdecide.core.Space.contains()
function on the domain
goals space provided by Goals.get_goals()
, but it can be overridden for faster implementations.
# Parameters
- observation: The observation to consider.
# Returns
True if the observation is a goal (False otherwise).
# is_observation PartiallyObservable
is_observation(
self,
observation: StrDict[D.T_observation]
) -> bool
Check that an observation indeed belongs to the domain observation space.
TIP
By default, this function is implemented using the skdecide.core.Space.contains()
function on the domain
observation space provided by PartiallyObservable.get_observation_space()
, but it can be overridden for
faster implementations.
# Parameters
- observation: The observation to consider.
# Returns
True if the observation belongs to the domain observation space (False otherwise).
# is_terminal UncertainTransitions
is_terminal(
self,
state: D.T_state
) -> StrDict[D.T_predicate]
Indicate whether a state is terminal.
A terminal state is a state with no outgoing transition (except to itself with value 0).
# Parameters
- state: The state to consider.
# Returns
True if the state is terminal (False otherwise).
# is_transition_value_dependent_on_next_state UncertainTransitions
is_transition_value_dependent_on_next_state(
self
) -> bool
Indicate whether get_transition_value() requires the next_state parameter for its computation (cached).
By default, UncertainTransitions.is_transition_value_dependent_on_next_state()
internally
calls UncertainTransitions._is_transition_value_dependent_on_next_state_()
the first time and automatically
caches its value to make future calls more efficient (since the returned value is assumed to be constant).
# Returns
True if the transition value computation depends on next_state (False otherwise).
# reset Initializable
reset(
self
) -> StrDict[D.T_observation]
Reset the state of the environment and return an initial observation.
By default, Initializable.reset()
provides some boilerplate code and internally calls Initializable._reset()
(which returns an initial state). The boilerplate code automatically stores the initial state into the _memory
attribute and samples a corresponding observation.
# Returns
An initial observation.
# sample Simulation
sample(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Sample one transition of the simulator's dynamics.
By default, Simulation.sample()
provides some boilerplate code and internally calls Simulation._sample()
(which returns a transition outcome). The boilerplate code automatically samples an observation corresponding to
the sampled next state.
TIP
Whenever an existing simulator needs to be wrapped instead of implemented fully in scikit-decide (e.g. a
simulator), it is recommended to overwrite Simulation.sample()
to call the external simulator and not use
the Simulation._sample()
helper function.
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
# Returns
The environment outcome of the sampled transition.
# set_memory Simulation
set_memory(
self,
memory: Memory[D.T_state]
) -> None
Set internal memory attribute _memory
to given one.
This can be useful to set a specific "starting point" before doing a rollout with successive Environment.step()
calls.
# Parameters
- memory: The memory to set internally.
# Example
# Set simulation_domain memory to my_state (assuming Markovian domain)
simulation_domain.set_memory(my_state)
# Start a 100-steps rollout from here (applying my_action at every step)
for _ in range(100):
simulation_domain.step(my_action)
# step Environment
step(
self,
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Run one step of the environment's dynamics.
By default, Environment.step()
provides some boilerplate code and internally calls Environment._step()
(which
returns a transition outcome). The boilerplate code automatically stores next state into the _memory
attribute
and samples a corresponding observation.
TIP
Whenever an existing environment needs to be wrapped instead of implemented fully in scikit-decide (e.g. compiled
ATARI games), it is recommended to overwrite Environment.step()
to call the external environment and not
use the Environment._step()
helper function.
WARNING
Before calling Environment.step()
the first time or when the end of an episode is
reached, Initializable.reset()
must be called to reset the environment's state.
# Parameters
- action: The action taken in the current memory (state or history) triggering the transition.
# Returns
The environment outcome of this step.
# _check_value Rewards
_check_value(
self,
value: Value[D.T_value]
) -> bool
Check that a value is compliant with its cost specification (must be positive).
TIP
This function calls PositiveCost._is_positive()
to determine if a value is positive (can be overridden for
advanced value types).
# Parameters
- value: The value to check.
# Returns
True if the value is compliant (False otherwise).
# _get_action_mask Events
_get_action_mask(
self,
memory: Optional[Memory[D.T_state]] = None
) -> StrDict[Mask]
Get action mask for the given memory or internal one if omitted.
An action mask is another (more specific) format for applicable actions, that has a meaning only if the action space can be iterated over in some way. It is represented by a flat array of 0's and 1's ordered as the actions when enumerated: 1 for an applicable action, and 0 for a not applicable action.
More precisely, this implementation makes the assumption that each agent action space is an EnumerableSpace
,
and calls internally self.get_applicable_action()
.
The action mask is used for instance by RL solvers to shut down logits associated to non-applicable actions in the output of their internal neural network.
# Parameters
- memory: The memory to consider. If None, works on the internal memory of the domain.
# Returns
a numpy array (or dict agent-> numpy array for multi-agent domains) with 0-1 indicating applicability of the action (1 meaning applicable and 0 not applicable)
# _get_action_space Events
_get_action_space(
self
) -> StrDict[Space[D.T_event]]
Get the (cached) domain action space (finite or infinite set).
By default, Events._get_action_space()
internally calls Events._get_action_space_()
the first time and
automatically caches its value to make future calls more efficient (since the action space is assumed to be
constant).
# Returns
The action space.
# _get_action_space_ Events
_get_action_space_(
self
) -> StrDict[Space[D.T_event]]
Get the domain action space (finite or infinite set).
This is a helper function called by default from Events._get_action_space()
, the difference being that the
result is not cached here.
TIP
The underscore at the end of this function's name is a convention to remind that its result should be constant.
# Returns
The action space.
# _get_applicable_actions Events
_get_applicable_actions(
self,
memory: Optional[Memory[D.T_state]] = None
) -> StrDict[Space[D.T_event]]
Get the space (finite or infinite set) of applicable actions in the given memory (state or history), or in the internal one if omitted.
By default, Events._get_applicable_actions()
provides some boilerplate code and internally
calls Events._get_applicable_actions_from()
. The boilerplate code automatically passes the _memory
attribute
instead of the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
The space of applicable actions.
# _get_applicable_actions_from Events
_get_applicable_actions_from(
self,
memory: Memory[D.T_state]
) -> StrDict[Space[D.T_event]]
Get the space (finite or infinite set) of applicable actions in the given memory (state or history).
This is a helper function called by default from Events._get_applicable_actions()
, the difference being that
the memory parameter is mandatory here.
# Parameters
- memory: The memory to consider.
# Returns
The space of applicable actions.
# _get_enabled_events Events
_get_enabled_events(
self,
memory: Optional[Memory[D.T_state]] = None
) -> Space[D.T_event]
Get the space (finite or infinite set) of enabled uncontrollable events in the given memory (state or history), or in the internal one if omitted.
By default, Events._get_enabled_events()
provides some boilerplate code and internally
calls Events._get_enabled_events_from()
. The boilerplate code automatically passes the _memory
attribute
instead of the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
The space of enabled events.
# _get_enabled_events_from Events
_get_enabled_events_from(
self,
memory: Memory[D.T_state]
) -> Space[D.T_event]
Get the space (finite or infinite set) of enabled uncontrollable events in the given memory (state or history).
This is a helper function called by default from Events._get_enabled_events()
, the difference being that the
memory parameter is mandatory here.
# Parameters
- memory: The memory to consider.
# Returns
The space of enabled events.
# _get_goals Goals
_get_goals(
self
) -> StrDict[Space[D.T_observation]]
Get the (cached) domain goals space (finite or infinite set).
By default, Goals._get_goals()
internally calls Goals._get_goals_()
the first time and automatically caches
its value to make future calls more efficient (since the goals space is assumed to be constant).
WARNING
Goal states are assumed to be fully observable (i.e. observation = state) so that there is never uncertainty about whether the goal has been reached or not. This assumption guarantees that any policy that does not reach the goal with certainty incurs in infinite expected cost. - Geffner, 2013: A Concise Introduction to Models and Methods for Automated Planning
# Returns
The goals space.
# _get_goals_ Goals
_get_goals_(
self
) -> StrDict[Space[D.T_observation]]
Get the domain goals space (finite or infinite set).
This is a helper function called by default from Goals._get_goals()
, the difference being that the result is
not cached here.
TIP
The underscore at the end of this function's name is a convention to remind that its result should be constant.
# Returns
The goals space.
# _get_initial_state_distribution UncertainInitialized
_get_initial_state_distribution(
self
) -> Distribution[D.T_state]
Get the (cached) probability distribution of initial states.
By default, UncertainInitialized._get_initial_state_distribution()
internally
calls UncertainInitialized._get_initial_state_distribution_()
the first time and automatically caches its value
to make future calls more efficient (since the initial state distribution is assumed to be constant).
# Returns
The probability distribution of initial states.
# _get_initial_state_distribution_ UncertainInitialized
_get_initial_state_distribution_(
self
) -> Distribution[D.T_state]
Get the probability distribution of initial states.
This is a helper function called by default from UncertainInitialized._get_initial_state_distribution()
, the
difference being that the result is not cached here.
TIP
The underscore at the end of this function's name is a convention to remind that its result should be constant.
# Returns
The probability distribution of initial states.
# _get_memory_maxlen History
_get_memory_maxlen(
self
) -> int
Get the (cached) memory max length.
By default, FiniteHistory._get_memory_maxlen()
internally calls FiniteHistory._get_memory_maxlen_()
the first
time and automatically caches its value to make future calls more efficient (since the memory max length is
assumed to be constant).
# Returns
The memory max length.
# _get_memory_maxlen_ FiniteHistory
_get_memory_maxlen_(
self
) -> int
Get the memory max length.
This is a helper function called by default from FiniteHistory._get_memory_maxlen()
, the difference being that
the result is not cached here.
TIP
The underscore at the end of this function's name is a convention to remind that its result should be constant.
# Returns
The memory max length.
# _get_next_state_distribution UncertainTransitions
_get_next_state_distribution(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> DiscreteDistribution[D.T_state]
Get the discrete probability distribution of next state given a memory and action.
TIP
In the Markovian case (memory only holds last state
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
# Returns
The discrete probability distribution of next state.
# _get_observation_distribution PartiallyObservable
_get_observation_distribution(
self,
state: D.T_state,
action: Optional[StrDict[list[D.T_event]]] = None
) -> Distribution[StrDict[D.T_observation]]
Get the probability distribution of the observation given a state and action.
In mathematical terms (discrete case), given an action
# Parameters
- state: The state to be observed.
- action: The last applied action (or None if the state is an initial state).
# Returns
The probability distribution of the observation.
# _get_observation_space PartiallyObservable
_get_observation_space(
self
) -> StrDict[Space[D.T_observation]]
Get the (cached) observation space (finite or infinite set).
By default, PartiallyObservable._get_observation_space()
internally
calls PartiallyObservable._get_observation_space_()
the first time and automatically caches its value to make
future calls more efficient (since the observation space is assumed to be constant).
# Returns
The observation space.
# _get_observation_space_ PartiallyObservable
_get_observation_space_(
self
) -> StrDict[Space[D.T_observation]]
Get the observation space (finite or infinite set).
This is a helper function called by default from PartiallyObservable._get_observation_space()
, the difference
being that the result is not cached here.
TIP
The underscore at the end of this function's name is a convention to remind that its result should be constant.
# Returns
The observation space.
# _get_transition_value UncertainTransitions
_get_transition_value(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]],
next_state: Optional[D.T_state] = None
) -> StrDict[Value[D.T_value]]
Get the value (reward or cost) of a transition.
The transition to consider is defined by the function parameters.
TIP
If this function never depends on the next_state parameter for its computation, it is recommended to
indicate it by overriding UncertainTransitions._is_transition_value_dependent_on_next_state_()
to return
False. This information can then be exploited by solvers to avoid computing next state to evaluate a
transition value (more efficient).
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
- next_state: The next state in which the transition ends (if needed for the computation).
# Returns
The transition value (reward or cost).
# _init_memory History
_init_memory(
self,
state: Optional[D.T_state] = None
) -> Memory[D.T_state]
Initialize memory (possibly with a state) according to its specification and return it.
This function is automatically called by Initializable._reset()
to reinitialize the internal memory whenever
the domain is used as an environment.
# Parameters
- state: An optional state to initialize the memory with (typically the initial state).
# Returns
The new initialized memory.
# _is_action Events
_is_action(
self,
event: D.T_event
) -> bool
Indicate whether an event is an action (i.e. a controllable event for the agents).
TIP
By default, this function is implemented using the skdecide.core.Space.contains()
function on the domain
action space provided by Events._get_action_space()
, but it can be overridden for faster implementations.
# Parameters
- event: The event to consider.
# Returns
True if the event is an action (False otherwise).
# _is_applicable_action Events
_is_applicable_action(
self,
action: StrDict[D.T_event],
memory: Optional[Memory[D.T_state]] = None
) -> bool
Indicate whether an action is applicable in the given memory (state or history), or in the internal one if omitted.
By default, Events._is_applicable_action()
provides some boilerplate code and internally
calls Events._is_applicable_action_from()
. The boilerplate code automatically passes the _memory
attribute
instead of the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
True if the action is applicable (False otherwise).
# _is_applicable_action_from Events
_is_applicable_action_from(
self,
action: StrDict[D.T_event],
memory: Memory[D.T_state]
) -> bool
Indicate whether an action is applicable in the given memory (state or history).
This is a helper function called by default from Events._is_applicable_action()
, the difference being that the
memory parameter is mandatory here.
TIP
By default, this function is implemented using the skdecide.core.Space.contains()
function on the space of
applicable actions provided by Events._get_applicable_actions_from()
, but it can be overridden for faster
implementations.
# Parameters
- memory: The memory to consider.
# Returns
True if the action is applicable (False otherwise).
# _is_enabled_event Events
_is_enabled_event(
self,
event: D.T_event,
memory: Optional[Memory[D.T_state]] = None
) -> bool
Indicate whether an uncontrollable event is enabled in the given memory (state or history), or in the internal one if omitted.
By default, Events._is_enabled_event()
provides some boilerplate code and internally
calls Events._is_enabled_event_from()
. The boilerplate code automatically passes the _memory
attribute instead
of the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
True if the event is enabled (False otherwise).
# _is_enabled_event_from Events
_is_enabled_event_from(
self,
event: D.T_event,
memory: Memory[D.T_state]
) -> bool
Indicate whether an event is enabled in the given memory (state or history).
This is a helper function called by default from Events._is_enabled_event()
, the difference being that the
memory parameter is mandatory here.
TIP
By default, this function is implemented using the skdecide.core.Space.contains()
function on the space of
enabled events provided by Events._get_enabled_events_from()
, but it can be overridden for faster
implementations.
# Parameters
- memory: The memory to consider.
# Returns
True if the event is enabled (False otherwise).
# _is_goal Goals
_is_goal(
self,
observation: StrDict[D.T_observation]
) -> StrDict[D.T_predicate]
Indicate whether an observation belongs to the goals.
TIP
By default, this function is implemented using the skdecide.core.Space.contains()
function on the domain
goals space provided by Goals._get_goals()
, but it can be overridden for faster implementations.
# Parameters
- observation: The observation to consider.
# Returns
True if the observation is a goal (False otherwise).
# _is_observation PartiallyObservable
_is_observation(
self,
observation: StrDict[D.T_observation]
) -> bool
Check that an observation indeed belongs to the domain observation space.
TIP
By default, this function is implemented using the skdecide.core.Space.contains()
function on the domain
observation space provided by PartiallyObservable._get_observation_space()
, but it can be overridden for
faster implementations.
# Parameters
- observation: The observation to consider.
# Returns
True if the observation belongs to the domain observation space (False otherwise).
# _is_positive PositiveCosts
_is_positive(
self,
cost: D.T_value
) -> bool
Determine if a value is positive (can be overridden for advanced value types).
# Parameters
- cost: The cost to evaluate.
# Returns
True if the cost is positive (False otherwise).
# _is_terminal UncertainTransitions
_is_terminal(
self,
state: D.T_state
) -> StrDict[D.T_predicate]
Indicate whether a state is terminal.
A terminal state is a state with no outgoing transition (except to itself with value 0).
# Parameters
- state: The state to consider.
# Returns
True if the state is terminal (False otherwise).
# _is_transition_value_dependent_on_next_state UncertainTransitions
_is_transition_value_dependent_on_next_state(
self
) -> bool
Indicate whether _get_transition_value() requires the next_state parameter for its computation (cached).
By default, UncertainTransitions._is_transition_value_dependent_on_next_state()
internally
calls UncertainTransitions._is_transition_value_dependent_on_next_state_()
the first time and automatically
caches its value to make future calls more efficient (since the returned value is assumed to be constant).
# Returns
True if the transition value computation depends on next_state (False otherwise).
# _is_transition_value_dependent_on_next_state_ UncertainTransitions
_is_transition_value_dependent_on_next_state_(
self
) -> bool
Indicate whether _get_transition_value() requires the next_state parameter for its computation.
This is a helper function called by default
from UncertainTransitions._is_transition_value_dependent_on_next_state()
, the difference being that the result
is not cached here.
TIP
The underscore at the end of this function's name is a convention to remind that its result should be constant.
# Returns
True if the transition value computation depends on next_state (False otherwise).
# _reset Initializable
_reset(
self
) -> StrDict[D.T_observation]
Reset the state of the environment and return an initial observation.
By default, Initializable._reset()
provides some boilerplate code and internally
calls Initializable._state_reset()
(which returns an initial state). The boilerplate code automatically stores
the initial state into the _memory
attribute and samples a corresponding observation.
# Returns
An initial observation.
# _sample Simulation
_sample(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Sample one transition of the simulator's dynamics.
By default, Simulation._sample()
provides some boilerplate code and internally
calls Simulation._state_sample()
(which returns a transition outcome). The boilerplate code automatically
samples an observation corresponding to the sampled next state.
TIP
Whenever an existing simulator needs to be wrapped instead of implemented fully in scikit-decide (e.g. a
simulator), it is recommended to overwrite Simulation._sample()
to call the external simulator and not use
the Simulation._state_sample()
helper function.
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
# Returns
The environment outcome of the sampled transition.
# _set_memory Simulation
_set_memory(
self,
memory: Memory[D.T_state]
) -> None
Set internal memory attribute _memory
to given one.
This can be useful to set a specific "starting point" before doing a rollout with
successive Environment._step()
calls.
# Parameters
- memory: The memory to set internally.
# Example
# Set simulation_domain memory to my_state (assuming Markovian domain)
simulation_domain._set_memory(my_state)
# Start a 100-steps rollout from here (applying my_action at every step)
for _ in range(100):
simulation_domain._step(my_action)
# _state_reset Initializable
_state_reset(
self
) -> D.T_state
Reset the state of the environment and return an initial state.
This is a helper function called by default from Initializable._reset()
. It focuses on the state level, as
opposed to the observation one for the latter.
# Returns
An initial state.
# _state_sample Simulation
_state_sample(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> TransitionOutcome[D.T_state, StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Compute one sample of the transition's dynamics.
This is a helper function called by default from Simulation._sample()
. It focuses on the state level, as
opposed to the observation one for the latter.
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
# Returns
The transition outcome of the sampled transition.
# _state_step Environment
_state_step(
self,
action: StrDict[list[D.T_event]]
) -> TransitionOutcome[D.T_state, StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Compute one step of the transition's dynamics.
This is a helper function called by default from Environment._step()
. It focuses on the state level, as opposed
to the observation one for the latter.
# Parameters
- action: The action taken in the current memory (state or history) triggering the transition.
# Returns
The transition outcome of this step.
# _step Environment
_step(
self,
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Run one step of the environment's dynamics.
By default, Environment._step()
provides some boilerplate code and internally
calls Environment._state_step()
(which returns a transition outcome). The boilerplate code automatically stores
next state into the _memory
attribute and samples a corresponding observation.
TIP
Whenever an existing environment needs to be wrapped instead of implemented fully in scikit-decide (e.g. compiled
ATARI games), it is recommended to overwrite Environment._step()
to call the external environment and not
use the Environment._state_step()
helper function.
WARNING
Before calling Environment._step()
the first time or when the end of an episode is
reached, Initializable._reset()
must be called to reset the environment's state.
# Parameters
- action: The action taken in the current memory (state or history) triggering the transition.
# Returns
The environment outcome of this step.
# DeterministicPlanningDomain
This is a typical deterministic planning domain class.
This helper class can be used as an alternate base class for domains, inheriting the following:
- Domain
- SingleAgent
- Sequential
- DeterministicTransitions
- Actions
- Goals
- DeterministicInitialized
- Markovian
- FullyObservable
- PositiveCosts
Typical use:
class D(DeterministicPlanningDomain)
TIP
It is also possible to refine any alternate base class, like for instance:
class D(RLDomain, FullyObservable)
# check_value Rewards
check_value(
self,
value: Value[D.T_value]
) -> bool
Check that a value is compliant with its reward specification.
TIP
This function returns always True by default because any kind of reward should be accepted at this level.
# Parameters
- value: The value to check.
# Returns
True if the value is compliant (False otherwise).
# get_action_mask Events
get_action_mask(
self,
memory: Optional[Memory[D.T_state]] = None
) -> StrDict[Mask]
Get action mask for the given memory or internal one if omitted.
An action mask is another (more specific) format for applicable actions, that has a meaning only if the action space can be iterated over in some way. It is represented by a flat array of 0's and 1's ordered as the actions when enumerated: 1 for an applicable action, and 0 for a not applicable action.
More precisely, this implementation makes the assumption that each agent action space is an EnumerableSpace
,
and calls internally self.get_applicable_action()
.
The action mask is used for instance by RL solvers to shut down logits associated to non-applicable actions in the output of their internal neural network.
# Parameters
- memory: The memory to consider. If None, works on the internal memory of the domain.
# Returns
a numpy array (or dict agent-> numpy array for multi-agent domains) with 0-1 indicating applicability of the action (1 meaning applicable and 0 not applicable)
# get_action_space Events
get_action_space(
self
) -> StrDict[Space[D.T_event]]
Get the (cached) domain action space (finite or infinite set).
By default, Events.get_action_space()
internally calls Events._get_action_space_()
the first time and
automatically caches its value to make future calls more efficient (since the action space is assumed to be
constant).
# Returns
The action space.
# get_agents MultiAgent
get_agents(
self
) -> set[str]
Return a singleton for single agent domains.
We must be here consistent with skdecide.core.autocast()
which transforms a single agent domain
into a multi agents domain whose only agent has the id "agent".
# get_applicable_actions Events
get_applicable_actions(
self,
memory: Optional[Memory[D.T_state]] = None
) -> StrDict[Space[D.T_event]]
Get the space (finite or infinite set) of applicable actions in the given memory (state or history), or in the internal one if omitted.
By default, Events.get_applicable_actions()
provides some boilerplate code and internally
calls Events._get_applicable_actions()
. The boilerplate code automatically passes the _memory
attribute
instead of the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
The space of applicable actions.
# get_enabled_events Events
get_enabled_events(
self,
memory: Optional[Memory[D.T_state]] = None
) -> Space[D.T_event]
Get the space (finite or infinite set) of enabled uncontrollable events in the given memory (state or history), or in the internal one if omitted.
By default, Events.get_enabled_events()
provides some boilerplate code and internally
calls Events._get_enabled_events()
. The boilerplate code automatically passes the _memory
attribute instead of
the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
The space of enabled events.
# get_goals Goals
get_goals(
self
) -> StrDict[Space[D.T_observation]]
Get the (cached) domain goals space (finite or infinite set).
By default, Goals.get_goals()
internally calls Goals._get_goals_()
the first time and automatically caches its
value to make future calls more efficient (since the goals space is assumed to be constant).
WARNING
Goal states are assumed to be fully observable (i.e. observation = state) so that there is never uncertainty about whether the goal has been reached or not. This assumption guarantees that any policy that does not reach the goal with certainty incurs in infinite expected cost. - Geffner, 2013: A Concise Introduction to Models and Methods for Automated Planning
# Returns
The goals space.
# get_initial_state DeterministicInitialized
get_initial_state(
self
) -> D.T_state
Get the (cached) initial state.
By default, DeterministicInitialized.get_initial_state()
internally
calls DeterministicInitialized._get_initial_state_()
the first time and automatically caches its value to make
future calls more efficient (since the initial state is assumed to be constant).
# Returns
The initial state.
# get_initial_state_distribution UncertainInitialized
get_initial_state_distribution(
self
) -> Distribution[D.T_state]
Get the (cached) probability distribution of initial states.
By default, UncertainInitialized.get_initial_state_distribution()
internally
calls UncertainInitialized._get_initial_state_distribution_()
the first time and automatically caches its value
to make future calls more efficient (since the initial state distribution is assumed to be constant).
# Returns
The probability distribution of initial states.
# get_next_state DeterministicTransitions
get_next_state(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> D.T_state
Get the next state given a memory and action.
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
# Returns
The deterministic next state.
# get_next_state_distribution UncertainTransitions
get_next_state_distribution(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> DiscreteDistribution[D.T_state]
Get the discrete probability distribution of next state given a memory and action.
TIP
In the Markovian case (memory only holds last state
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
# Returns
The discrete probability distribution of next state.
# get_observation TransformedObservable
get_observation(
self,
state: D.T_state,
action: Optional[StrDict[list[D.T_event]]] = None
) -> StrDict[D.T_observation]
Get the deterministic observation given a state and action.
# Parameters
- state: The state to be observed.
- action: The last applied action (or None if the state is an initial state).
# Returns
The probability distribution of the observation.
# get_observation_distribution PartiallyObservable
get_observation_distribution(
self,
state: D.T_state,
action: Optional[StrDict[list[D.T_event]]] = None
) -> Distribution[StrDict[D.T_observation]]
Get the probability distribution of the observation given a state and action.
In mathematical terms (discrete case), given an action
# Parameters
- state: The state to be observed.
- action: The last applied action (or None if the state is an initial state).
# Returns
The probability distribution of the observation.
# get_observation_space PartiallyObservable
get_observation_space(
self
) -> StrDict[Space[D.T_observation]]
Get the (cached) observation space (finite or infinite set).
By default, PartiallyObservable.get_observation_space()
internally
calls PartiallyObservable._get_observation_space_()
the first time and automatically caches its value to make
future calls more efficient (since the observation space is assumed to be constant).
# Returns
The observation space.
# get_transition_value UncertainTransitions
get_transition_value(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]],
next_state: Optional[D.T_state] = None
) -> StrDict[Value[D.T_value]]
Get the value (reward or cost) of a transition.
The transition to consider is defined by the function parameters.
TIP
If this function never depends on the next_state parameter for its computation, it is recommended to
indicate it by overriding UncertainTransitions._is_transition_value_dependent_on_next_state_()
to return
False. This information can then be exploited by solvers to avoid computing next state to evaluate a
transition value (more efficient).
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
- next_state: The next state in which the transition ends (if needed for the computation).
# Returns
The transition value (reward or cost).
# is_action Events
is_action(
self,
event: D.T_event
) -> bool
Indicate whether an event is an action (i.e. a controllable event for the agents).
TIP
By default, this function is implemented using the skdecide.core.Space.contains()
function on the domain
action space provided by Events.get_action_space()
, but it can be overridden for faster implementations.
# Parameters
- event: The event to consider.
# Returns
True if the event is an action (False otherwise).
# is_applicable_action Events
is_applicable_action(
self,
action: StrDict[D.T_event],
memory: Optional[Memory[D.T_state]] = None
) -> bool
Indicate whether an action is applicable in the given memory (state or history), or in the internal one if omitted.
By default, Events.is_applicable_action()
provides some boilerplate code and internally
calls Events._is_applicable_action()
. The boilerplate code automatically passes the _memory
attribute instead
of the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
True if the action is applicable (False otherwise).
# is_enabled_event Events
is_enabled_event(
self,
event: D.T_event,
memory: Optional[Memory[D.T_state]] = None
) -> bool
Indicate whether an uncontrollable event is enabled in the given memory (state or history), or in the internal one if omitted.
By default, Events.is_enabled_event()
provides some boilerplate code and internally
calls Events._is_enabled_event()
. The boilerplate code automatically passes the _memory
attribute instead of
the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
True if the event is enabled (False otherwise).
# is_goal Goals
is_goal(
self,
observation: StrDict[D.T_observation]
) -> StrDict[D.T_predicate]
Indicate whether an observation belongs to the goals.
TIP
By default, this function is implemented using the skdecide.core.Space.contains()
function on the domain
goals space provided by Goals.get_goals()
, but it can be overridden for faster implementations.
# Parameters
- observation: The observation to consider.
# Returns
True if the observation is a goal (False otherwise).
# is_observation PartiallyObservable
is_observation(
self,
observation: StrDict[D.T_observation]
) -> bool
Check that an observation indeed belongs to the domain observation space.
TIP
By default, this function is implemented using the skdecide.core.Space.contains()
function on the domain
observation space provided by PartiallyObservable.get_observation_space()
, but it can be overridden for
faster implementations.
# Parameters
- observation: The observation to consider.
# Returns
True if the observation belongs to the domain observation space (False otherwise).
# is_terminal UncertainTransitions
is_terminal(
self,
state: D.T_state
) -> StrDict[D.T_predicate]
Indicate whether a state is terminal.
A terminal state is a state with no outgoing transition (except to itself with value 0).
# Parameters
- state: The state to consider.
# Returns
True if the state is terminal (False otherwise).
# is_transition_value_dependent_on_next_state UncertainTransitions
is_transition_value_dependent_on_next_state(
self
) -> bool
Indicate whether get_transition_value() requires the next_state parameter for its computation (cached).
By default, UncertainTransitions.is_transition_value_dependent_on_next_state()
internally
calls UncertainTransitions._is_transition_value_dependent_on_next_state_()
the first time and automatically
caches its value to make future calls more efficient (since the returned value is assumed to be constant).
# Returns
True if the transition value computation depends on next_state (False otherwise).
# reset Initializable
reset(
self
) -> StrDict[D.T_observation]
Reset the state of the environment and return an initial observation.
By default, Initializable.reset()
provides some boilerplate code and internally calls Initializable._reset()
(which returns an initial state). The boilerplate code automatically stores the initial state into the _memory
attribute and samples a corresponding observation.
# Returns
An initial observation.
# sample Simulation
sample(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Sample one transition of the simulator's dynamics.
By default, Simulation.sample()
provides some boilerplate code and internally calls Simulation._sample()
(which returns a transition outcome). The boilerplate code automatically samples an observation corresponding to
the sampled next state.
TIP
Whenever an existing simulator needs to be wrapped instead of implemented fully in scikit-decide (e.g. a
simulator), it is recommended to overwrite Simulation.sample()
to call the external simulator and not use
the Simulation._sample()
helper function.
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
# Returns
The environment outcome of the sampled transition.
# set_memory Simulation
set_memory(
self,
memory: Memory[D.T_state]
) -> None
Set internal memory attribute _memory
to given one.
This can be useful to set a specific "starting point" before doing a rollout with successive Environment.step()
calls.
# Parameters
- memory: The memory to set internally.
# Example
# Set simulation_domain memory to my_state (assuming Markovian domain)
simulation_domain.set_memory(my_state)
# Start a 100-steps rollout from here (applying my_action at every step)
for _ in range(100):
simulation_domain.step(my_action)
# step Environment
step(
self,
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Run one step of the environment's dynamics.
By default, Environment.step()
provides some boilerplate code and internally calls Environment._step()
(which
returns a transition outcome). The boilerplate code automatically stores next state into the _memory
attribute
and samples a corresponding observation.
TIP
Whenever an existing environment needs to be wrapped instead of implemented fully in scikit-decide (e.g. compiled
ATARI games), it is recommended to overwrite Environment.step()
to call the external environment and not
use the Environment._step()
helper function.
WARNING
Before calling Environment.step()
the first time or when the end of an episode is
reached, Initializable.reset()
must be called to reset the environment's state.
# Parameters
- action: The action taken in the current memory (state or history) triggering the transition.
# Returns
The environment outcome of this step.
# _check_value Rewards
_check_value(
self,
value: Value[D.T_value]
) -> bool
Check that a value is compliant with its cost specification (must be positive).
TIP
This function calls PositiveCost._is_positive()
to determine if a value is positive (can be overridden for
advanced value types).
# Parameters
- value: The value to check.
# Returns
True if the value is compliant (False otherwise).
# _get_action_mask Events
_get_action_mask(
self,
memory: Optional[Memory[D.T_state]] = None
) -> StrDict[Mask]
Get action mask for the given memory or internal one if omitted.
An action mask is another (more specific) format for applicable actions, that has a meaning only if the action space can be iterated over in some way. It is represented by a flat array of 0's and 1's ordered as the actions when enumerated: 1 for an applicable action, and 0 for a not applicable action.
More precisely, this implementation makes the assumption that each agent action space is an EnumerableSpace
,
and calls internally self.get_applicable_action()
.
The action mask is used for instance by RL solvers to shut down logits associated to non-applicable actions in the output of their internal neural network.
# Parameters
- memory: The memory to consider. If None, works on the internal memory of the domain.
# Returns
a numpy array (or dict agent-> numpy array for multi-agent domains) with 0-1 indicating applicability of the action (1 meaning applicable and 0 not applicable)
# _get_action_space Events
_get_action_space(
self
) -> StrDict[Space[D.T_event]]
Get the (cached) domain action space (finite or infinite set).
By default, Events._get_action_space()
internally calls Events._get_action_space_()
the first time and
automatically caches its value to make future calls more efficient (since the action space is assumed to be
constant).
# Returns
The action space.
# _get_action_space_ Events
_get_action_space_(
self
) -> StrDict[Space[D.T_event]]
Get the domain action space (finite or infinite set).
This is a helper function called by default from Events._get_action_space()
, the difference being that the
result is not cached here.
TIP
The underscore at the end of this function's name is a convention to remind that its result should be constant.
# Returns
The action space.
# _get_applicable_actions Events
_get_applicable_actions(
self,
memory: Optional[Memory[D.T_state]] = None
) -> StrDict[Space[D.T_event]]
Get the space (finite or infinite set) of applicable actions in the given memory (state or history), or in the internal one if omitted.
By default, Events._get_applicable_actions()
provides some boilerplate code and internally
calls Events._get_applicable_actions_from()
. The boilerplate code automatically passes the _memory
attribute
instead of the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
The space of applicable actions.
# _get_applicable_actions_from Events
_get_applicable_actions_from(
self,
memory: Memory[D.T_state]
) -> StrDict[Space[D.T_event]]
Get the space (finite or infinite set) of applicable actions in the given memory (state or history).
This is a helper function called by default from Events._get_applicable_actions()
, the difference being that
the memory parameter is mandatory here.
# Parameters
- memory: The memory to consider.
# Returns
The space of applicable actions.
# _get_enabled_events Events
_get_enabled_events(
self,
memory: Optional[Memory[D.T_state]] = None
) -> Space[D.T_event]
Get the space (finite or infinite set) of enabled uncontrollable events in the given memory (state or history), or in the internal one if omitted.
By default, Events._get_enabled_events()
provides some boilerplate code and internally
calls Events._get_enabled_events_from()
. The boilerplate code automatically passes the _memory
attribute
instead of the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
The space of enabled events.
# _get_enabled_events_from Events
_get_enabled_events_from(
self,
memory: Memory[D.T_state]
) -> Space[D.T_event]
Get the space (finite or infinite set) of enabled uncontrollable events in the given memory (state or history).
This is a helper function called by default from Events._get_enabled_events()
, the difference being that the
memory parameter is mandatory here.
# Parameters
- memory: The memory to consider.
# Returns
The space of enabled events.
# _get_goals Goals
_get_goals(
self
) -> StrDict[Space[D.T_observation]]
Get the (cached) domain goals space (finite or infinite set).
By default, Goals._get_goals()
internally calls Goals._get_goals_()
the first time and automatically caches
its value to make future calls more efficient (since the goals space is assumed to be constant).
WARNING
Goal states are assumed to be fully observable (i.e. observation = state) so that there is never uncertainty about whether the goal has been reached or not. This assumption guarantees that any policy that does not reach the goal with certainty incurs in infinite expected cost. - Geffner, 2013: A Concise Introduction to Models and Methods for Automated Planning
# Returns
The goals space.
# _get_goals_ Goals
_get_goals_(
self
) -> StrDict[Space[D.T_observation]]
Get the domain goals space (finite or infinite set).
This is a helper function called by default from Goals._get_goals()
, the difference being that the result is
not cached here.
TIP
The underscore at the end of this function's name is a convention to remind that its result should be constant.
# Returns
The goals space.
# _get_initial_state DeterministicInitialized
_get_initial_state(
self
) -> D.T_state
Get the (cached) initial state.
By default, DeterministicInitialized._get_initial_state()
internally
calls DeterministicInitialized._get_initial_state_()
the first time and automatically caches its value to make
future calls more efficient (since the initial state is assumed to be constant).
# Returns
The initial state.
# _get_initial_state_ DeterministicInitialized
_get_initial_state_(
self
) -> D.T_state
Get the initial state.
This is a helper function called by default from DeterministicInitialized._get_initial_state()
, the difference
being that the result is not cached here.
# Returns
The initial state.
# _get_initial_state_distribution UncertainInitialized
_get_initial_state_distribution(
self
) -> Distribution[D.T_state]
Get the (cached) probability distribution of initial states.
By default, UncertainInitialized._get_initial_state_distribution()
internally
calls UncertainInitialized._get_initial_state_distribution_()
the first time and automatically caches its value
to make future calls more efficient (since the initial state distribution is assumed to be constant).
# Returns
The probability distribution of initial states.
# _get_initial_state_distribution_ UncertainInitialized
_get_initial_state_distribution_(
self
) -> Distribution[D.T_state]
Get the probability distribution of initial states.
This is a helper function called by default from UncertainInitialized._get_initial_state_distribution()
, the
difference being that the result is not cached here.
TIP
The underscore at the end of this function's name is a convention to remind that its result should be constant.
# Returns
The probability distribution of initial states.
# _get_memory_maxlen History
_get_memory_maxlen(
self
) -> int
Get the (cached) memory max length.
By default, FiniteHistory._get_memory_maxlen()
internally calls FiniteHistory._get_memory_maxlen_()
the first
time and automatically caches its value to make future calls more efficient (since the memory max length is
assumed to be constant).
# Returns
The memory max length.
# _get_memory_maxlen_ FiniteHistory
_get_memory_maxlen_(
self
) -> int
Get the memory max length.
This is a helper function called by default from FiniteHistory._get_memory_maxlen()
, the difference being that
the result is not cached here.
TIP
The underscore at the end of this function's name is a convention to remind that its result should be constant.
# Returns
The memory max length.
# _get_next_state DeterministicTransitions
_get_next_state(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> D.T_state
Get the next state given a memory and action.
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
# Returns
The deterministic next state.
# _get_next_state_distribution UncertainTransitions
_get_next_state_distribution(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> SingleValueDistribution[D.T_state]
Get the discrete probability distribution of next state given a memory and action.
TIP
In the Markovian case (memory only holds last state
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
# Returns
The discrete probability distribution of next state.
# _get_observation TransformedObservable
_get_observation(
self,
state: D.T_state,
action: Optional[StrDict[list[D.T_event]]] = None
) -> StrDict[D.T_observation]
Get the deterministic observation given a state and action.
# Parameters
- state: The state to be observed.
- action: The last applied action (or None if the state is an initial state).
# Returns
The probability distribution of the observation.
# _get_observation_distribution PartiallyObservable
_get_observation_distribution(
self,
state: D.T_state,
action: Optional[StrDict[list[D.T_event]]] = None
) -> Distribution[StrDict[D.T_observation]]
Get the probability distribution of the observation given a state and action.
In mathematical terms (discrete case), given an action
# Parameters
- state: The state to be observed.
- action: The last applied action (or None if the state is an initial state).
# Returns
The probability distribution of the observation.
# _get_observation_space PartiallyObservable
_get_observation_space(
self
) -> StrDict[Space[D.T_observation]]
Get the (cached) observation space (finite or infinite set).
By default, PartiallyObservable._get_observation_space()
internally
calls PartiallyObservable._get_observation_space_()
the first time and automatically caches its value to make
future calls more efficient (since the observation space is assumed to be constant).
# Returns
The observation space.
# _get_observation_space_ PartiallyObservable
_get_observation_space_(
self
) -> StrDict[Space[D.T_observation]]
Get the observation space (finite or infinite set).
This is a helper function called by default from PartiallyObservable._get_observation_space()
, the difference
being that the result is not cached here.
TIP
The underscore at the end of this function's name is a convention to remind that its result should be constant.
# Returns
The observation space.
# _get_transition_value UncertainTransitions
_get_transition_value(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]],
next_state: Optional[D.T_state] = None
) -> StrDict[Value[D.T_value]]
Get the value (reward or cost) of a transition.
The transition to consider is defined by the function parameters.
TIP
If this function never depends on the next_state parameter for its computation, it is recommended to
indicate it by overriding UncertainTransitions._is_transition_value_dependent_on_next_state_()
to return
False. This information can then be exploited by solvers to avoid computing next state to evaluate a
transition value (more efficient).
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
- next_state: The next state in which the transition ends (if needed for the computation).
# Returns
The transition value (reward or cost).
# _init_memory History
_init_memory(
self,
state: Optional[D.T_state] = None
) -> Memory[D.T_state]
Initialize memory (possibly with a state) according to its specification and return it.
This function is automatically called by Initializable._reset()
to reinitialize the internal memory whenever
the domain is used as an environment.
# Parameters
- state: An optional state to initialize the memory with (typically the initial state).
# Returns
The new initialized memory.
# _is_action Events
_is_action(
self,
event: D.T_event
) -> bool
Indicate whether an event is an action (i.e. a controllable event for the agents).
TIP
By default, this function is implemented using the skdecide.core.Space.contains()
function on the domain
action space provided by Events._get_action_space()
, but it can be overridden for faster implementations.
# Parameters
- event: The event to consider.
# Returns
True if the event is an action (False otherwise).
# _is_applicable_action Events
_is_applicable_action(
self,
action: StrDict[D.T_event],
memory: Optional[Memory[D.T_state]] = None
) -> bool
Indicate whether an action is applicable in the given memory (state or history), or in the internal one if omitted.
By default, Events._is_applicable_action()
provides some boilerplate code and internally
calls Events._is_applicable_action_from()
. The boilerplate code automatically passes the _memory
attribute
instead of the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
True if the action is applicable (False otherwise).
# _is_applicable_action_from Events
_is_applicable_action_from(
self,
action: StrDict[D.T_event],
memory: Memory[D.T_state]
) -> bool
Indicate whether an action is applicable in the given memory (state or history).
This is a helper function called by default from Events._is_applicable_action()
, the difference being that the
memory parameter is mandatory here.
TIP
By default, this function is implemented using the skdecide.core.Space.contains()
function on the space of
applicable actions provided by Events._get_applicable_actions_from()
, but it can be overridden for faster
implementations.
# Parameters
- memory: The memory to consider.
# Returns
True if the action is applicable (False otherwise).
# _is_enabled_event Events
_is_enabled_event(
self,
event: D.T_event,
memory: Optional[Memory[D.T_state]] = None
) -> bool
Indicate whether an uncontrollable event is enabled in the given memory (state or history), or in the internal one if omitted.
By default, Events._is_enabled_event()
provides some boilerplate code and internally
calls Events._is_enabled_event_from()
. The boilerplate code automatically passes the _memory
attribute instead
of the memory parameter whenever the latter is None.
# Parameters
- memory: The memory to consider (if None, the internal memory attribute
_memory
is used instead).
# Returns
True if the event is enabled (False otherwise).
# _is_enabled_event_from Events
_is_enabled_event_from(
self,
event: D.T_event,
memory: Memory[D.T_state]
) -> bool
Indicate whether an event is enabled in the given memory (state or history).
This is a helper function called by default from Events._is_enabled_event()
, the difference being that the
memory parameter is mandatory here.
TIP
By default, this function is implemented using the skdecide.core.Space.contains()
function on the space of
enabled events provided by Events._get_enabled_events_from()
, but it can be overridden for faster
implementations.
# Parameters
- memory: The memory to consider.
# Returns
True if the event is enabled (False otherwise).
# _is_goal Goals
_is_goal(
self,
observation: StrDict[D.T_observation]
) -> StrDict[D.T_predicate]
Indicate whether an observation belongs to the goals.
TIP
By default, this function is implemented using the skdecide.core.Space.contains()
function on the domain
goals space provided by Goals._get_goals()
, but it can be overridden for faster implementations.
# Parameters
- observation: The observation to consider.
# Returns
True if the observation is a goal (False otherwise).
# _is_observation PartiallyObservable
_is_observation(
self,
observation: StrDict[D.T_observation]
) -> bool
Check that an observation indeed belongs to the domain observation space.
TIP
By default, this function is implemented using the skdecide.core.Space.contains()
function on the domain
observation space provided by PartiallyObservable._get_observation_space()
, but it can be overridden for
faster implementations.
# Parameters
- observation: The observation to consider.
# Returns
True if the observation belongs to the domain observation space (False otherwise).
# _is_positive PositiveCosts
_is_positive(
self,
cost: D.T_value
) -> bool
Determine if a value is positive (can be overridden for advanced value types).
# Parameters
- cost: The cost to evaluate.
# Returns
True if the cost is positive (False otherwise).
# _is_terminal UncertainTransitions
_is_terminal(
self,
state: D.T_state
) -> StrDict[D.T_predicate]
Indicate whether a state is terminal.
A terminal state is a state with no outgoing transition (except to itself with value 0).
# Parameters
- state: The state to consider.
# Returns
True if the state is terminal (False otherwise).
# _is_transition_value_dependent_on_next_state UncertainTransitions
_is_transition_value_dependent_on_next_state(
self
) -> bool
Indicate whether _get_transition_value() requires the next_state parameter for its computation (cached).
By default, UncertainTransitions._is_transition_value_dependent_on_next_state()
internally
calls UncertainTransitions._is_transition_value_dependent_on_next_state_()
the first time and automatically
caches its value to make future calls more efficient (since the returned value is assumed to be constant).
# Returns
True if the transition value computation depends on next_state (False otherwise).
# _is_transition_value_dependent_on_next_state_ UncertainTransitions
_is_transition_value_dependent_on_next_state_(
self
) -> bool
Indicate whether _get_transition_value() requires the next_state parameter for its computation.
This is a helper function called by default
from UncertainTransitions._is_transition_value_dependent_on_next_state()
, the difference being that the result
is not cached here.
TIP
The underscore at the end of this function's name is a convention to remind that its result should be constant.
# Returns
True if the transition value computation depends on next_state (False otherwise).
# _reset Initializable
_reset(
self
) -> StrDict[D.T_observation]
Reset the state of the environment and return an initial observation.
By default, Initializable._reset()
provides some boilerplate code and internally
calls Initializable._state_reset()
(which returns an initial state). The boilerplate code automatically stores
the initial state into the _memory
attribute and samples a corresponding observation.
# Returns
An initial observation.
# _sample Simulation
_sample(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Sample one transition of the simulator's dynamics.
By default, Simulation._sample()
provides some boilerplate code and internally
calls Simulation._state_sample()
(which returns a transition outcome). The boilerplate code automatically
samples an observation corresponding to the sampled next state.
TIP
Whenever an existing simulator needs to be wrapped instead of implemented fully in scikit-decide (e.g. a
simulator), it is recommended to overwrite Simulation._sample()
to call the external simulator and not use
the Simulation._state_sample()
helper function.
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
# Returns
The environment outcome of the sampled transition.
# _set_memory Simulation
_set_memory(
self,
memory: Memory[D.T_state]
) -> None
Set internal memory attribute _memory
to given one.
This can be useful to set a specific "starting point" before doing a rollout with
successive Environment._step()
calls.
# Parameters
- memory: The memory to set internally.
# Example
# Set simulation_domain memory to my_state (assuming Markovian domain)
simulation_domain._set_memory(my_state)
# Start a 100-steps rollout from here (applying my_action at every step)
for _ in range(100):
simulation_domain._step(my_action)
# _state_reset Initializable
_state_reset(
self
) -> D.T_state
Reset the state of the environment and return an initial state.
This is a helper function called by default from Initializable._reset()
. It focuses on the state level, as
opposed to the observation one for the latter.
# Returns
An initial state.
# _state_sample Simulation
_state_sample(
self,
memory: Memory[D.T_state],
action: StrDict[list[D.T_event]]
) -> TransitionOutcome[D.T_state, StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Compute one sample of the transition's dynamics.
This is a helper function called by default from Simulation._sample()
. It focuses on the state level, as
opposed to the observation one for the latter.
# Parameters
- memory: The source memory (state or history) of the transition.
- action: The action taken in the given memory (state or history) triggering the transition.
# Returns
The transition outcome of the sampled transition.
# _state_step Environment
_state_step(
self,
action: StrDict[list[D.T_event]]
) -> TransitionOutcome[D.T_state, StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Compute one step of the transition's dynamics.
This is a helper function called by default from Environment._step()
. It focuses on the state level, as opposed
to the observation one for the latter.
# Parameters
- action: The action taken in the current memory (state or history) triggering the transition.
# Returns
The transition outcome of this step.
# _step Environment
_step(
self,
action: StrDict[list[D.T_event]]
) -> EnvironmentOutcome[StrDict[D.T_observation], StrDict[Value[D.T_value]], StrDict[D.T_predicate], StrDict[D.T_info]]
Run one step of the environment's dynamics.
By default, Environment._step()
provides some boilerplate code and internally
calls Environment._state_step()
(which returns a transition outcome). The boilerplate code automatically stores
next state into the _memory
attribute and samples a corresponding observation.
TIP
Whenever an existing environment needs to be wrapped instead of implemented fully in scikit-decide (e.g. compiled
ATARI games), it is recommended to overwrite Environment._step()
to call the external environment and not
use the Environment._state_step()
helper function.
WARNING
Before calling Environment._step()
the first time or when the end of an episode is
reached, Initializable._reset()
must be called to reset the environment's state.
# Parameters
- action: The action taken in the current memory (state or history) triggering the transition.
# Returns
The environment outcome of this step.