# builders.solver.assessability

Domain specification

Domain

Only show finetuned characteristics Simplify signatures

# Utilities

A solver must inherit this class if it can provide the utility function (i.e. value function).

# get_utility Utilities

get_utility(
  self,
  observation: StrDict[D.T_observation]
) -> D.T_value

Get the estimated on-policy utility of the given observation.

In mathematical terms, for a fully observable domain, this function estimates:

where

is the current policy, any

represents a trajectory sampled from the policy,

is the return (cumulative reward) and

the initial state for the trajectories.

# Parameters

observation: The observation to consider.

# Returns

The estimated on-policy utility of the given observation.

# _get_utility Utilities

_get_utility(
  self,
  observation: StrDict[D.T_observation]
) -> D.T_value

Get the estimated on-policy utility of the given observation.

In mathematical terms, for a fully observable domain, this function estimates:

where

is the current policy, any

represents a trajectory sampled from the policy,

is the return (cumulative reward) and

the initial state for the trajectories.

# Parameters

observation: The observation to consider.

# Returns

The estimated on-policy utility of the given observation.

# QValues

A solver must inherit this class if it can provide the Q function (i.e. action-value function).

# get_q_value QValues

get_q_value(
  self,
  observation: StrDict[D.T_observation],
  action: StrDict[list[D.T_event]]
) -> D.T_value

Get the estimated on-policy Q value of the given observation and action.

In mathematical terms, for a fully observable domain, this function estimates:

where

is the current policy, any

represents a trajectory sampled from the policy,

is the return (cumulative reward) and

the initial state/action for the trajectories.

# Parameters

observation: The observation to consider.
action: The action to consider.

# Returns

The estimated on-policy Q value of the given observation and action.

# get_utility Utilities

get_utility(
  self,
  observation: StrDict[D.T_observation]
) -> D.T_value

Get the estimated on-policy utility of the given observation.

In mathematical terms, for a fully observable domain, this function estimates:

where

is the current policy, any

represents a trajectory sampled from the policy,

is the return (cumulative reward) and

the initial state for the trajectories.

# Parameters

observation: The observation to consider.

# Returns

The estimated on-policy utility of the given observation.

# _get_q_value QValues

_get_q_value(
  self,
  observation: StrDict[D.T_observation],
  action: StrDict[list[D.T_event]]
) -> D.T_value

Get the estimated on-policy Q value of the given observation and action.

In mathematical terms, for a fully observable domain, this function estimates:

where

is the current policy, any

represents a trajectory sampled from the policy,

is the return (cumulative reward) and

the initial state/action for the trajectories.

# Parameters

observation: The observation to consider.
action: The action to consider.

# Returns

The estimated on-policy Q value of the given observation and action.

# _get_utility Utilities

_get_utility(
  self,
  observation: StrDict[D.T_observation]
) -> D.T_value

Get the estimated on-policy utility of the given observation.

In mathematical terms, for a fully observable domain, this function estimates:

where

is the current policy, any

represents a trajectory sampled from the policy,

is the return (cumulative reward) and

the initial state for the trajectories.

# Parameters

observation: The observation to consider.

# Returns

The estimated on-policy utility of the given observation.