# builders.solver.assessability

Domain specification

Domain

# Utilities

A solver must inherit this class if it can provide the utility function (i.e. value function).

# get_utility Utilities

get_utility(
  self,
observation: StrDict[D.T_observation]
) -> D.T_value

Get the estimated on-policy utility of the given observation.

In mathematical terms, for a fully observable domain, this function estimates:

where is the current policy, any represents a trajectory sampled from the policy, is the return (cumulative reward) and the initial state for the trajectories.

# Parameters

  • observation: The observation to consider.

# Returns

The estimated on-policy utility of the given observation.

# _get_utility Utilities

_get_utility(
  self,
observation: StrDict[D.T_observation]
) -> D.T_value

Get the estimated on-policy utility of the given observation.

In mathematical terms, for a fully observable domain, this function estimates:

where is the current policy, any represents a trajectory sampled from the policy, is the return (cumulative reward) and the initial state for the trajectories.

# Parameters

  • observation: The observation to consider.

# Returns

The estimated on-policy utility of the given observation.

# QValues

A solver must inherit this class if it can provide the Q function (i.e. action-value function).

# get_q_value QValues

get_q_value(
  self,
observation: StrDict[D.T_observation],
action: StrDict[list[D.T_event]]
) -> D.T_value

Get the estimated on-policy Q value of the given observation and action.

In mathematical terms, for a fully observable domain, this function estimates:

where is the current policy, any represents a trajectory sampled from the policy, is the return (cumulative reward) and / the initial state/action for the trajectories.

# Parameters

  • observation: The observation to consider.
  • action: The action to consider.

# Returns

The estimated on-policy Q value of the given observation and action.

# get_utility Utilities

get_utility(
  self,
observation: StrDict[D.T_observation]
) -> D.T_value

Get the estimated on-policy utility of the given observation.

In mathematical terms, for a fully observable domain, this function estimates:

where is the current policy, any represents a trajectory sampled from the policy, is the return (cumulative reward) and the initial state for the trajectories.

# Parameters

  • observation: The observation to consider.

# Returns

The estimated on-policy utility of the given observation.

# _get_q_value QValues

_get_q_value(
  self,
observation: StrDict[D.T_observation],
action: StrDict[list[D.T_event]]
) -> D.T_value

Get the estimated on-policy Q value of the given observation and action.

In mathematical terms, for a fully observable domain, this function estimates:

where is the current policy, any represents a trajectory sampled from the policy, is the return (cumulative reward) and / the initial state/action for the trajectories.

# Parameters

  • observation: The observation to consider.
  • action: The action to consider.

# Returns

The estimated on-policy Q value of the given observation and action.

# _get_utility Utilities

_get_utility(
  self,
observation: StrDict[D.T_observation]
) -> D.T_value

Get the estimated on-policy utility of the given observation.

In mathematical terms, for a fully observable domain, this function estimates:

where is the current policy, any represents a trajectory sampled from the policy, is the return (cumulative reward) and the initial state for the trajectories.

# Parameters

  • observation: The observation to consider.

# Returns

The estimated on-policy utility of the given observation.