decision_functions#

Functions#

prt_rl.common.decision_functions.stochastic_selection(action_pmf: Tensor) Tensor[source]#

Perform a stochastic selection of an action based on a given PMF.

Samples pi(a mid s)

ightarrow a

Args:
action_pmf (torch.Tensor): 1D tensor containing probabilities for each action.

Must sum to 1 and have non-negative values.

Returns:

torch.Tensor: The index of the selected action.

Classes#

DecisionFunction

A decision function takes in the state-action values from a Q function and returns a selected action.

EpsilonGreedy

Epsilon-greedy is a soft policy version of greedy action selection, where a random action is chosen with probability epsilon and the maximum value action otherwise.

Greedy

Greedy policy chooses the action with the highest value.

Softmax

Soft-max

UpperConfidenceBound

class prt_rl.common.decision_functions.DecisionFunction[source]#

A decision function takes in the state-action values from a Q function and returns a selected action.

Input: Tensor of action values with shape (# env, # action values)

Output: Tensor of selected actions with shape (# env, 1)

classmethod from_dict(data: dict) DecisionFunction[source]#

Reconstruct the decision function from a dictionary. Child classes should override this if they have custom parameters.

Parameters:

data (dict) – dictionary containing parameter values

Returns:

Decision function object

Return type:

DecisionFunction

abstractmethod select_action(action_values: Tensor) Tensor[source]#

Selects an action from a vector of q values.

Parameters:

action_values (torch.Tensor) – tensor of q values with shape (# environments, # actions)

Returns:

tensor of selected actions with shape (# environments, 1)

Return type:

torch.Tensor

set_parameter(name: str, value: Any) None[source]#

Sets a named parameter in the decision function.

Parameters:
  • name (str) – name of the parameter

  • value (Any) – value to set

to_dict() dict[source]#

Serialize the decision function to a dictionary. Child classes should override this if they have custom parameters.

Returns:

dictionary containing class type and parameter values

Return type:

dict

class prt_rl.common.decision_functions.EpsilonGreedy(epsilon: float)[source]#

Epsilon-greedy is a soft policy version of greedy action selection, where a random action is chosen with probability epsilon and the maximum value action otherwise.

Parameters:
  • epsilon (float) – probability of selecting a random action

  • epsilon – probability of selecting a random action

classmethod from_dict(data: dict) EpsilonGreedy[source]#

Reconstruct the decision function from a dictionary. Child classes should override this if they have custom parameters.

Parameters:

data (dict) – dictionary containing parameter values

Returns:

Decision function object

Return type:

DecisionFunction

select_action(action_values: Tensor) Tensor[source]#

Epsilon-greedy policy chooses the action with the highest value and samples all actions randomly with probability epsilon.

If \(b > \epsilon\), use Greedy; otherwise choose randomly from among all actions.

Parameters:

action_values (torch.Tensor) – Tensor of action values.

Returns:

Selected action index.

Return type:

torch.Tensor

set_parameter(name: str, value: Any) None#

Sets a named parameter in the decision function.

Parameters:
  • name (str) – name of the parameter

  • value (Any) – value to set

to_dict() dict[source]#

Serialize the decision function to a dictionary. Child classes should override this if they have custom parameters.

Returns:

dictionary containing class type and parameter values

Return type:

dict

class prt_rl.common.decision_functions.Greedy[source]#

Greedy policy chooses the action with the highest value.

\[A_t \equiv argmax Q_t(a)\]

Notes

If there are multiple actions with the same maximum value, they are sampled randomly to choose the action.

Parameters:

action_values (torch.Tensor) – 1D tensor of state-action values.

Returns:

Selected action index.

Return type:

torch.Tensor

classmethod from_dict(data: dict) DecisionFunction#

Reconstruct the decision function from a dictionary. Child classes should override this if they have custom parameters.

Parameters:

data (dict) – dictionary containing parameter values

Returns:

Decision function object

Return type:

DecisionFunction

select_action(action_values: Tensor) Tensor[source]#

Selects an action from a vector of q values.

Parameters:

action_values (torch.Tensor) – tensor of q values with shape (# environments, # actions)

Returns:

tensor of selected actions with shape (# environments, 1)

Return type:

torch.Tensor

set_parameter(name: str, value: Any) None#

Sets a named parameter in the decision function.

Parameters:
  • name (str) – name of the parameter

  • value (Any) – value to set

to_dict() dict#

Serialize the decision function to a dictionary. Child classes should override this if they have custom parameters.

Returns:

dictionary containing class type and parameter values

Return type:

dict

class prt_rl.common.decision_functions.Softmax(tau: float)[source]#

Soft-max

classmethod from_dict(data: dict) Softmax[source]#

Reconstruct the decision function from a dictionary. Child classes should override this if they have custom parameters.

Parameters:

data (dict) – dictionary containing parameter values

Returns:

Decision function object

Return type:

DecisionFunction

select_action(action_values: Tensor) Tensor[source]#

Softmax policy models a Boltzmann (or Gibbs) distribution to select an action probabilistically with the highest value.

Parameters:
  • actions (torch.Tensor) – 1D tensor of action values.

  • tau (float) – Temperature parameter controlling exploration.

Returns:

Selected action index.

Return type:

torch.Tensor

set_parameter(name: str, value: Any) None#

Sets a named parameter in the decision function.

Parameters:
  • name (str) – name of the parameter

  • value (Any) – value to set

to_dict() dict[source]#

Serialize the decision function to a dictionary. Child classes should override this if they have custom parameters.

Returns:

dictionary containing class type and parameter values

Return type:

dict

class prt_rl.common.decision_functions.UpperConfidenceBound(c: float, t: float)[source]#
classmethod from_dict(data: dict) UpperConfidenceBound[source]#

Reconstruct the decision function from a dictionary. Child classes should override this if they have custom parameters.

Parameters:

data (dict) – dictionary containing parameter values

Returns:

Decision function object

Return type:

DecisionFunction

select_action(action_values: Tensor, action_selections: Tensor | None = None) Tensor[source]#

Upper Confidence Bound selects among the non-greedy actions based on their potential for being optimal.

\[A_t \equiv argmax [Q_t(a) + c\sqrt{\]

rac{ln t}{N_t(a)}}

Args:

actions (torch.Tensor): 1D tensor of action values. action_selections (torch.Tensor): 1D tensor of the number of times each action has been selected. c (float): Constant controlling degree of exploration. t (int): Current time step.

Returns:

torch.Tensor: Selected action index.

set_parameter(name: str, value: Any) None#

Sets a named parameter in the decision function.

Parameters:
  • name (str) – name of the parameter

  • value (Any) – value to set

to_dict() dict[source]#

Serialize the decision function to a dictionary. Child classes should override this if they have custom parameters.

Returns:

dictionary containing class type and parameter values

Return type:

dict