decision_functions

decision_functions#

Functions#

prt_rl.common.decision_functions.stochastic_selection(action_pmf: Tensor) → Tensor[source]#

Perform a stochastic selection of an action based on a given PMF.

Samples pi(a mid s)

ightarrow a

Args:

action_pmf (torch.Tensor): 1D tensor containing probabilities for each action.
Must sum to 1 and have non-negative values.

Returns:
torch.Tensor: The index of the selected action.

Classes#

DecisionFunction

A decision function takes in the state-action values from a Q function and returns a selected action.

EpsilonGreedy

Epsilon-greedy is a soft policy version of greedy action selection, where a random action is chosen with probability epsilon and the maximum value action otherwise.

Greedy

Greedy policy chooses the action with the highest value.

Softmax

Soft-max

UpperConfidenceBound

class prt_rl.common.decision_functions.DecisionFunction[source]#

A decision function takes in the state-action values from a Q function and returns a selected action.

Input: Tensor of action values with shape (# env, # action values)

Output: Tensor of selected actions with shape (# env, 1)

classmethod from_dict(data: dict) → DecisionFunction[source]#

Reconstruct the decision function from a dictionary. Child classes should override this if they have custom parameters.

Parameters:: data (dict) – dictionary containing parameter values
Returns:: Decision function object
Return type:: DecisionFunction

abstractmethod select_action(action_values: Tensor) → Tensor[source]#

Selects an action from a vector of q values.

Parameters:: action_values (torch.Tensor) – tensor of q values with shape (# environments, # actions)
Returns:: tensor of selected actions with shape (# environments, 1)
Return type:: torch.Tensor

set_parameter(name: str, value: Any) → None[source]#

Sets a named parameter in the decision function.

Parameters:

name (str) – name of the parameter
value (Any) – value to set

to_dict() → dict[source]#

Serialize the decision function to a dictionary. Child classes should override this if they have custom parameters.

Returns:: dictionary containing class type and parameter values
Return type:: dict

class prt_rl.common.decision_functions.EpsilonGreedy(epsilon: float)[source]#

Epsilon-greedy is a soft policy version of greedy action selection, where a random action is chosen with probability epsilon and the maximum value action otherwise.

Parameters:

epsilon (float) – probability of selecting a random action
epsilon – probability of selecting a random action

classmethod from_dict(data: dict) → EpsilonGreedy[source]#

Reconstruct the decision function from a dictionary. Child classes should override this if they have custom parameters.

Parameters:: data (dict) – dictionary containing parameter values
Returns:: Decision function object
Return type:: DecisionFunction

select_action(action_values: Tensor) → Tensor[source]#

Epsilon-greedy policy chooses the action with the highest value and samples all actions randomly with probability epsilon.

If \(b > \epsilon\), use Greedy; otherwise choose randomly from among all actions.

Parameters:: action_values (torch.Tensor) – Tensor of action values.
Returns:: Selected action index.
Return type:: torch.Tensor

set_parameter(name: str, value: Any) → None#

Sets a named parameter in the decision function.

Parameters:

name (str) – name of the parameter
value (Any) – value to set

to_dict() → dict[source]#

Serialize the decision function to a dictionary. Child classes should override this if they have custom parameters.

Returns:: dictionary containing class type and parameter values
Return type:: dict

class prt_rl.common.decision_functions.Greedy[source]#

Greedy policy chooses the action with the highest value.

\[A_t \equiv argmax Q_t(a)\]

Notes

If there are multiple actions with the same maximum value, they are sampled randomly to choose the action.

Parameters:: action_values (torch.Tensor) – 1D tensor of state-action values.
Returns:: Selected action index.
Return type:: torch.Tensor

classmethod from_dict(data: dict) → DecisionFunction#

Reconstruct the decision function from a dictionary. Child classes should override this if they have custom parameters.

Parameters:: data (dict) – dictionary containing parameter values
Returns:: Decision function object
Return type:: DecisionFunction

select_action(action_values: Tensor) → Tensor[source]#

Selects an action from a vector of q values.

Parameters:: action_values (torch.Tensor) – tensor of q values with shape (# environments, # actions)
Returns:: tensor of selected actions with shape (# environments, 1)
Return type:: torch.Tensor

set_parameter(name: str, value: Any) → None#

Sets a named parameter in the decision function.

Parameters:

name (str) – name of the parameter
value (Any) – value to set

to_dict() → dict#

Serialize the decision function to a dictionary. Child classes should override this if they have custom parameters.

Returns:: dictionary containing class type and parameter values
Return type:: dict

class prt_rl.common.decision_functions.Softmax(tau: float)[source]#

Soft-max

classmethod from_dict(data: dict) → Softmax[source]#

Reconstruct the decision function from a dictionary. Child classes should override this if they have custom parameters.

Parameters:: data (dict) – dictionary containing parameter values
Returns:: Decision function object
Return type:: DecisionFunction

select_action(action_values: Tensor) → Tensor[source]#

Softmax policy models a Boltzmann (or Gibbs) distribution to select an action probabilistically with the highest value.

Parameters:

actions (torch.Tensor) – 1D tensor of action values.
tau (float) – Temperature parameter controlling exploration.

Returns:

Selected action index.

Return type:

torch.Tensor

set_parameter(name: str, value: Any) → None#

Sets a named parameter in the decision function.

Parameters:

name (str) – name of the parameter
value (Any) – value to set

to_dict() → dict[source]#

Serialize the decision function to a dictionary. Child classes should override this if they have custom parameters.

Returns:: dictionary containing class type and parameter values
Return type:: dict

class prt_rl.common.decision_functions.UpperConfidenceBound(c: float, t: float)[source]#

classmethod from_dict(data: dict) → UpperConfidenceBound[source]#

Reconstruct the decision function from a dictionary. Child classes should override this if they have custom parameters.

Parameters:: data (dict) – dictionary containing parameter values
Returns:: Decision function object
Return type:: DecisionFunction

select_action(action_values: Tensor, action_selections: Tensor | None = None) → Tensor[source]#

Upper Confidence Bound selects among the non-greedy actions based on their potential for being optimal.

\[A_t \equiv argmax [Q_t(a) + c\sqrt{\]

rac{ln t}{N_t(a)}}

Args:
actions (torch.Tensor): 1D tensor of action values. action_selections (torch.Tensor): 1D tensor of the number of times each action has been selected. c (float): Constant controlling degree of exploration. t (int): Current time step.

Returns:
torch.Tensor: Selected action index.

set_parameter(name: str, value: Any) → None#

Sets a named parameter in the decision function.

Parameters:

name (str) – name of the parameter
value (Any) – value to set

to_dict() → dict[source]#

Serialize the decision function to a dictionary. Child classes should override this if they have custom parameters.

Returns:: dictionary containing class type and parameter values
Return type:: dict

`DecisionFunction`	A decision function takes in the state-action values from a Q function and returns a selected action.
`EpsilonGreedy`	Epsilon-greedy is a soft policy version of greedy action selection, where a random action is chosen with probability epsilon and the maximum value action otherwise.
`Greedy`	Greedy policy chooses the action with the highest value.
`Softmax`	Soft-max
`UpperConfidenceBound`

decision_functions

Contents

decision_functions#

Functions#

Classes#