decision_functions#
Functions#
- prt_rl.common.decision_functions.stochastic_selection(action_pmf: Tensor) Tensor[source]#
Perform a stochastic selection of an action based on a given PMF.
Samples pi(a mid s)
ightarrow a
- Args:
- action_pmf (torch.Tensor): 1D tensor containing probabilities for each action.
Must sum to 1 and have non-negative values.
- Returns:
torch.Tensor: The index of the selected action.
Classes#
A decision function takes in the state-action values from a Q function and returns a selected action.
Epsilon-greedy is a soft policy version of greedy action selection, where a random action is chosen with probability epsilon and the maximum value action otherwise.
Greedy policy chooses the action with the highest value.
Soft-max
- class prt_rl.common.decision_functions.DecisionFunction[source]#
A decision function takes in the state-action values from a Q function and returns a selected action.
Input: Tensor of action values with shape (# env, # action values)
Output: Tensor of selected actions with shape (# env, 1)
- classmethod from_dict(data: dict) DecisionFunction[source]#
Reconstruct the decision function from a dictionary. Child classes should override this if they have custom parameters.
- Parameters:
data (dict) – dictionary containing parameter values
- Returns:
Decision function object
- Return type:
- abstractmethod select_action(action_values: Tensor) Tensor[source]#
Selects an action from a vector of q values.
- Parameters:
action_values (torch.Tensor) – tensor of q values with shape (# environments, # actions)
- Returns:
tensor of selected actions with shape (# environments, 1)
- Return type:
- class prt_rl.common.decision_functions.EpsilonGreedy(epsilon: float)[source]#
Epsilon-greedy is a soft policy version of greedy action selection, where a random action is chosen with probability epsilon and the maximum value action otherwise.
- Parameters:
epsilon (float) – probability of selecting a random action
epsilon – probability of selecting a random action
- classmethod from_dict(data: dict) EpsilonGreedy[source]#
Reconstruct the decision function from a dictionary. Child classes should override this if they have custom parameters.
- Parameters:
data (dict) – dictionary containing parameter values
- Returns:
Decision function object
- Return type:
- select_action(action_values: Tensor) Tensor[source]#
Epsilon-greedy policy chooses the action with the highest value and samples all actions randomly with probability epsilon.
If \(b > \epsilon\), use Greedy; otherwise choose randomly from among all actions.
- Parameters:
action_values (torch.Tensor) – Tensor of action values.
- Returns:
Selected action index.
- Return type:
- class prt_rl.common.decision_functions.Greedy[source]#
Greedy policy chooses the action with the highest value.
\[A_t \equiv argmax Q_t(a)\]Notes
If there are multiple actions with the same maximum value, they are sampled randomly to choose the action.
- Parameters:
action_values (torch.Tensor) – 1D tensor of state-action values.
- Returns:
Selected action index.
- Return type:
- classmethod from_dict(data: dict) DecisionFunction#
Reconstruct the decision function from a dictionary. Child classes should override this if they have custom parameters.
- Parameters:
data (dict) – dictionary containing parameter values
- Returns:
Decision function object
- Return type:
- select_action(action_values: Tensor) Tensor[source]#
Selects an action from a vector of q values.
- Parameters:
action_values (torch.Tensor) – tensor of q values with shape (# environments, # actions)
- Returns:
tensor of selected actions with shape (# environments, 1)
- Return type:
- class prt_rl.common.decision_functions.Softmax(tau: float)[source]#
Soft-max
- classmethod from_dict(data: dict) Softmax[source]#
Reconstruct the decision function from a dictionary. Child classes should override this if they have custom parameters.
- Parameters:
data (dict) – dictionary containing parameter values
- Returns:
Decision function object
- Return type:
- select_action(action_values: Tensor) Tensor[source]#
Softmax policy models a Boltzmann (or Gibbs) distribution to select an action probabilistically with the highest value.
- Parameters:
actions (torch.Tensor) – 1D tensor of action values.
tau (float) – Temperature parameter controlling exploration.
- Returns:
Selected action index.
- Return type:
- class prt_rl.common.decision_functions.UpperConfidenceBound(c: float, t: float)[source]#
- classmethod from_dict(data: dict) UpperConfidenceBound[source]#
Reconstruct the decision function from a dictionary. Child classes should override this if they have custom parameters.
- Parameters:
data (dict) – dictionary containing parameter values
- Returns:
Decision function object
- Return type:
- select_action(action_values: Tensor, action_selections: Tensor | None = None) Tensor[source]#
Upper Confidence Bound selects among the non-greedy actions based on their potential for being optimal.
\[A_t \equiv argmax [Q_t(a) + c\sqrt{\]rac{ln t}{N_t(a)}}
- Args:
actions (torch.Tensor): 1D tensor of action values. action_selections (torch.Tensor): 1D tensor of the number of times each action has been selected. c (float): Constant controlling degree of exploration. t (int): Current time step.
- Returns:
torch.Tensor: Selected action index.