dqn

dqn#

Deep Q-Network (DQN) Agents

Classes#

DQNAgent

Deep Q-Network (DQN) agent for reinforcement learning.

DQNConfig

Hyperparameters for the DQN agent.

DQNPolicy

DQN Policy that outputs Q-values for each action given a state.

DoubleDQNAgent

Double DQN agent for reinforcement learning.

class prt_rl.model_free.dqn.DQNAgent(policy: DQNPolicy, config: DQNConfig = DQNConfig(buffer_size=1000000, min_buffer_size=10000, mini_batch_size=32, learning_rate=0.1, gamma=0.99, max_grad_norm=None, target_update_freq=1, polyak_tau=None, train_freq=1, gradient_steps=1), *, device: str = 'cpu')[source]#

Deep Q-Network (DQN) agent for reinforcement learning.

Parameters:

alpha (float, optional) – Learning rate. Defaults to 0.1.
gamma (float, optional) – Discount factor. Defaults to 0.99.
buffer_size (int, optional) – Size of the replay buffer. Defaults to 1_000_000.
min_buffer_size (int, optional) – Minimum size of the replay buffer before training. Defaults to 10_000.
mini_batch_size (int, optional) – Size of the mini-batch for training. Defaults to 32.
max_grad_norm (float, optional) – Maximum gradient norm for clipping. Defaults to None.
target_update_freq (int, optional) – Frequency of target network updates. Defaults to None.
polyak_tau (float, optional) – Polyak averaging coefficient for target network updates. Defaults to None.
decision_function (EpsilonGreedy, optional) – Decision function for action selection. Defaults to EpsilonGreedy(epsilon=0.1).
replay_buffer (BaseReplayBuffer, optional) – Replay buffer for storing experiences. Defaults to None.
device (str, optional) – Device for computation (‘cpu’ or ‘cuda’). Defaults to ‘cuda’.

References: [1] https://openai.com/index/openai-baselines-dqn/ [2] openai/baselines [3] Mnih et al. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529-533.

act(obs: Tensor, deterministic: bool = False) → Tensor[source]#

Perform an action based on the current state.

Parameters:

obs (torch.Tensor) – The current observation of the environment.
deterministic (bool) – If True, the agent will select actions deterministically.

Returns:

The action to be taken.

Return type:

torch.Tensor

classmethod load(path: str | Path, map_location: str | device = 'cpu') → DQNAgent[source]#: Loads the checkpoint and returns a fully-constructed DQNAgent.

train(env: EnvironmentInterface, total_steps: int, schedulers: List[ParameterScheduler] | None = None, logger: Logger | None = None, evaluator: Evaluator = <prt_rl.common.evaluators.Evaluator object>, show_progress: bool = True) → None[source]#: Train the DQN agent. :param env: The environment to train on. :type env: EnvironmentInterface :param total_steps: Total number of steps to train the agent. :type total_steps: int :param schedulers: List of schedulers to update during training. Defaults to None. :type schedulers: List[ParameterScheduler], optional :param logger: Logger to log training metrics. Defaults to None. :type logger: Logger, optional :param evaluator: Evaluator to evaluate the agent periodically. :type evaluator: Evaluator :param show_progress: If True, show a progress bar during training. :type show_progress: bool

class prt_rl.model_free.dqn.DQNConfig(buffer_size: int = 1000000, min_buffer_size: int = 10000, mini_batch_size: int = 32, learning_rate: float = 0.1, gamma: float = 0.99, max_grad_norm: float | None = None, target_update_freq: int = 1, polyak_tau: float | None = None, train_freq: int = 1, gradient_steps: int = 1)[source]#

Hyperparameters for the DQN agent.

Parameters:

buffer_size (int) – Size of the replay buffer. Default is 1_000_000.
min_buffer_size (int) – Minimum size of the replay buffer before training. Default is 10_000.
mini_batch_size (int) – Size of the mini-batch for training. Default is 32.
learning_rate (float) – Learning rate for the optimizer. Default is 0.1.
gamma (float) – Discount factor for future rewards. Default is 0.99.
max_grad_norm (float) – Maximum gradient norm for gradient clipping. Default is None.
target_update_freq (int) – Frequency of target network updates. Default is 1.
polyak_tau (float) – Polyak averaging coefficient for target network updates. Default is None.
train_freq (int) – Frequency of training steps. Default is 1.
gradient_steps (int) – Number of gradient steps per training iteration. Default is 1.

class prt_rl.model_free.dqn.DQNPolicy(network: Module, decision_function: DecisionFunction)[source]#

DQN Policy that outputs Q-values for each action given a state.

Parameters:

network (nn.Module) – Neural network that processes the input state and outputs a latent representation.
device (str) – Device to run the policy on (‘cpu’ or ‘cuda’).

Initialize internal Module state, shared by both nn.Module and ScriptModule.

act(obs: Tensor, deterministic: bool = False) → Tuple[Tensor, Dict[str, Tensor]][source]#: Return an action tensor and auxiliary policy outputs.

get_q_values(obs: Tensor) → Tensor[source]#: Return the Q-values for the given observation.

get_target_q_values(obs: Tensor) → Tensor[source]#: Return the Q-values from the target network for the given observation.

metadata() → Dict[str, Any]#: Optionally save metadata alongside the policy. This is a no-op in the base class but can be overridden by subclasses.

class prt_rl.model_free.dqn.DoubleDQNAgent(policy: DQNPolicy, config: DQNConfig = DQNConfig(buffer_size=1000000, min_buffer_size=10000, mini_batch_size=32, learning_rate=0.1, gamma=0.99, max_grad_norm=None, target_update_freq=1, polyak_tau=None, train_freq=1, gradient_steps=1), *, device: str = 'cpu')[source]#

Double DQN agent for reinforcement learning.

Parameters:

alpha (float, optional) – Learning rate. Defaults to 0.1.
gamma (float, optional) – Discount factor. Defaults to 0.99.
buffer_size (int, optional) – Size of the replay buffer. Defaults to 1_000_000.
min_buffer_size (int, optional) – Minimum size of the replay buffer before training. Defaults to 10_000.
mini_batch_size (int, optional) – Size of the mini-batch for training. Defaults to 32.
max_grad_norm (float, optional) – Maximum gradient norm for clipping. Defaults to None.
target_update_freq (int, optional) – Frequency of target network updates. Defaults to None.
polyak_tau (float, optional) – Polyak averaging coefficient for target network updates. Defaults to None.
decision_function (EpsilonGreedy, optional) – Decision function for action selection. Defaults to EpsilonGreedy(epsilon=0.1).
device (str, optional) – Device for computation (‘cpu’ or ‘cuda’). Defaults to ‘cuda’.

References: [1] Curt-Park/rainbow-is-all-you-need

act(obs: Tensor, deterministic: bool = False) → Tensor#

Perform an action based on the current state.

Parameters:

obs (torch.Tensor) – The current observation of the environment.
deterministic (bool) – If True, the agent will select actions deterministically.

Returns:

The action to be taken.

Return type:

torch.Tensor

classmethod load(path: str | Path, map_location: str | device = 'cpu') → DQNAgent#: Loads the checkpoint and returns a fully-constructed DQNAgent.

train(env: EnvironmentInterface, total_steps: int, schedulers: List[ParameterScheduler] | None = None, logger: Logger | None = None, evaluator: Evaluator = <prt_rl.common.evaluators.Evaluator object>, show_progress: bool = True) → None#: Train the DQN agent. :param env: The environment to train on. :type env: EnvironmentInterface :param total_steps: Total number of steps to train the agent. :type total_steps: int :param schedulers: List of schedulers to update during training. Defaults to None. :type schedulers: List[ParameterScheduler], optional :param logger: Logger to log training metrics. Defaults to None. :type logger: Logger, optional :param evaluator: Evaluator to evaluate the agent periodically. :type evaluator: Evaluator :param show_progress: If True, show a progress bar during training. :type show_progress: bool

`DQNAgent`	Deep Q-Network (DQN) agent for reinforcement learning.
`DQNConfig`	Hyperparameters for the DQN agent.
`DQNPolicy`	DQN Policy that outputs Q-values for each action given a state.
`DoubleDQNAgent`	Double DQN agent for reinforcement learning.

dqn

Contents

dqn#

Classes#