model_free

model_free#

a2c

Implementation of the Advantage Actor-Critic (A2C) algorithm.

dqn

Deep Q-Network (DQN) Agents

policy_gradient

Policy Gradient algorithm

ppo

Proximal Policy Optimization (PPO)

sac

Soft Actor-Critic (SAC)

td3

Twin Delayed Deep Deterministic Policy Gradient (TD3)