model_free# a2c Implementation of the Advantage Actor-Critic (A2C) algorithm. dqn Deep Q-Network (DQN) Agents policy_gradient Policy Gradient algorithm ppo Proximal Policy Optimization (PPO) sac Soft Actor-Critic (SAC) td3 Twin Delayed Deep Deterministic Policy Gradient (TD3)