Stable Baselines Algorithms

Intro

Stable Baselines (Docs) is a cleaned up and easier to use version of OpenAI's baseline Reinforcement Learning algorithms. They support multiple RL algorithms (PPO, DQN, etc) each of which supports some sub-set of features. The docs, however, don't include a single table where you can see what all the algorithms support in one place. The table below shows them all at a glance, making it easier to decide which algorithms you can or can't use based on recurrance, continuous actions, multi-processing, etc.

Algorithms

Algorithm RecurrentMulti-ProcessingReplay Buffer Action SpacesObservation Spaces
DiscreteBoxMultiDiscreteMultiBinary DiscreteBoxMultiDiscreteMultiBinary
A2C ✔️✔️ ✔️✔️✔️✔️ ✔️✔️✔️✔️
ACER ✔️✔️✔️ ✔️ ✔️✔️✔️✔️
ACKTR ✔️✔️ ✔️ ✔️✔️✔️✔️
DDPG ✔️ ✔️ ✔️✔️✔️✔️
DQN ✔️ ✔️ ✔️✔️✔️✔️
GAIL ✔️✔️ (MPI) ✔️ ✔️✔️✔️✔️
PPO1 ✔️✔️ (MPI) ✔️✔️✔️✔️ ✔️✔️✔️✔️
PPO2 ✔️✔️ ✔️✔️✔️✔️ ✔️✔️✔️✔️
SAC ✔️ ✔️ ✔️✔️✔️✔️
TRPO ✔️✔️ (MPI) ✔️✔️✔️✔️ ✔️✔️✔️✔️

Notes

  1. DDPG does not support stable_baselines.common.policies because it uses q-value instead of value estimation
  2. DQN does not support stable_baselines.common.policies
  3. PPO2 is the implementation OpenAI made for GPU. For multiprocessing, it uses vectorized environments compared to PPO1 which uses MPI
  4. SAC does not support stable_baselines.common.policies because it uses double q-values and value estimation
  5. HER (Hindsight Experience Replay) is not refactored yet.
Edit 1: add Replay Buffer.