Intro

Stable Baselines (Docs) is a cleaned up and easier to use version of OpenAI's baseline Reinforcement Learning algorithms. They support multiple RL algorithms (PPO, DQN, etc) each of which supports some sub-set of features. The docs, however, don't include a single table where you can see what all the algorithms support in one place. The table below shows them all at a glance, making it easier to decide which algorithms you can or can't use based on recurrance, continuous actions, multi-processing, etc.

Algorithms

Algorithm	Recurrent	Multi-Processing	Replay Buffer	Action Spaces				Observation Spaces
				Discrete	Box	MultiDiscrete	MultiBinary	Discrete	Box	MultiDiscrete	MultiBinary
A2C	✔️	✔️	❌	✔️	✔️	✔️	✔️	✔️	✔️	✔️	✔️
ACER	✔️	✔️	✔️	✔️	❌	❌	❌	✔️	✔️	✔️	✔️
ACKTR	✔️	✔️	❌	✔️	❌	❌	❌	✔️	✔️	✔️	✔️
DDPG	❌	❌	✔️	❌	✔️	❌	❌	✔️	✔️	✔️	✔️
DQN	❌	❌	✔️	✔️	❌	❌	❌	✔️	✔️	✔️	✔️
GAIL	✔️	✔️ (MPI)	❌	❌	✔️	❌	❌	✔️	✔️	✔️	✔️
PPO1	✔️	✔️ (MPI)	❌	✔️	✔️	✔️	✔️	✔️	✔️	✔️	✔️
PPO2	✔️	✔️	❌	✔️	✔️	✔️	✔️	✔️	✔️	✔️	✔️
SAC	❌	❌	✔️	❌	✔️	❌	❌	✔️	✔️	✔️	✔️
TRPO	✔️	✔️ (MPI)	❌	✔️	✔️	✔️	✔️	✔️	✔️	✔️	✔️

Notes

DDPG does not support stable_baselines.common.policies because it uses q-value instead of value estimation
DQN does not support stable_baselines.common.policies
PPO2 is the implementation OpenAI made for GPU. For multiprocessing, it uses vectorized environments compared to PPO1 which uses MPI
SAC does not support stable_baselines.common.policies because it uses double q-values and value estimation
HER (Hindsight Experience Replay) is not refactored yet.

Edit 1: add Replay Buffer.

Tower of Bleyddyn ap Rhys

Stable Baselines Algorithms

Intro

Algorithms

Notes

Bleyddyn

Intro

Algorithms

Notes

Bleyddyn

social

links