A2C

Introduced by Mnih et al. in Asynchronous Methods for Deep Reinforcement Learning

A2C, or Advantage Actor Critic, is a synchronous version of the A3C policy gradient method. As an alternative to the asynchronous implementation of A3C, A2C is a synchronous, deterministic implementation that waits for each actor to finish its segment of experience before updating, averaging over all of the actors. This more effectively uses GPUs due to larger batch sizes.

Image Credit: OpenAI Baselines

Source: Asynchronous Methods for Deep Reinforcement Learning

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Reinforcement Learning (RL)	47	35.61%
Atari Games	10	7.58%
Decision Making	10	7.58%
OpenAI Gym	5	3.79%
Continuous Control	5	3.79%
Management	4	3.03%
Multi-agent Reinforcement Learning	3	2.27%
Benchmarking	3	2.27%
Myocardial infarction detection	2	1.52%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
🤖 No Components Found	You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories

Add Remove

Policy Gradient Methods

A2C

Papers

Tasks

Usage Over Time

Components

Categories Edit Add Remove

Categories

Add Remove