A2C, or Advantage Actor Critic, is a synchronous version of the A3C policy gradient method. As an alternative to the asynchronous implementation of A3C, A2C is a synchronous, deterministic implementation that waits for each actor to finish its segment of experience before updating, averaging over all of the actors. This more effectively uses GPUs due to larger batch sizes.
Image Credit: OpenAI Baselines
Source: Asynchronous Methods for Deep Reinforcement LearningPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Reinforcement Learning (RL) | 47 | 35.61% |
Atari Games | 10 | 7.58% |
Decision Making | 10 | 7.58% |
OpenAI Gym | 5 | 3.79% |
Continuous Control | 5 | 3.79% |
Management | 4 | 3.03% |
Multi-agent Reinforcement Learning | 3 | 2.27% |
Benchmarking | 3 | 2.27% |
Myocardial infarction detection | 2 | 1.52% |
Component | Type |
|
---|---|---|
🤖 No Components Found | You can add them if they exist; e.g. Mask R-CNN uses RoIAlign |