TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
OpenAI Gym	Ant-v4	TD3	Average Return	5942.55	# 2
OpenAI Gym	HalfCheetah-v4	TD3	Average Return	12026.73	# 3
OpenAI Gym	Hopper-v4	TD3	Average Return	3319.98	# 2
OpenAI Gym	Humanoid-v4	TD3	Average Return	198.44	# 4
Continuous Control	Lunar Lander (OpenAI Gym)	TD3	Score	277.26±4.17	# 2
OpenAI Gym	Walker2d-v4	TD3	Average Return	2612.74	# 5

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/addressing-function-approximation-error-in/openai-gym-on-ant-v4)](https://paperswithcode.com/sota/openai-gym-on-ant-v4?p=addressing-function-approximation-error-in)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/addressing-function-approximation-error-in/openai-gym-on-hopper-v4)](https://paperswithcode.com/sota/openai-gym-on-hopper-v4?p=addressing-function-approximation-error-in)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/addressing-function-approximation-error-in/continuous-control-on-lunar-lander-openai-gym)](https://paperswithcode.com/sota/continuous-control-on-lunar-lander-openai-gym?p=addressing-function-approximation-error-in)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/addressing-function-approximation-error-in/openai-gym-on-halfcheetah-v4)](https://paperswithcode.com/sota/openai-gym-on-halfcheetah-v4?p=addressing-function-approximation-error-in)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/addressing-function-approximation-error-in/openai-gym-on-humanoid-v4)](https://paperswithcode.com/sota/openai-gym-on-humanoid-v4?p=addressing-function-approximation-error-in)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/addressing-function-approximation-error-in/openai-gym-on-walker2d-v4)](https://paperswithcode.com/sota/openai-gym-on-walker2d-v4?p=addressing-function-approximation-error-in)`

Addressing Function Approximation Error in Actor-Critic Methods

ICML 2018 · Scott Fujimoto, Herke van Hoof, David Meger ·

In value-based reinforcement learning methods such as deep Q-learning, function approximation errors are known to lead to overestimated value estimates and suboptimal policies. We show that this problem persists in an actor-critic setting and propose novel mechanisms to minimize its effects on both the actor and the critic. Our algorithm builds on Double Q-learning, by taking the minimum value between a pair of critics to limit overestimation. We draw the connection between target networks and overestimation bias, and suggest delaying policy updates to reduce per-update error and further improve performance. We evaluate our method on the suite of OpenAI gym tasks, outperforming the state of the art in every environment tested.

PDF Abstract ICML 2018 PDF ICML 2018 Abstract

Code

Add Remove Mark official

sfujim/TD3 official

1,628

DLR-RM/stable-baselines3

↳ Quickstart in

Colab

8,142

hill-a/stable-baselines

↳ Quickstart in

Colab

4,068

facebookresearch/ReAgent

3,530

opendilab/DI-engine

2,653

See all 67 implementations

Tasks

Add Remove

Continuous Control

OpenAI Gym

Q-Learning

reinforcement-learning

Reinforcement Learning (RL)

Datasets

MuJoCo

OpenAI Gym

Results from the Paper

Edit

Ranked #2 on OpenAI Gym on Ant-v4

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
OpenAI Gym	Ant-v4	TD3	Average Return	5942.55	# 2	Compare
OpenAI Gym	HalfCheetah-v4	TD3	Average Return	12026.73	# 3	Compare
OpenAI Gym	Hopper-v4	TD3	Average Return	3319.98	# 2	Compare
OpenAI Gym	Humanoid-v4	TD3	Average Return	198.44	# 4	Compare
Continuous Control	Lunar Lander (OpenAI Gym)	TD3	Score	277.26±4.17	# 2	Compare
OpenAI Gym	Walker2d-v4	TD3	Average Return	2612.74	# 5	Compare

Methods

Add Remove

Adam • Clipped Double Q-learning • Dense Connections • Double Q-learning • Experience Replay • ReLU • Target Policy Smoothing • TD3

Edit Social Preview

Addressing Function Approximation Error in Actor-Critic Methods

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove