Continuous Control

415 papers with code • 73 benchmarks • 9 datasets

Continuous control in the context of playing games, especially within artificial intelligence (AI) and machine learning (ML), refers to the ability to make a series of smooth, ongoing adjustments or actions to control a game or a simulation. This is in contrast to discrete control, where the actions are limited to a set of specific, distinct choices. Continuous control is crucial in environments where precision, timing, and the magnitude of actions matter, such as driving a car in a racing game, controlling a character in a simulation, or managing the flight of an aircraft in a flight simulator.

Benchmarks

Add a Result

These leaderboards are used to track progress in Continuous Control

Dataset	Best Model	Compare
PyBullet HalfCheetah	SAC	See all
PyBullet Walker2D	SAC gSDE	See all
PyBullet Ant	SAC gSDE	See all
PyBullet Hopper	SAC gSDE	See all
Lunar Lander (OpenAI Gym)	MAC	See all
DeepMind Cheetah Run (Images)	DreamerV1	See all
DeepMind Cup Catch (Images)	DrQ	See all
DeepMind Walker Walk (Images)	DrQ	See all
cartpole.swingup	SMuZero	See all
cheetah.run	SMuZero	See all
finger.turn_hard	SMuZero	See all
walker.stand	SMuZero	See all
walker.walk	SMuZero	See all
Cart-Pole Balancing	TRPO	See all
Inverted Pendulum	TRPO	See all
Mountain Car	TRPO	See all
Acrobot	TRPO	See all
Double Inverted Pendulum	TRPO	See all
Swimmer	TRPO	See all
Hopper	TRPO	See all
2D Walker	TRPO	See all
Half-Cheetah	TRPO	See all
Ant	TRPO	See all
Simple Humanoid	TRPO	See all
Full Humanoid	TRPO	See all
Cart-Pole Balancing (limited sensors)	TRPO	See all
Inverted Pendulum (limited sensors)	TRPO	See all
Mountain Car (limited sensors)	TRPO	See all
Acrobot (limited sensors)	TRPO	See all
Cart-Pole Balancing (noisy observations)	TRPO	See all
Inverted Pendulum (noisy observations)	TRPO	See all
Mountain Car (noisy observations)	TRPO	See all
Acrobot (noisy observations)	TRPO	See all
Cart-Pole Balancing (system identifications)	TRPO	See all
Inverted Pendulum (system identifications)	TRPO	See all
Mountain Car (system identifications)	TRPO	See all
Acrobot (system identifications)	TRPO	See all
Swimmer + Gathering	TRPO	See all
Ant + Gathering	TRPO	See all
Swimmer + Maze	TRPO	See all
Ant + Maze	TRPO	See all
Cart Pole (OpenAI Gym)	MAC	See all
Finger, spin (DMControl500k)	CURL	See all
Cartpole, swingup (DMControl500k)	CURL	See all
Reacher, easy (DMControl500k)	CURL	See all
Cheetah, run (DMControl500k)	CURL	See all
Walker, walk (DMControl500k)	CURL	See all
Ball in cup, catch (DMControl500k)	CURL	See all
Finger, spin (DMControl100k)	CURL	See all
Cartpole, swingup (DMControl100k)	CURL	See all
Reacher, easy (DMControl100k)	CURL	See all
Cheetah, run (DMControl100k)	CURL	See all
Walker, walk (DMControl100k)	CURL	See all
Ball in cup, catch (DMControl100k)	CURL	See all
acrobot.swingup	SMuZero	See all
cartpole.balance	SMuZero	See all
cartpole.balance_sparse	SMuZero	See all
cartpole.swingup_sparse	SMuZero	See all
ball_in_cup.catch	SMuZero	See all
finger.spin	SMuZero	See all
finger.turn_easy	SMuZero	See all
hopper.hop	SMuZero	See all
hopper.stand	SMuZero	See all
pendulum.swingup	SMuZero	See all
quadruped.run	SMuZero	See all
quadruped.walk	SMuZero	See all
reacher.easy	SMuZero	See all
reacher.hard	SMuZero	See all
walker.run	SMuZero	See all
fish.swim	MuZero Unplugged	See all
manipulator.insert_ball	MuZero Unplugged	See all
manipulator.insert_peg	MuZero Unplugged	See all
humanoid.run	MuZero Unplugged	See all

Show all 73 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Continuous Control models and implementations

DLR-RM/stable-baselines3

8 papers

7,976

hill-a/stable-baselines

7 papers

4,047

opendilab/DI-engine

7 papers

2,577

Kaixhin/imitation-learning

6 papers

388

See all 33 libraries.

Datasets

Subtasks

Most implemented papers

Most implemented Social Latest No code

Near-Optimal Representation Learning for Hierarchical Reinforcement Learning

tensorflow/models • • ICLR 2019

We study the problem of representation learning in goal-conditioned hierarchical reinforcement learning.

Paper
Code

CURL: Contrastive Unsupervised Representations for Reinforcement Learning

MishaLaskin/curl • • 8 Apr 2020

On the DeepMind Control Suite, CURL is the first image-based algorithm to nearly match the sample-efficiency of methods that use state-based features.

Paper
Code

Evolution-Guided Policy Gradient in Reinforcement Learning

ShawK91/erl_paper_nips18 • • NeurIPS 2018

However, these methods typically suffer from three core difficulties: temporal credit assignment with sparse rewards, lack of effective exploration, and brittle convergence properties that are extremely sensitive to hyperparameters.

Paper
Code

MOPO: Model-based Offline Policy Optimization

tianheyu927/mopo • NeurIPS 2020

We also characterize the trade-off between the gain and risk of leaving the support of the batch data.

Paper
Code

Action Branching Architectures for Deep Reinforcement Learning

atavakol/action-branching-agents • • 24 Nov 2017

This approach achieves a linear increase of the number of network outputs with the number of degrees of freedom by allowing a level of independence for each individual action dimension.

Paper
Code

Distributed Distributional Deterministic Policy Gradients

opendilab/DI-engine • • ICLR 2018

This work adopts the very successful distributional perspective on reinforcement learning and adapts it to the continuous control setting.

Paper
Code

Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow

reinforcement-learning-kr/lets-do-irl • • ICLR 2019

By enforcing a constraint on the mutual information between the observations and the discriminator's internal representation, we can effectively modulate the discriminator's accuracy and maintain useful and informative gradients.

Paper
Code

Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning

google/trax • • 1 Oct 2019

In this paper, we aim to develop a simple and scalable reinforcement learning algorithm that uses standard supervised learning methods as subroutines.

Paper
Code

IQ-Learn: Inverse soft-Q Learning for Imitation

Div99/IQ-Learn • • NeurIPS 2021

In many sequential decision-making problems (e. g., robotics control, game playing, sequential prediction), human or expert data is available containing useful information about the task.

Paper
Code

Deep Reinforcement Learning that Matters

chainer/chainerrl • • 19 Sep 2017

In recent years, significant progress has been made in solving challenging problems across various domains using deep reinforcement learning (RL).

Paper
Code

Continuous Control

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result