Offline RL

225 papers with code • 2 benchmarks • 6 datasets

This task has no description! Would you like to contribute one?

Benchmarks

Add a Result

These leaderboards are used to track progress in Offline RL

Trend	Dataset	Best Model	Paper	Code	Compare
	D4RL	KFC			See all
	Walker2d	ParPI			See all

Libraries

Use these libraries to find Offline RL models and implementations

zzmtsvv/rl_task

14 papers

yihaosun1124/OfflineRL-Kit

8 papers

227

corl-team/CORL

7 papers

387

opendilab/DI-engine

4 papers

2,523

See all 10 libraries.

Datasets

Subtasks

DQN Replay Dataset

Most implemented papers

Most implemented Social Latest No code

Critic Regularized Regression

ray-project/ray • NeurIPS 2020

Offline reinforcement learning (RL), also known as batch RL, offers the prospect of policy optimization from large pre-recorded datasets without online environment interaction.

Paper
Code

COMBO: Conservative Offline Model-Based Policy Optimization

yihaosun1124/OfflineRL-Kit • • NeurIPS 2021

We overcome this limitation by developing a new model-based offline RL algorithm, COMBO, that regularizes the value function on out-of-support state-action tuples generated via rollouts under the learned model.

Paper
Code

Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble

corl-team/CORL • • NeurIPS 2021

However, prior methods typically require accurate estimation of the behavior policy or sampling from OOD data points, which themselves can be a non-trivial problem.

Paper
Code

The In-Sample Softmax for Offline Reinforcement Learning

hwang-ua/inac_pytorch • • 28 Feb 2023

We highlight a simple fact: it is more straightforward to approximate an in-sample \emph{softmax} using only actions in the dataset.

Paper
Code

The CoSTAR Block Stacking Dataset: Learning with Workspace Constraints

jhu-lcsr/costar_plan • • 27 Oct 2018

We show that a mild relaxation of the task and workspace constraints implicit in existing object grasping datasets can cause neural network based grasping algorithms to fail on even a simple block stacking task when executed under more realistic circumstances.

Paper
Code

NeoRL: A Near Real-World Benchmark for Offline Reinforcement Learning

polixir/NeoRL • 1 Feb 2021

We evaluate existing offline RL algorithms on NeoRL and argue that the performance of a policy should also be compared with the deterministic version of the behavior policy, instead of the dataset reward.

Paper
Code

Adversarially Trained Actor Critic for Offline Reinforcement Learning

microsoft/atac • • 5 Feb 2022

We propose Adversarially Trained Actor Critic (ATAC), a new model-free algorithm for offline reinforcement learning (RL) under insufficient data coverage, based on the concept of relative pessimism.

Paper
Code

Supported Policy Optimization for Offline Reinforcement Learning

thuml/SPOT • • 13 Feb 2022

Policy constraint methods to offline reinforcement learning (RL) typically utilize parameterization or regularization that constrains the policy to perform actions within the support set of the behavior policy.

Paper
Code

cosFormer: Rethinking Softmax in Attention

OpenNLPLab/cosFormer • • ICLR 2022

As one of its core components, the softmax attention helps to capture long-range dependencies yet prohibits its scale-up due to the quadratic space and time complexity to the sequence length.

Paper
Code

Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning

zhendong-wang/diffusion-policies-for-offline-rl • • 12 Aug 2022

In our approach, we learn an action-value function and we add a term maximizing action-values into the training loss of the conditional diffusion model, which results in a loss that seeks optimal actions that are near the behavior policy.

Paper
Code

Offline RL

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result