Offline RL

225 papers with code • 2 benchmarks • 6 datasets

This task has no description! Would you like to contribute one?

Libraries

Use these libraries to find Offline RL models and implementations
14 papers
35
7 papers
387
4 papers
2,523
See all 10 libraries.

Most implemented papers

Critic Regularized Regression

ray-project/ray NeurIPS 2020

Offline reinforcement learning (RL), also known as batch RL, offers the prospect of policy optimization from large pre-recorded datasets without online environment interaction.

COMBO: Conservative Offline Model-Based Policy Optimization

yihaosun1124/OfflineRL-Kit NeurIPS 2021

We overcome this limitation by developing a new model-based offline RL algorithm, COMBO, that regularizes the value function on out-of-support state-action tuples generated via rollouts under the learned model.

Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble

corl-team/CORL NeurIPS 2021

However, prior methods typically require accurate estimation of the behavior policy or sampling from OOD data points, which themselves can be a non-trivial problem.

The In-Sample Softmax for Offline Reinforcement Learning

hwang-ua/inac_pytorch 28 Feb 2023

We highlight a simple fact: it is more straightforward to approximate an in-sample \emph{softmax} using only actions in the dataset.

The CoSTAR Block Stacking Dataset: Learning with Workspace Constraints

jhu-lcsr/costar_plan 27 Oct 2018

We show that a mild relaxation of the task and workspace constraints implicit in existing object grasping datasets can cause neural network based grasping algorithms to fail on even a simple block stacking task when executed under more realistic circumstances.

NeoRL: A Near Real-World Benchmark for Offline Reinforcement Learning

polixir/NeoRL 1 Feb 2021

We evaluate existing offline RL algorithms on NeoRL and argue that the performance of a policy should also be compared with the deterministic version of the behavior policy, instead of the dataset reward.

Adversarially Trained Actor Critic for Offline Reinforcement Learning

microsoft/atac 5 Feb 2022

We propose Adversarially Trained Actor Critic (ATAC), a new model-free algorithm for offline reinforcement learning (RL) under insufficient data coverage, based on the concept of relative pessimism.

Supported Policy Optimization for Offline Reinforcement Learning

thuml/SPOT 13 Feb 2022

Policy constraint methods to offline reinforcement learning (RL) typically utilize parameterization or regularization that constrains the policy to perform actions within the support set of the behavior policy.

cosFormer: Rethinking Softmax in Attention

OpenNLPLab/cosFormer ICLR 2022

As one of its core components, the softmax attention helps to capture long-range dependencies yet prohibits its scale-up due to the quadratic space and time complexity to the sequence length.

Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning

zhendong-wang/diffusion-policies-for-offline-rl 12 Aug 2022

In our approach, we learn an action-value function and we add a term maximizing action-values into the training loss of the conditional diffusion model, which results in a loss that seeks optimal actions that are near the behavior policy.