Offline RL

178 papers with code • 1 benchmarks • 6 datasets

This task has no description! Would you like to contribute one?


Use these libraries to find Offline RL models and implementations
14 papers
7 papers
4 papers
See all 8 libraries.

Most implemented papers

Conservative Q-Learning for Offline Reinforcement Learning

aviralkumar2907/CQL NeurIPS 2020

We theoretically show that CQL produces a lower bound on the value of the current policy and that it can be incorporated into a policy learning procedure with theoretical improvement guarantees.

Decision Transformer: Reinforcement Learning via Sequence Modeling

kzl/decision-transformer NeurIPS 2021

In particular, we present Decision Transformer, an architecture that casts the problem of RL as conditional sequence modeling.

Offline Reinforcement Learning with Implicit Q-Learning

rail-berkeley/rlkit 12 Oct 2021

The main insight in our work is that, instead of evaluating unseen actions from the latest policy, we can approximate the policy improvement step implicitly by treating the state value function as a random variable, with randomness determined by the action (while still integrating over the dynamics to avoid excessive optimism), and then taking a state conditional upper expectile of this random variable to estimate the value of the best actions in that state.

A Minimalist Approach to Offline Reinforcement Learning

sfujim/TD3_BC NeurIPS 2021

Offline reinforcement learning (RL) defines the task of learning from a fixed batch of data.

D4RL: Datasets for Deep Data-Driven Reinforcement Learning

rail-berkeley/offline_rl 15 Apr 2020

In this work, we introduce benchmarks specifically designed for the offline setting, guided by key properties of datasets relevant to real-world applications of offline RL.

MOPO: Model-based Offline Policy Optimization

tianheyu927/mopo NeurIPS 2020

We also characterize the trade-off between the gain and risk of leaving the support of the batch data.

Critic Regularized Regression

ray-project/ray NeurIPS 2020

Offline reinforcement learning (RL), also known as batch RL, offers the prospect of policy optimization from large pre-recorded datasets without online environment interaction.

COMBO: Conservative Offline Model-Based Policy Optimization

yihaosun1124/OfflineRL-Kit NeurIPS 2021

We overcome this limitation by developing a new model-based offline RL algorithm, COMBO, that regularizes the value function on out-of-support state-action tuples generated via rollouts under the learned model.

Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble

corl-team/CORL NeurIPS 2021

However, prior methods typically require accurate estimation of the behavior policy or sampling from OOD data points, which themselves can be a non-trivial problem.

The CoSTAR Block Stacking Dataset: Learning with Workspace Constraints

jhu-lcsr/costar_plan 27 Oct 2018

We show that a mild relaxation of the task and workspace constraints implicit in existing object grasping datasets can cause neural network based grasping algorithms to fail on even a simple block stacking task when executed under more realistic circumstances.