Offline RL
221 papers with code • 2 benchmarks • 6 datasets
Libraries
Use these libraries to find Offline RL models and implementationsDatasets
Most implemented papers
Conservative Q-Learning for Offline Reinforcement Learning
We theoretically show that CQL produces a lower bound on the value of the current policy and that it can be incorporated into a policy learning procedure with theoretical improvement guarantees.
Decision Transformer: Reinforcement Learning via Sequence Modeling
In particular, we present Decision Transformer, an architecture that casts the problem of RL as conditional sequence modeling.
Reformer: The Efficient Transformer
Large Transformer models routinely achieve state-of-the-art results on a number of tasks but training these models can be prohibitively costly, especially on long sequences.
Offline Reinforcement Learning with Implicit Q-Learning
The main insight in our work is that, instead of evaluating unseen actions from the latest policy, we can approximate the policy improvement step implicitly by treating the state value function as a random variable, with randomness determined by the action (while still integrating over the dynamics to avoid excessive optimism), and then taking a state conditional upper expectile of this random variable to estimate the value of the best actions in that state.
Rethinking Attention with Performers
We introduce Performers, Transformer architectures which can estimate regular (softmax) full-rank-attention Transformers with provable accuracy, but using only linear (as opposed to quadratic) space and time complexity, without relying on any priors such as sparsity or low-rankness.
A Minimalist Approach to Offline Reinforcement Learning
Offline reinforcement learning (RL) defines the task of learning from a fixed batch of data.
D4RL: Datasets for Deep Data-Driven Reinforcement Learning
In this work, we introduce benchmarks specifically designed for the offline setting, guided by key properties of datasets relevant to real-world applications of offline RL.
MOPO: Model-based Offline Policy Optimization
We also characterize the trade-off between the gain and risk of leaving the support of the batch data.
Acme: A Research Framework for Distributed Reinforcement Learning
These implementations serve both as a validation of our design decisions as well as an important contribution to reproducibility in RL research.
Critic Regularized Regression
Offline reinforcement learning (RL), also known as batch RL, offers the prospect of policy optimization from large pre-recorded datasets without online environment interaction.