D4RL
102 papers with code • 1 benchmarks • 1 datasets
Libraries
Use these libraries to find D4RL models and implementationsMost implemented papers
Decision Transformer: Reinforcement Learning via Sequence Modeling
In particular, we present Decision Transformer, an architecture that casts the problem of RL as conditional sequence modeling.
Offline Reinforcement Learning with Implicit Q-Learning
The main insight in our work is that, instead of evaluating unseen actions from the latest policy, we can approximate the policy improvement step implicitly by treating the state value function as a random variable, with randomness determined by the action (while still integrating over the dynamics to avoid excessive optimism), and then taking a state conditional upper expectile of this random variable to estimate the value of the best actions in that state.
Reformer: The Efficient Transformer
Large Transformer models routinely achieve state-of-the-art results on a number of tasks but training these models can be prohibitively costly, especially on long sequences.
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
Transformers achieve remarkable performance in several tasks but due to their quadratic complexity, with respect to the input's length, they are prohibitively slow for very long sequences.
D4RL: Datasets for Deep Data-Driven Reinforcement Learning
In this work, we introduce benchmarks specifically designed for the offline setting, guided by key properties of datasets relevant to real-world applications of offline RL.
Rethinking Attention with Performers
We introduce Performers, Transformer architectures which can estimate regular (softmax) full-rank-attention Transformers with provable accuracy, but using only linear (as opposed to quadratic) space and time complexity, without relying on any priors such as sparsity or low-rankness.
Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble
However, prior methods typically require accurate estimation of the behavior policy or sampling from OOD data points, which themselves can be a non-trivial problem.
CORL: Research-oriented Deep Offline Reinforcement Learning Library
CORL is an open-source library that provides thoroughly benchmarked single-file implementations of both deep offline and offline-to-online reinforcement learning algorithms.
Implicit Behavioral Cloning
We find that across a wide range of robot policy learning scenarios, treating supervised policy learning with an implicit model generally performs better, on average, than commonly used explicit models.
Extreme Q-Learning: MaxEnt RL without Entropy
Using EVT, we derive our \emph{Extreme Q-Learning} framework and consequently online and, for the first time, offline MaxEnt Q-learning algorithms, that do not explicitly require access to a policy or its entropy.