69 papers with code • 1 benchmarks • 1 datasets

This task has no description! Would you like to contribute one?


Use these libraries to find D4RL models and implementations
9 papers
4 papers
4 papers
See all 8 libraries.


Most implemented papers

Decision Transformer: Reinforcement Learning via Sequence Modeling

kzl/decision-transformer NeurIPS 2021

In particular, we present Decision Transformer, an architecture that casts the problem of RL as conditional sequence modeling.

Reformer: The Efficient Transformer

google/trax ICLR 2020

Large Transformer models routinely achieve state-of-the-art results on a number of tasks but training these models can be prohibitively costly, especially on long sequences.

Offline Reinforcement Learning with Implicit Q-Learning

rail-berkeley/rlkit 12 Oct 2021

The main insight in our work is that, instead of evaluating unseen actions from the latest policy, we can approximate the policy improvement step implicitly by treating the state value function as a random variable, with randomness determined by the action (while still integrating over the dynamics to avoid excessive optimism), and then taking a state conditional upper expectile of this random variable to estimate the value of the best actions in that state.

Rethinking Attention with Performers

google-research/google-research ICLR 2021

We introduce Performers, Transformer architectures which can estimate regular (softmax) full-rank-attention Transformers with provable accuracy, but using only linear (as opposed to quadratic) space and time complexity, without relying on any priors such as sparsity or low-rankness.

D4RL: Datasets for Deep Data-Driven Reinforcement Learning

rail-berkeley/offline_rl 15 Apr 2020

In this work, we introduce benchmarks specifically designed for the offline setting, guided by key properties of datasets relevant to real-world applications of offline RL.

Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention

idiap/fast-transformers ICML 2020

Transformers achieve remarkable performance in several tasks but due to their quadratic complexity, with respect to the input's length, they are prohibitively slow for very long sequences.

Implicit Behavioral Cloning

opendilab/DI-engine 1 Sep 2021

We find that across a wide range of robot policy learning scenarios, treating supervised policy learning with an implicit model generally performs better, on average, than commonly used explicit models.

Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble

corl-team/CORL NeurIPS 2021

However, prior methods typically require accurate estimation of the behavior policy or sampling from OOD data points, which themselves can be a non-trivial problem.

Adversarially Trained Actor Critic for Offline Reinforcement Learning

microsoft/atac 5 Feb 2022

We propose Adversarially Trained Actor Critic (ATAC), a new model-free algorithm for offline reinforcement learning (RL) under insufficient data coverage, based on the concept of relative pessimism.

cosFormer: Rethinking Softmax in Attention

OpenNLPLab/cosFormer ICLR 2022

As one of its core components, the softmax attention helps to capture long-range dependencies yet prohibits its scale-up due to the quadratic space and time complexity to the sequence length.