Search Results

Decision Transformer: Reinforcement Learning via Sequence Modeling

opendilab/DI-engine NeurIPS 2021

In particular, we present Decision Transformer, an architecture that casts the problem of RL as conditional sequence modeling.

Atari Games Language Modelling +3

Conservative Q-Learning for Offline Reinforcement Learning

opendilab/DI-engine NeurIPS 2020

We theoretically show that CQL produces a lower bound on the value of the current policy and that it can be incorporated into a policy learning procedure with theoretical improvement guarantees.

Continuous Control DQN Replay Dataset +2

Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization

opendilab/DI-engine 1 Mar 2016

We explore how inverse optimal control (IOC) can be used to learn behaviors from demonstrations, with applications to torque control of high-dimensional robotic systems.

Feature Engineering

Making Efficient Use of Demonstrations to Solve Hard Exploration Problems

opendilab/DI-engine ICLR 2020

This paper introduces R2D3, an agent that makes efficient use of demonstrations to solve hard exploration problems in partially observable environments with highly variable initial conditions.

Deep Q-learning from Demonstrations

opendilab/DI-engine 12 Apr 2017

We present an algorithm, Deep Q-learning from Demonstrations (DQfD), that leverages small sets of demonstration data to massively accelerate the learning process even from relatively small amounts of demonstration data and is able to automatically assess the necessary ratio of demonstration data while learning thanks to a prioritized replay mechanism.

Imitation Learning Q-Learning

Weighted QMIX: Expanding Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning

opendilab/DI-engine NeurIPS 2020

We show in particular that this projection can fail to recover the optimal policy even with access to $Q^*$, which primarily stems from the equal weighting placed on each joint action.

Multi-agent Reinforcement Learning Q-Learning +2

Learning Continuous Control Policies by Stochastic Value Gradients

opendilab/DI-engine NeurIPS 2015

One of these variants, SVG(1), shows the effectiveness of learning models, value functions, and policies simultaneously in continuous domains.

Continuous Control

Behavioral Cloning from Observation

opendilab/DI-engine 4 May 2018

In this work, we propose a two-phase, autonomous imitation learning technique called behavioral cloning from observation (BCO), that aims to provide improved performance with respect to both of these aspects.

Imitation Learning

Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations

opendilab/DI-engine 12 Apr 2019

A critical flaw of existing inverse reinforcement learning (IRL) methods is their inability to significantly outperform the demonstrator.

Imitation Learning reinforcement-learning

Parametrized Deep Q-Networks Learning: Reinforcement Learning with Discrete-Continuous Hybrid Action Space

opendilab/DI-engine 10 Oct 2018

Most existing deep reinforcement learning (DRL) frameworks consider either discrete action space or continuous action space solely.