TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
MuJoCo Games	Ant	IQ-Learn	Average Return	4362.9	# 1
Atari Games	Atari 2600 Beam Rider	IQ-Learn	Return	3025	# 1
Atari Games	Atari 2600 Q*Bert	IQ-Learn	Return	12940	# 1
Atari Games	Atari 2600 Seaquest	IQ-Learn	Return	2349	# 1
Atari Games	Atari 2600 Space Invaders	IQ-Learn	Return	507	# 1
MuJoCo Games	Humanoid-v2	IQ-Learn	Return	5227.1	# 1
MuJoCo Games	Walker2d	IQ-Learn	Mean	5134	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/iq-learn-inverse-soft-q-learning-for/mujoco-games-on-ant)](https://paperswithcode.com/sota/mujoco-games-on-ant?p=iq-learn-inverse-soft-q-learning-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/iq-learn-inverse-soft-q-learning-for/atari-games-on-atari-2600-beam-rider)](https://paperswithcode.com/sota/atari-games-on-atari-2600-beam-rider?p=iq-learn-inverse-soft-q-learning-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/iq-learn-inverse-soft-q-learning-for/atari-games-on-atari-2600-qbert)](https://paperswithcode.com/sota/atari-games-on-atari-2600-qbert?p=iq-learn-inverse-soft-q-learning-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/iq-learn-inverse-soft-q-learning-for/atari-games-on-atari-2600-seaquest)](https://paperswithcode.com/sota/atari-games-on-atari-2600-seaquest?p=iq-learn-inverse-soft-q-learning-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/iq-learn-inverse-soft-q-learning-for/atari-games-on-atari-2600-space-invaders)](https://paperswithcode.com/sota/atari-games-on-atari-2600-space-invaders?p=iq-learn-inverse-soft-q-learning-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/iq-learn-inverse-soft-q-learning-for/mujoco-games-on-humanoid-v2)](https://paperswithcode.com/sota/mujoco-games-on-humanoid-v2?p=iq-learn-inverse-soft-q-learning-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/iq-learn-inverse-soft-q-learning-for/mujoco-games-on-walker2d)](https://paperswithcode.com/sota/mujoco-games-on-walker2d?p=iq-learn-inverse-soft-q-learning-for)`

IQ-Learn: Inverse soft-Q Learning for Imitation

NeurIPS 2021 · Divyansh Garg, Shuvam Chakraborty, Chris Cundy, Jiaming Song, Matthieu Geist, Stefano Ermon ·

In many sequential decision-making problems (e.g., robotics control, game playing, sequential prediction), human or expert data is available containing useful information about the task. However, imitation learning (IL) from a small amount of expert data can be challenging in high-dimensional environments with complex dynamics. Behavioral cloning is a simple method that is widely used due to its simplicity of implementation and stable convergence but doesn't utilize any information involving the environment's dynamics. Many existing methods that exploit dynamics information are difficult to train in practice due to an adversarial optimization process over reward and policy approximators or biased, high variance gradient estimators. We introduce a method for dynamics-aware IL which avoids adversarial training by learning a single Q-function, implicitly representing both reward and policy. On standard benchmarks, the implicitly learned rewards show a high positive correlation with the ground-truth rewards, illustrating our method can also be used for inverse reinforcement learning (IRL). Our method, Inverse soft-Q learning (IQ-Learn) obtains state-of-the-art results in offline and online imitation learning settings, significantly outperforming existing methods both in the number of required environment interactions and scalability in high-dimensional spaces, often by more than 3x.

PDF Abstract NeurIPS 2021 PDF NeurIPS 2021 Abstract

Code

Add Remove Mark official

Div99/IQ-Learn official

185

robfiras/ls-iq

google-deepmind/csil

↳ Quickstart in

Colab

edmundmills/basalt-competition

MilkSilk/masters_thesis

Tasks

Add Remove

Atari Games

Continuous Control

Decision Making

Imitation Learning

MuJoCo Games

Q-Learning

Datasets

MuJoCo

OpenAI Gym

Arcade Learning Environment

Results from the Paper

Edit

Ranked #1 on MuJoCo Games on Walker2d

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
MuJoCo Games	Ant	IQ-Learn	Average Return	4362.9	# 1	Compare
Atari Games	Atari 2600 Beam Rider	IQ-Learn	Return	3025	# 1	Compare
Atari Games	Atari 2600 Q*Bert	IQ-Learn	Return	12940	# 1	Compare
Atari Games	Atari 2600 Seaquest	IQ-Learn	Return	2349	# 1	Compare
Atari Games	Atari 2600 Space Invaders	IQ-Learn	Return	507	# 1	Compare
MuJoCo Games	Humanoid-v2	IQ-Learn	Return	5227.1	# 1	Compare
MuJoCo Games	Walker2d	IQ-Learn	Mean	5134	# 1	Compare

Methods

Add Remove

IQ-Learn

Edit Social Preview

IQ-Learn: Inverse soft-Q Learning for Imitation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove