TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Unsupervised Reinforcement Learning	URLB (pixels, 2*10^6 frames)	Mastering URLB	Walker (mean normalized return)	97.51	# 1
Unsupervised Reinforcement Learning	URLB (pixels, 2*10^6 frames)	Mastering URLB	Quadruped (mean normalized return)	89.96	# 1
Unsupervised Reinforcement Learning	URLB (pixels, 2*10^6 frames)	Mastering URLB	Jaco (mean normalized return)	98.07	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/unsupervised-model-based-pre-training-for/unsupervised-reinforcement-learning-on-urlb-7)](https://paperswithcode.com/sota/unsupervised-reinforcement-learning-on-urlb-7?p=unsupervised-model-based-pre-training-for)`

Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels

24 Sep 2022 · Sai Rajeswar, Pietro Mazzaglia, Tim Verbelen, Alexandre Piché, Bart Dhoedt, Aaron Courville, Alexandre Lacoste ·

Controlling artificial agents from visual sensory data is an arduous task. Reinforcement learning (RL) algorithms can succeed but require large amounts of interactions between the agent and the environment. To alleviate the issue, unsupervised RL proposes to employ self-supervised interaction and learning, for adapting faster to future tasks. Yet, as shown in the Unsupervised RL Benchmark (URLB; Laskin et al. 2021), whether current unsupervised strategies can improve generalization capabilities is still unclear, especially in visual control settings. In this work, we study the URLB and propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent, and a task-aware fine-tuning strategy combined with a new proposed hybrid planner, Dyna-MPC, to adapt the agent for downstream tasks. On URLB, our method obtains 93.59% overall normalized performance, surpassing previous baselines by a staggering margin. The approach is empirically evaluated through a large-scale empirical study, which we use to validate our design choices and analyze our models. We also show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation. Project website: https://masteringurlb.github.io/

PDF Abstract

Code

Add Remove Mark official

mazpie/mastering-urlb official

Tasks

Add Remove

reinforcement-learning

Reinforcement Learning (RL)

Unsupervised Reinforcement Learning

Datasets

URLB

Results from the Paper

Edit

Ranked #1 on Unsupervised Reinforcement Learning on URLB (pixels, 2*10^6 frames)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Unsupervised Reinforcement Learning	URLB (pixels, 2*10^6 frames)	Mastering URLB	Walker (mean normalized return)	97.51	# 1	Compare
			Quadruped (mean normalized return)	89.96	# 1	Compare
			Jaco (mean normalized return)	98.07	# 1	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove