The Atari 2600 Games task (and dataset) involves training an agent to achieve high game scores.

( Image credit: Playing Atari with Deep Reinforcement Learning )

TREND DATASET BEST METHOD PAPER TITLE PAPER CODE COMPARE

Revisiting Prioritized Experience Replay: A Value Perspective

5 Feb 2021RLforlife/VER

Furthermore, we successfully extend our theoretical framework to maximum-entropy RL by deriving the lower and upper bounds of these value metrics for soft Q-learning, which turn out to be the product of $|\text{TD}|$ and "on-policyness" of the experiences.

1
05 Feb 2021

Shielding Atari Games with Bounded Prescience

20 Jan 2021HjalmarWijk/bounded-prescience

We present the first exact method for analysing and ensuring the safety of DRL agents for Atari games.

0
20 Jan 2021

Benchmarking Perturbation-based Saliency Maps for Explaining Deep Reinforcement Learning Agents

18 Jan 2021belimmer/PerturbationSaliencyEvaluation

All four approaches work by perturbing parts of the input and measuring how much this affects the agent's output.

2
18 Jan 2021

Developing an OpenAI Gym-compatible framework and simulation environment for testing Deep Reinforcement Learning agents solving the Ambulance Location Problem

12 Jan 2021MichaelAllen1966/qambo

Results: A range of Deep RL agents based on Deep Q networks were tested in this custom environment.

0
12 Jan 2021

Evolving Reinforcement Learning Algorithms

Learning from scratch on simple classical control and gridworld tasks, our method rediscovers the temporal-difference (TD) algorithm.

2
08 Jan 2021

Reinforcement Learning with Latent Flow

6 Jan 2021WendyShang/flare

Temporal information is essential to learning effective policies with Reinforcement Learning (RL).

16
06 Jan 2021

Multi-Agent Trust Region Learning

1 Jan 2021matrl-project/matrl

We derive the lower bound of agents' payoff improvements for MATRL methods, and also prove the convergence of our method on the meta-game fixed points.

2
01 Jan 2021

Augmenting Policy Learning with Routines Discovered from a Demonstration

23 Dec 2020sjtuytc/-AAAI21-RoutineAugmentedPolicyLearning-RAPL-

Humans can abstract prior knowledge from very little data and use it to boost skill learning.

13
23 Dec 2020

Evaluating Agents without Rewards

21 Dec 2020bfmat/agenteval

Moreover, input entropy and information gain correlate more strongly with human similarity than task reward does, suggesting the use of intrinsic objectives for designing agents that behave similarly to human players.

1
21 Dec 2020

High-Throughput Synchronous Deep RL

In contrast, asynchronous methods achieve high throughput but suffer from stability issues and lower sample efficiency due to `stale policies.'

9
17 Dec 2020