The Atari 2600 Games task (and dataset) involves training an agent to achieve high game scores.

( Image credit: Playing Atari with Deep Reinforcement Learning )

Revisiting Prioritized Experience Replay: A Value Perspective

5 Feb 2021RLforlife/VER

Furthermore, we successfully extend our theoretical framework to maximum-entropy RL by deriving the lower and upper bounds of these value metrics for soft Q-learning, which turn out to be the product of $|\text{TD}|$ and "on-policyness" of the experiences.

1
Shielding Atari Games with Bounded Prescience

20 Jan 2021HjalmarWijk/bounded-prescience

We present the first exact method for analysing and ensuring the safety of DRL agents for Atari games.

0
Benchmarking Perturbation-based Saliency Maps for Explaining Deep Reinforcement Learning Agents

18 Jan 2021belimmer/PerturbationSaliencyEvaluation

All four approaches work by perturbing parts of the input and measuring how much this affects the agent's output.

2
Developing an OpenAI Gym-compatible framework and simulation environment for testing Deep Reinforcement Learning agents solving the Ambulance Location Problem

12 Jan 2021MichaelAllen1966/qambo

Results: A range of Deep RL agents based on Deep Q networks were tested in this custom environment.

0
Evolving Reinforcement Learning Algorithms

Learning from scratch on simple classical control and gridworld tasks, our method rediscovers the temporal-difference (TD) algorithm.

2
Reinforcement Learning with Latent Flow

6 Jan 2021WendyShang/flare

Temporal information is essential to learning effective policies with Reinforcement Learning (RL).

16
Multi-Agent Trust Region Learning

1 Jan 2021matrl-project/matrl

We derive the lower bound of agents' payoff improvements for MATRL methods, and also prove the convergence of our method on the meta-game fixed points.

2
Augmenting Policy Learning with Routines Discovered from a Demonstration

23 Dec 2020sjtuytc/-AAAI21-RoutineAugmentedPolicyLearning-RAPL-

Humans can abstract prior knowledge from very little data and use it to boost skill learning.

13
Evaluating Agents without Rewards

21 Dec 2020bfmat/agenteval

Moreover, input entropy and information gain correlate more strongly with human similarity than task reward does, suggesting the use of intrinsic objectives for designing agents that behave similarly to human players.

1
High-Throughput Synchronous Deep RL

In contrast, asynchronous methods achieve high throughput but suffer from stability issues and lower sample efficiency due to `stale policies.'

9
