In this work, we build on recent advances in distributional reinforcement learning to give a generally applicable, flexible, and state-of-the-art distributional variant of DQN.
Ranked #1 on
Atari Games
on Atari 2600 Freeway
In this paper, we build on recent work advocating a distributional approach to reinforcement learning in which the distribution over returns is modeled explicitly instead of only estimating the mean.
Ranked #1 on
Atari Games
on Atari 2600 Freeway
In this paper, we propose the Quantile Option Architecture (QUOTA) for exploration based on recent advances in distributional reinforcement learning (RL).
Efficient exploration remains a major challenge for reinforcement learning.
ATARI GAMES DISTRIBUTIONAL REINFORCEMENT LEARNING EFFICIENT EXPLORATION Q-LEARNING
The key challenge in practical distributional RL algorithms lies in how to parameterize estimated distributions so as to better approximate the true continuous distribution.
Ranked #2 on
Atari Games
on Atari 2600 James Bond
Distributional reinforcement learning (distributional RL) has seen empirical success in complex Markov Decision Processes (MDPs) in the setting of nonlinear function approximation.
Reinforcement learning agents are faced with two types of uncertainty.
BAYESIAN INFERENCE DISTRIBUTIONAL REINFORCEMENT LEARNING EFFICIENT EXPLORATION
We consider the problem of learning a set of probability distributions from the empirical Bellman dynamics in distributional reinforcement learning (RL), a class of state-of-the-art methods that estimate the distribution, as opposed to only the expectation, of the total return.
Moreover in a safety critical domain it is essential to know what our agent does and does not know, for this we also quantify the model uncertainty associated with each patient state and action, and propose a general framework for uncertainty aware, interpretable treatment policies.
DECISION MAKING UNDER UNCERTAINTY DISTRIBUTIONAL REINFORCEMENT LEARNING SAFE REINFORCEMENT LEARNING
To improve the sample efficiency of policy-gradient based reinforcement learning algorithms, we propose implicit distributional actor-critic (IDAC) that consists of a distributional critic, built on two deep generator networks (DGNs), and a semi-implicit actor (SIA), powered by a flexible policy distribution.