Distributional Reinforcement Learning
31 papers with code • 0 benchmarks • 0 datasets
Value distribution is the distribution of the random return received by a reinforcement learning agent. it been used for a specific purpose such as implementing risk-aware behaviour.
We have random return Z whose expectation is the value Q. This random return is also described by a recursive equation, but one of a distributional nature
Benchmarks
These leaderboards are used to track progress in Distributional Reinforcement Learning
Latest papers with no code
Beyond Average Return in Markov Decision Processes
What are the functionals of the reward that can be computed and optimized exactly in Markov Decision Processes? In the finite-horizon, undiscounted setting, Dynamic Programming (DP) can only handle these operations efficiently for certain classes of statistics.
Pitfall of Optimism: Distributional Reinforcement Learning by Randomizing Risk Criterion
Distributional reinforcement learning algorithms have attempted to utilize estimated uncertainty for exploration, such as optimism in the face of uncertainty.
Distributional Reinforcement Learning with Online Risk-awareness Adaption
The use of reinforcement learning (RL) in practical applications requires considering sub-optimal outcomes, which depend on the agent's familiarity with the uncertain environment.
Learning Risk-Aware Quadrupedal Locomotion using Distributional Reinforcement Learning
Instead of relying on a value expectation, we estimate the complete value distribution to account for uncertainty in the robot's interaction with the environment.
Deep Reinforcement Learning for Artificial Upwelling Energy Management
The potential of artificial upwelling (AU) as a means of lifting nutrient-rich bottom water to the surface, stimulating seaweed growth, and consequently enhancing ocean carbon sequestration, has been gaining increasing attention in recent years.
Value-Distributional Model-Based Reinforcement Learning
We study the problem from a model-based Bayesian reinforcement learning perspective, where the goal is to learn the posterior distribution over value functions induced by parameter (epistemic) uncertainty of the Markov decision process.
Cramer Type Distances for Learning Gaussian Mixture Models by Gradient Descent
Even fewer algorithms are compatible with gradient descent, the common learning process for neural networks.
Is Risk-Sensitive Reinforcement Learning Properly Resolved?
Due to the nature of risk management in learning applicable policies, risk-sensitive reinforcement learning (RSRL) has been realized as an important direction.
Diverse Projection Ensembles for Distributional Reinforcement Learning
In contrast to classical reinforcement learning, distributional reinforcement learning algorithms aim to learn the distribution of returns rather than their expected value.
PACER: A Fully Push-forward-based Distributional Reinforcement Learning Algorithm
In this paper, we propose the first fully push-forward-based Distributional Reinforcement Learning algorithm, called Push-forward-based Actor-Critic EncourageR (PACER).