Distributionally Robust Reinforcement Learning

23 Feb 2019  ·  Elena Smirnova, Elvis Dohmatob, Jérémie Mary ·

Real-world applications require RL algorithms to act safely. During learning process, it is likely that the agent executes sub-optimal actions that may lead to unsafe/poor states of the system. Exploration is particularly brittle in high-dimensional state/action space due to increased number of low-performing actions. In this work, we consider risk-averse exploration in approximate RL setting. To ensure safety during learning, we propose the distributionally robust policy iteration scheme that provides lower bound guarantee on state-values. Our approach induces a dynamic level of risk to prevent poor decisions and yet preserves the convergence to the optimal policy. Our formulation results in a efficient algorithm that accounts for a simple re-weighting of policy actions in the standard policy iteration scheme. We extend our approach to continuous state/action space and present a practical algorithm, distributionally robust soft actor-critic, that implements a different exploration strategy: it acts conservatively at short-term and it explores optimistically in a long-run. We provide promising experimental results on continuous control tasks.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here