6 papers with code • 17 benchmarks • 1 datasets
A Unified Approach to Reinforcement Learning, Quantal Response Equilibria, and Two-Player Zero-Sum Games
This work studies an algorithm, which we call magnetic mirror descent, that is inspired by mirror descent and the non-Euclidean proximal gradient algorithm.
We hope that our suite of benchmarks will increase the reproducibility of experiments and make it possible to study challenging tasks with a limited computational budget, thus making RL research both more systematic and more accessible across the community.
The current reward learning from human preferences could be used to resolve complex reinforcement learning (RL) tasks without access to a reward function by defining a single fixed preference between pairs of trajectory segments.
With the rapid development of deep reinforcement learning (DRL) techniques, there is an increasing need to understand and interpret DRL policies.
Recent methods for imitation learning directly learn a $Q$-function using an implicit reward formulation rather than an explicit reward function.