Search Results for author: Michael Dennis

Found 8 papers, 5 papers with code

Evolving Curricula with Regret-Based Environment Design

1 code implementation2 Mar 2022 Jack Parker-Holder, Minqi Jiang, Michael Dennis, Mikayel Samvelyan, Jakob Foerster, Edward Grefenstette, Tim Rocktäschel

Our approach, which we call Adversarially Compounding Complexity by Editing Levels (ACCEL), seeks to constantly produce levels at the frontier of an agent's capabilities, resulting in curricula that start simple but become increasingly complex.

Replay-Guided Adversarial Environment Design

no code implementations NeurIPS 2021 Minqi Jiang, Michael Dennis, Jack Parker-Holder, Jakob Foerster, Edward Grefenstette, Tim Rocktäschel

Furthermore, our theory suggests a highly counterintuitive improvement to PLR: by stopping the agent from updating its policy on uncurated levels (training on less data), we can improve the convergence to Nash equilibria.

A New Formalism, Method and Open Issues for Zero-Shot Coordination

1 code implementation11 Jun 2021 Johannes Treutlein, Michael Dennis, Caspar Oesterheld, Jakob Foerster

We introduce an extension of the algorithm, other-play with tie-breaking, and prove that it is optimal in the LFC problem and an equilibrium in the LFC game.

Multi-agent Reinforcement Learning

Improving Social Welfare While Preserving Autonomy via a Pareto Mediator

no code implementations7 Jun 2021 Stephen Mcaleer, John Lanier, Michael Dennis, Pierre Baldi, Roy Fox

Machine learning algorithms often make decisions on behalf of agents with varied and sometimes conflicting interests.

Accumulating Risk Capital Through Investing in Cooperation

no code implementations25 Jan 2021 Charlotte Roman, Michael Dennis, Andrew Critch, Stuart Russell

Recent work on promoting cooperation in multi-agent learning has resulted in many methods which successfully promote cooperation at the cost of becoming more vulnerable to exploitation by malicious actors.

Quantifying Differences in Reward Functions

1 code implementation ICLR 2021 Adam Gleave, Michael Dennis, Shane Legg, Stuart Russell, Jan Leike

However, this method cannot distinguish between the learned reward function failing to reflect user preferences and the policy optimization process failing to optimize the learned reward.

Adversarial Policies: Attacking Deep Reinforcement Learning

1 code implementation ICLR 2020 Adam Gleave, Michael Dennis, Cody Wild, Neel Kant, Sergey Levine, Stuart Russell

Deep reinforcement learning (RL) policies are known to be vulnerable to adversarial perturbations to their observations, similar to adversarial examples for classifiers.

reinforcement-learning

Cannot find the paper you are looking for? You can Submit a new open access paper.