Search Results for author: Daniel Guo

Found 5 papers, 3 papers with code

Human Alignment of Large Language Models through Online Preference Optimisation

no code implementations • 13 Mar 2024 • Daniele Calandriello, Daniel Guo, Remi Munos, Mark Rowland, Yunhao Tang, Bernardo Avila Pires, Pierre Harvey Richemond, Charline Le Lan, Michal Valko, Tianqi Liu, Rishabh Joshi, Zeyu Zheng, Bilal Piot

Building on this equivalence, we introduce the IPO-MD algorithm that generates data with a mixture policy (between the online and reference policy) similarly as the general Nash-MD algorithm.

Paper
Add Code

A General Theoretical Paradigm to Understand Learning from Human Preferences

1 code implementation • 18 Oct 2023 • Mohammad Gheshlaghi Azar, Mark Rowland, Bilal Piot, Daniel Guo, Daniele Calandriello, Michal Valko, Rémi Munos

In particular we derive a new general objective called $\Psi$PO for learning from human preferences that is expressed in terms of pairwise preferences and therefore bypasses both approximations.

1,631

Paper
Code

Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning

no code implementations • ICML 2020 • Daniel Guo, Bernardo Avila Pires, Bilal Piot, Jean-bastien Grill, Florent Altché, Rémi Munos, Mohammad Gheshlaghi Azar

These latent embeddings are themselves trained to be predictive of the aforementioned representations.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Agent57: Outperforming the Atari Human Benchmark

5 code implementations • ICML 2020 • Adrià Puigdomènech Badia, Bilal Piot, Steven Kapturowski, Pablo Sprechmann, Alex Vitvitskyi, Daniel Guo, Charles Blundell

Atari games have been a long-standing benchmark in the reinforcement learning (RL) community for the past decade.

Ranked #1 on Atari Games on Atari 2600 HERO

Atari Games Reinforcement Learning (RL)

Paper
Code

Never Give Up: Learning Directed Exploration Strategies

6 code implementations • ICLR 2020 • Adrià Puigdomènech Badia, Pablo Sprechmann, Alex Vitvitskyi, Daniel Guo, Bilal Piot, Steven Kapturowski, Olivier Tieleman, Martín Arjovsky, Alexander Pritzel, Andew Bolt, Charles Blundell

Our method doubles the performance of the base agent in all hard exploration in the Atari-57 suite while maintaining a very high score across the remaining games, obtaining a median human normalised score of 1344. 0%.

Ranked #7 on Atari Games on atari game

Atari Games

2,548

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.