no code implementations • 22 Jan 2025 • Bernardo Ávila Pires, Mark Rowland, Diana Borsa, Zhaohan Daniel Guo, Khimya Khetarpal, André Barreto, David Abel, Rémi Munos, Will Dabney
To go beyond expected utilities, we combine distributional DP with stock augmentation, a technique previously introduced for classic DP in the context of risk-sensitive RL, where the MDP state is augmented with a statistic of the rewards obtained so far (since the first time step).
no code implementations • 4 Jun 2024 • Khimya Khetarpal, Zhaohan Daniel Guo, Bernardo Avila Pires, Yunhao Tang, Clare Lyle, Mark Rowland, Nicolas Heess, Diana Borsa, Arthur Guez, Will Dabney
In this work, we take a step towards bridging the gap between theory and practice by analyzing an action-conditional self-predictive objective (BYOL-AC) using the ODE framework, characterizing its convergence properties and highlighting important distinctions between the limiting solutions of the BYOL-$\Pi$ and BYOL-AC dynamics.
no code implementations • 8 Feb 2024 • Yunhao Tang, Zhaohan Daniel Guo, Zeyu Zheng, Daniele Calandriello, Rémi Munos, Mark Rowland, Pierre Harvey Richemond, Michal Valko, Bernardo Ávila Pires, Bilal Piot
The GPO framework also sheds light on how offline algorithms enforce regularization, through the design of the convex function that defines the loss.
no code implementations • 1 Dec 2023 • Rémi Munos, Michal Valko, Daniele Calandriello, Mohammad Gheshlaghi Azar, Mark Rowland, Zhaohan Daniel Guo, Yunhao Tang, Matthieu Geist, Thomas Mesnard, Andrea Michi, Marco Selvi, Sertan Girgin, Nikola Momchev, Olivier Bachem, Daniel J. Mankowitz, Doina Precup, Bilal Piot
We term this approach Nash learning from human feedback (NLHF).
no code implementations • 1 May 2023 • Yash Chandak, Shantanu Thakoor, Zhaohan Daniel Guo, Yunhao Tang, Remi Munos, Will Dabney, Diana L Borsa
Representation learning and exploration are among the key challenges for any deep reinforcement learning agent.
no code implementations • 6 Dec 2022 • Yunhao Tang, Zhaohan Daniel Guo, Pierre Harvey Richemond, Bernardo Ávila Pires, Yash Chandak, Rémi Munos, Mark Rowland, Mohammad Gheshlaghi Azar, Charline Le Lan, Clare Lyle, András György, Shantanu Thakoor, Will Dabney, Bilal Piot, Daniele Calandriello, Michal Valko
We identify that a faster paced optimization of the predictor and semi-gradient updates on the representation, are crucial to preventing the representation collapse.
no code implementations • 16 Jun 2022 • Zhaohan Daniel Guo, Shantanu Thakoor, Miruna Pîslar, Bernardo Avila Pires, Florent Altché, Corentin Tallec, Alaa Saade, Daniele Calandriello, Jean-bastien Grill, Yunhao Tang, Michal Valko, Rémi Munos, Mohammad Gheshlaghi Azar, Bilal Piot
We present BYOL-Explore, a conceptually simple yet general approach for curiosity-driven exploration in visually-complex environments.
no code implementations • 6 Jan 2021 • Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Alaa Saade, Shantanu Thakoor, Bilal Piot, Bernardo Avila Pires, Michal Valko, Thomas Mesnard, Tor Lattimore, Rémi Munos
Exploration is essential for solving complex Reinforcement Learning (RL) tasks.
31 code implementations • 13 Jun 2020 • Jean-bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre H. Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, Rémi Munos, Michal Valko
From an augmented view of an image, we train the online network to predict the target network representation of the same image under a different augmented view.
Ranked #2 on
Self-Supervised Person Re-Identification
on SYSU-30k
no code implementations • 18 Jun 2019 • Zhaohan Daniel Guo, Emma Brunskill
Efficient exploration is necessary to achieve good sample efficiency for reinforcement learning in general.
no code implementations • 15 Nov 2018 • Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Bernardo A. Pires, Rémi Munos
In partially observable domains it is important for the representation to encode a belief state, a sufficient statistic of the observations seen so far.
no code implementations • NeurIPS 2017 • Zhaohan Daniel Guo, Philip S. Thomas, Emma Brunskill
In addition, we can take advantage of special cases that arise due to options-based policies to further improve the performance of importance sampling.
no code implementations • 9 Mar 2017 • Zhaohan Daniel Guo, Emma Brunskill
This can result in a much better sample complexity when the in-degree of the necessary features is smaller than the in-degree of all features.
no code implementations • 25 May 2016 • Zhaohan Daniel Guo, Shayan Doroudi, Emma Brunskill
Many interesting real world domains involve reinforcement learning (RL) in partially observable environments.