Search Results for author: Zhaohan Daniel Guo

Found 12 papers, 1 papers with code

Generalized Preference Optimization: A Unified Approach to Offline Alignment

no code implementations • 8 Feb 2024 • Yunhao Tang, Zhaohan Daniel Guo, Zeyu Zheng, Daniele Calandriello, Rémi Munos, Mark Rowland, Pierre Harvey Richemond, Michal Valko, Bernardo Ávila Pires, Bilal Piot

Offline preference optimization allows fine-tuning large models directly from offline data, and has proved effective in recent alignment practices.

Paper
Add Code

Nash Learning from Human Feedback

no code implementations • 1 Dec 2023 • Rémi Munos, Michal Valko, Daniele Calandriello, Mohammad Gheshlaghi Azar, Mark Rowland, Zhaohan Daniel Guo, Yunhao Tang, Matthieu Geist, Thomas Mesnard, Andrea Michi, Marco Selvi, Sertan Girgin, Nikola Momchev, Olivier Bachem, Daniel J. Mankowitz, Doina Precup, Bilal Piot

We term this approach Nash learning from human feedback (NLHF).

Text Summarization

Paper
Add Code

Representations and Exploration for Deep Reinforcement Learning using Singular Value Decomposition

no code implementations • 1 May 2023 • Yash Chandak, Shantanu Thakoor, Zhaohan Daniel Guo, Yunhao Tang, Remi Munos, Will Dabney, Diana L Borsa

Representation learning and exploration are among the key challenges for any deep reinforcement learning agent.

reinforcement-learning Representation Learning

Paper
Add Code

Understanding Self-Predictive Learning for Reinforcement Learning

no code implementations • 6 Dec 2022 • Yunhao Tang, Zhaohan Daniel Guo, Pierre Harvey Richemond, Bernardo Ávila Pires, Yash Chandak, Rémi Munos, Mark Rowland, Mohammad Gheshlaghi Azar, Charline Le Lan, Clare Lyle, András György, Shantanu Thakoor, Will Dabney, Bilal Piot, Daniele Calandriello, Michal Valko

We identify that a faster paced optimization of the predictor and semi-gradient updates on the representation, are crucial to preventing the representation collapse.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

BYOL-Explore: Exploration by Bootstrapped Prediction

no code implementations • 16 Jun 2022 • Zhaohan Daniel Guo, Shantanu Thakoor, Miruna Pîslar, Bernardo Avila Pires, Florent Altché, Corentin Tallec, Alaa Saade, Daniele Calandriello, Jean-bastien Grill, Yunhao Tang, Michal Valko, Rémi Munos, Mohammad Gheshlaghi Azar, Bilal Piot

We present BYOL-Explore, a conceptually simple yet general approach for curiosity-driven exploration in visually-complex environments.

Paper
Add Code

Geometric Entropic Exploration

no code implementations • 6 Jan 2021 • Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Alaa Saade, Shantanu Thakoor, Bilal Piot, Bernardo Avila Pires, Michal Valko, Thomas Mesnard, Tor Lattimore, Rémi Munos

Exploration is essential for solving complex Reinforcement Learning (RL) tasks.

Reinforcement Learning (RL)

Paper
Add Code

Bootstrap your own latent: A new approach to self-supervised Learning

31 code implementations • 13 Jun 2020 • Jean-bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre H. Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, Rémi Munos, Michal Valko

From an augmented view of an image, we train the online network to predict the target network representation of the same image under a different augmented view.

Ranked #2 on Self-Supervised Person Re-Identification on SYSU-30k

Representation Learning Self-Supervised Image Classification +3

12,788

Paper
Code

Directed Exploration for Reinforcement Learning

no code implementations • 18 Jun 2019 • Zhaohan Daniel Guo, Emma Brunskill

Efficient exploration is necessary to achieve good sample efficiency for reinforcement learning in general.

Efficient Exploration reinforcement-learning +1

Paper
Add Code

Neural Predictive Belief Representations

no code implementations • 15 Nov 2018 • Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Bernardo A. Pires, Rémi Munos

In partially observable domains it is important for the representation to encode a belief state, a sufficient statistic of the observations seen so far.

Decision Making Representation Learning

Paper
Add Code

Using Options and Covariance Testing for Long Horizon Off-Policy Policy Evaluation

no code implementations • NeurIPS 2017 • Zhaohan Daniel Guo, Philip S. Thomas, Emma Brunskill

In addition, we can take advantage of special cases that arise due to options-based policies to further improve the performance of importance sampling.

Paper
Add Code

Sample Efficient Feature Selection for Factored MDPs

no code implementations • 9 Mar 2017 • Zhaohan Daniel Guo, Emma Brunskill

This can result in a much better sample complexity when the in-degree of the necessary features is smaller than the in-degree of all features.

feature selection reinforcement-learning +1

Paper
Add Code

A PAC RL Algorithm for Episodic POMDPs

no code implementations • 25 May 2016 • Zhaohan Daniel Guo, Shayan Doroudi, Emma Brunskill

Many interesting real world domains involve reinforcement learning (RL) in partially observable environments.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.