Search Results for author: Zhaohan Daniel Guo

Found 14 papers, 1 papers with code

Optimizing Return Distributions with Distributional Dynamic Programming

no code implementations22 Jan 2025 Bernardo Ávila Pires, Mark Rowland, Diana Borsa, Zhaohan Daniel Guo, Khimya Khetarpal, André Barreto, David Abel, Rémi Munos, Will Dabney

To go beyond expected utilities, we combine distributional DP with stock augmentation, a technique previously introduced for classic DP in the context of risk-sensitive RL, where the MDP state is augmented with a statistic of the rewards obtained so far (since the first time step).

A Unifying Framework for Action-Conditional Self-Predictive Reinforcement Learning

no code implementations4 Jun 2024 Khimya Khetarpal, Zhaohan Daniel Guo, Bernardo Avila Pires, Yunhao Tang, Clare Lyle, Mark Rowland, Nicolas Heess, Diana Borsa, Arthur Guez, Will Dabney

In this work, we take a step towards bridging the gap between theory and practice by analyzing an action-conditional self-predictive objective (BYOL-AC) using the ODE framework, characterizing its convergence properties and highlighting important distinctions between the limiting solutions of the BYOL-$\Pi$ and BYOL-AC dynamics.

reinforcement-learning Reinforcement Learning +2

Generalized Preference Optimization: A Unified Approach to Offline Alignment

no code implementations8 Feb 2024 Yunhao Tang, Zhaohan Daniel Guo, Zeyu Zheng, Daniele Calandriello, Rémi Munos, Mark Rowland, Pierre Harvey Richemond, Michal Valko, Bernardo Ávila Pires, Bilal Piot

The GPO framework also sheds light on how offline algorithms enforce regularization, through the design of the convex function that defines the loss.

Directed Exploration for Reinforcement Learning

no code implementations18 Jun 2019 Zhaohan Daniel Guo, Emma Brunskill

Efficient exploration is necessary to achieve good sample efficiency for reinforcement learning in general.

Efficient Exploration reinforcement-learning +2

Neural Predictive Belief Representations

no code implementations15 Nov 2018 Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Bernardo A. Pires, Rémi Munos

In partially observable domains it is important for the representation to encode a belief state, a sufficient statistic of the observations seen so far.

Decision Making Representation Learning

Using Options and Covariance Testing for Long Horizon Off-Policy Policy Evaluation

no code implementations NeurIPS 2017 Zhaohan Daniel Guo, Philip S. Thomas, Emma Brunskill

In addition, we can take advantage of special cases that arise due to options-based policies to further improve the performance of importance sampling.

Sample Efficient Feature Selection for Factored MDPs

no code implementations9 Mar 2017 Zhaohan Daniel Guo, Emma Brunskill

This can result in a much better sample complexity when the in-degree of the necessary features is smaller than the in-degree of all features.

All feature selection +3

A PAC RL Algorithm for Episodic POMDPs

no code implementations25 May 2016 Zhaohan Daniel Guo, Shayan Doroudi, Emma Brunskill

Many interesting real world domains involve reinforcement learning (RL) in partially observable environments.

reinforcement-learning Reinforcement Learning +1

Cannot find the paper you are looking for? You can Submit a new open access paper.