Search Results for author: Bernardo Ávila Pires

Found 7 papers, 0 papers with code

Generalized Preference Optimization: A Unified Approach to Offline Alignment

no code implementations8 Feb 2024 Yunhao Tang, Zhaohan Daniel Guo, Zeyu Zheng, Daniele Calandriello, Rémi Munos, Mark Rowland, Pierre Harvey Richemond, Michal Valko, Bernardo Ávila Pires, Bilal Piot

Offline preference optimization allows fine-tuning large models directly from offline data, and has proved effective in recent alignment practices.

Off-policy Distributional Q($λ$): Distributional RL without Importance Sampling

no code implementations8 Feb 2024 Yunhao Tang, Mark Rowland, Rémi Munos, Bernardo Ávila Pires, Will Dabney

We introduce off-policy distributional Q($\lambda$), a new addition to the family of off-policy distributional evaluation algorithms.

DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm

no code implementations29 May 2023 Yunhao Tang, Tadashi Kozuno, Mark Rowland, Anna Harutyunyan, Rémi Munos, Bernardo Ávila Pires, Michal Valko

Multi-step learning applies lookahead over multiple time steps and has proved valuable in policy evaluation settings.

Multiclass Classification Calibration Functions

no code implementations20 Sep 2016 Bernardo Ávila Pires, Csaba Szepesvári

We devise a streamlined analysis that simplifies the process of deriving calibration functions for a large number of surrogate losses that have been proposed in the literature.

Classification General Classification

Cannot find the paper you are looking for? You can Submit a new open access paper.