Search Results for author: Daniele Calandriello

Found 31 papers, 8 papers with code

Human Alignment of Large Language Models through Online Preference Optimisation

no code implementations • 13 Mar 2024 • Daniele Calandriello, Daniel Guo, Remi Munos, Mark Rowland, Yunhao Tang, Bernardo Avila Pires, Pierre Harvey Richemond, Charline Le Lan, Michal Valko, Tianqi Liu, Rishabh Joshi, Zeyu Zheng, Bilal Piot

Building on this equivalence, we introduce the IPO-MD algorithm that generates data with a mixture policy (between the online and reference policy) similarly as the general Nash-MD algorithm.

Paper
Add Code

Generalized Preference Optimization: A Unified Approach to Offline Alignment

no code implementations • 8 Feb 2024 • Yunhao Tang, Zhaohan Daniel Guo, Zeyu Zheng, Daniele Calandriello, Rémi Munos, Mark Rowland, Pierre Harvey Richemond, Michal Valko, Bernardo Ávila Pires, Bilal Piot

Offline preference optimization allows fine-tuning large models directly from offline data, and has proved effective in recent alignment practices.

Paper
Add Code

Decoding-time Realignment of Language Models

no code implementations • 5 Feb 2024 • Tianlin Liu, Shangmin Guo, Leonardo Bianco, Daniele Calandriello, Quentin Berthet, Felipe Llinares, Jessica Hoffmann, Lucas Dixon, Michal Valko, Mathieu Blondel

Aligning language models with human preferences is crucial for reducing errors and biases in these models.

Models Alignment

Paper
Add Code

Nash Learning from Human Feedback

no code implementations • 1 Dec 2023 • Rémi Munos, Michal Valko, Daniele Calandriello, Mohammad Gheshlaghi Azar, Mark Rowland, Zhaohan Daniel Guo, Yunhao Tang, Matthieu Geist, Thomas Mesnard, Andrea Michi, Marco Selvi, Sertan Girgin, Nikola Momchev, Olivier Bachem, Daniel J. Mankowitz, Doina Precup, Bilal Piot

We term this approach Nash learning from human feedback (NLHF).

Text Summarization

Paper
Add Code

Demonstration-Regularized RL

no code implementations • 26 Oct 2023 • Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Alexey Naumov, Pierre Perrault, Michal Valko, Pierre Menard

In particular, we study the demonstration-regularized reinforcement learning that leverages the expert demonstrations by KL-regularization for a policy learned by behavior cloning.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

A General Theoretical Paradigm to Understand Learning from Human Preferences

1 code implementation • 18 Oct 2023 • Mohammad Gheshlaghi Azar, Mark Rowland, Bilal Piot, Daniel Guo, Daniele Calandriello, Michal Valko, Rémi Munos

In particular we derive a new general objective called $\Psi$PO for learning from human preferences that is expressed in terms of pairwise preferences and therefore bypasses both approximations.

1,628

Paper
Code

Unlocking the Power of Representations in Long-term Novelty-based Exploration

no code implementations • 2 May 2023 • Alaa Saade, Steven Kapturowski, Daniele Calandriello, Charles Blundell, Pablo Sprechmann, Leopoldo Sarra, Oliver Groth, Michal Valko, Bilal Piot

We introduce Robust Exploration via Clustering-based Online Density Estimation (RECODE), a non-parametric method for novelty-based exploration that estimates visitation counts for clusters of states based on their similarity in a chosen embedding space.

Atari Games Clustering +1

Paper
Add Code

Fast Rates for Maximum Entropy Exploration

1 code implementation • 14 Mar 2023 • Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Remi Munos, Alexey Naumov, Pierre Perrault, Yunhao Tang, Michal Valko, Pierre Menard

Finally, we apply developed regularization techniques to reduce sample complexity of visitation entropy maximization to $\widetilde{\mathcal{O}}(H^2SA/\varepsilon^2)$, yielding a statistical separation between maximum entropy exploration and reward-free exploration.

Reinforcement Learning (RL)

Paper
Code

Understanding Self-Predictive Learning for Reinforcement Learning

no code implementations • 6 Dec 2022 • Yunhao Tang, Zhaohan Daniel Guo, Pierre Harvey Richemond, Bernardo Ávila Pires, Yash Chandak, Rémi Munos, Mark Rowland, Mohammad Gheshlaghi Azar, Charline Le Lan, Clare Lyle, András György, Shantanu Thakoor, Will Dabney, Bilal Piot, Daniele Calandriello, Michal Valko

We identify that a faster paced optimization of the predictor and semi-gradient updates on the representation, are crucial to preventing the representation collapse.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Optimistic Posterior Sampling for Reinforcement Learning with Few Samples and Tight Guarantees

1 code implementation • 28 Sep 2022 • Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Remi Munos, Alexey Naumov, Mark Rowland, Michal Valko, Pierre Menard

We consider reinforcement learning in an environment modeled by an episodic, finite, stage-dependent Markov decision process of horizon $H$ with $S$ states, and $A$ actions.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

BYOL-Explore: Exploration by Bootstrapped Prediction

no code implementations • 16 Jun 2022 • Zhaohan Daniel Guo, Shantanu Thakoor, Miruna Pîslar, Bernardo Avila Pires, Florent Altché, Corentin Tallec, Alaa Saade, Daniele Calandriello, Jean-bastien Grill, Yunhao Tang, Michal Valko, Rémi Munos, Mohammad Gheshlaghi Azar, Bilal Piot

We present BYOL-Explore, a conceptually simple yet general approach for curiosity-driven exploration in visually-complex environments.

Paper
Add Code

Information-theoretic Online Memory Selection for Continual Learning

no code implementations • ICLR 2022 • Shengyang Sun, Daniele Calandriello, Huiyi Hu, Ang Li, Michalis Titsias

A challenging problem in task-free continual learning is the online selection of a representative replay memory from data streams.

Continual Learning

Paper
Add Code

Scaling Gaussian Process Optimization by Evaluating a Few Unique Candidates Multiple Times

no code implementations • 30 Jan 2022 • Daniele Calandriello, Luigi Carratino, Alessandro Lazaric, Michal Valko, Lorenzo Rosasco

Computing a Gaussian process (GP) posterior has a computational cost cubical in the number of historical points.

Active Learning Hyperparameter Optimization

Paper
Add Code

One Pass ImageNet

no code implementations • NeurIPS Workshop ImageNet_PPF 2021 • Huiyi Hu, Ang Li, Daniele Calandriello, Dilan Gorur

We present the One Pass ImageNet (OPIN) problem, which aims to study the effectiveness of deep learning in a streaming setting.

Continual Learning

Paper
Add Code

ParK: Sound and Efficient Kernel Ridge Regression by Feature Space Partitions

no code implementations • NeurIPS 2021 • Luigi Carratino, Stefano Vigogna, Daniele Calandriello, Lorenzo Rosasco

We introduce ParK, a new large-scale solver for kernel ridge regression.

regression

Paper
Add Code

On the Emergence of Whole-body Strategies from Humanoid Robot Push-recovery Learning

no code implementations • 29 Apr 2021 • Diego Ferigo, Raffaello Camoriano, Paolo Maria Viceconte, Daniele Calandriello, Silvio Traversaro, Lorenzo Rosasco, Daniele Pucci

Balancing and push-recovery are essential capabilities enabling humanoid robots to solve complex locomotion tasks.

Humanoid Control

Paper
Add Code

Sampling from a k-DPP without looking at all items

1 code implementation • NeurIPS 2020 • Daniele Calandriello, Michal Derezinski, Michal Valko

Determinantal point processes (DPPs) are a useful probabilistic model for selecting a small diverse subset out of a large collection of items, with applications in summarization, recommendation, stochastic optimization, experimental design and more.

Experimental Design Point Processes +1

217

Paper
Code

Sampling from a $k$-DPP without looking at all items

no code implementations • 30 Jun 2020 • Daniele Calandriello, Michał Dereziński, Michal Valko

Active Learning Point Processes +1

Paper
Add Code

Near-linear Time Gaussian Process Optimization with Adaptive Batching and Resparsification

1 code implementation • ICML 2020 • Daniele Calandriello, Luigi Carratino, Alessandro Lazaric, Michal Valko, Lorenzo Rosasco

Gaussian processes (GP) are one of the most successful frameworks to model uncertainty.

Gaussian Processes

Paper
Code

Statistical and Computational Trade-Offs in Kernel K-Means

no code implementations • NeurIPS 2018 • Daniele Calandriello, Lorenzo Rosasco

We investigate the efficiency of k-means in terms of both statistical and computational requirements.

Paper
Add Code

Exact sampling of determinantal point processes with sublinear time preprocessing

2 code implementations • NeurIPS 2019 • Michał Dereziński, Daniele Calandriello, Michal Valko

For this purpose, we propose a new algorithm which, given access to $\mathbf{L}$, samples exactly from a determinantal point process while satisfying the following two properties: (1) its preprocessing cost is $n \cdot \text{poly}(k)$, i. e., sublinear in the size of $\mathbf{L}$, and (2) its sampling cost is $\text{poly}(k)$, i. e., independent of the size of $\mathbf{L}$.

Point Processes

217

Paper
Code

Gaussian Process Optimization with Adaptive Sketching: Scalable and No Regret

1 code implementation • 13 Mar 2019 • Daniele Calandriello, Luigi Carratino, Alessandro Lazaric, Michal Valko, Lorenzo Rosasco

Moreover, we show that our procedure selects at most $\tilde{O}(d_{eff})$ points, where $d_{eff}$ is the effective dimension of the explored space, which is typically much smaller than both $d$ and $t$.

Gaussian Processes

Paper
Code

On Fast Leverage Score Sampling and Optimal Learning

1 code implementation • NeurIPS 2018 • Alessandro Rudi, Daniele Calandriello, Luigi Carratino, Lorenzo Rosasco

Leverage score sampling provides an appealing way to perform approximate computations for large matrices.

regression

Paper
Code

Improved large-scale graph learning through ridge spectral sparsification

no code implementations • ICML 2018 • Daniele Calandriello, Alessandro Lazaric, Ioannis Koutis, Michal Valko

By constructing a spectrally-similar graph, we are able to bound the error induced by the sparsification for a variety of downstream tasks (e. g., SSL).

Graph Learning

Paper
Add Code

Distributed Adaptive Sampling for Kernel Matrix Approximation

no code implementations • 27 Mar 2018 • Daniele Calandriello, Alessandro Lazaric, Michal Valko

In this paper, we introduce SQUEAK, a new algorithm for kernel approximation based on RLS sampling that sequentially processes the dataset, storing a dictionary which creates accurate kernel matrix approximations with a number of points that only depends on the effective dimension $d_{eff}(\gamma)$ of the dataset.

Clustering

Paper
Add Code

Efficient Second-Order Online Kernel Learning with Adaptive Embedding

no code implementations • NeurIPS 2017 • Daniele Calandriello, Alessandro Lazaric, Michal Valko

The embedded space is continuously updated to guarantee that the embedding remains accurate, and we show that the per-step cost only grows with the effective dimension of the problem and not with $T$.

Second-order methods

Paper
Add Code

Second-Order Kernel Online Convex Optimization with Adaptive Sketching

no code implementations • ICML 2017 • Daniele Calandriello, Alessandro Lazaric, Michal Valko

First-order KOCO methods such as functional gradient descent require only $\mathcal{O}(t)$ time and space per iteration, and, when the only information on the losses is their convexity, achieve a minimax optimal $\mathcal{O}(\sqrt{T})$ regret.

Second-order methods

Paper
Add Code

Analysis of Kelner and Levin graph sparsification algorithm for a streaming setting

no code implementations • 13 Sep 2016 • Daniele Calandriello, Alessandro Lazaric, Michal Valko

We derive a new proof to show that the incremental resparsification algorithm proposed by Kelner and Levin (2013) produces a spectral sparsifier in high probability.

Paper
Add Code

Incremental Spectral Sparsification for Large-Scale Graph-Based Semi-Supervised Learning

no code implementations • 21 Jan 2016 • Daniele Calandriello, Alessandro Lazaric, Michal Valko, Ioannis Koutis

While the harmonic function solution performs well in many semi-supervised learning (SSL) tasks, it is known to scale poorly with the number of samples.

Quantization

Paper
Add Code

Sparse Multi-Task Reinforcement Learning

no code implementations • NeurIPS 2014 • Daniele Calandriello, Alessandro Lazaric, Marcello Restelli

This is equivalent to assuming that the weight vectors of the task value functions are \textit{jointly sparse}, i. e., the set of their non-zero components is small and it is shared across tasks.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Semi-Supervised Information-Maximization Clustering

no code implementations • 30 Apr 2013 • Daniele Calandriello, Gang Niu, Masashi Sugiyama

Semi-supervised clustering aims to introduce prior knowledge in the decision process of a clustering algorithm.

Clustering

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.