Search Results for author: Daniele Calandriello

Found 31 papers, 8 papers with code

Human Alignment of Large Language Models through Online Preference Optimisation

no code implementations13 Mar 2024 Daniele Calandriello, Daniel Guo, Remi Munos, Mark Rowland, Yunhao Tang, Bernardo Avila Pires, Pierre Harvey Richemond, Charline Le Lan, Michal Valko, Tianqi Liu, Rishabh Joshi, Zeyu Zheng, Bilal Piot

Building on this equivalence, we introduce the IPO-MD algorithm that generates data with a mixture policy (between the online and reference policy) similarly as the general Nash-MD algorithm.

Generalized Preference Optimization: A Unified Approach to Offline Alignment

no code implementations8 Feb 2024 Yunhao Tang, Zhaohan Daniel Guo, Zeyu Zheng, Daniele Calandriello, Rémi Munos, Mark Rowland, Pierre Harvey Richemond, Michal Valko, Bernardo Ávila Pires, Bilal Piot

Offline preference optimization allows fine-tuning large models directly from offline data, and has proved effective in recent alignment practices.

Demonstration-Regularized RL

no code implementations26 Oct 2023 Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Alexey Naumov, Pierre Perrault, Michal Valko, Pierre Menard

In particular, we study the demonstration-regularized reinforcement learning that leverages the expert demonstrations by KL-regularization for a policy learned by behavior cloning.

reinforcement-learning Reinforcement Learning (RL)

A General Theoretical Paradigm to Understand Learning from Human Preferences

1 code implementation18 Oct 2023 Mohammad Gheshlaghi Azar, Mark Rowland, Bilal Piot, Daniel Guo, Daniele Calandriello, Michal Valko, Rémi Munos

In particular we derive a new general objective called $\Psi$PO for learning from human preferences that is expressed in terms of pairwise preferences and therefore bypasses both approximations.

Unlocking the Power of Representations in Long-term Novelty-based Exploration

no code implementations2 May 2023 Alaa Saade, Steven Kapturowski, Daniele Calandriello, Charles Blundell, Pablo Sprechmann, Leopoldo Sarra, Oliver Groth, Michal Valko, Bilal Piot

We introduce Robust Exploration via Clustering-based Online Density Estimation (RECODE), a non-parametric method for novelty-based exploration that estimates visitation counts for clusters of states based on their similarity in a chosen embedding space.

Atari Games Clustering +1

Fast Rates for Maximum Entropy Exploration

1 code implementation14 Mar 2023 Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Remi Munos, Alexey Naumov, Pierre Perrault, Yunhao Tang, Michal Valko, Pierre Menard

Finally, we apply developed regularization techniques to reduce sample complexity of visitation entropy maximization to $\widetilde{\mathcal{O}}(H^2SA/\varepsilon^2)$, yielding a statistical separation between maximum entropy exploration and reward-free exploration.

Reinforcement Learning (RL)

Optimistic Posterior Sampling for Reinforcement Learning with Few Samples and Tight Guarantees

1 code implementation28 Sep 2022 Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Remi Munos, Alexey Naumov, Mark Rowland, Michal Valko, Pierre Menard

We consider reinforcement learning in an environment modeled by an episodic, finite, stage-dependent Markov decision process of horizon $H$ with $S$ states, and $A$ actions.

reinforcement-learning Reinforcement Learning (RL)

Information-theoretic Online Memory Selection for Continual Learning

no code implementations ICLR 2022 Shengyang Sun, Daniele Calandriello, Huiyi Hu, Ang Li, Michalis Titsias

A challenging problem in task-free continual learning is the online selection of a representative replay memory from data streams.

Continual Learning

One Pass ImageNet

no code implementations NeurIPS Workshop ImageNet_PPF 2021 Huiyi Hu, Ang Li, Daniele Calandriello, Dilan Gorur

We present the One Pass ImageNet (OPIN) problem, which aims to study the effectiveness of deep learning in a streaming setting.

Continual Learning

Sampling from a k-DPP without looking at all items

1 code implementation NeurIPS 2020 Daniele Calandriello, Michal Derezinski, Michal Valko

Determinantal point processes (DPPs) are a useful probabilistic model for selecting a small diverse subset out of a large collection of items, with applications in summarization, recommendation, stochastic optimization, experimental design and more.

Experimental Design Point Processes +1

Sampling from a $k$-DPP without looking at all items

no code implementations30 Jun 2020 Daniele Calandriello, Michał Dereziński, Michal Valko

Determinantal point processes (DPPs) are a useful probabilistic model for selecting a small diverse subset out of a large collection of items, with applications in summarization, stochastic optimization, active learning and more.

Active Learning Point Processes +1

Statistical and Computational Trade-Offs in Kernel K-Means

no code implementations NeurIPS 2018 Daniele Calandriello, Lorenzo Rosasco

We investigate the efficiency of k-means in terms of both statistical and computational requirements.

Exact sampling of determinantal point processes with sublinear time preprocessing

2 code implementations NeurIPS 2019 Michał Dereziński, Daniele Calandriello, Michal Valko

For this purpose, we propose a new algorithm which, given access to $\mathbf{L}$, samples exactly from a determinantal point process while satisfying the following two properties: (1) its preprocessing cost is $n \cdot \text{poly}(k)$, i. e., sublinear in the size of $\mathbf{L}$, and (2) its sampling cost is $\text{poly}(k)$, i. e., independent of the size of $\mathbf{L}$.

Point Processes

Gaussian Process Optimization with Adaptive Sketching: Scalable and No Regret

1 code implementation13 Mar 2019 Daniele Calandriello, Luigi Carratino, Alessandro Lazaric, Michal Valko, Lorenzo Rosasco

Moreover, we show that our procedure selects at most $\tilde{O}(d_{eff})$ points, where $d_{eff}$ is the effective dimension of the explored space, which is typically much smaller than both $d$ and $t$.

Gaussian Processes

On Fast Leverage Score Sampling and Optimal Learning

1 code implementation NeurIPS 2018 Alessandro Rudi, Daniele Calandriello, Luigi Carratino, Lorenzo Rosasco

Leverage score sampling provides an appealing way to perform approximate computations for large matrices.

regression

Improved large-scale graph learning through ridge spectral sparsification

no code implementations ICML 2018 Daniele Calandriello, Alessandro Lazaric, Ioannis Koutis, Michal Valko

By constructing a spectrally-similar graph, we are able to bound the error induced by the sparsification for a variety of downstream tasks (e. g., SSL).

Graph Learning

Distributed Adaptive Sampling for Kernel Matrix Approximation

no code implementations27 Mar 2018 Daniele Calandriello, Alessandro Lazaric, Michal Valko

In this paper, we introduce SQUEAK, a new algorithm for kernel approximation based on RLS sampling that sequentially processes the dataset, storing a dictionary which creates accurate kernel matrix approximations with a number of points that only depends on the effective dimension $d_{eff}(\gamma)$ of the dataset.

Clustering

Efficient Second-Order Online Kernel Learning with Adaptive Embedding

no code implementations NeurIPS 2017 Daniele Calandriello, Alessandro Lazaric, Michal Valko

The embedded space is continuously updated to guarantee that the embedding remains accurate, and we show that the per-step cost only grows with the effective dimension of the problem and not with $T$.

Second-order methods

Second-Order Kernel Online Convex Optimization with Adaptive Sketching

no code implementations ICML 2017 Daniele Calandriello, Alessandro Lazaric, Michal Valko

First-order KOCO methods such as functional gradient descent require only $\mathcal{O}(t)$ time and space per iteration, and, when the only information on the losses is their convexity, achieve a minimax optimal $\mathcal{O}(\sqrt{T})$ regret.

Second-order methods

Analysis of Kelner and Levin graph sparsification algorithm for a streaming setting

no code implementations13 Sep 2016 Daniele Calandriello, Alessandro Lazaric, Michal Valko

We derive a new proof to show that the incremental resparsification algorithm proposed by Kelner and Levin (2013) produces a spectral sparsifier in high probability.

Incremental Spectral Sparsification for Large-Scale Graph-Based Semi-Supervised Learning

no code implementations21 Jan 2016 Daniele Calandriello, Alessandro Lazaric, Michal Valko, Ioannis Koutis

While the harmonic function solution performs well in many semi-supervised learning (SSL) tasks, it is known to scale poorly with the number of samples.

Quantization

Sparse Multi-Task Reinforcement Learning

no code implementations NeurIPS 2014 Daniele Calandriello, Alessandro Lazaric, Marcello Restelli

This is equivalent to assuming that the weight vectors of the task value functions are \textit{jointly sparse}, i. e., the set of their non-zero components is small and it is shared across tasks.

reinforcement-learning Reinforcement Learning (RL)

Semi-Supervised Information-Maximization Clustering

no code implementations30 Apr 2013 Daniele Calandriello, Gang Niu, Masashi Sugiyama

Semi-supervised clustering aims to introduce prior knowledge in the decision process of a clustering algorithm.

Clustering

Cannot find the paper you are looking for? You can Submit a new open access paper.