no code implementations • 2 Sep 2024 • Georgy Sokolov, Maximilian Thiessen, Margarita Akhmejanova, Fabio Vitale, Francesco Orabona
We study the problem of learning the clusters of a given graph in the self-directed learning setup.
no code implementations • 3 Jun 2024 • Andrew Jacobsen, Francesco Orabona
We study the problem of dynamic regret minimization in online convex optimization, in which the objective is to minimize the difference between the cumulative loss of an algorithm and that of an arbitrary sequence of comparators.
no code implementations • 14 Feb 2024 • Ilja Kuzborskij, Kwang-Sung Jun, Yulian Wu, Kyoungseok Jang, Francesco Orabona
In this paper, we consider the problem of proving concentration inequalities to estimate the mean of the sequence.
1 code implementation • 3 Oct 2023 • Alexandru Meterez, Amir Joudaki, Francesco Orabona, Alexander Immer, Gunnar Rätsch, Hadi Daneshmand
We answer this question in the affirmative by giving a particular construction of an Multi-Layer Perceptron (MLP) with linear activations and batch-normalization that provably has bounded gradients at any depth.
no code implementations • 10 Aug 2023 • Francesco Orabona
In this short note, I show how to adapt to H\"{o}lder smoothness using normalized gradients in a black-box way.
no code implementations • 22 Jul 2023 • Keyi Chen, Francesco Orabona
Due to its speed and simplicity, subgradient descent is one of the most used optimization algorithms in convex machine learning algorithms.
no code implementations • 31 May 2023 • Keyi Chen, Francesco Orabona
We propose a new class of online learning algorithms, generalized implicit Follow-The-Regularized-Leader (FTRL), that expands the scope of FTRL framework.
no code implementations • 12 Feb 2023 • Kyoungseok Jang, Kwang-Sung Jun, Ilja Kuzborskij, Francesco Orabona
We consider the problem of estimating the mean of a sequence of random elements $f(X_1, \theta)$ $, \ldots, $ $f(X_n, \theta)$ where $f$ is a fixed scalar function, $S=(X_1, \ldots, X_n)$ are independent random variables, and $\theta$ is a possibly $S$-dependent parameter.
no code implementations • 7 Feb 2023 • Ashok Cutkosky, Harsh Mehta, Francesco Orabona
Our primary technique is a reduction from non-smooth non-convex optimization to online learning, after which our results follow from standard regret bounds in online learning.
no code implementations • 23 Aug 2022 • Michael Crawshaw, Mingrui Liu, Francesco Orabona, Wei zhang, Zhenxun Zhuang
We also compare these algorithms with popular optimizers on a set of deep learning tasks, observing that we can match the performance of Adam while beating the others.
no code implementations • 19 Mar 2022 • Keyi Chen, Ashok Cutkosky, Francesco Orabona
Parameter-free algorithms are online learning algorithms that do not require setting learning rates.
1 code implementation • 31 Jan 2022 • Zhenxun Zhuang, Mingrui Liu, Ashok Cutkosky, Francesco Orabona
First, we show how to re-interpret AdamW as an approximation of a proximal gradient method, which takes advantage of the closed-form proximal mapping of the regularizer instead of only utilizing its gradient information as in Adam-$\ell_2$.
1 code implementation • NeurIPS 2021 • Jeffrey Negrea, Blair Bilodeau, Nicolò Campolongo, Francesco Orabona, Daniel M. Roy
Quantile (and, more generally, KL) regret bounds, such as those achieved by NormalHedge (Chaudhuri, Freund, and Hsu 2009) and its variants, relax the goal of competing against the best individual expert to only competing against a majority of experts on adversarial data.
2 code implementations • 27 Oct 2021 • Francesco Orabona, Kwang-Sung Jun
A classic problem in statistics is the estimation of the expectation of random variables from samples.
1 code implementation • 13 Jun 2021 • Genevieve Flaspohler, Francesco Orabona, Judah Cohen, Soukayna Mouatadid, Miruna Oprescu, Paulo Orenstein, Lester Mackey
Inspired by the demands of real-time climate and weather forecasting, we develop optimistic online learning algorithms that require no parameter tuning and have optimal regret guarantees under delayed feedback.
no code implementations • 27 Feb 2021 • Mingrui Liu, Francesco Orabona
This means that the convergence speed does not have any improvement even if the algorithm starts from the optimal solution, and hence, is oblivious to the initialization.
no code implementations • 15 Feb 2021 • Nicolò Campolongo, Francesco Orabona
Our proposed algorithm is adaptive not only to the temporal variability of the loss functions, but also to the path length of the sequence of comparators when an upper bound is known.
no code implementations • 13 Feb 2021 • Xiaoyu Li, Mingrui Liu, Francesco Orabona
In this paper, we focus on the convergence rate of the last iterate of SGDM.
1 code implementation • 30 Jan 2021 • Francesco Orabona, Dávid Pál
We design and analyze an algorithm for first-order stochastic optimization of a large class of functions on $\mathbb{R}^d$.
no code implementations • 24 Nov 2020 • Mingrui Liu, Wei zhang, Francesco Orabona, Tianbao Yang
As a result, Adam$^+$ requires few parameter tuning, as Adam, but it enjoys a provable convergence guarantee.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +5
no code implementations • 28 Jul 2020 • Xiaoyu Li, Francesco Orabona
We use it to prove for the first time the convergence of the gradients to zero in high probability in the smooth nonconvex setting for Delayed AdaGrad with momentum.
no code implementations • 12 Jun 2020 • Keyi Chen, John Langford, Francesco Orabona
Parameter-free stochastic gradient descent (PFSGD) algorithms do not require setting learning rates while achieving optimal theoretical performance.
no code implementations • NeurIPS 2020 • Nicolò Campolongo, Francesco Orabona
We prove a novel static regret bound that depends on the temporal variability of the sequence of loss functions, a quantity which is often encountered when considering dynamic competitors.
2 code implementations • 12 Feb 2020 • Xiaoyu Li, Zhenxun Zhuang, Francesco Orabona
Moreover, we show the surprising property that these two strategies are \emph{adaptive} to the noise level in the stochastic gradients of PL functions.
2 code implementations • 31 Dec 2019 • Francesco Orabona
I present first-order and second-order algorithms for online learning with convex losses, in Euclidean and non-Euclidean settings.
no code implementations • 21 Nov 2019 • Kwang-Sung Jun, Francesco Orabona
We consider the problem of minimizing a convex risk with stochastic subgradients guaranteeing $\epsilon$-locally differentially private ($\epsilon$-LDP).
no code implementations • NeurIPS 2019 • Kwang-Sung Jun, Ashok Cutkosky, Francesco Orabona
In this paper, we consider the nonparametric least square regression in a Reproducing Kernel Hilbert Space (RKHS).
2 code implementations • NeurIPS 2019 • Ashok Cutkosky, Francesco Orabona
Variance reduction has emerged in recent years as a strong competitor to stochastic gradient descent in non-convex problems, providing the first algorithms to improve upon the converge rate of stochastic gradient descent for finding first-order critical points.
no code implementations • 5 Feb 2019 • Kwang-Sung Jun, Francesco Orabona
We show that BANCO achieves the optimal regret rate in our problem.
1 code implementation • 25 Jan 2019 • Zhenxun Zhuang, Ashok Cutkosky, Francesco Orabona
Stochastic Gradient Descent (SGD) has played a central role in machine learning.
no code implementations • 21 May 2018 • Xiaoyu Li, Francesco Orabona
In this paper, we start closing this gap: we theoretically analyze in the convex and non-convex settings a generalized version of the AdaGrad stepsizes.
no code implementations • 17 Feb 2018 • Ashok Cutkosky, Francesco Orabona
We introduce several new black-box reductions that significantly improve the design of adaptive and parameter-free online learning algorithms by simplifying analysis, improving regret guarantees, and sometimes even improving runtime.
no code implementations • 6 Nov 2017 • Kwang-Sung Jun, Francesco Orabona, Stephen Wright, Rebecca Willett
A key challenge in online learning is that classical algorithms can be slow to adapt to changing environments.
no code implementations • ICML 2017 • Alina Beygelzimer, Francesco Orabona, Chicheng Zhang
An efficient bandit algorithm for $\sqrt{T}$-regret in online multiclass prediction?
6 code implementations • NeurIPS 2017 • Francesco Orabona, Tatiana Tommasi
Instead, we reduce the optimization process to a game of betting on a coin and propose a learning-rate-free optimal algorithm for this scenario.
Ranked #1 on Stochastic Optimization on MNIST
no code implementations • 25 Feb 2017 • Alina Beygelzimer, Francesco Orabona, Chicheng Zhang
The regret bound holds simultaneously with respect to a family of loss functions parameterized by $\eta$, for a range of $\eta$ restricted by the norm of the competitor.
no code implementations • 14 Oct 2016 • Kwang-Sung Jun, Francesco Orabona, Rebecca Willett, Stephen Wright
This paper describes a new parameter-free online learning algorithm for changing environments.
1 code implementation • NeurIPS 2016 • Francesco Orabona, Dávid Pál
We present a new intuitive framework to design parameter-free algorithms for \emph{both} online linear optimization over Hilbert spaces and for learning with expert advice, based on reductions to betting on outcomes of adversarial coins.
no code implementations • 10 Feb 2016 • Tamir Hazan, Francesco Orabona, Anand D. Sarwate, Subhransu Maji, Tommi Jaakkola
This paper shows that the expected value of perturb-max inference with low dimensional perturbations can be used sequentially to generate unbiased samples from the Gibbs distribution.
no code implementations • 7 Feb 2016 • Alon Gonen, Francesco Orabona, Shai Shalev-Shwartz
We develop a novel preconditioning method for ridge regression, based on recent linear sketching methods.
no code implementations • 8 Jan 2016 • Francesco Orabona, Dávid Pál
We design and analyze algorithms for online linear optimization that have optimal regret and at the same time do not need to know any upper or lower bounds on the norm of the loss vectors.
no code implementations • 6 Nov 2015 • Francesco Orabona, David Pal
We prove non-asymptotic lower bounds on the expectation of the maximum of $d$ independent Gaussian variables and the expectation of the maximum of $d$ independent symmetric random walks.
no code implementations • 20 Aug 2015 • Rocco De Rosa, Francesco Orabona, Nicolò Cesa-Bianchi
Stream mining poses unique challenges to machine learning: predictive models are required to be scalable, incrementally trainable, must remain bounded in size (even when the data stream is arbitrarily long), and be nonparametric in order to achieve high accuracy even in complex and dynamic environments.
no code implementations • 19 Feb 2015 • Francesco Orabona, David Pal
We design algorithms for online linear optimization that have optimal regret and at the same time do not need to know any upper or lower bounds on the norm of the loss vectors.
no code implementations • 5 Feb 2015 • Francesco Orabona
I show a simple expression of the Mill's ratio of the Student's t-Distribution.
no code implementations • 4 Dec 2014 • Ilja Kuzborskij, Francesco Orabona
In this work we consider the learning setting where, in addition to the training set, the learner receives a collection of auxiliary hypotheses originating from other tasks.
no code implementations • 6 Aug 2014 • Ilja Kuzborskij, Francesco Orabona, Barbara Caputo
In this paper we consider the binary transfer learning problem, focusing on how to select and combine sources from a large pool to yield a good performance on a target task.
no code implementations • NeurIPS 2014 • Francesco Orabona
Stochastic gradient descent algorithms for training linear and kernel predictors are gaining more and more importance, thanks to their scalability.
no code implementations • 3 Mar 2014 • H. Brendan McMahan, Francesco Orabona
When $T$ is known, we derive an algorithm with an optimal regret bound (up to constant factors).
no code implementations • NeurIPS 2013 • Samory Kpotufe, Francesco Orabona
We consider the problem of maintaining the data-structures of a partition-based regression procedure in a setting where the training data arrives sequentially over time.
no code implementations • NeurIPS 2013 • Francesco Orabona
We present a new online learning algorithm that extends the exponentiated gradient to infinite dimensional spaces.
no code implementations • 15 Oct 2013 • Francesco Orabona, Tamir Hazan, Anand D. Sarwate, Tommi Jaakkola
Applying the general result to MAP perturbations can yield a more efficient algorithm to approximate sampling from the Gibbs distribution.
no code implementations • CVPR 2013 • Ilja Kuzborskij, Francesco Orabona, Barbara Caputo
Since the seminal work of Thrun [17], the learning to learn paradigm has been defined as the ability of an agent to improve its performance at each task with experience, with the number of tasks.
no code implementations • 10 Apr 2013 • Francesco Orabona, Koby Crammer, Nicolò Cesa-Bianchi
A unifying perspective on the design and the analysis of online algorithms is provided by online mirror descent, a general prediction strategy from which most first-order algorithms can be obtained as special cases.
no code implementations • NeurIPS 2012 • Claudio Gentile, Francesco Orabona
We present a novel multilabel/ranking algorithm working in partial information settings.
no code implementations • NeurIPS 2010 • Francesco Orabona, Koby Crammer
We propose a general framework to online learning for classification problems with time-varying potential functions in the adversarial setting.
no code implementations • NeurIPS 2010 • Jie Luo, Francesco Orabona
In this paper, we propose a semi-supervised framework to model this kind of problems.