no code implementations • 15 Feb 2024 • Khaled Eldowa, Nicolò Cesa-Bianchi, Alberto Maria Metelli, Marcello Restelli
For a selection of policy set families, we prove nearly-matching lower bounds, scaling similarly with the capacity.
no code implementations • 24 Dec 2023 • Yuko Kuroki, Alberto Rumi, Taira Tsuchiya, Fabio Vitale, Nicolò Cesa-Bianchi
We study best-of-both-worlds algorithms for $K$-armed linear contextual bandits.
no code implementations • 10 Nov 2023 • Stephen Pasteris, Alberto Rumi, Fabio Vitale, Nicolò Cesa-Bianchi
Many online decision-making problems correspond to maximizing a sequence of submodular functions.
no code implementations • 26 Oct 2023 • Juliette Achddou, Nicolò Cesa-Bianchi, Pierre Laforgue
We study multitask online learning in a setting where agents can only exchange information with their neighbors on an arbitrary communication network.
no code implementations • 15 Aug 2023 • Dirk van der Hoeven, Nikita Zhivotovskiy, Nicolò Cesa-Bianchi
Online learning methods yield sequential regret bounds under minimal assumptions and provide in-expectation risk bounds for statistical learning.
no code implementations • 14 Jul 2023 • Nicolò Cesa-Bianchi, Tommaso Cesari, Roberto Colomboni, Federico Fusco, Stefano Leonardi
We study the problem of regret minimization for a single bidder in a sequence of first-price auctions where the bidder discovers the item's value only if the auction is won.
no code implementations • 30 May 2023 • Emmanuel Esposito, Saeed Masoudian, Hao Qiu, Dirk van der Hoeven, Nicolò Cesa-Bianchi, Yevgeny Seldin
However, if the mapping of states to losses is stochastic, we show that the regret grows at a rate of $\sqrt{\big(K+\min\{|\mathcal{S}|, d\}\big)T}$ (within log factors), implying that if the number $|\mathcal{S}|$ of states is smaller than the delay, then intermediate observations help.
no code implementations • 14 Mar 2023 • Khaled Eldowa, Nicolò Cesa-Bianchi, Alberto Maria Metelli, Marcello Restelli
We investigate the problem of bandits with expert advice when the experts are fixed and known distributions over the actions.
no code implementations • 21 Feb 2023 • Nicolò Cesa-Bianchi, Tommaso Cesari, Roberto Colomboni, Federico Fusco, Stefano Leonardi
We provide a complete characterization of the regret regimes for fixed-price mechanisms under different feedback models in the two cases where the learner can post either the same or different prices to buyers and sellers.
no code implementations • 16 Feb 2023 • Giulia Clerici, Pierre Laforgue, Nicolò Cesa-Bianchi
By choosing the cycle length so as to trade-off approximation and estimation errors, we then prove a bound of order $\sqrt{d}\,(m+1)^{\frac{1}{2}+\max\{\gamma, 0\}}\, T^{3/4}$ (ignoring log factors) on the regret against the optimal sequence of actions, where $T$ is the horizon and $d$ is the dimension of the linear action space.
no code implementations • 9 Oct 2022 • Emmanuel Esposito, Federico Fusco, Dirk van der Hoeven, Nicolò Cesa-Bianchi
The framework of feedback graphs is a generalization of sequential decision-making with bandit or full information feedback.
no code implementations • 8 Sep 2022 • Marco Bressan, Nicolò Cesa-Bianchi, Silvio Lattanzi, Andrea Paudice, Maximilian Thiessen
In this work we show that, by carefully combining the two types of queries, a binary classifier can be learned in time $\operatorname{poly}(n+m)$ using only $O(m^2 \log n)$ label queries and $O\big(m \log \frac{m}{\gamma}\big)$ seed queries; the result extends to $k$-class classifiers at the price of a $k! k^2$ multiplicative overhead.
no code implementations • 8 Jul 2022 • Nicolò Cesa-Bianchi, Tommaso Cesari, Takayuki Osogami, Marco Scarsini, Segev Wasserkrug
We study a repeated game between a supplier and a retailer who want to maximize their respective profits without full knowledge of the problem parameters.
no code implementations • 6 Jun 2022 • Dirk van der Hoeven, Nikita Zhivotovskiy, Nicolò Cesa-Bianchi
We prove that a variant of EWA either achieves a negative regret (i. e., the algorithm outperforms the best expert), or guarantees a $O(\log K)$ bound on both variance and regret.
no code implementations • 1 Jun 2022 • Chloé Rouyer, Dirk van der Hoeven, Nicolò Cesa-Bianchi, Yevgeny Seldin
The algorithm combines ideas from the EXP3++ algorithm for stochastic and adversarial bandits and the EXP3. G algorithm for feedback graphs with a novel exploration scheme.
no code implementations • 31 May 2022 • Pierre Laforgue, Andrea Della Vecchia, Nicolò Cesa-Bianchi, Lorenzo Rosasco
We introduce and analyze AdaTask, a multitask online learning algorithm that adapts to the unknown structure of the tasks.
no code implementations • 6 Dec 2021 • Nicolò Cesa-Bianchi, Tommaso Cesari, Roberto Colomboni, Claudio Gentile, Yishay Mansour
We investigate a nonstochastic bandit setting in which the loss of an action is not immediately charged to the player, but rather spread over the subsequent rounds in an adversarial way.
no code implementations • 2 Nov 2021 • Dirk van der Hoeven, Nicolò Cesa-Bianchi
We study nonstochastic bandits and experts in a delayed setting where delays depend on both time and arms.
1 code implementation • 22 Oct 2021 • Pierre Laforgue, Giulia Clerici, Nicolò Cesa-Bianchi, Ran Gilad-Bachrach
Motivated by the fact that humans like some level of unpredictability or novelty, and might therefore get quickly bored when interacting with a stationary policy, we introduce a novel non-stationary bandit problem, where the expected reward of an arm is fully determined by the time elapsed since the arm last took part in a switch of actions.
no code implementations • 8 Sep 2021 • Nicolò Cesa-Bianchi, Tommaso Cesari, Roberto Colomboni, Federico Fusco, Stefano Leonardi
In this paper, we cast the bilateral trade problem in a regret minimization framework over $T$ rounds of seller/buyer interactions, with no prior knowledge on their private valuations.
no code implementations • 9 Jun 2021 • Nicolò Cesa-Bianchi, Tommaso R. Cesari, Riccardo Della Vecchia
We study the interplay between feedback and communication in a cooperative online learning setting where a network of agents solves a task in which the learners' feedback is determined by an arbitrary graph.
no code implementations • NeurIPS 2021 • Marco Bressan, Nicolò Cesa-Bianchi, Silvio Lattanzi, Andrea Paudice
We study an active cluster recovery problem where, given a set of $n$ points and an oracle answering queries like "are these two points in the same cluster?
no code implementations • NeurIPS 2021 • Dirk van der Hoeven, Federico Fusco, Nicolò Cesa-Bianchi
We study the problem of online multiclass classification in a setting where the learner's feedback is determined by an arbitrary directed graph.
no code implementations • NeurIPS 2021 • Nicolò Cesa-Bianchi, Pierre Laforgue, Andrea Paudice, Massimiliano Pontil
We introduce and analyze MT-OMD, a multitask generalization of Online Mirror Descent (OMD) which operates by sharing updates between tasks.
no code implementations • 23 Feb 2021 • Maximilian Mordig, Riccardo Della Vecchia, Nicolò Cesa-Bianchi, Bernhard Schölkopf
Our setting is motivated by a PhD market of students, advisors, and co-advisors, and can be generalized to supply chain networks viewed as $n$-sided markets.
Computer Science and Game Theory Theoretical Economics Combinatorics
no code implementations • 19 Feb 2021 • Chloé Rouyer, Yevgeny Seldin, Nicolò Cesa-Bianchi
In the stochastically constrained adversarial regime, which includes the stochastic regime as a special case, it achieves a regret bound of $O\left(\big((\lambda K)^{2/3} T^{1/3} + \ln T\big)\sum_{i \neq i^*} \Delta_i^{-1}\right)$, where $\Delta_i$ are the suboptimality gaps and $i^*$ is a unique optimal arm.
no code implementations • 16 Feb 2021 • Nicolò Cesa-Bianchi, Tommaso Cesari, Roberto Colomboni, Federico Fusco, Stefano Leonardi
Despite the simplicity of this problem, a classical result by Myerson and Satterthwaite (1983) affirms the impossibility of designing a mechanism which is simultaneously efficient, incentive compatible, individually rational, and budget balanced.
no code implementations • 31 Jan 2021 • Marco Bressan, Nicolò Cesa-Bianchi, Silvio Lattanzi, Andrea Paudice
Previous results show that clusters in Euclidean spaces that are convex and separated with a margin can be reconstructed exactly using only $O(\log n)$ same-cluster queries, where $n$ is the number of input points.
no code implementations • NeurIPS 2020 • Marco Bressan, Nicolò Cesa-Bianchi, Silvio Lattanzi, Andrea Paudice
Given a finite set of input points, and an oracle revealing whether any two points lie in the same cluster, our goal is to recover all clusters exactly using as few queries as possible.
no code implementations • NeurIPS 2020 • Ilja Kuzborskij, Nicolò Cesa-Bianchi
When competing against "simple" locality profiles, our technique delivers regret bounds that are significantly better than those proven using the previous approach.
no code implementations • 7 Oct 2019 • Leonardo Cella, Nicolò Cesa-Bianchi
Motivated by recommendation problems in music streaming platforms, we propose a nonstationary stochastic bandit model in which the expected reward of an arm depends on the number of rounds that have passed since the arm was last pulled.
no code implementations • NeurIPS 2019 • Tobias Sommer Thune, Nicolò Cesa-Bianchi, Yevgeny Seldin
We then introduce a new algorithm that lifts the requirement of bounded delays by using a wrapper that skips rounds with excessively large delays.
no code implementations • NeurIPS 2021 • Nicolò Cesa-Bianchi, Tommaso Cesari, Yishay Mansour, Vianney Perchet
We introduce a novel theoretical framework for Return On Investment (ROI) maximization in repeated decision-making.
1 code implementation • NeurIPS 2019 • Marco Bressan, Nicolò Cesa-Bianchi, Andrea Paudice, Fabio Vitale
In this work we investigate correlation clustering as an active learning problem: each similarity score can be learned by making a query, and the goal is to minimise both the disagreements and the total number of queries.
no code implementations • 5 Feb 2019 • Ilja Kuzborskij, Nicolò Cesa-Bianchi, Csaba Szepesvári
This is a well-established notion of effective dimension appearing in several previous works, including the analyses of SGD and ridge regression, but ours is the first work that brings this dimension to the analysis of learning using Gibbs densities.
no code implementations • 23 Jan 2019 • Nicolò Cesa-Bianchi, Tommaso R. Cesari, Claire Monteleoni
However, when agents can choose to ignore some of their neighbors based on the knowledge of the network structure, we prove a $O(\sqrt{\overline{\chi} T})$ sublinear regret bound, where $\overline{\chi} \ge \alpha$ is the clique-covering number of the network.
no code implementations • 28 Sep 2018 • Ilja Kuzborskij, Leonardo Cella, Nicolò Cesa-Bianchi
More precisely, we show that a sketch of size $m$ allows a $\mathcal{O}(md)$ update time for both algorithms, as opposed to $\Omega(d^2)$ required by their non-sketched versions in general (where $d$ is the dimension of context vectors).
no code implementations • 9 Jul 2018 • Nicolò Cesa-Bianchi, Tommaso Cesari, Vianney Perchet
When $K=2$ in the distribution-dependent case, the hardness of our setting reduces to that of a stochastic $2$-armed bandit: we prove that an upper bound of order $(\log T)/\Delta$ (up to $\log\log$ factors) on the regret can be achieved with no information on the demand curve.
no code implementations • 18 May 2018 • Marco Frasca, Nicolò Cesa-Bianchi
Motivated by applications in protein function prediction, we consider a challenging supervised classification setting in which positive labels are scarce and there are no explicit negative labels.
no code implementations • NeurIPS 2017 • Nicolò Cesa-Bianchi, Claudio Gentile, Gábor Lugosi, Gergely Neu
Boltzmann exploration is a classic strategy for sequential decision-making under uncertainty, and is one of the most standard tools in Reinforcement Learning (RL).
no code implementations • NeurIPS 2017 • Ilja Kuzborskij, Nicolò Cesa-Bianchi
We study algorithms for online nonparametric regression that learn the directions along which the regression function is smoother.
no code implementations • 15 May 2017 • Nicolò Cesa-Bianchi, Ohad Shamir
We study how the regret guarantees of nonstochastic multi-armed bandits can be improved, if the effective range of the losses in each round is small (e. g. the maximal difference between two losses in a given round).
no code implementations • 27 Feb 2017 • Nicolò Cesa-Bianchi, Pierre Gaillard, Claudio Gentile, Sébastien Gerchinovitz
We investigate contextual online learning with nonparametric (Lipschitz) comparison classes under different assumptions on losses and feedback information.
no code implementations • 1 Jun 2016 • Géraud Le Falher, Nicolò Cesa-Bianchi, Claudio Gentile, Fabio Vitale
In the problem of edge sign prediction, we are given a directed graph (representing a social network), and our task is to predict the binary labels of the edges (i. e., the positive or negative nature of the social relationships).
no code implementations • 11 Apr 2016 • Rocco De Rosa, Ilaria Gori, Fabio Cuzzolin, Barbara Caputo, Nicolò Cesa-Bianchi
Recognising human activities from streaming videos poses unique challenges to learning algorithms: predictive models need to be scalable, incrementally trainable, and must remain bounded in size even when the data stream is arbitrarily long.
no code implementations • 20 Aug 2015 • Rocco De Rosa, Francesco Orabona, Nicolò Cesa-Bianchi
Stream mining poses unique challenges to machine learning: predictive models are required to be scalable, incrementally trainable, must remain bounded in size (even when the data stream is arbitrarily long), and be nonparametric in order to achieve high accuracy even in complex and dynamic environments.
no code implementations • 26 Feb 2015 • Noga Alon, Nicolò Cesa-Bianchi, Ofer Dekel, Tomer Koren
We study a general class of online learning problems where the feedback is specified by a graph.
no code implementations • 5 Nov 2014 • Nicolò Cesa-Bianchi, Yishay Mansour, Ohad Shamir
In this paper, we study lower bounds on the error attainable by such methods as a function of the number of entries observed in the kernel matrix or the rank of an approximate kernel matrix.
no code implementations • 30 Sep 2014 • Noga Alon, Nicolò Cesa-Bianchi, Claudio Gentile, Shie Mannor, Yishay Mansour, Ohad Shamir
This naturally models several situations where the losses of different actions are related, and knowing the loss of one action provides information on the loss of other actions.
no code implementations • NeurIPS 2013 • Nicolò Cesa-Bianchi, Ofer Dekel, Ohad Shamir
In particular, we show that with switching costs, the attainable rate with bandit feedback is $T^{2/3}$.
no code implementations • NeurIPS 2013 • Noga Alon, Nicolò Cesa-Bianchi, Claudio Gentile, Yishay Mansour
We consider the partial observability model for multi-armed bandits, introduced by Mannor and Shamir.
no code implementations • NeurIPS 2013 • Nicolò Cesa-Bianchi, Claudio Gentile, Giovanni Zappella
Multi-armed bandit problems are receiving a great deal of attention because they adequately formalize the exploration-exploitation trade-offs arising in several industrially relevant applications, such as online advertisement and, more generally, recommendation systems.
no code implementations • 10 Apr 2013 • Francesco Orabona, Koby Crammer, Nicolò Cesa-Bianchi
A unifying perspective on the design and the analysis of online algorithms is provided by online mirror descent, a general prediction strategy from which most first-order algorithms can be obtained as special cases.
no code implementations • NeurIPS 2012 • Nicolò Cesa-Bianchi, Pierre Gaillard, Gabor Lugosi, Gilles Stoltz
Mirror descent with an entropic regularizer is known to achieve shifting regret bounds that are logarithmic in the dimension.
no code implementations • NeurIPS 2012 • Nicolò Cesa-Bianchi, Claudio Gentile, Fabio Vitale, Giovanni Zappella
We provide a theoretical analysis within this model, showing that we can achieve an optimal (to whithin a constant factor) number of mistakes on any graph $G = (V, E)$ such that $|E|$ is at least order of $|V|^{3/2}$ by querying at most order of $|V|^{3/2}$ edge labels.
no code implementations • 25 Apr 2012 • Sébastien Bubeck, Nicolò Cesa-Bianchi
Multi-armed bandit problems are the most basic examples of sequential decision problems with an exploration-exploitation trade-off.
no code implementations • NeurIPS 2011 • Nicolò Cesa-Bianchi, Ohad Shamir
Most online algorithms used in machine learning today are based on variants of mirror descent or follow-the-leader.
no code implementations • NeurIPS 2011 • Fabio Vitale, Nicolò Cesa-Bianchi, Claudio Gentile, Giovanni Zappella
Although it is known how to predict the nodes of an unweighted tree in a nearly optimal way, in the weighted case a fully satisfactory algorithm is not available yet.
no code implementations • 13 Jun 2011 • Nicolò Cesa-Bianchi, Ohad Shamir
Most traditional online learning algorithms are based on variants of mirror descent or follow-the-leader.
no code implementations • NeurIPS 2008 • Giovanni Cavallanti, Nicolò Cesa-Bianchi, Claudio Gentile
Using the so-called Tsybakov low noise condition to parametrize the instance distribution, we show bounds on the convergence rate to the Bayes risk of both the fully supervised and the selective sampling versions of the basic algorithm.