no code implementations • 14 Feb 2024 • Ilja Kuzborskij, Kwang-Sung Jun, Yulian Wu, Kyoungseok Jang, Francesco Orabona
In this paper, we consider the problem of proving concentration inequalities to estimate the mean of the sequence.
no code implementations • 12 Feb 2023 • Kyoungseok Jang, Kwang-Sung Jun, Ilja Kuzborskij, Francesco Orabona
We consider the problem of estimating the mean of a sequence of random elements $f(X_1, \theta)$ $, \ldots, $ $f(X_n, \theta)$ where $f$ is a fixed scalar function, $S=(X_1, \ldots, X_n)$ are independent random variables, and $\theta$ is a possibly $S$-dependent parameter.
no code implementations • 28 Dec 2022 • Ilja Kuzborskij, Csaba Szepesvári
We explore the ability of overparameterized shallow ReLU neural networks to learn Lipschitz, nondifferentiable, bounded functions with additive noise when trained by Gradient Descent (GD).
no code implementations • NeurIPS 2021 • Dominic Richards, Ilja Kuzborskij
We revisit on-average algorithmic stability of GD for training overparameterised shallow neural networks and prove new generalisation and excess risk bounds without the NTK or PL assumptions.
no code implementations • NeurIPS 2021 • Ilja Kuzborskij, Csaba Szepesvári, Omar Rivasplata, Amal Rannen-Triki, Razvan Pascanu
Empirically it has been observed that the performance of deep neural networks steadily improves as we increase model size, contradicting the classical view on overfitting and generalization.
no code implementations • 12 Jul 2021 • Ilja Kuzborskij, Csaba Szepesvári
We explore the ability of overparameterized shallow neural networks to learn Lipschitz regression functions with and without label noise when trained by Gradient Descent (GD).
no code implementations • 31 Oct 2020 • Mikhail Konobeev, Ilja Kuzborskij, Csaba Szepesvári
A key problem in the theory of meta-learning is to understand how the task distributions influence transfer risk, the expected error of a meta-learner on a new task drawn from the unknown task distribution.
no code implementations • NeurIPS 2020 • Omar Rivasplata, Ilja Kuzborskij, Csaba Szepesvari, John Shawe-Taylor
Specifically, we present a basic PAC-Bayes inequality for stochastic kernels, from which one may derive extensions of various known PAC-Bayes bounds as well as novel bounds.
1 code implementation • 18 Jun 2020 • Ilja Kuzborskij, Claire Vernade, András György, Csaba Szepesvári
We consider off-policy evaluation in the contextual bandit setting for the purpose of obtaining a robust off-policy selection strategy, where the selection strategy is evaluated based on the value of the chosen policy in a set of proposal (target) policies.
no code implementations • NeurIPS 2020 • Ilja Kuzborskij, Nicolò Cesa-Bianchi
When competing against "simple" locality profiles, our technique delivers regret bounds that are significantly better than those proven using the previous approach.
no code implementations • 4 Sep 2019 • Ilja Kuzborskij, Csaba Szepesvári
We prove semi-empirical concentration inequalities for random variables which are given as possibly nonlinear functions of independent random variables.
no code implementations • 5 Feb 2019 • Ilja Kuzborskij, Nicolò Cesa-Bianchi, Csaba Szepesvári
This is a well-established notion of effective dimension appearing in several previous works, including the analyses of SGD and ridge regression, but ours is the first work that brings this dimension to the analysis of learning using Gibbs densities.
no code implementations • 28 Sep 2018 • Ilja Kuzborskij, Leonardo Cella, Nicolò Cesa-Bianchi
More precisely, we show that a sketch of size $m$ allows a $\mathcal{O}(md)$ update time for both algorithms, as opposed to $\Omega(d^2)$ required by their non-sketched versions in general (where $d$ is the dimension of context vectors).
no code implementations • NeurIPS 2017 • Ilja Kuzborskij, Nicolò Cesa-Bianchi
We study algorithms for online nonparametric regression that learn the directions along which the regression function is smoother.
no code implementations • ICML 2018 • Ilja Kuzborskij, Christoph H. Lampert
We establish a data-dependent notion of algorithmic stability for Stochastic Gradient Descent (SGD), and employ it to develop novel generalization bounds.
no code implementations • CVPR 2016 • Ilja Kuzborskij, Fabio Maria Carlucci, Barbara Caputo
Since Convolutional Neural Networks (CNNs) have become the leading learning paradigm in visual recognition, Naive Bayes Nearest Neighbor (NBNN)-based classifiers have lost momentum in the community.
no code implementations • 12 Nov 2015 • Ilja Kuzborskij, Fabio Maria Carlucci, Barbara Caputo
Since Convolutional Neural Networks (CNNs) have become the leading learning paradigm in visual recognition, Naive Bayes Nearest Neighbour (NBNN)-based classifiers have lost momentum in the community.
no code implementations • 4 Dec 2014 • Ilja Kuzborskij, Francesco Orabona
In this work we consider the learning setting where, in addition to the training set, the learner receives a collection of auxiliary hypotheses originating from other tasks.
no code implementations • 6 Aug 2014 • Ilja Kuzborskij, Francesco Orabona, Barbara Caputo
In this paper we consider the binary transfer learning problem, focusing on how to select and combine sources from a large pool to yield a good performance on a target task.
no code implementations • CVPR 2013 • Ilja Kuzborskij, Francesco Orabona, Barbara Caputo
Since the seminal work of Thrun [17], the learning to learn paradigm has been defined as the ability of an agent to improve its performance at each task with experience, with the number of tasks.