Search Results for author: Alessandro Rudi

Found 54 papers, 16 papers with code

Non-Convex Optimization with Certificates and Fast Rates Through Kernel Sums of Squares

no code implementations11 Apr 2022 Blake Woodworth, Francis Bach, Alessandro Rudi

We consider potentially non-convex optimization problems, for which optimal rates of approximation depend on the dimension of the parameter space and the smoothness of the function to be optimized.

On the Benefits of Large Learning Rates for Kernel Methods

no code implementations28 Feb 2022 Gaspard Beugnot, Julien Mairal, Alessandro Rudi

This paper studies an intriguing phenomenon related to the good generalization performance of estimators obtained by using large learning rates within gradient descent algorithms.

Measuring dissimilarity with diffeomorphism invariance

1 code implementation11 Feb 2022 Théophile Cantelobre, Carlo Ciliberto, Benjamin Guedj, Alessandro Rudi

Measures of similarity (or dissimilarity) are a key ingredient to many machine learning algorithms.

Nyström Kernel Mean Embeddings

no code implementations31 Jan 2022 Antoine Chatalic, Nicolas Schreuder, Alessandro Rudi, Lorenzo Rosasco

Our main result is an upper bound on the approximation error of this procedure.

Near-optimal estimation of smooth transport maps with kernel sums-of-squares

no code implementations3 Dec 2021 Boris Muzellec, Adrien Vacher, Francis Bach, François-Xavier Vialard, Alessandro Rudi

It was recently shown that under smoothness conditions, the squared Wasserstein distance between two distributions could be efficiently computed with appealing statistical error upper bounds.

Learning PSD-valued functions using kernel sums-of-squares

1 code implementation22 Nov 2021 Boris Muzellec, Francis Bach, Alessandro Rudi

Shape constraints such as positive semi-definiteness (PSD) for matrices or convexity for functions play a central role in many applications in machine learning and sciences, including metric learning, optimal transport, and economics.

Metric Learning

Sampling from Arbitrary Functions via PSD Models

no code implementations20 Oct 2021 Ulysse Marteau-Ferey, Francis Bach, Alessandro Rudi

In many areas of applied statistics and machine learning, generating an arbitrary number of independent and identically distributed (i. i. d.)

A Note on Optimizing Distributions using Kernel Mean Embeddings

1 code implementation18 Jun 2021 Boris Muzellec, Francis Bach, Alessandro Rudi

Kernel mean embeddings are a popular tool that consists in representing probability measures by their infinite-dimensional mean embeddings in a reproducing kernel Hilbert space.

Beyond Tikhonov: Faster Learning with Self-Concordant Losses via Iterative Regularization

no code implementations NeurIPS 2021 Gaspard Beugnot, Julien Mairal, Alessandro Rudi

The theory of spectral filtering is a remarkable tool to understand the statistical properties of learning with kernels.

On the Consistency of Max-Margin Losses

no code implementations31 May 2021 Alex Nowak-Vila, Alessandro Rudi, Francis Bach

The resulting loss is also a generalization of the binary support vector machine and it is consistent under milder conditions on the discrete loss.

Structured Prediction

Beyond Tikhonov: faster learning with self-concordant losses, via iterative regularization

no code implementations NeurIPS 2021 Gaspard Beugnot, Julien Mairal, Alessandro Rudi

The theory of spectral filtering is a remarkable tool to understand the statistical properties of learning with kernels.

Online nonparametric regression with Sobolev kernels

no code implementations6 Feb 2021 Oleksandr Zadorozhnyi, Pierre Gaillard, Sebastien Gerschinovitz, Alessandro Rudi

In this work we investigate the variation of the online kernelized ridge regression algorithm in the setting of $d-$dimensional adversarial nonparametric regression.

Disambiguation of weak supervision with exponential convergence rates

1 code implementation4 Feb 2021 Vivien Cabannes, Francis Bach, Alessandro Rudi

Machine learning approached through supervised learning requires expensive annotation of data.

Fast rates in structured prediction

no code implementations1 Feb 2021 Vivien Cabannes, Alessandro Rudi, Francis Bach

Discrete supervised learning problems such as classification are often tackled by introducing a continuous surrogate problem akin to regression.

Structured Prediction

A Dimension-free Computational Upper-bound for Smooth Optimal Transport Estimation

no code implementations13 Jan 2021 Adrien Vacher, Boris Muzellec, Alessandro Rudi, Francis Bach, Francois-Xavier Vialard

It is well-known that plug-in statistical estimation of optimal transport suffers from the curse of dimensionality.

Statistics Theory Optimization and Control Statistics Theory 62G05

Finding Global Minima via Kernel Approximations

no code implementations22 Dec 2020 Alessandro Rudi, Ulysse Marteau-Ferey, Francis Bach

We consider the global minimization of smooth functions based solely on function evaluations.

Overcoming the curse of dimensionality with Laplacian regularization in semi-supervised learning

1 code implementation NeurIPS 2021 Vivien Cabannes, Loucas Pillaud-Vivien, Francis Bach, Alessandro Rudi

As annotations of data can be scarce in large-scale practical problems, leveraging unlabelled examples is one of the most important aspects of machine learning.

Learning Output Embeddings in Structured Prediction

no code implementations29 Jul 2020 Luc Brogat-Motte, Alessandro Rudi, Céline Brouard, Juho Rousu, Florence d'Alché-Buc

A powerful and flexible approach to structured prediction consists in embedding the structured objects to be predicted into a feature space of possibly infinite dimension by means of output kernels, and then, solving a regression problem in this output space.

Structured Prediction

Non-parametric Models for Non-negative Functions

no code implementations NeurIPS 2020 Ulysse Marteau-Ferey, Francis Bach, Alessandro Rudi

The paper is complemented by an experimental evaluation of the model showing its effectiveness in terms of formulation, algorithmic derivation and practical results on the problems of density estimation, regression with heteroscedastic errors, and multiple quantile regression.

Density Estimation

Consistent Structured Prediction with Max-Min Margin Markov Networks

1 code implementation ICML 2020 Alex Nowak-Vila, Francis Bach, Alessandro Rudi

Max-margin methods for binary classification such as the support vector machine (SVM) have been extended to the structured prediction setting under the name of max-margin Markov networks ($M^3N$), or more generally structural SVMs.

Generalization Bounds Multi-class Classification +1

Kernel methods through the roof: handling billions of points efficiently

1 code implementation NeurIPS 2020 Giacomo Meanti, Luigi Carratino, Lorenzo Rosasco, Alessandro Rudi

Kernel methods provide an elegant and principled approach to nonparametric learning, but so far could hardly be used in large scale problems, since na\"ive implementations scale poorly with data size.

Interpolation and Learning with Scale Dependent Kernels

no code implementations17 Jun 2020 Nicolò Pagliana, Alessandro Rudi, Ernesto De Vito, Lorenzo Rosasco

We study the learning properties of nonparametric ridge-less least squares.

Structured and Localized Image Restoration

no code implementations16 Jun 2020 Thomas Eboli, Alex Nowak-Vila, Jian Sun, Francis Bach, Jean Ponce, Alessandro Rudi

We present a novel approach to image restoration that leverages ideas from localized structured prediction and non-linear multi-task learning.

Image Restoration Multi-Task Learning +1

Efficient improper learning for online logistic regression

no code implementations18 Mar 2020 Rémi Jézéquel, Pierre Gaillard, Alessandro Rudi

We consider the setting of online logistic regression and consider the regret with respect to the 2-ball of radius B.

Statistical Limits of Supervised Quantum Learning

no code implementations28 Jan 2020 Carlo Ciliberto, Andrea Rocchetto, Alessandro Rudi, Leonard Wossnig

Within the framework of statistical learning theory it is possible to bound the minimum number of samples required by a learner to reach a target accuracy.

Learning Theory

Gain with no Pain: Efficient Kernel-PCA by Nyström Sampling

no code implementations11 Jul 2019 Nicholas Sterge, Bharath Sriperumbudur, Lorenzo Rosasco, Alessandro Rudi

In this paper, we propose and study a Nystr\"om based approach to efficient large scale kernel principal component analysis (PCA).

Globally Convergent Newton Methods for Ill-conditioned Generalized Self-concordant Losses

1 code implementation NeurIPS 2019 Ulysse Marteau-Ferey, Francis Bach, Alessandro Rudi

In this paper, we study large-scale convex optimization algorithms based on the Newton method applied to regularized generalized self-concordant losses, which include logistic regression and softmax regression.

Generalization Bounds

Efficient online learning with kernels for adversarial large scale problems

1 code implementation NeurIPS 2019 Rémi Jézéquel, Pierre Gaillard, Alessandro Rudi

For $d$-dimensional inputs, we provide a (close to) optimal regret of order $O((\log n)^{d+1})$ with per-round time complexity and space complexity $O((\log n)^{2d})$.

online learning

Affine Invariant Covariance Estimation for Heavy-Tailed Distributions

no code implementations8 Feb 2019 Dmitrii Ostrovskii, Alessandro Rudi

Denoting $\text{cond}(\mathbf{S})$ the condition number of $\mathbf{S}$, the computational cost of the novel estimator is $O(d^2 n + d^3\log(\text{cond}(\mathbf{S})))$, which is comparable to the cost of the sample covariance estimator in the statistically interesing regime $n \ge d$.

Beyond Least-Squares: Fast Rates for Regularized Empirical Risk Minimization through Self-Concordance

no code implementations8 Feb 2019 Ulysse Marteau-Ferey, Dmitrii Ostrovskii, Francis Bach, Alessandro Rudi

We consider learning methods based on the regularization of a convex empirical risk by a squared Hilbertian norm, a setting that includes linear predictors and non-linear predictors through positive-definite kernels.

A General Theory for Structured Prediction with Smooth Convex Surrogates

no code implementations5 Feb 2019 Alex Nowak-Vila, Francis Bach, Alessandro Rudi

In this work we provide a theoretical framework for structured prediction that generalizes the existing theory of surrogate methods for binary and multiclass classification based on estimating conditional probabilities with smooth convex surrogates (e. g. logistic regression).

General Classification Graph Matching +1

Massively scalable Sinkhorn distances via the Nyström method

no code implementations NeurIPS 2019 Jason Altschuler, Francis Bach, Alessandro Rudi, Jonathan Niles-Weed

The Sinkhorn "distance", a variant of the Wasserstein distance with entropic regularization, is an increasingly popular tool in machine learning and statistical inference.

On Fast Leverage Score Sampling and Optimal Learning

1 code implementation NeurIPS 2018 Alessandro Rudi, Daniele Calandriello, Luigi Carratino, Lorenzo Rosasco

Leverage score sampling provides an appealing way to perform approximate computations for large matrices.

Sharp Analysis of Learning with Discrete Losses

no code implementations16 Oct 2018 Alex Nowak-Vila, Francis Bach, Alessandro Rudi

The problem of devising learning strategies for discrete losses (e. g., multilabeling, ranking) is currently addressed with methods and theoretical analyses ad-hoc for each loss.

Learning with SGD and Random Features

no code implementations NeurIPS 2018 Luigi Carratino, Alessandro Rudi, Lorenzo Rosasco

Sketching and stochastic gradient methods are arguably the most common techniques to derive efficient large scale learning algorithms.

Manifold Structured Prediction

no code implementations NeurIPS 2018 Alessandro Rudi, Carlo Ciliberto, Gian Maria Marconi, Lorenzo Rosasco

Structured prediction provides a general framework to deal with supervised problems where the outputs have semantically rich structure.

Structured Prediction

Localized Structured Prediction

no code implementations NeurIPS 2019 Carlo Ciliberto, Francis Bach, Alessandro Rudi

Key to structured prediction is exploiting the problem structure to simplify the learning process.

Learning Theory Structured Prediction

Differential Properties of Sinkhorn Approximation for Learning with Wasserstein Distance

2 code implementations NeurIPS 2018 Giulia Luise, Alessandro Rudi, Massimiliano Pontil, Carlo Ciliberto

Applications of optimal transport have recently gained remarkable attention thanks to the computational advantages of entropic regularization.

Approximating Hamiltonian dynamics with the Nyström method

no code implementations6 Apr 2018 Alessandro Rudi, Leonard Wossnig, Carlo Ciliberto, Andrea Rocchetto, Massimiliano Pontil, Simone Severini

Simulating the time-evolution of quantum mechanical systems is BQP-hard and expected to be one of the foremost applications of quantum computers.

Optimal Rates for Spectral Algorithms with Least-Squares Regression over Hilbert Spaces

no code implementations20 Jan 2018 Junhong Lin, Alessandro Rudi, Lorenzo Rosasco, Volkan Cevher

In this paper, we study regression problems over a separable Hilbert space with the square loss, covering non-parametric regression over a reproducing kernel Hilbert space.

Exponential convergence of testing error for stochastic gradient methods

no code implementations13 Dec 2017 Loucas Pillaud-Vivien, Alessandro Rudi, Francis Bach

We consider binary classification problems with positive definite kernels and square loss, and study the convergence rates of stochastic gradient methods.

Classification General Classification

FALKON: An Optimal Large Scale Kernel Method

4 code implementations NeurIPS 2017 Alessandro Rudi, Luigi Carratino, Lorenzo Rosasco

In this paper, we take a substantial step in scaling up kernel methods, proposing FALKON, a novel algorithm that allows to efficiently process millions of points.

Consistent Multitask Learning with Nonlinear Output Relations

no code implementations NeurIPS 2017 Carlo Ciliberto, Alessandro Rudi, Lorenzo Rosasco, Massimiliano Pontil

However, in practice assuming the tasks to be linearly related might be restrictive, and allowing for nonlinear structures is a challenge.

Structured Prediction

Generalization Properties of Learning with Random Features

1 code implementation NeurIPS 2017 Alessandro Rudi, Lorenzo Rosasco

We study the generalization properties of ridge regression with random features in the statistical learning framework.

NYTRO: When Subsampling Meets Early Stopping

1 code implementation19 Oct 2015 Tomas Angles, Raffaello Camoriano, Alessandro Rudi, Lorenzo Rosasco

Early stopping is a well known approach to reduce the time complexity for performing training and model selection of large scale learning machines.

Model Selection

Less is More: Nyström Computational Regularization

1 code implementation NeurIPS 2015 Alessandro Rudi, Raffaello Camoriano, Lorenzo Rosasco

We study Nystr\"om type subsampling approaches to large scale kernel methods, and prove learning bounds in the statistical learning setting, where random sampling and high probability estimates are considered.

On the Sample Complexity of Subspace Learning

no code implementations NeurIPS 2013 Alessandro Rudi, Guille D. Canas, Lorenzo Rosasco

A large number of algorithms in machine learning, from principal component analysis (PCA), and its non-linear (kernel) extensions, to more recent spectral embedding and support estimation methods, rely on estimating a linear subspace from samples.

Cannot find the paper you are looking for? You can Submit a new open access paper.