Search Results for author: Francis Bach

Found 160 papers, 47 papers with code

Chain of Log-Concave Markov Chains

no code implementations31 May 2023 Saeed Saremi, Ji Won Park, Francis Bach

Markov chain Monte Carlo (MCMC) is a class of general-purpose algorithms for sampling from unnormalized densities.

On the impact of activation and normalization in obtaining isometric embeddings at initialization

no code implementations28 May 2023 Amir Joudaki, Hadi Daneshmand, Francis Bach

To bridge this gap, we provide a proof that layer normalization, in conjunction with activation layers, biases the Gram matrix of a multilayer perceptron towards isometry at an exponential rate with depth at initialization.

Universal Smoothed Score Functions for Generative Modeling

no code implementations21 Mar 2023 Saeed Saremi, Rupesh Kumar Srivastava, Francis Bach

We consider the problem of generative modeling based on smoothing an unknown density of interest in $\mathbb{R}^d$ using factorial kernels with $M$ independent Gaussian channels with equal noise levels introduced by Saremi and Srivastava (2022).

Variational Principles for Mirror Descent and Mirror Langevin Dynamics

no code implementations16 Mar 2023 Belinda Tzen, Anant Raj, Maxim Raginsky, Francis Bach

Mirror descent, introduced by Nemirovski and Yudin in the 1970s, is a primal-dual convex optimization method that can be tailored to the geometry of the optimization problem at hand through the choice of a strongly convex potential function.

Convergence Rates for Non-Log-Concave Sampling and Log-Partition Estimation

1 code implementation6 Mar 2023 David Holzmüller, Francis Bach

Specifically, for $m$-times differentiable functions in $d$ dimensions, the optimal rate for algorithms with $n$ function evaluations is known to be $O(n^{-m/d})$, where the constant can potentially depend on $m, d$ and the function to be optimized.

High-dimensional analysis of double descent for linear regression with random projections

no code implementations2 Mar 2023 Francis Bach

We consider linear regression problems with a varying number of random projections, where we provably exhibit a double descent curve for a fixed prediction problem, with a high-dimensional analysis based on random matrix theory.


Kernelized Diffusion maps

1 code implementation13 Feb 2023 Loucas Pillaud-Vivien, Francis Bach

Spectral clustering and diffusion maps are celebrated dimensionality reduction algorithms built on eigen-elements related to the diffusive structure of the data.

Dimensionality Reduction

Two Losses Are Better Than One: Faster Optimization Using a Cheaper Proxy

no code implementations7 Feb 2023 Blake Woodworth, Konstantin Mishchenko, Francis Bach

We present an algorithm for minimizing an objective with hard-to-compute gradients by using a related, easier-to-access function as a proxy.

On the relationship between multivariate splines and infinitely-wide neural networks

no code implementations7 Feb 2023 Francis Bach

We consider multivariate splines and show that they have a random feature expansion as infinitely wide neural networks with one-hidden layer and a homogeneous activation function which is the power of the rectified linear unit.

Regression as Classification: Influence of Task Formulation on Neural Network Features

1 code implementation10 Nov 2022 Lawrence Stewart, Francis Bach, Quentin Berthet, Jean-Philippe Vert

Neural networks can be trained to solve regression problems by using gradient-based methods to minimize the square loss.


Sum-of-Squares Relaxations for Information Theory and Variational Inference

no code implementations27 Jun 2022 Francis Bach

In order to achieve this, we derive a sequence of convex relaxations for computing these divergences from non-centered covariance matrices associated with a given feature vector: starting from the typically non-tractable optimal lower-bound, we consider an additional relaxation based on "sums-of-squares", which is is now computable in polynomial time as a semidefinite program, as well as further computationally more efficient relaxations based on spectral information divergences from quantum information theory.

Variational Inference

Asynchronous SGD Beats Minibatch SGD Under Arbitrary Delays

1 code implementation15 Jun 2022 Konstantin Mishchenko, Francis Bach, Mathieu Even, Blake Woodworth

The existing analysis of asynchronous stochastic gradient descent (SGD) degrades dramatically when any delay is large, giving the impression that performance depends primarily on the delay.

Explicit Regularization in Overparametrized Models via Noise Injection

1 code implementation9 Jun 2022 Antonio Orvieto, Anant Raj, Hans Kersting, Francis Bach

Injecting noise within gradient descent has several desirable features, such as smoothing and regularizing properties.

Variational inference via Wasserstein gradient flows

1 code implementation31 May 2022 Marc Lambert, Sinho Chewi, Francis Bach, Silvère Bonnabel, Philippe Rigollet

Along with Markov chain Monte Carlo (MCMC) methods, variational inference (VI) has emerged as a central computational approach to large-scale Bayesian inference.

Bayesian Inference Variational Inference

Polynomial-time Sparse Measure Recovery: From Mean Field Theory to Algorithm Design

1 code implementation16 Apr 2022 Hadi Daneshmand, Francis Bach

Mean field theory has provided theoretical insights into various algorithms by letting the problem size tend to infinity.

Super-Resolution Tensor Decomposition

Non-Convex Optimization with Certificates and Fast Rates Through Kernel Sums of Squares

no code implementations11 Apr 2022 Blake Woodworth, Francis Bach, Alessandro Rudi

We consider potentially non-convex optimization problems, for which optimal rates of approximation depend on the dimension of the parameter space and the smoothness of the function to be optimized.

Information Theory with Kernel Methods

no code implementations17 Feb 2022 Francis Bach

We consider the analysis of probability distributions through their associated covariance operators from reproducing kernel Hilbert spaces.

Variational Inference

On a Variance-Reduction Correction for the Temporal-Difference Learning in the Stochastic Continuous Setting

no code implementations16 Feb 2022 Ziad Kobeissi, Francis Bach

We consider the problem of policy evaluation for continuous-time processes using the temporal-difference learning algorithm.

Stochastic Optimization

Anticorrelated Noise Injection for Improved Generalization

no code implementations6 Feb 2022 Antonio Orvieto, Hans Kersting, Frank Proske, Francis Bach, Aurelien Lucchi

Injecting artificial noise into gradient descent (GD) is commonly employed to improve the performance of machine learning models.

BIG-bench Machine Learning

Differential Privacy Guarantees for Stochastic Gradient Langevin Dynamics

no code implementations28 Jan 2022 Théo Ryffel, Francis Bach, David Pointcheval

We analyse the privacy leakage of noisy stochastic gradient descent by modeling R\'enyi divergence dynamics with Langevin diffusions.

Near-optimal estimation of smooth transport maps with kernel sums-of-squares

no code implementations3 Dec 2021 Boris Muzellec, Adrien Vacher, Francis Bach, François-Xavier Vialard, Alessandro Rudi

It was recently shown that under smoothness conditions, the squared Wasserstein distance between two distributions could be efficiently computed with appealing statistical error upper bounds.

Learning PSD-valued functions using kernel sums-of-squares

1 code implementation22 Nov 2021 Boris Muzellec, Francis Bach, Alessandro Rudi

Shape constraints such as positive semi-definiteness (PSD) for matrices or convexity for functions play a central role in many applications in machine learning and sciences, including metric learning, optimal transport, and economics.

Metric Learning regression

Convergence of Uncertainty Sampling for Active Learning

no code implementations29 Oct 2021 Anant Raj, Francis Bach

Uncertainty sampling in active learning is heavily used in practice to reduce the annotation cost.

Active Learning Binary Classification +2

Sampling from Arbitrary Functions via PSD Models

no code implementations20 Oct 2021 Ulysse Marteau-Ferey, Francis Bach, Alessandro Rudi

In many areas of applied statistics and machine learning, generating an arbitrary number of independent and identically distributed (i. i. d.)

Gradient Descent on Infinitely Wide Neural Networks: Global Convergence and Generalization

1 code implementation15 Oct 2021 Francis Bach, Lenaïc Chizat

Many supervised machine learning methods are naturally cast as optimization problems.

Screening for a Reweighted Penalized Conditional Gradient Method

no code implementations2 Jul 2021 Yifan Sun, Francis Bach

We couple this with a screening rule which is safe in the convex case, converging to the true support at a rate $O(1/(\delta^2))$ where $\delta \geq 0$ measures how close the problem is to degeneracy.

A Note on Optimizing Distributions using Kernel Mean Embeddings

1 code implementation18 Jun 2021 Boris Muzellec, Francis Bach, Alessandro Rudi

Kernel mean embeddings are a popular tool that consists in representing probability measures by their infinite-dimensional mean embeddings in a reproducing kernel Hilbert space.

A Continuized View on Nesterov Acceleration for Stochastic Gradient Descent and Randomized Gossip

1 code implementation10 Jun 2021 Mathieu Even, Raphaël Berthier, Francis Bach, Nicolas Flammarion, Pierre Gaillard, Hadrien Hendrikx, Laurent Massoulié, Adrien Taylor

We introduce the continuized Nesterov acceleration, a close variant of Nesterov acceleration whose variables are indexed by a continuous time parameter.

Batch Normalization Orthogonalizes Representations in Deep Random Networks

1 code implementation NeurIPS 2021 Hadi Daneshmand, Amir Joudaki, Francis Bach

This paper underlines a subtle property of batch-normalization (BN): Successive batch normalizations with random linear transformations make hidden representations increasingly orthogonal across layers of a deep neural network.

On the Consistency of Max-Margin Losses

no code implementations31 May 2021 Alex Nowak-Vila, Alessandro Rudi, Francis Bach

The resulting loss is also a generalization of the binary support vector machine and it is consistent under milder conditions on the discrete loss.

Structured Prediction

A Continuized View on Nesterov Acceleration

no code implementations11 Feb 2021 Raphaël Berthier, Francis Bach, Nicolas Flammarion, Pierre Gaillard, Adrien Taylor

We introduce the "continuized" Nesterov acceleration, a close variant of Nesterov acceleration whose variables are indexed by a continuous time parameter.

Distributed, Parallel, and Cluster Computing Optimization and Control

Fast rates in structured prediction

no code implementations1 Feb 2021 Vivien Cabannes, Alessandro Rudi, Francis Bach

Discrete supervised learning problems such as classification are often tackled by introducing a continuous surrogate problem akin to regression.

Binary Classification regression +1

A Dimension-free Computational Upper-bound for Smooth Optimal Transport Estimation

no code implementations13 Jan 2021 Adrien Vacher, Boris Muzellec, Alessandro Rudi, Francis Bach, Francois-Xavier Vialard

It is well-known that plug-in statistical estimation of optimal transport suffers from the curse of dimensionality.

Statistics Theory Optimization and Control Statistics Theory 62G05

Finding Global Minima via Kernel Approximations

no code implementations22 Dec 2020 Alessandro Rudi, Ulysse Marteau-Ferey, Francis Bach

We consider the global minimization of smooth functions based solely on function evaluations.

Learning with Differentiable Pertubed Optimizers

no code implementations NeurIPS 2020 Quentin Berthet, Mathieu Blondel, Olivier Teboul, Marco Cuturi, Jean-Philippe Vert, Francis Bach

Machine learning pipelines often rely on optimizers procedures to make discrete decisions (e. g., sorting, picking closest neighbors, or shortest paths).

Structured Prediction

Batch normalization provably avoids ranks collapse for randomly initialised deep networks

no code implementations NeurIPS 2020 Hadi Daneshmand, Jonas Kohler, Francis Bach, Thomas Hofmann, Aurelien Lucchi

Randomly initialized neural networks are known to become harder to train with increasing depth, unless architectural enhancements like residual connections and batch normalization are used.

Variance-Reduced Methods for Machine Learning

no code implementations2 Oct 2020 Robert M. Gower, Mark Schmidt, Francis Bach, Peter Richtarik

Stochastic optimization lies at the heart of machine learning, and its cornerstone is stochastic gradient descent (SGD), a method introduced over 60 years ago.

BIG-bench Machine Learning Stochastic Optimization

Deep Equals Shallow for ReLU Networks in Kernel Regimes

1 code implementation ICLR 2021 Alberto Bietti, Francis Bach

Deep networks are often considered to be more expressive than shallow ones in terms of approximation.

Overcoming the curse of dimensionality with Laplacian regularization in semi-supervised learning

2 code implementations NeurIPS 2021 Vivien Cabannes, Loucas Pillaud-Vivien, Francis Bach, Alessandro Rudi

As annotations of data can be scarce in large-scale practical problems, leveraging unlabelled examples is one of the most important aspects of machine learning.

Non-parametric Models for Non-negative Functions

no code implementations NeurIPS 2020 Ulysse Marteau-Ferey, Francis Bach, Alessandro Rudi

The paper is complemented by an experimental evaluation of the model showing its effectiveness in terms of formulation, algorithmic derivation and practical results on the problems of density estimation, regression with heteroscedastic errors, and multiple quantile regression.

Density Estimation regression

Consistent Structured Prediction with Max-Min Margin Markov Networks

1 code implementation ICML 2020 Alex Nowak-Vila, Francis Bach, Alessandro Rudi

Max-margin methods for binary classification such as the support vector machine (SVM) have been extended to the structured prediction setting under the name of max-margin Markov networks ($M^3N$), or more generally structural SVMs.

Binary Classification Generalization Bounds +2

Structured and Localized Image Restoration

no code implementations16 Jun 2020 Thomas Eboli, Alex Nowak-Vila, Jian Sun, Francis Bach, Jean Ponce, Alessandro Rudi

We present a novel approach to image restoration that leverages ideas from localized structured prediction and non-linear multi-task learning.

Image Restoration Multi-Task Learning +1

Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model

no code implementations NeurIPS 2020 Raphaël Berthier, Francis Bach, Pierre Gaillard

In the context of statistical supervised learning, the noiseless linear model assumes that there exists a deterministic linear relation $Y = \langle \theta_*, X \rangle$ between the random output $Y$ and the random feature vector $\Phi(U)$, a potentially non-linear transformation of the inputs $U$.

Principled Analyses and Design of First-Order Methods with Inexact Proximal Operators

1 code implementation10 Jun 2020 Mathieu Barré, Adrien Taylor, Francis Bach

In this work, we survey notions of inaccuracies that can be used when solving those intermediary optimization problems.

Optimization and Control Numerical Analysis Numerical Analysis

ARIANN: Low-Interaction Privacy-Preserving Deep Learning via Function Secret Sharing

1 code implementation8 Jun 2020 Théo Ryffel, Pierre Tholoniat, David Pointcheval, Francis Bach

We evaluate our end-to-end system for private inference between distant servers on standard neural networks such as AlexNet, VGG16 or ResNet18, and for private training on smaller networks like LeNet.

Federated Learning Privacy Preserving +1

Explicit Regularization of Stochastic Gradient Methods through Duality

no code implementations30 Mar 2020 Anant Raj, Francis Bach

For accelerated coordinate descent, we obtain a new algorithm that has better convergence properties than existing stochastic gradient methods in the interpolating regime.

A Simple Convergence Proof of Adam and Adagrad

no code implementations5 Mar 2020 Alexandre Défossez, Léon Bottou, Francis Bach, Nicolas Usunier

We provide a simple proof of convergence covering both the Adam and Adagrad adaptive optimization algorithms when applied to smooth (possibly non-convex) objective functions with bounded gradients.

Batch Normalization Provably Avoids Rank Collapse for Randomly Initialised Deep Networks

no code implementations3 Mar 2020 Hadi Daneshmand, Jonas Kohler, Francis Bach, Thomas Hofmann, Aurelien Lucchi

Randomly initialized neural networks are known to become harder to train with increasing depth, unless architectural enhancements like residual connections and batch normalization are used.

Safe Screening for the Generalized Conditional Gradient Method

no code implementations22 Feb 2020 Yifan Sun, Francis Bach

The conditional gradient method (CGM) has been widely used for fast sparse approximation, having a low per iteration computational cost for structured sparse regularizers.

feature selection

Learning with Differentiable Perturbed Optimizers

2 code implementations20 Feb 2020 Quentin Berthet, Mathieu Blondel, Olivier Teboul, Marco Cuturi, Jean-Philippe Vert, Francis Bach

Machine learning pipelines often rely on optimization procedures to make discrete decisions (e. g., sorting, picking closest neighbors, or shortest paths).

Structured Prediction

Stochastic Optimization for Regularized Wasserstein Estimators

no code implementations ICML 2020 Marin Ballu, Quentin Berthet, Francis Bach

We show that this algorithm can be extended to other tasks, including estimation of Wasserstein barycenters.

Stochastic Optimization

On the Effectiveness of Richardson Extrapolation in Machine Learning

no code implementations7 Feb 2020 Francis Bach

Richardson extrapolation is a classical technique from numerical analysis that can improve the approximation error of an estimation method by combining linearly several estimates obtained from different values of one of its hyperparameters, without the need to know in details the inner structure of the original estimation method.

BIG-bench Machine Learning

Partially Encrypted Deep Learning using Functional Encryption

1 code implementation NeurIPS 2019 Théo Ryffel, David Pointcheval, Francis Bach, Edouard Dufour-Sans, Romain Gay

Machine learning on encrypted data has received a lot of attention thanks to recent breakthroughs in homomorphic encryption and secure multi-party computation.

BIG-bench Machine Learning Privacy Preserving

Music Source Separation in the Waveform Domain

1 code implementation27 Nov 2019 Alexandre Défossez, Nicolas Usunier, Léon Bottou, Francis Bach

Source separation for music is the task of isolating contributions, or stems, from different instruments recorded individually and arranged together to form a song.

Audio Generation Data Augmentation +3

UniXGrad: A Universal, Adaptive Algorithm with Optimal Guarantees for Constrained Optimization

no code implementations NeurIPS 2019 Ali Kavis, Kfir. Y. Levy, Francis Bach, Volkan Cevher

To the best of our knowledge, this is the first adaptive, unified algorithm that achieves the optimal rates in the constrained setting.

Demucs: Deep Extractor for Music Sources with extra unlabeled data remixed

1 code implementation3 Sep 2019 Alexandre Défossez, Nicolas Usunier, Léon Bottou, Francis Bach

We study the problem of source separation for music using deep learning with four known sources: drums, bass, vocals and other accompaniments.

Music Source Separation

Towards closing the gap between the theory and practice of SVRG

1 code implementation NeurIPS 2019 Othmane Sebbouh, Nidham Gazagnadou, Samy Jelassi, Francis Bach, Robert M. Gower

Among the very first variance reduced stochastic methods for solving the empirical risk minimization problem was the SVRG method (Johnson & Zhang 2013).

Globally Convergent Newton Methods for Ill-conditioned Generalized Self-concordant Losses

1 code implementation NeurIPS 2019 Ulysse Marteau-Ferey, Francis Bach, Alessandro Rudi

In this paper, we study large-scale convex optimization algorithms based on the Newton method applied to regularized generalized self-concordant losses, which include logistic regression and softmax regression.

Generalization Bounds regression

Max-Plus Matching Pursuit for Deterministic Markov Decision Processes

no code implementations20 Jun 2019 Francis Bach

We consider deterministic Markov decision processes (MDPs) and apply max-plus algebra tools to approximate the value iteration algorithm by a smaller-dimensional iteration based on a representation on dictionaries of value functions.

Continuous Control

Fast Decomposable Submodular Function Minimization using Constrained Total Variation

1 code implementation NeurIPS 2019 K. S. Sesh Kumar, Francis Bach, Thomas Pock

We consider the problem of minimizing the sum of submodular set functions assuming minimization oracles of each summand function.

Partially Encrypted Machine Learning using Functional Encryption

3 code implementations24 May 2019 Theo Ryffel, Edouard Dufour-Sans, Romain Gay, Francis Bach, David Pointcheval

Machine learning on encrypted data has received a lot of attention thanks to recent breakthroughs in homomorphic encryption and secure multi-party computation.

BIG-bench Machine Learning Privacy Preserving

Implicit Regularization of Discrete Gradient Dynamics in Linear Neural Networks

1 code implementation NeurIPS 2019 Gauthier Gidel, Francis Bach, Simon Lacoste-Julien

When optimizing over-parameterized models, such as deep neural networks, a large set of parameters can achieve zero training error.

Efficient Primal-Dual Algorithms for Large-Scale Multiclass Classification

1 code implementation11 Feb 2019 Dmitry Babichev, Dmitrii Ostrovskii, Francis Bach

We develop efficient algorithms to train $\ell_1$-regularized linear classifiers with large dimensionality $d$ of the feature space, number of classes $k$, and sample size $n$.

Classification General Classification

Beyond Least-Squares: Fast Rates for Regularized Empirical Risk Minimization through Self-Concordance

no code implementations8 Feb 2019 Ulysse Marteau-Ferey, Dmitrii Ostrovskii, Francis Bach, Alessandro Rudi

We consider learning methods based on the regularization of a convex empirical risk by a squared Hilbertian norm, a setting that includes linear predictors and non-linear predictors through positive-definite kernels.


A General Theory for Structured Prediction with Smooth Convex Surrogates

no code implementations5 Feb 2019 Alex Nowak-Vila, Francis Bach, Alessandro Rudi

In this work we provide a theoretical framework for structured prediction that generalizes the existing theory of surrogate methods for binary and multiclass classification based on estimating conditional probabilities with smooth convex surrogates (e. g. logistic regression).

General Classification Graph Matching +2

A Universal Algorithm for Variational Inequalities Adaptive to Smoothness and Noise

no code implementations5 Feb 2019 Francis Bach, Kfir. Y. Levy

We consider variational inequalities coming from monotone operators, a setting that includes convex minimization and convex-concave saddle-point problems.

Stochastic first-order methods: non-asymptotic and computer-aided analyses via potential functions

1 code implementation3 Feb 2019 Adrien Taylor, Francis Bach

We use the approach for analyzing deterministic and stochastic first-order methods under different assumptions on the nature of the stochastic noise.

Asynchronous Accelerated Proximal Stochastic Gradient for Strongly Convex Distributed Finite Sums

no code implementations28 Jan 2019 Hadrien Hendrikx, Francis Bach, Laurent Massoulié

In this work, we study the problem of minimizing the sum of strongly convex functions split over a network of $n$ nodes.

Optimization and Control Distributed, Parallel, and Cluster Computing

Overcomplete Independent Component Analysis via SDP

no code implementations24 Jan 2019 Anastasia Podosinnikova, Amelia Perry, Alexander Wein, Francis Bach, Alexandre d'Aspremont, David Sontag

Moreover, we conjecture that the proposed program recovers a mixing component at the rate k < p^2/4 and prove that a mixing component can be recovered with high probability when k < (2 - epsilon) p log p when the original components are sampled uniformly at random on the hyper sphere.

On Lazy Training in Differentiable Programming

1 code implementation NeurIPS 2019 Lenaic Chizat, Edouard Oyallon, Francis Bach

In a series of recent theoretical works, it was shown that strongly over-parameterized neural networks trained with gradient-based methods could converge exponentially fast to zero training loss, with their parameters hardly varying.

Massively scalable Sinkhorn distances via the Nyström method

no code implementations NeurIPS 2019 Jason Altschuler, Francis Bach, Alessandro Rudi, Jonathan Niles-Weed

The Sinkhorn "distance", a variant of the Wasserstein distance with entropic regularization, is an increasingly popular tool in machine learning and statistical inference.

Marginal Weighted Maximum Log-likelihood for Efficient Learning of Perturb-and-Map models

no code implementations21 Nov 2018 Tatiana Shpakova, Francis Bach, Anton Osokin

We consider the structured-output prediction problem through probabilistic approaches and generalize the "perturb-and-MAP" framework to more challenging weighted Hamming losses, which are crucial in applications.

Image Segmentation Semantic Segmentation

SING: Symbol-to-Instrument Neural Generator

1 code implementation NeurIPS 2018 Alexandre Défossez, Neil Zeghidour, Nicolas Usunier, Léon Bottou, Francis Bach

On the generalization task of synthesizing notes for pairs of pitch and instrument not seen during training, SING produces audio with significantly improved perceptual quality compared to a state-of-the-art autoencoder based on WaveNet as measured by a Mean Opinion Score (MOS), and is about 32 times faster for training and 2, 500 times faster for inference.

Music Generation

Fast and Faster Convergence of SGD for Over-Parameterized Models and an Accelerated Perceptron

no code implementations16 Oct 2018 Sharan Vaswani, Francis Bach, Mark Schmidt

Under this condition, we prove that constant step-size stochastic gradient descent (SGD) with Nesterov acceleration matches the convergence rate of the deterministic accelerated method for both convex and strongly-convex functions.

Sharp Analysis of Learning with Discrete Losses

no code implementations16 Oct 2018 Alex Nowak-Vila, Francis Bach, Alessandro Rudi

The problem of devising learning strategies for discrete losses (e. g., multilabeling, ranking) is currently addressed with methods and theoretical analyses ad-hoc for each loss.

Finite-sample analysis of M-estimators using self-concordance

1 code implementation16 Oct 2018 Dmitrii Ostrovskii, Francis Bach

We demonstrate how self-concordance of the loss allows to characterize the critical sample size sufficient to guarantee a chi-square type in-probability bound for the excess risk.

Accelerated Decentralized Optimization with Local Updates for Smooth and Strongly Convex Objectives

no code implementations5 Oct 2018 Hadrien Hendrikx, Francis Bach, Laurent Massoulié

Applying $ESDACD$ to quadratic local functions leads to an accelerated randomized gossip algorithm of rate $O( \sqrt{\theta_{\rm gossip}/n})$ where $\theta_{\rm gossip}$ is the rate of the standard randomized gossip.

Localized Structured Prediction

no code implementations NeurIPS 2019 Carlo Ciliberto, Francis Bach, Alessandro Rudi

Key to structured prediction is exploiting the problem structure to simplify the learning process.

Learning Theory Structured Prediction

Nonlinear Acceleration of CNNs

1 code implementation1 Jun 2018 Damien Scieur, Edouard Oyallon, Alexandre d'Aspremont, Francis Bach

The Regularized Nonlinear Acceleration (RNA) algorithm is an acceleration method capable of improving the rate of convergence of many optimization schemes such as gradient descend, SAGA or SVRG.

Optimal Algorithms for Non-Smooth Distributed Optimization in Networks

no code implementations NeurIPS 2018 Kevin Scaman, Francis Bach, Sébastien Bubeck, Yin Tat Lee, Laurent Massoulié

Under the global regularity assumption, we provide a simple yet efficient algorithm called distributed randomized smoothing (DRS) based on a local smoothing of the objective function, and show that DRS is within a $d^{1/4}$ multiplicative factor of the optimal convergence rate, where $d$ is the underlying dimension.

Optimization and Control

Stochastic algorithms with descent guarantees for ICA

1 code implementation25 May 2018 Pierre Ablin, Alexandre Gramfort, Jean-François Cardoso, Francis Bach

We derive an online algorithm for the streaming setting, and an incremental algorithm for the finite sum setting, with the following benefits.

Online Regularized Nonlinear Acceleration

no code implementations24 May 2018 Damien Scieur, Edouard Oyallon, Alexandre d'Aspremont, Francis Bach

Regularized nonlinear acceleration (RNA) estimates the minimum of a function by post-processing iterates from an algorithm such as the gradient method.

General Classification

On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport

no code implementations NeurIPS 2018 Lenaic Chizat, Francis Bach

Many tasks in machine learning and signal processing can be solved by minimizing a convex function of a measure.

Accelerated Gossip in Networks of Given Dimension using Jacobi Polynomial Iterations

1 code implementation22 May 2018 Raphaël Berthier, Francis Bach, Pierre Gaillard

We develop a method solving the gossip problem that depends only on the spectral dimension of the network, that is, in the communication network set-up, the dimension of the space in which the agents live.


Relating Leverage Scores and Density using Regularized Christoffel Functions

no code implementations NeurIPS 2018 Edouard Pauwels, Francis Bach, Jean-Philippe Vert

Statistical leverage scores emerged as a fundamental tool for matrix sketching and column sampling with applications to low rank approximation, regression, random feature learning and quadrature.


Constant Step Size Stochastic Gradient Descent for Probabilistic Modeling

no code implementations16 Apr 2018 Dmitry Babichev, Francis Bach

We propose averaging moment parameters instead of natural parameters for constant-step-size stochastic gradient descent.

Averaging Stochastic Gradient Descent on Riemannian Manifolds

no code implementations26 Feb 2018 Nilesh Tripuraneni, Nicolas Flammarion, Francis Bach, Michael. I. Jordan

We consider the minimization of a function defined on a Riemannian manifold $\mathcal{M}$ accessible only through unbiased estimates of its gradients.

Riemannian optimization

Exponential convergence of testing error for stochastic gradient methods

no code implementations13 Dec 2017 Loucas Pillaud-Vivien, Alessandro Rudi, Francis Bach

We consider binary classification problems with positive definite kernels and square loss, and study the convergence rates of stochastic gradient methods.

Binary Classification Classification +1

Nonlinear Acceleration of Stochastic Algorithms

no code implementations NeurIPS 2017 Damien Scieur, Francis Bach, Alexandre d'Aspremont

Here, we study extrapolation methods in a stochastic setting, where the iterates are produced by either a simple or an accelerated stochastic gradient algorithm.

Integration Methods and Optimization Algorithms

no code implementations NeurIPS 2017 Damien Scieur, Vincent Roulet, Francis Bach, Alexandre d'Aspremont

We show that accelerated optimization methods can be seen as particular instances of multi-step integration schemes from numerical analysis, applied to the gradient flow equation.

AdaBatch: Efficient Gradient Aggregation Rules for Sequential and Parallel Stochastic Gradient Methods

no code implementations6 Nov 2017 Alexandre Défossez, Francis Bach

We study a new aggregation operator for gradients coming from a mini-batch for stochastic gradient (SG) methods that allows a significant speed-up in the case of sparse optimization problems.

Tracking the gradients using the Hessian: A new look at variance reducing stochastic methods

1 code implementation20 Oct 2017 Robert M. Gower, Nicolas Le Roux, Francis Bach

Our goal is to improve variance reducing stochastic methods through better control variates.

Combinatorial Penalties: Which structures are preserved by convex relaxations?

no code implementations17 Oct 2017 Marwa El Halabi, Francis Bach, Volkan Cevher

We consider the homogeneous and the non-homogeneous convex relaxations for combinatorial penalty functions defined on support sets.

A Generic Approach for Escaping Saddle points

no code implementations5 Sep 2017 Sashank J. Reddi, Manzil Zaheer, Suvrit Sra, Barnabas Poczos, Francis Bach, Ruslan Salakhutdinov, Alexander J. Smola

A central challenge to using first-order methods for optimizing nonconvex problems is the presence of saddle points.

Second-order methods

Bridging the Gap between Constant Step Size Stochastic Gradient Descent and Markov Chains

no code implementations20 Jul 2017 Aymeric Dieuleveut, Alain Durmus, Francis Bach

We consider the minimization of an objective function given access to unbiased estimates of its gradient through stochastic gradient descent (SGD) with constant step-size.

Kernel Square-Loss Exemplar Machines for Image Retrieval

no code implementations CVPR 2017 Rafael S. Rezende, Joaquin Zepeda, Jean Ponce, Francis Bach, Patrick Perez

Zepeda and Perez have recently demonstrated the promise of the exemplar SVM (ESVM) as a feature encoder for image retrieval.

Image Retrieval Retrieval

On Structured Prediction Theory with Calibrated Convex Surrogate Losses

1 code implementation NeurIPS 2017 Anton Osokin, Francis Bach, Simon Lacoste-Julien

We provide novel theoretical insights on structured prediction in the context of efficient convex surrogate loss minimization with consistency guarantees.

Structured Prediction

Optimal algorithms for smooth and strongly convex distributed optimization in networks

1 code implementation ICML 2017 Kevin Scaman, Francis Bach, Sébastien Bubeck, Yin Tat Lee, Laurent Massoulié

For centralized (i. e. master/slave) algorithms, we show that distributing Nesterov's accelerated gradient descent is optimal and achieves a precision $\varepsilon > 0$ in time $O(\sqrt{\kappa_g}(1+\Delta\tau)\ln(1/\varepsilon))$, where $\kappa_g$ is the condition number of the (global) function to optimize, $\Delta$ is the diameter of the network, and $\tau$ (resp.

Distributed Optimization regression

Stochastic Composite Least-Squares Regression with convergence rate O(1/n)

no code implementations21 Feb 2017 Nicolas Flammarion, Francis Bach

We consider the minimization of composite objective functions composed of the expectation of quadratic functions and an arbitrary convex function.


Learning Determinantal Point Processes in Sublinear Time

no code implementations19 Oct 2016 Christophe Dupuy, Francis Bach

We propose a new class of determinantal point processes (DPPs) which can be manipulated for inference and parameter learning in potentially sublinear time in the number of items.

Document Summarization Point Processes

Robust Discriminative Clustering with Sparse Regularizers

no code implementations29 Aug 2016 Nicolas Flammarion, Balamurugan Palaniappan, Francis Bach

Clustering high-dimensional data often requires some form of dimensionality reduction, where clustered variables are separated from "noise-looking" variables.

Dimensionality Reduction

Parameter Learning for Log-supermodular Distributions

no code implementations NeurIPS 2016 Tatiana Shpakova, Francis Bach

Then, to learn parameters, given that our approximation of the log-partition function is an expectation (over our own randomization), we use a stochastic subgradient technique to maximize a lower-bound on the log-likelihood.

Combinatorial Optimization Image Denoising +2

Stochastic Optimization for Large-scale Optimal Transport

no code implementations NeurIPS 2016 Genevay Aude, Marco Cuturi, Gabriel Peyré, Francis Bach

We instantiate these ideas in three different setups: (i) when comparing a discrete distribution to another, we show that incremental stochastic optimization schemes can beat Sinkhorn's algorithm, the current state-of-the-art finite dimensional OT solver; (ii) when comparing a discrete distribution to a continuous density, a semi-discrete reformulation of the dual program is amenable to averaged stochastic gradient descent, leading to better performance than approximately solving the problem by discretization ; (iii) when dealing with two continuous densities, we propose a stochastic gradient descent over a reproducing kernel Hilbert space (RKHS).

Stochastic Optimization

PAC-Bayesian Theory Meets Bayesian Inference

no code implementations NeurIPS 2016 Pascal Germain, Francis Bach, Alexandre Lacoste, Simon Lacoste-Julien

That is, for the negative log-likelihood loss function, we show that the minimization of PAC-Bayesian generalization risk bounds maximizes the Bayesian marginal likelihood.

Bayesian Inference regression

Highly-Smooth Zero-th Order Online Optimization Vianney Perchet

no code implementations26 May 2016 Francis Bach, Vianney Perchet

The minimization of convex functions which are only available through partial and noisy information is a key methodological problem in many disciplines.

Stochastic Variance Reduction Methods for Saddle-Point Problems

no code implementations NeurIPS 2016 P. Balamurugan, Francis Bach

We consider convex-concave saddle-point problems where the objective functions may be split in many components, and extend recent stochastic variance reduction methods (such as SVRG or SAGA) to provide the first large-scale linearly convergent algorithms for this class of problems which is common in machine learning.

Online but Accurate Inference for Latent Variable Models with Local Gibbs Sampling

no code implementations8 Mar 2016 Christophe Dupuy, Francis Bach

We first propose an unified treatment of online inference for latent variable models from a non-canonical exponential family, and draw explicit links between several previously proposed frequentist or Bayesian methods.

Bayesian Inference Variational Inference

Beyond CCA: Moment Matching for Multi-View Models

no code implementations29 Feb 2016 Anastasia Podosinnikova, Francis Bach, Simon Lacoste-Julien

We introduce three novel semi-parametric extensions of probabilistic canonical correlation analysis with identifiability guarantees.

Harder, Better, Faster, Stronger Convergence Rates for Least-Squares Regression

no code implementations17 Feb 2016 Aymeric Dieuleveut, Nicolas Flammarion, Francis Bach

We consider the optimization of a quadratic objective function whose gradients are only accessible through a stochastic oracle that returns the gradient at any given point plus a zero-mean finite variance random error.


Spectral Norm Regularization of Orthonormal Representations for Graph Transduction

no code implementations NeurIPS 2015 Rakesh Shivanna, Bibaswan K. Chatterjee, Raman Sankaran, Chiranjib Bhattacharyya, Francis Bach

We propose an alternative PAC-based bound, which do not depend on the VC dimension of the underlying function class, but is related to the famous Lov\'{a}sz~$\vartheta$ function.

Submodular Functions: from Discrete to Continous Domains

no code implementations2 Nov 2015 Francis Bach

A key element in many of the algorithms and analyses is the possibility of extending the submodular set-function to a convex function, which opens up tools from convex optimization.

Combinatorial Optimization

Rethinking LDA: moment matching for discrete ICA

no code implementations NeurIPS 2015 Anastasia Podosinnikova, Francis Bach, Simon Lacoste-Julien

We consider moment matching techniques for estimation in Latent Dirichlet Allocation (LDA).

Learning with Clustering Structure

no code implementations16 Jun 2015 Vincent Roulet, Fajwel Fogel, Alexandre d'Aspremont, Francis Bach

We study supervised learning problems using clustering constraints to impose structure on either features or samples, seeking to help both prediction and interpretation.

text-classification Text Classification

Weakly-Supervised Alignment of Video With Text

no code implementations ICCV 2015 Piotr Bojanowski, Rémi Lajugie, Edouard Grave, Francis Bach, Ivan Laptev, Jean Ponce, Cordelia Schmid

Given vectorial features for both video and text, we propose to cast this task as a temporal assignment problem, with an implicit linear mapping between the two feature modalities.

From Averaging to Acceleration, There is Only a Step-size

no code implementations7 Apr 2015 Nicolas Flammarion, Francis Bach

We show that accelerated gradient descent, averaged gradient descent and the heavy-ball method for non-strongly-convex problems may be reformulated as constant parameter second-order difference equation algorithms, where stability of the system is equivalent to convergence at rate O(1/n 2), where n is the number of iterations.

Learning the Structure for Structured Sparsity

no code implementations10 Mar 2015 Nino Shervashidze, Francis Bach

Structured sparsity has recently emerged in statistics, machine learning and signal processing as a promising paradigm for learning in high-dimensional settings.


Convex Optimization for Parallel Energy Minimization

no code implementations5 Mar 2015 K. S. Sesh Kumar, Alvaro Barbero, Stefanie Jegelka, Suvrit Sra, Francis Bach

By exploiting results from convex and submodular theory, we reformulate the quadratic energy minimization problem as a total variation denoising problem, which, when viewed geometrically, enables the use of projection and reflection based convex methods.


On the Equivalence between Kernel Quadrature Rules and Random Feature Expansions

no code implementations24 Feb 2015 Francis Bach

We show that kernel-based quadrature rules for computing integrals can be seen as a special case of random feature expansions for positive definite kernels, for a particular decomposition that always exists for such kernels.

Sequential Kernel Herding: Frank-Wolfe Optimization for Particle Filtering

no code implementations9 Jan 2015 Simon Lacoste-Julien, Fredrik Lindsten, Francis Bach

Recently, the Frank-Wolfe optimization algorithm was suggested as a procedure to obtain adaptive quadrature rules for integrals of functions in a reproducing kernel Hilbert space (RKHS) with a potentially faster rate of convergence than Monte Carlo integration (and "kernel herding" was shown to be a special case of this procedure).

Breaking the Curse of Dimensionality with Convex Neural Networks

no code implementations30 Dec 2014 Francis Bach

Moreover, when using sparsity-inducing norms on the input weights, we show that high-dimensional non-linear variable selection may be achieved, without any strong assumption regarding the data and with a total number of variables potentially exponential in the number of ob-servations.

Variable Selection

Primal-Dual Algorithms for Non-negative Matrix Factorization with the Kullback-Leibler Divergence

2 code implementations4 Dec 2014 Felipe Yanez, Francis Bach

Non-negative matrix factorization (NMF) approximates a given matrix as a product of two non-negative matrices.

Face Recognition Music Source Separation

Constant Step Size Least-Mean-Square: Bias-Variance Trade-offs and Optimal Sampling Distributions

no code implementations29 Nov 2014 Alexandre Défossez, Francis Bach

Our analysis leads to new insights into stochastic approximation algorithms: (a) it gives a tighter bound on the allowed step-size; (b) the generalization error may be divided into a variance term which is decaying as O(1/n), independently of the step-size $\gamma$, and a bias term that decays as O(1/$\gamma$ 2 n 2); (c) when allowing non-uniform sampling, the choice of a good sampling density depends on whether the variance or bias terms dominate.

Sparse Modeling for Image and Vision Processing

no code implementations12 Nov 2014 Julien Mairal, Francis Bach, Jean Ponce

In recent years, a large amount of multi-disciplinary research has been conducted on sparse models and their applications.

Model Selection

On the Consistency of Ordinal Regression Methods

no code implementations11 Aug 2014 Fabian Pedregosa, Francis Bach, Alexandre Gramfort

We will see that, for a family of surrogate loss functions that subsumes support vector ordinal regression and ORBoosting, consistency can be fully characterized by the derivative of a real-valued function at zero, as happens for convex margin-based surrogates in binary classification.

Binary Classification General Classification +1

Sparse and spurious: dictionary learning with noise and outliers

no code implementations19 Jul 2014 Rémi Gribonval, Rodolphe Jenatton, Francis Bach

A popular approach within the signal processing and machine learning communities consists in modelling signals as sparse linear combinations of atoms selected from a learned dictionary.

Dictionary Learning

Weakly Supervised Action Labeling in Videos Under Ordering Constraints

no code implementations4 Jul 2014 Piotr Bojanowski, Rémi Lajugie, Francis Bach, Ivan Laptev, Jean Ponce, Cordelia Schmid, Josef Sivic

We are given a set of video clips, each one annotated with an {\em ordered} list of actions, such as "walk" then "sit" then "answer phone" extracted from, for example, the associated text script.

SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives

5 code implementations NeurIPS 2014 Aaron Defazio, Francis Bach, Simon Lacoste-Julien

In this work we introduce a new optimisation method called SAGA in the spirit of SAG, SDCA, MISO and SVRG, a set of recently proposed incremental gradient algorithms with fast linear convergence rates.

On The Sample Complexity of Sparse Dictionary Learning

no code implementations20 Mar 2014 Matthias Seibert, Martin Kleinsteuber, Rémi Gribonval, Rodolphe Jenatton, Francis Bach

The main goal of this paper is to provide a sample complexity estimate that controls to what extent the empirical average deviates from the cost function.

Dictionary Learning

Sample Complexity of Dictionary Learning and other Matrix Factorizations

no code implementations13 Dec 2013 Rémi Gribonval, Rodolphe Jenatton, Francis Bach, Martin Kleinsteuber, Matthias Seibert

Many modern tools in machine learning and signal processing, such as sparse dictionary learning, principal component analysis (PCA), non-negative matrix factorization (NMF), $K$-means clustering, etc., rely on the factorization of a matrix obtained by concatenating high-dimensional vectors from a training collection.

Dictionary Learning Generalization Bounds

Convex Relaxations for Permutation Problems

no code implementations NeurIPS 2013 Fajwel Fogel, Rodolphe Jenatton, Francis Bach, Alexandre d'Aspremont

Seriation seeks to reconstruct a linear order between variables using unsorted similarity information.

Reflection methods for user-friendly submodular optimization

no code implementations NeurIPS 2013 Stefanie Jegelka, Francis Bach, Suvrit Sra

A key component of our method is a formulation of the discrete submodular minimization problem as a continuous best approximation problem that is solved through a sequence of reflections, and its solution can be easily thresholded to obtain an optimal discrete solution.

Image Segmentation Semantic Segmentation

Convex relaxations of structured matrix factorizations

no code implementations12 Sep 2013 Francis Bach

We consider the factorization of a rectangular matrix $X $ into a positive linear combination of rank-one factors of the form $u v^\top$, where $u$ and $v$ belongs to certain sets $\mathcal{U}$ and $\mathcal{V}$, that may encode specific structures regarding the factors, such as positivity or sparsity.

Minimizing Finite Sums with the Stochastic Average Gradient

2 code implementations10 Sep 2013 Mark Schmidt, Nicolas Le Roux, Francis Bach

Further, in many cases the convergence rate of the new method is also faster than black-box deterministic gradient methods, in terms of the number of gradient evaluations.

Maximizing submodular functions using probabilistic graphical models

no code implementations10 Sep 2013 K. S. Sesh Kumar, Francis Bach

In a graphical model, the entropy of the joint distribution decomposes as a sum of marginal entropies of subsets of variables; moreover, for any distribution, the entropy of the closest distribution factorizing in the graphical model provides an bound on the entropy.

Variational Inference

Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n)

no code implementations NeurIPS 2013 Francis Bach, Eric Moulines

We consider the stochastic approximation problem where a convex function has to be minimized, given only the knowledge of unbiased estimates of its gradients at certain points, a framework which includes machine learning methods based on the minimization of the empirical risk.

BIG-bench Machine Learning regression

Adaptivity of averaged stochastic gradient descent to local strong convexity for logistic regression

no code implementations25 Mar 2013 Francis Bach

In this paper, we consider supervised learning problems such as logistic regression and study the stochastic gradient method with averaging, in the usual stochastic approximation setting where observations are used only once.


Duality between subgradient and conditional gradient methods

no code implementations27 Nov 2012 Francis Bach

Given a convex optimization problem and its dual, there are many possible first-order algorithms.

BIG-bench Machine Learning

Sharp analysis of low-rank kernel matrix approximations

no code implementations9 Aug 2012 Francis Bach

We consider supervised learning problems within the positive-definite kernel framework, such as kernel ridge regression, kernel logistic regression or the support vector machine.


Learning with Submodular Functions: A Convex Optimization Perspective

no code implementations28 Nov 2011 Francis Bach

Submodular functions are relevant to machine learning for at least two reasons: (1) some problems may be expressed directly as the optimization of submodular functions and (2) the lovasz extension of submodular functions provides a useful set of regularization functions for supervised and unsupervised learning.

BIG-bench Machine Learning Combinatorial Optimization +1

Task-Driven Dictionary Learning

no code implementations27 Sep 2010 Julien Mairal, Francis Bach, Jean Ponce

Modeling data with linear combinations of a few elements from a learned dictionary has been the focus of much recent research in machine learning, neuroscience and signal processing.

Classification Dictionary Learning +2

Structured Sparse Principal Component Analysis

no code implementations8 Sep 2009 Rodolphe Jenatton, Guillaume Obozinski, Francis Bach

We present an extension of sparse PCA, or sparse dictionary learning, where the sparsity patterns of all dictionary elements are structured and constrained to belong to a prespecified set of shapes.

Dictionary Learning Face Recognition

Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning

no code implementations9 Sep 2008 Francis Bach

For supervised and unsupervised learning, positive definite kernels allow to use large and potentially infinite dimensional feature spaces with a computational cost that only depends on the number of observations.

Variable Selection

Bolasso: model consistent Lasso estimation through the bootstrap

2 code implementations8 Apr 2008 Francis Bach

For various decays of the regularization parameter, we compute asymptotic equivalents of the probability of correct model selection (i. e., variable selection).

Model Selection regression +1

Cannot find the paper you are looking for? You can Submit a new open access paper.