no code implementations • 6 Nov 2024 • Eliot Beyler, Francis Bach
In this paper, we derive variational inference upper-bounds on the log-partition function of pairwise Markov random fields on the Boolean hypercube, based on quantum relaxations of the Kullback-Leibler divergence.
no code implementations • 16 Oct 2024 • Antônio H. Ribeiro, Thomas B. Schön, Dave Zahariah, Francis Bach
For linear models, it can be formulated as a convex optimization problem.
no code implementations • 9 Oct 2024 • Sebastian G. Gruber, Francis Bach
In this work, we propose a mean-squared error-based risk that enables the comparison and optimization of estimators of squared calibration errors in practical settings.
1 code implementation • 20 Sep 2024 • Nathan Doumèche, Francis Bach, Gérard Biau, Claire Boyer
Building on the formulation of the problem as a kernel regression task, we use Fourier methods to approximate the associated kernel, and propose a tractable estimator that minimizes the physics-informed risk function.
1 code implementation • 29 Aug 2024 • Clémentine Chazal, Anna Korba, Francis Bach
In this paper, we study the statistical and geometrical properties of the Kullback-Leibler divergence with kernel covariance operators (KKL) introduced by Bach [2022].
1 code implementation • 24 Jul 2024 • Bertille Follain, Francis Bach
We propose a new method for feature learning and function estimation in supervised learning via regularised empirical risk minimisation.
1 code implementation • 12 Feb 2024 • Nathan Doumèche, Francis Bach, Gérard Biau, Claire Boyer
In this context, we consider a general regression problem where the empirical risk is regularized by a partial differential equation that quantifies the physical inconsistency.
no code implementations • 21 Nov 2023 • Eugene Berta, Francis Bach, Michael Jordan
IR acts as an adaptive binning procedure, which allows achieving a calibration error of zero, but leaves open the issue of the effect on performance.
no code implementations • 7 Nov 2023 • Simon Martin, Francis Bach, Giulio Biroli
We study the training dynamics of a shallow neural network with quadratic activation functions and quadratic cost in a teacher-student setup.
1 code implementation • NeurIPS 2023 • Antônio H. Ribeiro, Dave Zachariah, Francis Bach, Thomas B. Schön
And, conversely, the minimum-norm interpolator is the solution to adversarial training with a given radius.
no code implementations • 3 Oct 2023 • Marc Lambert, Silvère Bonnabel, Francis Bach
The solution to this last proximal loss is given by implicit updates on the mean and covariance that we proposed earlier.
1 code implementation • 24 Jul 2023 • Bertille Follain, Francis Bach
Representation learning plays a crucial role in automated feature selection, particularly in the context of high-dimensional data, where non-parametric methods often struggle.
1 code implementation • 1 Jun 2023 • Vivien Cabannes, Francis Bach
Historically, the machine learning community has derived spectral decompositions from graph-based approaches.
no code implementations • 31 May 2023 • Saeed Saremi, Ji Won Park, Francis Bach
We introduce a theoretical framework for sampling from unnormalized densities based on a smoothing scheme that uses an isotropic Gaussian kernel with a single fixed noise scale.
1 code implementation • NeurIPS 2023 • Amir Joudaki, Hadi Daneshmand, Francis Bach
In this paper, we explore the structure of the penultimate Gram matrix in deep neural networks, which contains the pairwise inner products of outputs corresponding to a batch of inputs.
no code implementations • 21 Mar 2023 • Saeed Saremi, Rupesh Kumar Srivastava, Francis Bach
We consider the problem of generative modeling based on smoothing an unknown density of interest in $\mathbb{R}^d$ using factorial kernels with $M$ independent Gaussian channels with equal noise levels introduced by Saremi and Srivastava (2022).
no code implementations • 16 Mar 2023 • Belinda Tzen, Anant Raj, Maxim Raginsky, Francis Bach
Mirror descent, introduced by Nemirovski and Yudin in the 1970s, is a primal-dual convex optimization method that can be tailored to the geometry of the optimization problem at hand through the choice of a strongly convex potential function.
1 code implementation • 6 Mar 2023 • David Holzmüller, Francis Bach
Specifically, for $m$-times differentiable functions in $d$ dimensions, the optimal rate for algorithms with $n$ function evaluations is known to be $O(n^{-m/d})$, where the constant can potentially depend on $m, d$ and the function to be optimized.
no code implementations • 2 Mar 2023 • Francis Bach
We consider linear regression problems with a varying number of random projections, where we provably exhibit a double descent curve for a fixed prediction problem, with a high-dimensional analysis based on random matrix theory.
1 code implementation • 13 Feb 2023 • Loucas Pillaud-Vivien, Francis Bach
Spectral clustering and diffusion maps are celebrated dimensionality reduction algorithms built on eigen-elements related to the diffusive structure of the data.
no code implementations • 7 Feb 2023 • Blake Woodworth, Konstantin Mishchenko, Francis Bach
We present an algorithm for minimizing an objective with hard-to-compute gradients by using a related, easier-to-access function as a proxy.
no code implementations • 7 Feb 2023 • Francis Bach
We consider multivariate splines and show that they have a random feature expansion as infinitely wide neural networks with one-hidden layer and a homogeneous activation function which is the power of the rectified linear unit.
1 code implementation • 10 Nov 2022 • Lawrence Stewart, Francis Bach, Quentin Berthet, Jean-Philippe Vert
Neural networks can be trained to solve regression problems by using gradient-based methods to minimize the square loss.
no code implementations • 19 Sep 2022 • Aurelien Lucchi, Frank Proske, Antonio Orvieto, Francis Bach, Hans Kersting
This generalizes processes based on Brownian motion, such as the Ornstein-Uhlenbeck process.
no code implementations • 27 Jun 2022 • Francis Bach
We consider extensions of the Shannon relative entropy, referred to as $f$-divergences. Three classical related computational problems are typically associated with these divergences: (a) estimation from moments, (b) computing normalizing integrals, and (c) variational inference in probabilistic models.
1 code implementation • 15 Jun 2022 • Konstantin Mishchenko, Francis Bach, Mathieu Even, Blake Woodworth
The existing analysis of asynchronous stochastic gradient descent (SGD) degrades dramatically when any delay is large, giving the impression that performance depends primarily on the delay.
1 code implementation • 9 Jun 2022 • Antonio Orvieto, Anant Raj, Hans Kersting, Francis Bach
Injecting noise within gradient descent has several desirable features, such as smoothing and regularizing properties.
1 code implementation • 31 May 2022 • Marc Lambert, Sinho Chewi, Francis Bach, Silvère Bonnabel, Philippe Rigollet
Along with Markov chain Monte Carlo (MCMC) methods, variational inference (VI) has emerged as a central computational approach to large-scale Bayesian inference.
1 code implementation • 26 May 2022 • Vivien Cabannes, Francis Bach, Vianney Perchet, Alessandro Rudi
The workhorse of machine learning is stochastic gradient descent.
no code implementations • 25 May 2022 • Amir Joudaki, Hadi Daneshmand, Francis Bach
Mean field theory is widely used in the theoretical studies of neural networks.
1 code implementation • 25 May 2022 • Benjamin Dubois-Taine, Francis Bach, Quentin Berthet, Adrien Taylor
We consider the problem of minimizing the sum of two convex functions.
1 code implementation • 16 Apr 2022 • Hadi Daneshmand, Francis Bach
Mean field theory has provided theoretical insights into various algorithms by letting the problem size tend to infinity.
no code implementations • 11 Apr 2022 • Blake Woodworth, Francis Bach, Alessandro Rudi
We consider potentially non-convex optimization problems, for which optimal rates of approximation depend on the dimension of the parameter space and the smoothness of the function to be optimized.
no code implementations • 17 Feb 2022 • Francis Bach
We consider the analysis of probability distributions through their associated covariance operators from reproducing kernel Hilbert spaces.
no code implementations • 16 Feb 2022 • Ziad Kobeissi, Francis Bach
We consider the problem of continuous-time policy evaluation.
no code implementations • 6 Feb 2022 • Antonio Orvieto, Hans Kersting, Frank Proske, Francis Bach, Aurelien Lucchi
Injecting artificial noise into gradient descent (GD) is commonly employed to improve the performance of machine learning models.
no code implementations • 28 Jan 2022 • Théo Ryffel, Francis Bach, David Pointcheval
We analyse the privacy leakage of noisy stochastic gradient descent by modeling R\'enyi divergence dynamics with Langevin diffusions.
no code implementations • 3 Dec 2021 • Boris Muzellec, Adrien Vacher, Francis Bach, François-Xavier Vialard, Alessandro Rudi
It was recently shown that under smoothness conditions, the squared Wasserstein distance between two distributions could be efficiently computed with appealing statistical error upper bounds.
no code implementations • NeurIPS 2021 • Mathieu Even, Raphaël Berthier, Francis Bach, Nicolas Flammarion, Hadrien Hendrikx, Pierre Gaillard, Laurent Massoulié, Adrien Taylor
We introduce the ``continuized'' Nesterov acceleration, a close variant of Nesterov acceleration whose variables are indexed by a continuous time parameter.
1 code implementation • 22 Nov 2021 • Boris Muzellec, Francis Bach, Alessandro Rudi
Shape constraints such as positive semi-definiteness (PSD) for matrices or convexity for functions play a central role in many applications in machine learning and sciences, including metric learning, optimal transport, and economics.
no code implementations • 29 Oct 2021 • Anant Raj, Francis Bach
Uncertainty sampling in active learning is heavily used in practice to reduce the annotation cost.
no code implementations • 20 Oct 2021 • Ulysse Marteau-Ferey, Francis Bach, Alessandro Rudi
In many areas of applied statistics and machine learning, generating an arbitrary number of independent and identically distributed (i. i. d.)
1 code implementation • 15 Oct 2021 • Francis Bach, Lenaïc Chizat
Many supervised machine learning methods are naturally cast as optimization problems.
no code implementations • 2 Jul 2021 • Yifan Sun, Francis Bach
We couple this with a screening rule which is safe in the convex case, converging to the true support at a rate $O(1/(\delta^2))$ where $\delta \geq 0$ measures how close the problem is to degeneracy.
1 code implementation • 18 Jun 2021 • Boris Muzellec, Francis Bach, Alessandro Rudi
Kernel mean embeddings are a popular tool that consists in representing probability measures by their infinite-dimensional mean embeddings in a reproducing kernel Hilbert space.
1 code implementation • 10 Jun 2021 • Mathieu Even, Raphaël Berthier, Francis Bach, Nicolas Flammarion, Pierre Gaillard, Hadrien Hendrikx, Laurent Massoulié, Adrien Taylor
We introduce the continuized Nesterov acceleration, a close variant of Nesterov acceleration whose variables are indexed by a continuous time parameter.
1 code implementation • NeurIPS 2021 • Hadi Daneshmand, Amir Joudaki, Francis Bach
This paper underlines a subtle property of batch-normalization (BN): Successive batch normalizations with random linear transformations make hidden representations increasingly orthogonal across layers of a deep neural network.
no code implementations • 31 May 2021 • Alex Nowak-Vila, Alessandro Rudi, Francis Bach
The resulting loss is also a generalization of the binary support vector machine and it is consistent under milder conditions on the discrete loss.
no code implementations • 11 Feb 2021 • Raphaël Berthier, Francis Bach, Nicolas Flammarion, Pierre Gaillard, Adrien Taylor
We introduce the "continuized" Nesterov acceleration, a close variant of Nesterov acceleration whose variables are indexed by a continuous time parameter.
Distributed, Parallel, and Cluster Computing Optimization and Control
1 code implementation • 4 Feb 2021 • Vivien Cabannes, Francis Bach, Alessandro Rudi
Machine learning approached through supervised learning requires expensive annotation of data.
no code implementations • 1 Feb 2021 • Vivien Cabannes, Alessandro Rudi, Francis Bach
Discrete supervised learning problems such as classification are often tackled by introducing a continuous surrogate problem akin to regression.
no code implementations • 13 Jan 2021 • Adrien Vacher, Boris Muzellec, Alessandro Rudi, Francis Bach, Francois-Xavier Vialard
It is well-known that plug-in statistical estimation of optimal transport suffers from the curse of dimensionality.
Statistics Theory Optimization and Control Statistics Theory 62G05
no code implementations • 22 Dec 2020 • Alessandro Rudi, Ulysse Marteau-Ferey, Francis Bach
We consider the global minimization of smooth functions based solely on function evaluations.
no code implementations • NeurIPS 2020 • Hadi Daneshmand, Jonas Kohler, Francis Bach, Thomas Hofmann, Aurelien Lucchi
Randomly initialized neural networks are known to become harder to train with increasing depth, unless architectural enhancements like residual connections and batch normalization are used.
no code implementations • NeurIPS 2020 • Quentin Berthet, Mathieu Blondel, Olivier Teboul, Marco Cuturi, Jean-Philippe Vert, Francis Bach
Machine learning pipelines often rely on optimizers procedures to make discrete decisions (e. g., sorting, picking closest neighbors, or shortest paths).
no code implementations • 2 Oct 2020 • Robert M. Gower, Mark Schmidt, Francis Bach, Peter Richtarik
Stochastic optimization lies at the heart of machine learning, and its cornerstone is stochastic gradient descent (SGD), a method introduced over 60 years ago.
1 code implementation • ICLR 2021 • Alberto Bietti, Francis Bach
Deep networks are often considered to be more expressive than shallow ones in terms of approximation.
2 code implementations • NeurIPS 2021 • Vivien Cabannes, Loucas Pillaud-Vivien, Francis Bach, Alessandro Rudi
As annotations of data can be scarce in large-scale practical problems, leveraging unlabelled examples is one of the most important aspects of machine learning.
1 code implementation • NeurIPS 2020 • Ulysse Marteau-Ferey, Francis Bach, Alessandro Rudi
The paper is complemented by an experimental evaluation of the model showing its effectiveness in terms of formulation, algorithmic derivation and practical results on the problems of density estimation, regression with heteroscedastic errors, and multiple quantile regression.
1 code implementation • ICML 2020 • Alex Nowak-Vila, Francis Bach, Alessandro Rudi
Max-margin methods for binary classification such as the support vector machine (SVM) have been extended to the structured prediction setting under the name of max-margin Markov networks ($M^3N$), or more generally structural SVMs.
no code implementations • 16 Jun 2020 • Thomas Eboli, Alex Nowak-Vila, Jian Sun, Francis Bach, Jean Ponce, Alessandro Rudi
We present a novel approach to image restoration that leverages ideas from localized structured prediction and non-linear multi-task learning.
no code implementations • NeurIPS 2020 • Raphaël Berthier, Francis Bach, Pierre Gaillard
In the context of statistical supervised learning, the noiseless linear model assumes that there exists a deterministic linear relation $Y = \langle \theta_*, X \rangle$ between the random output $Y$ and the random feature vector $\Phi(U)$, a potentially non-linear transformation of the inputs $U$.
1 code implementation • 10 Jun 2020 • Mathieu Barré, Adrien Taylor, Francis Bach
In this work, we survey notions of inaccuracies that can be used when solving those intermediary optimization problems.
Optimization and Control Numerical Analysis Numerical Analysis
2 code implementations • 8 Jun 2020 • Théo Ryffel, Pierre Tholoniat, David Pointcheval, Francis Bach
We evaluate our end-to-end system for private inference between distant servers on standard neural networks such as AlexNet, VGG16 or ResNet18, and for private training on smaller networks like LeNet.
no code implementations • 30 Mar 2020 • Anant Raj, Francis Bach
For accelerated coordinate descent, we obtain a new algorithm that has better convergence properties than existing stochastic gradient methods in the interpolating regime.
no code implementations • 5 Mar 2020 • Alexandre Défossez, Léon Bottou, Francis Bach, Nicolas Usunier
We provide a simple proof of convergence covering both the Adam and Adagrad adaptive optimization algorithms when applied to smooth (possibly non-convex) objective functions with bounded gradients.
no code implementations • 3 Mar 2020 • Hadi Daneshmand, Jonas Kohler, Francis Bach, Thomas Hofmann, Aurelien Lucchi
Randomly initialized neural networks are known to become harder to train with increasing depth, unless architectural enhancements like residual connections and batch normalization are used.
2 code implementations • ICML 2020 • Vivien Cabannes, Alessandro Rudi, Francis Bach
Annotating datasets is one of the main costs in nowadays supervised learning.
no code implementations • 22 Feb 2020 • Yifan Sun, Francis Bach
The conditional gradient method (CGM) has been widely used for fast sparse approximation, having a low per iteration computational cost for structured sparse regularizers.
no code implementations • ICML 2020 • Marin Ballu, Quentin Berthet, Francis Bach
We show that this algorithm can be extended to other tasks, including estimation of Wasserstein barycenters.
3 code implementations • 20 Feb 2020 • Quentin Berthet, Mathieu Blondel, Olivier Teboul, Marco Cuturi, Jean-Philippe Vert, Francis Bach
Machine learning pipelines often rely on optimization procedures to make discrete decisions (e. g., sorting, picking closest neighbors, or shortest paths).
1 code implementation • 11 Feb 2020 • Lenaic Chizat, Francis Bach
Neural networks trained to minimize the logistic (a. k. a.
no code implementations • 7 Feb 2020 • Francis Bach
Richardson extrapolation is a classical technique from numerical analysis that can improve the approximation error of an estimation method by combining linearly several estimates obtained from different values of one of its hyperparameters, without the need to know in details the inner structure of the original estimation method.
1 code implementation • NeurIPS 2019 • Théo Ryffel, David Pointcheval, Francis Bach, Edouard Dufour-Sans, Romain Gay
Machine learning on encrypted data has received a lot of attention thanks to recent breakthroughs in homomorphic encryption and secure multi-party computation.
1 code implementation • 27 Nov 2019 • Alexandre Défossez, Nicolas Usunier, Léon Bottou, Francis Bach
Source separation for music is the task of isolating contributions, or stems, from different instruments recorded individually and arranged together to form a song.
Ranked #3 on Multi-task Audio Source Seperation on MTASS
no code implementations • NeurIPS 2019 • Ali Kavis, Kfir. Y. Levy, Francis Bach, Volkan Cevher
To the best of our knowledge, this is the first adaptive, unified algorithm that achieves the optimal rates in the constrained setting.
1 code implementation • 3 Sep 2019 • Alexandre Défossez, Nicolas Usunier, Léon Bottou, Francis Bach
We study the problem of source separation for music using deep learning with four known sources: drums, bass, vocals and other accompaniments.
1 code implementation • NeurIPS 2019 • Othmane Sebbouh, Nidham Gazagnadou, Samy Jelassi, Francis Bach, Robert M. Gower
Among the very first variance reduced stochastic methods for solving the empirical risk minimization problem was the SVRG method (Johnson & Zhang 2013).
1 code implementation • NeurIPS 2019 • Ulysse Marteau-Ferey, Francis Bach, Alessandro Rudi
In this paper, we study large-scale convex optimization algorithms based on the Newton method applied to regularized generalized self-concordant losses, which include logistic regression and softmax regression.
no code implementations • 20 Jun 2019 • Francis Bach
We consider deterministic Markov decision processes (MDPs) and apply max-plus algebra tools to approximate the value iteration algorithm by a smaller-dimensional iteration based on a representation on dictionaries of value functions.
1 code implementation • NeurIPS 2019 • K. S. Sesh Kumar, Francis Bach, Thomas Pock
We consider the problem of minimizing the sum of submodular set functions assuming minimization oracles of each summand function.
3 code implementations • 24 May 2019 • Theo Ryffel, Edouard Dufour-Sans, Romain Gay, Francis Bach, David Pointcheval
Machine learning on encrypted data has received a lot of attention thanks to recent breakthroughs in homomorphic encryption and secure multi-party computation.
1 code implementation • NeurIPS 2019 • Gauthier Gidel, Francis Bach, Simon Lacoste-Julien
When optimizing over-parameterized models, such as deep neural networks, a large set of parameters can achieve zero training error.
1 code implementation • CVPR 2019 • Huy V. Vo, Francis Bach, Minsu Cho, Kai Han, Yann Lecun, Patrick Perez, Jean Ponce
Learning with complete or partial supervision is powerful but relies on ever-growing human annotation efforts.
Ranked #2 on Single-object colocalization on Object Discovery
1 code implementation • 11 Feb 2019 • Dmitry Babichev, Dmitrii Ostrovskii, Francis Bach
We develop efficient algorithms to train $\ell_1$-regularized linear classifiers with large dimensionality $d$ of the feature space, number of classes $k$, and sample size $n$.
no code implementations • 8 Feb 2019 • Ulysse Marteau-Ferey, Dmitrii Ostrovskii, Francis Bach, Alessandro Rudi
We consider learning methods based on the regularization of a convex empirical risk by a squared Hilbertian norm, a setting that includes linear predictors and non-linear predictors through positive-definite kernels.
no code implementations • 5 Feb 2019 • Alex Nowak-Vila, Francis Bach, Alessandro Rudi
In this work we provide a theoretical framework for structured prediction that generalizes the existing theory of surrogate methods for binary and multiclass classification based on estimating conditional probabilities with smooth convex surrogates (e. g. logistic regression).
no code implementations • 5 Feb 2019 • Francis Bach, Kfir. Y. Levy
We consider variational inequalities coming from monotone operators, a setting that includes convex minimization and convex-concave saddle-point problems.
1 code implementation • 3 Feb 2019 • Adrien Taylor, Francis Bach
We use the approach for analyzing deterministic and stochastic first-order methods under different assumptions on the nature of the stochastic noise.
no code implementations • 28 Jan 2019 • Hadrien Hendrikx, Francis Bach, Laurent Massoulié
In this work, we study the problem of minimizing the sum of strongly convex functions split over a network of $n$ nodes.
Optimization and Control Distributed, Parallel, and Cluster Computing
no code implementations • 24 Jan 2019 • Anastasia Podosinnikova, Amelia Perry, Alexander Wein, Francis Bach, Alexandre d'Aspremont, David Sontag
Moreover, we conjecture that the proposed program recovers a mixing component at the rate k < p^2/4 and prove that a mixing component can be recovered with high probability when k < (2 - epsilon) p log p when the original components are sampled uniformly at random on the hyper sphere.
1 code implementation • NeurIPS 2019 • Lenaic Chizat, Edouard Oyallon, Francis Bach
In a series of recent theoretical works, it was shown that strongly over-parameterized neural networks trained with gradient-based methods could converge exponentially fast to zero training loss, with their parameters hardly varying.
no code implementations • NeurIPS 2019 • Jason Altschuler, Francis Bach, Alessandro Rudi, Jonathan Niles-Weed
The Sinkhorn "distance", a variant of the Wasserstein distance with entropic regularization, is an increasingly popular tool in machine learning and statistical inference.
no code implementations • 21 Nov 2018 • Tatiana Shpakova, Francis Bach, Anton Osokin
We consider the structured-output prediction problem through probabilistic approaches and generalize the "perturb-and-MAP" framework to more challenging weighted Hamming losses, which are crucial in applications.
1 code implementation • NeurIPS 2018 • Alexandre Défossez, Neil Zeghidour, Nicolas Usunier, Léon Bottou, Francis Bach
On the generalization task of synthesizing notes for pairs of pitch and instrument not seen during training, SING produces audio with significantly improved perceptual quality compared to a state-of-the-art autoencoder based on WaveNet as measured by a Mean Opinion Score (MOS), and is about 32 times faster for training and 2, 500 times faster for inference.
1 code implementation • 16 Oct 2018 • Dmitrii Ostrovskii, Francis Bach
We demonstrate how self-concordance of the loss allows to characterize the critical sample size sufficient to guarantee a chi-square type in-probability bound for the excess risk.
no code implementations • 16 Oct 2018 • Alex Nowak-Vila, Francis Bach, Alessandro Rudi
The problem of devising learning strategies for discrete losses (e. g., multilabeling, ranking) is currently addressed with methods and theoretical analyses ad-hoc for each loss.
no code implementations • 16 Oct 2018 • Sharan Vaswani, Francis Bach, Mark Schmidt
Under this condition, we prove that constant step-size stochastic gradient descent (SGD) with Nesterov acceleration matches the convergence rate of the deterministic accelerated method for both convex and strongly-convex functions.
no code implementations • 5 Oct 2018 • Hadrien Hendrikx, Francis Bach, Laurent Massoulié
Applying $ESDACD$ to quadratic local functions leads to an accelerated randomized gossip algorithm of rate $O( \sqrt{\theta_{\rm gossip}/n})$ where $\theta_{\rm gossip}$ is the rate of the standard randomized gossip.
no code implementations • NeurIPS 2019 • Carlo Ciliberto, Francis Bach, Alessandro Rudi
Key to structured prediction is exploiting the problem structure to simplify the learning process.
1 code implementation • 1 Jun 2018 • Damien Scieur, Edouard Oyallon, Alexandre d'Aspremont, Francis Bach
The Regularized Nonlinear Acceleration (RNA) algorithm is an acceleration method capable of improving the rate of convergence of many optimization schemes such as gradient descend, SAGA or SVRG.
no code implementations • NeurIPS 2018 • Kevin Scaman, Francis Bach, Sébastien Bubeck, Yin Tat Lee, Laurent Massoulié
Under the global regularity assumption, we provide a simple yet efficient algorithm called distributed randomized smoothing (DRS) based on a local smoothing of the objective function, and show that DRS is within a $d^{1/4}$ multiplicative factor of the optimal convergence rate, where $d$ is the underlying dimension.
Optimization and Control
1 code implementation • 25 May 2018 • Pierre Ablin, Alexandre Gramfort, Jean-François Cardoso, Francis Bach
We derive an online algorithm for the streaming setting, and an incremental algorithm for the finite sum setting, with the following benefits.
no code implementations • NeurIPS 2018 • Loucas Pillaud-Vivien, Alessandro Rudi, Francis Bach
We consider stochastic gradient descent (SGD) for least-squares regression with potentially several passes over the data.
no code implementations • 24 May 2018 • Damien Scieur, Edouard Oyallon, Alexandre d'Aspremont, Francis Bach
Regularized nonlinear acceleration (RNA) estimates the minimum of a function by post-processing iterates from an algorithm such as the gradient method.
no code implementations • NeurIPS 2018 • Lenaic Chizat, Francis Bach
Many tasks in machine learning and signal processing can be solved by minimizing a convex function of a measure.
1 code implementation • 22 May 2018 • Raphaël Berthier, Francis Bach, Pierre Gaillard
We develop a method solving the gossip problem that depends only on the spectral dimension of the network, that is, in the communication network set-up, the dimension of the space in which the agents live.
no code implementations • NeurIPS 2018 • Edouard Pauwels, Francis Bach, Jean-Philippe Vert
Statistical leverage scores emerged as a fundamental tool for matrix sketching and column sampling with applications to low rank approximation, regression, random feature learning and quadrature.
no code implementations • 16 Apr 2018 • Dmitry Babichev, Francis Bach
We propose averaging moment parameters instead of natural parameters for constant-step-size stochastic gradient descent.
no code implementations • 26 Feb 2018 • Nilesh Tripuraneni, Nicolas Flammarion, Francis Bach, Michael. I. Jordan
We consider the minimization of a function defined on a Riemannian manifold $\mathcal{M}$ accessible only through unbiased estimates of its gradients.
no code implementations • 13 Dec 2017 • Loucas Pillaud-Vivien, Alessandro Rudi, Francis Bach
We consider binary classification problems with positive definite kernels and square loss, and study the convergence rates of stochastic gradient methods.
no code implementations • NeurIPS 2017 • Damien Scieur, Vincent Roulet, Francis Bach, Alexandre d'Aspremont
We show that accelerated optimization methods can be seen as particular instances of multi-step integration schemes from numerical analysis, applied to the gradient flow equation.
no code implementations • NeurIPS 2017 • Damien Scieur, Francis Bach, Alexandre d'Aspremont
Here, we study extrapolation methods in a stochastic setting, where the iterates are produced by either a simple or an accelerated stochastic gradient algorithm.
no code implementations • 6 Nov 2017 • Alexandre Défossez, Francis Bach
We study a new aggregation operator for gradients coming from a mini-batch for stochastic gradient (SG) methods that allows a significant speed-up in the case of sparse optimization problems.
1 code implementation • 20 Oct 2017 • Robert M. Gower, Nicolas Le Roux, Francis Bach
Our goal is to improve variance reducing stochastic methods through better control variates.
no code implementations • 17 Oct 2017 • Marwa El Halabi, Francis Bach, Volkan Cevher
We consider the homogeneous and the non-homogeneous convex relaxations for combinatorial penalty functions defined on support sets.
no code implementations • 5 Sep 2017 • Sashank J. Reddi, Manzil Zaheer, Suvrit Sra, Barnabas Poczos, Francis Bach, Ruslan Salakhutdinov, Alexander J. Smola
A central challenge to using first-order methods for optimizing nonconvex problems is the presence of saddle points.
no code implementations • NeurIPS 2018 • Francis Bach
We consider the minimization of submodular functions subject to ordering constraints.
no code implementations • 20 Jul 2017 • Aymeric Dieuleveut, Alain Durmus, Francis Bach
We consider the minimization of an objective function given access to unbiased estimates of its gradient through stochastic gradient descent (SGD) with constant step-size.
no code implementations • CVPR 2017 • Rafael S. Rezende, Joaquin Zepeda, Jean Ponce, Francis Bach, Patrick Perez
Zepeda and Perez have recently demonstrated the promise of the exemplar SVM (ESVM) as a feature encoder for image retrieval.
1 code implementation • NeurIPS 2017 • Anton Osokin, Francis Bach, Simon Lacoste-Julien
We provide novel theoretical insights on structured prediction in the context of efficient convex surrogate loss minimization with consistency guarantees.
1 code implementation • ICML 2017 • Kevin Scaman, Francis Bach, Sébastien Bubeck, Yin Tat Lee, Laurent Massoulié
For centralized (i. e. master/slave) algorithms, we show that distributing Nesterov's accelerated gradient descent is optimal and achieves a precision $\varepsilon > 0$ in time $O(\sqrt{\kappa_g}(1+\Delta\tau)\ln(1/\varepsilon))$, where $\kappa_g$ is the condition number of the (global) function to optimize, $\Delta$ is the diameter of the network, and $\tau$ (resp.
no code implementations • 21 Feb 2017 • Nicolas Flammarion, Francis Bach
We consider the minimization of composite objective functions composed of the expectation of quadratic functions and an arbitrary convex function.
no code implementations • 19 Oct 2016 • Christophe Dupuy, Francis Bach
We propose a new class of determinantal point processes (DPPs) which can be manipulated for inference and parameter learning in potentially sublinear time in the number of items.
no code implementations • 29 Aug 2016 • Nicolas Flammarion, Balamurugan Palaniappan, Francis Bach
Clustering high-dimensional data often requires some form of dimensionality reduction, where clustered variables are separated from "noise-looking" variables.
no code implementations • NeurIPS 2016 • Tatiana Shpakova, Francis Bach
Then, to learn parameters, given that our approximation of the log-partition function is an expectation (over our own randomization), we use a stochastic subgradient technique to maximize a lower-bound on the log-likelihood.
no code implementations • NeurIPS 2016 • Pascal Germain, Francis Bach, Alexandre Lacoste, Simon Lacoste-Julien
That is, for the negative log-likelihood loss function, we show that the minimization of PAC-Bayesian generalization risk bounds maximizes the Bayesian marginal likelihood.
no code implementations • NeurIPS 2016 • Genevay Aude, Marco Cuturi, Gabriel Peyré, Francis Bach
We instantiate these ideas in three different setups: (i) when comparing a discrete distribution to another, we show that incremental stochastic optimization schemes can beat Sinkhorn's algorithm, the current state-of-the-art finite dimensional OT solver; (ii) when comparing a discrete distribution to a continuous density, a semi-discrete reformulation of the dual program is amenable to averaged stochastic gradient descent, leading to better performance than approximately solving the problem by discretization ; (iii) when dealing with two continuous densities, we propose a stochastic gradient descent over a reproducing kernel Hilbert space (RKHS).
no code implementations • 26 May 2016 • Francis Bach, Vianney Perchet
The minimization of convex functions which are only available through partial and noisy information is a key methodological problem in many disciplines.
no code implementations • NeurIPS 2016 • P. Balamurugan, Francis Bach
We consider convex-concave saddle-point problems where the objective functions may be split in many components, and extend recent stochastic variance reduction methods (such as SVRG or SAGA) to provide the first large-scale linearly convergent algorithms for this class of problems which is common in machine learning.
no code implementations • 8 Mar 2016 • Christophe Dupuy, Francis Bach
We first propose an unified treatment of online inference for latent variable models from a non-canonical exponential family, and draw explicit links between several previously proposed frequentist or Bayesian methods.
no code implementations • 29 Feb 2016 • Anastasia Podosinnikova, Francis Bach, Simon Lacoste-Julien
We introduce three novel semi-parametric extensions of probabilistic canonical correlation analysis with identifiability guarantees.
no code implementations • 17 Feb 2016 • Aymeric Dieuleveut, Nicolas Flammarion, Francis Bach
We consider the optimization of a quadratic objective function whose gradients are only accessible through a stochastic oracle that returns the gradient at any given point plus a zero-mean finite variance random error.
no code implementations • NeurIPS 2015 • Rakesh Shivanna, Bibaswan K. Chatterjee, Raman Sankaran, Chiranjib Bhattacharyya, Francis Bach
We propose an alternative PAC-based bound, which do not depend on the VC dimension of the underlying function class, but is related to the famous Lov\'{a}sz~$\vartheta$ function.
no code implementations • 2 Nov 2015 • Francis Bach
A key element in many of the algorithms and analyses is the possibility of extending the submodular set-function to a convex function, which opens up tools from convex optimization.
no code implementations • NeurIPS 2015 • Anastasia Podosinnikova, Francis Bach, Simon Lacoste-Julien
We consider moment matching techniques for estimation in Latent Dirichlet Allocation (LDA).
no code implementations • 16 Jun 2015 • Vincent Roulet, Fajwel Fogel, Alexandre d'Aspremont, Francis Bach
We study supervised learning problems using clustering constraints to impose structure on either features or samples, seeking to help both prediction and interpretation.
no code implementations • 5 Jun 2015 • Rémi Lajugie, Piotr Bojanowski, Sylvain Arlot, Francis Bach
In this paper, we address the problem of multi-label classification.
no code implementations • ICCV 2015 • Piotr Bojanowski, Rémi Lajugie, Edouard Grave, Francis Bach, Ivan Laptev, Jean Ponce, Cordelia Schmid
Given vectorial features for both video and text, we propose to cast this task as a temporal assignment problem, with an implicit linear mapping between the two feature modalities.
no code implementations • 7 Apr 2015 • Nicolas Flammarion, Francis Bach
We show that accelerated gradient descent, averaged gradient descent and the heavy-ball method for non-strongly-convex problems may be reformulated as constant parameter second-order difference equation algorithms, where stability of the system is equivalent to convergence at rate O(1/n 2), where n is the number of iterations.
no code implementations • 10 Mar 2015 • Nino Shervashidze, Francis Bach
Structured sparsity has recently emerged in statistics, machine learning and signal processing as a promising paradigm for learning in high-dimensional settings.
no code implementations • 5 Mar 2015 • K. S. Sesh Kumar, Alvaro Barbero, Stefanie Jegelka, Suvrit Sra, Francis Bach
By exploiting results from convex and submodular theory, we reformulate the quadratic energy minimization problem as a total variation denoising problem, which, when viewed geometrically, enables the use of projection and reflection based convex methods.
no code implementations • 24 Feb 2015 • Francis Bach
We show that kernel-based quadrature rules for computing integrals can be seen as a special case of random feature expansions for positive definite kernels, for a particular decomposition that always exists for such kernels.
no code implementations • 9 Jan 2015 • Simon Lacoste-Julien, Fredrik Lindsten, Francis Bach
Recently, the Frank-Wolfe optimization algorithm was suggested as a procedure to obtain adaptive quadrature rules for integrals of functions in a reproducing kernel Hilbert space (RKHS) with a potentially faster rate of convergence than Monte Carlo integration (and "kernel herding" was shown to be a special case of this procedure).
no code implementations • 30 Dec 2014 • Francis Bach
Moreover, when using sparsity-inducing norms on the input weights, we show that high-dimensional non-linear variable selection may be achieved, without any strong assumption regarding the data and with a total number of variables potentially exponential in the number of ob-servations.
2 code implementations • 4 Dec 2014 • Felipe Yanez, Francis Bach
Non-negative matrix factorization (NMF) approximates a given matrix as a product of two non-negative matrices.
no code implementations • 29 Nov 2014 • Alexandre Défossez, Francis Bach
Our analysis leads to new insights into stochastic approximation algorithms: (a) it gives a tighter bound on the allowed step-size; (b) the generalization error may be divided into a variance term which is decaying as O(1/n), independently of the step-size $\gamma$, and a bias term that decays as O(1/$\gamma$ 2 n 2); (c) when allowing non-uniform sampling, the choice of a good sampling density depends on whether the variance or bias terms dominate.
no code implementations • 12 Nov 2014 • Julien Mairal, Francis Bach, Jean Ponce
In recent years, a large amount of multi-disciplinary research has been conducted on sparse models and their applications.
no code implementations • NeurIPS 2014 • Damien Garreau, Rémi Lajugie, Sylvain Arlot, Francis Bach
The learning examples for this task are time series for which the true alignment is known.
no code implementations • 11 Aug 2014 • Fabian Pedregosa, Francis Bach, Alexandre Gramfort
We will see that, for a family of surrogate loss functions that subsumes support vector ordinal regression and ORBoosting, consistency can be fully characterized by the derivative of a real-valued function at zero, as happens for convex margin-based surrogates in binary classification.
no code implementations • 19 Jul 2014 • Rémi Gribonval, Rodolphe Jenatton, Francis Bach
A popular approach within the signal processing and machine learning communities consists in modelling signals as sparse linear combinations of atoms selected from a learned dictionary.
no code implementations • 4 Jul 2014 • Piotr Bojanowski, Rémi Lajugie, Francis Bach, Ivan Laptev, Jean Ponce, Cordelia Schmid, Josef Sivic
We are given a set of video clips, each one annotated with an {\em ordered} list of actions, such as "walk" then "sit" then "answer phone" extracted from, for example, the associated text script.
5 code implementations • NeurIPS 2014 • Aaron Defazio, Francis Bach, Simon Lacoste-Julien
In this work we introduce a new optimisation method called SAGA in the spirit of SAG, SDCA, MISO and SVRG, a set of recently proposed incremental gradient algorithms with fast linear convergence rates.
no code implementations • 20 Mar 2014 • Matthias Seibert, Martin Kleinsteuber, Rémi Gribonval, Rodolphe Jenatton, Francis Bach
The main goal of this paper is to provide a sample complexity estimate that controls to what extent the empirical average deviates from the cost function.
no code implementations • 14 Dec 2013 • Edouard Grave, Guillaume Obozinski, Francis Bach
Most natural language processing systems based on machine learning are not robust to domain shift.
no code implementations • 13 Dec 2013 • Rémi Gribonval, Rodolphe Jenatton, Francis Bach, Martin Kleinsteuber, Matthias Seibert
Many modern tools in machine learning and signal processing, such as sparse dictionary learning, principal component analysis (PCA), non-negative matrix factorization (NMF), $K$-means clustering, etc., rely on the factorization of a matrix obtained by concatenating high-dimensional vectors from a training collection.
no code implementations • NeurIPS 2013 • Fajwel Fogel, Rodolphe Jenatton, Francis Bach, Alexandre d'Aspremont
Seriation seeks to reconstruct a linear order between variables using unsorted similarity information.
no code implementations • NeurIPS 2013 • Stefanie Jegelka, Francis Bach, Suvrit Sra
A key component of our method is a formulation of the discrete submodular minimization problem as a continuous best approximation problem that is solved through a sequence of reflections, and its solution can be easily thresholded to obtain an optimal discrete solution.
no code implementations • 12 Sep 2013 • Francis Bach
We consider the factorization of a rectangular matrix $X $ into a positive linear combination of rank-one factors of the form $u v^\top$, where $u$ and $v$ belongs to certain sets $\mathcal{U}$ and $\mathcal{V}$, that may encode specific structures regarding the factors, such as positivity or sparsity.
no code implementations • 10 Sep 2013 • K. S. Sesh Kumar, Francis Bach
In a graphical model, the entropy of the joint distribution decomposes as a sum of marginal entropies of subsets of variables; moreover, for any distribution, the entropy of the closest distribution factorizing in the graphical model provides an bound on the entropy.
2 code implementations • 10 Sep 2013 • Mark Schmidt, Nicolas Le Roux, Francis Bach
Further, in many cases the convergence rate of the new method is also faster than black-box deterministic gradient methods, in terms of the number of gradient evaluations.
no code implementations • NeurIPS 2013 • Francis Bach, Eric Moulines
We consider the stochastic approximation problem where a convex function has to be minimized, given only the knowledge of unbiased estimates of its gradients at certain points, a framework which includes machine learning methods based on the minimization of the empirical risk.
no code implementations • 25 Mar 2013 • Francis Bach
In this paper, we consider supervised learning problems such as logistic regression and study the stochastic gradient method with averaging, in the usual stochastic approximation setting where observations are used only once.
no code implementations • 27 Nov 2012 • Francis Bach
Given a convex optimization problem and its dual, there are many possible first-order algorithms.
no code implementations • 9 Aug 2012 • Francis Bach
We consider supervised learning problems within the positive-definite kernel framework, such as kernel ridge regression, kernel logistic regression or the support vector machine.
no code implementations • 28 Nov 2011 • Francis Bach
Submodular functions are relevant to machine learning for at least two reasons: (1) some problems may be expressed directly as the optimization of submodular functions and (2) the lovasz extension of submodular functions provides a useful set of regularization functions for supervised and unsupervised learning.
no code implementations • 27 Sep 2010 • Julien Mairal, Francis Bach, Jean Ponce
Modeling data with linear combinations of a few elements from a learned dictionary has been the focus of much recent research in machine learning, neuroscience and signal processing.
no code implementations • 8 Sep 2009 • Rodolphe Jenatton, Guillaume Obozinski, Francis Bach
We present an extension of sparse PCA, or sparse dictionary learning, where the sparsity patterns of all dictionary elements are structured and constrained to belong to a prespecified set of shapes.
no code implementations • 9 Sep 2008 • Francis Bach
For supervised and unsupervised learning, positive definite kernels allow to use large and potentially infinite dimensional feature spaces with a computational cost that only depends on the number of observations.
2 code implementations • 8 Apr 2008 • Francis Bach
For various decays of the regularization parameter, we compute asymptotic equivalents of the probability of correct model selection (i. e., variable selection).