You need to log in to edit.

You can create a new account if you don't have one.

Or, discuss a change on Slack.

You can create a new account if you don't have one.

Or, discuss a change on Slack.

no code implementations • 31 May 2023 • Saeed Saremi, Ji Won Park, Francis Bach

Markov chain Monte Carlo (MCMC) is a class of general-purpose algorithms for sampling from unnormalized densities.

no code implementations • 28 May 2023 • Amir Joudaki, Hadi Daneshmand, Francis Bach

To bridge this gap, we provide a proof that layer normalization, in conjunction with activation layers, biases the Gram matrix of a multilayer perceptron towards isometry at an exponential rate with depth at initialization.

no code implementations • 21 Mar 2023 • Saeed Saremi, Rupesh Kumar Srivastava, Francis Bach

We consider the problem of generative modeling based on smoothing an unknown density of interest in $\mathbb{R}^d$ using factorial kernels with $M$ independent Gaussian channels with equal noise levels introduced by Saremi and Srivastava (2022).

no code implementations • 16 Mar 2023 • Belinda Tzen, Anant Raj, Maxim Raginsky, Francis Bach

Mirror descent, introduced by Nemirovski and Yudin in the 1970s, is a primal-dual convex optimization method that can be tailored to the geometry of the optimization problem at hand through the choice of a strongly convex potential function.

1 code implementation • 6 Mar 2023 • David Holzmüller, Francis Bach

Specifically, for $m$-times differentiable functions in $d$ dimensions, the optimal rate for algorithms with $n$ function evaluations is known to be $O(n^{-m/d})$, where the constant can potentially depend on $m, d$ and the function to be optimized.

no code implementations • 2 Mar 2023 • Francis Bach

We consider linear regression problems with a varying number of random projections, where we provably exhibit a double descent curve for a fixed prediction problem, with a high-dimensional analysis based on random matrix theory.

1 code implementation • 13 Feb 2023 • Loucas Pillaud-Vivien, Francis Bach

Spectral clustering and diffusion maps are celebrated dimensionality reduction algorithms built on eigen-elements related to the diffusive structure of the data.

no code implementations • 7 Feb 2023 • Blake Woodworth, Konstantin Mishchenko, Francis Bach

We present an algorithm for minimizing an objective with hard-to-compute gradients by using a related, easier-to-access function as a proxy.

no code implementations • 7 Feb 2023 • Francis Bach

We consider multivariate splines and show that they have a random feature expansion as infinitely wide neural networks with one-hidden layer and a homogeneous activation function which is the power of the rectified linear unit.

1 code implementation • 10 Nov 2022 • Lawrence Stewart, Francis Bach, Quentin Berthet, Jean-Philippe Vert

Neural networks can be trained to solve regression problems by using gradient-based methods to minimize the square loss.

no code implementations • 19 Sep 2022 • Aurelien Lucchi, Frank Proske, Antonio Orvieto, Francis Bach, Hans Kersting

This generalizes processes based on Brownian motion, such as the Ornstein-Uhlenbeck process.

no code implementations • 27 Jun 2022 • Francis Bach

In order to achieve this, we derive a sequence of convex relaxations for computing these divergences from non-centered covariance matrices associated with a given feature vector: starting from the typically non-tractable optimal lower-bound, we consider an additional relaxation based on "sums-of-squares", which is is now computable in polynomial time as a semidefinite program, as well as further computationally more efficient relaxations based on spectral information divergences from quantum information theory.

1 code implementation • 15 Jun 2022 • Konstantin Mishchenko, Francis Bach, Mathieu Even, Blake Woodworth

The existing analysis of asynchronous stochastic gradient descent (SGD) degrades dramatically when any delay is large, giving the impression that performance depends primarily on the delay.

1 code implementation • 9 Jun 2022 • Antonio Orvieto, Anant Raj, Hans Kersting, Francis Bach

Injecting noise within gradient descent has several desirable features, such as smoothing and regularizing properties.

1 code implementation • 31 May 2022 • Marc Lambert, Sinho Chewi, Francis Bach, Silvère Bonnabel, Philippe Rigollet

Along with Markov chain Monte Carlo (MCMC) methods, variational inference (VI) has emerged as a central computational approach to large-scale Bayesian inference.

1 code implementation • 26 May 2022 • Vivien Cabannes, Francis Bach, Vianney Perchet, Alessandro Rudi

The workhorse of machine learning is stochastic gradient descent.

1 code implementation • 25 May 2022 • Benjamin Dubois-Taine, Francis Bach, Quentin Berthet, Adrien Taylor

We consider the problem of minimizing the sum of two convex functions.

no code implementations • 25 May 2022 • Amir Joudaki, Hadi Daneshmand, Francis Bach

Mean field theory is widely used in the theoretical studies of neural networks.

1 code implementation • 16 Apr 2022 • Hadi Daneshmand, Francis Bach

Mean field theory has provided theoretical insights into various algorithms by letting the problem size tend to infinity.

no code implementations • 11 Apr 2022 • Blake Woodworth, Francis Bach, Alessandro Rudi

We consider potentially non-convex optimization problems, for which optimal rates of approximation depend on the dimension of the parameter space and the smoothness of the function to be optimized.

no code implementations • 17 Feb 2022 • Francis Bach

We consider the analysis of probability distributions through their associated covariance operators from reproducing kernel Hilbert spaces.

no code implementations • 16 Feb 2022 • Ziad Kobeissi, Francis Bach

We consider the problem of policy evaluation for continuous-time processes using the temporal-difference learning algorithm.

no code implementations • 6 Feb 2022 • Antonio Orvieto, Hans Kersting, Frank Proske, Francis Bach, Aurelien Lucchi

Injecting artificial noise into gradient descent (GD) is commonly employed to improve the performance of machine learning models.

no code implementations • 28 Jan 2022 • Théo Ryffel, Francis Bach, David Pointcheval

We analyse the privacy leakage of noisy stochastic gradient descent by modeling R\'enyi divergence dynamics with Langevin diffusions.

no code implementations • 3 Dec 2021 • Boris Muzellec, Adrien Vacher, Francis Bach, François-Xavier Vialard, Alessandro Rudi

It was recently shown that under smoothness conditions, the squared Wasserstein distance between two distributions could be efficiently computed with appealing statistical error upper bounds.

no code implementations • NeurIPS 2021 • Mathieu Even, Raphaël Berthier, Francis Bach, Nicolas Flammarion, Hadrien Hendrikx, Pierre Gaillard, Laurent Massoulié, Adrien Taylor

We introduce the ``continuized'' Nesterov acceleration, a close variant of Nesterov acceleration whose variables are indexed by a continuous time parameter.

1 code implementation • 22 Nov 2021 • Boris Muzellec, Francis Bach, Alessandro Rudi

Shape constraints such as positive semi-definiteness (PSD) for matrices or convexity for functions play a central role in many applications in machine learning and sciences, including metric learning, optimal transport, and economics.

no code implementations • 29 Oct 2021 • Anant Raj, Francis Bach

Uncertainty sampling in active learning is heavily used in practice to reduce the annotation cost.

no code implementations • 20 Oct 2021 • Ulysse Marteau-Ferey, Francis Bach, Alessandro Rudi

In many areas of applied statistics and machine learning, generating an arbitrary number of independent and identically distributed (i. i. d.)

1 code implementation • 15 Oct 2021 • Francis Bach, Lenaïc Chizat

Many supervised machine learning methods are naturally cast as optimization problems.

no code implementations • 2 Jul 2021 • Yifan Sun, Francis Bach

We couple this with a screening rule which is safe in the convex case, converging to the true support at a rate $O(1/(\delta^2))$ where $\delta \geq 0$ measures how close the problem is to degeneracy.

1 code implementation • 18 Jun 2021 • Boris Muzellec, Francis Bach, Alessandro Rudi

Kernel mean embeddings are a popular tool that consists in representing probability measures by their infinite-dimensional mean embeddings in a reproducing kernel Hilbert space.

1 code implementation • 10 Jun 2021 • Mathieu Even, Raphaël Berthier, Francis Bach, Nicolas Flammarion, Pierre Gaillard, Hadrien Hendrikx, Laurent Massoulié, Adrien Taylor

We introduce the continuized Nesterov acceleration, a close variant of Nesterov acceleration whose variables are indexed by a continuous time parameter.

1 code implementation • NeurIPS 2021 • Hadi Daneshmand, Amir Joudaki, Francis Bach

This paper underlines a subtle property of batch-normalization (BN): Successive batch normalizations with random linear transformations make hidden representations increasingly orthogonal across layers of a deep neural network.

no code implementations • 31 May 2021 • Alex Nowak-Vila, Alessandro Rudi, Francis Bach

The resulting loss is also a generalization of the binary support vector machine and it is consistent under milder conditions on the discrete loss.

no code implementations • 11 Feb 2021 • Raphaël Berthier, Francis Bach, Nicolas Flammarion, Pierre Gaillard, Adrien Taylor

We introduce the "continuized" Nesterov acceleration, a close variant of Nesterov acceleration whose variables are indexed by a continuous time parameter.

Distributed, Parallel, and Cluster Computing Optimization and Control

1 code implementation • 4 Feb 2021 • Vivien Cabannes, Francis Bach, Alessandro Rudi

Machine learning approached through supervised learning requires expensive annotation of data.

no code implementations • 1 Feb 2021 • Vivien Cabannes, Alessandro Rudi, Francis Bach

Discrete supervised learning problems such as classification are often tackled by introducing a continuous surrogate problem akin to regression.

no code implementations • 13 Jan 2021 • Adrien Vacher, Boris Muzellec, Alessandro Rudi, Francis Bach, Francois-Xavier Vialard

It is well-known that plug-in statistical estimation of optimal transport suffers from the curse of dimensionality.

Statistics Theory Optimization and Control Statistics Theory 62G05

no code implementations • 22 Dec 2020 • Alessandro Rudi, Ulysse Marteau-Ferey, Francis Bach

We consider the global minimization of smooth functions based solely on function evaluations.

no code implementations • NeurIPS 2020 • Quentin Berthet, Mathieu Blondel, Olivier Teboul, Marco Cuturi, Jean-Philippe Vert, Francis Bach

Machine learning pipelines often rely on optimizers procedures to make discrete decisions (e. g., sorting, picking closest neighbors, or shortest paths).

no code implementations • NeurIPS 2020 • Hadi Daneshmand, Jonas Kohler, Francis Bach, Thomas Hofmann, Aurelien Lucchi

Randomly initialized neural networks are known to become harder to train with increasing depth, unless architectural enhancements like residual connections and batch normalization are used.

no code implementations • 2 Oct 2020 • Robert M. Gower, Mark Schmidt, Francis Bach, Peter Richtarik

Stochastic optimization lies at the heart of machine learning, and its cornerstone is stochastic gradient descent (SGD), a method introduced over 60 years ago.

1 code implementation • ICLR 2021 • Alberto Bietti, Francis Bach

Deep networks are often considered to be more expressive than shallow ones in terms of approximation.

2 code implementations • NeurIPS 2021 • Vivien Cabannes, Loucas Pillaud-Vivien, Francis Bach, Alessandro Rudi

As annotations of data can be scarce in large-scale practical problems, leveraging unlabelled examples is one of the most important aspects of machine learning.

no code implementations • NeurIPS 2020 • Ulysse Marteau-Ferey, Francis Bach, Alessandro Rudi

The paper is complemented by an experimental evaluation of the model showing its effectiveness in terms of formulation, algorithmic derivation and practical results on the problems of density estimation, regression with heteroscedastic errors, and multiple quantile regression.

1 code implementation • ICML 2020 • Alex Nowak-Vila, Francis Bach, Alessandro Rudi

Max-margin methods for binary classification such as the support vector machine (SVM) have been extended to the structured prediction setting under the name of max-margin Markov networks ($M^3N$), or more generally structural SVMs.

no code implementations • 16 Jun 2020 • Thomas Eboli, Alex Nowak-Vila, Jian Sun, Francis Bach, Jean Ponce, Alessandro Rudi

We present a novel approach to image restoration that leverages ideas from localized structured prediction and non-linear multi-task learning.

no code implementations • NeurIPS 2020 • Raphaël Berthier, Francis Bach, Pierre Gaillard

In the context of statistical supervised learning, the noiseless linear model assumes that there exists a deterministic linear relation $Y = \langle \theta_*, X \rangle$ between the random output $Y$ and the random feature vector $\Phi(U)$, a potentially non-linear transformation of the inputs $U$.

1 code implementation • 10 Jun 2020 • Mathieu Barré, Adrien Taylor, Francis Bach

In this work, we survey notions of inaccuracies that can be used when solving those intermediary optimization problems.

Optimization and Control Numerical Analysis Numerical Analysis

1 code implementation • 8 Jun 2020 • Théo Ryffel, Pierre Tholoniat, David Pointcheval, Francis Bach

We evaluate our end-to-end system for private inference between distant servers on standard neural networks such as AlexNet, VGG16 or ResNet18, and for private training on smaller networks like LeNet.

no code implementations • 30 Mar 2020 • Anant Raj, Francis Bach

For accelerated coordinate descent, we obtain a new algorithm that has better convergence properties than existing stochastic gradient methods in the interpolating regime.

no code implementations • 5 Mar 2020 • Alexandre Défossez, Léon Bottou, Francis Bach, Nicolas Usunier

We provide a simple proof of convergence covering both the Adam and Adagrad adaptive optimization algorithms when applied to smooth (possibly non-convex) objective functions with bounded gradients.

no code implementations • 3 Mar 2020 • Hadi Daneshmand, Jonas Kohler, Francis Bach, Thomas Hofmann, Aurelien Lucchi

Randomly initialized neural networks are known to become harder to train with increasing depth, unless architectural enhancements like residual connections and batch normalization are used.

2 code implementations • ICML 2020 • Vivien Cabannes, Alessandro Rudi, Francis Bach

Annotating datasets is one of the main costs in nowadays supervised learning.

no code implementations • 22 Feb 2020 • Yifan Sun, Francis Bach

The conditional gradient method (CGM) has been widely used for fast sparse approximation, having a low per iteration computational cost for structured sparse regularizers.

2 code implementations • 20 Feb 2020 • Quentin Berthet, Mathieu Blondel, Olivier Teboul, Marco Cuturi, Jean-Philippe Vert, Francis Bach

Machine learning pipelines often rely on optimization procedures to make discrete decisions (e. g., sorting, picking closest neighbors, or shortest paths).

no code implementations • ICML 2020 • Marin Ballu, Quentin Berthet, Francis Bach

We show that this algorithm can be extended to other tasks, including estimation of Wasserstein barycenters.

1 code implementation • 11 Feb 2020 • Lenaic Chizat, Francis Bach

Neural networks trained to minimize the logistic (a. k. a.

no code implementations • 7 Feb 2020 • Francis Bach

Richardson extrapolation is a classical technique from numerical analysis that can improve the approximation error of an estimation method by combining linearly several estimates obtained from different values of one of its hyperparameters, without the need to know in details the inner structure of the original estimation method.

1 code implementation • NeurIPS 2019 • Théo Ryffel, David Pointcheval, Francis Bach, Edouard Dufour-Sans, Romain Gay

Machine learning on encrypted data has received a lot of attention thanks to recent breakthroughs in homomorphic encryption and secure multi-party computation.

1 code implementation • 27 Nov 2019 • Alexandre Défossez, Nicolas Usunier, Léon Bottou, Francis Bach

Source separation for music is the task of isolating contributions, or stems, from different instruments recorded individually and arranged together to form a song.

Ranked #3 on Multi-task Audio Source Seperation on MTASS

no code implementations • NeurIPS 2019 • Ali Kavis, Kfir. Y. Levy, Francis Bach, Volkan Cevher

To the best of our knowledge, this is the first adaptive, unified algorithm that achieves the optimal rates in the constrained setting.

1 code implementation • 3 Sep 2019 • Alexandre Défossez, Nicolas Usunier, Léon Bottou, Francis Bach

We study the problem of source separation for music using deep learning with four known sources: drums, bass, vocals and other accompaniments.

1 code implementation • NeurIPS 2019 • Othmane Sebbouh, Nidham Gazagnadou, Samy Jelassi, Francis Bach, Robert M. Gower

Among the very first variance reduced stochastic methods for solving the empirical risk minimization problem was the SVRG method (Johnson & Zhang 2013).

1 code implementation • NeurIPS 2019 • Ulysse Marteau-Ferey, Francis Bach, Alessandro Rudi

In this paper, we study large-scale convex optimization algorithms based on the Newton method applied to regularized generalized self-concordant losses, which include logistic regression and softmax regression.

no code implementations • 20 Jun 2019 • Francis Bach

We consider deterministic Markov decision processes (MDPs) and apply max-plus algebra tools to approximate the value iteration algorithm by a smaller-dimensional iteration based on a representation on dictionaries of value functions.

1 code implementation • NeurIPS 2019 • K. S. Sesh Kumar, Francis Bach, Thomas Pock

We consider the problem of minimizing the sum of submodular set functions assuming minimization oracles of each summand function.

3 code implementations • 24 May 2019 • Theo Ryffel, Edouard Dufour-Sans, Romain Gay, Francis Bach, David Pointcheval

Machine learning on encrypted data has received a lot of attention thanks to recent breakthroughs in homomorphic encryption and secure multi-party computation.

1 code implementation • NeurIPS 2019 • Gauthier Gidel, Francis Bach, Simon Lacoste-Julien

When optimizing over-parameterized models, such as deep neural networks, a large set of parameters can achieve zero training error.

1 code implementation • CVPR 2019 • Huy V. Vo, Francis Bach, Minsu Cho, Kai Han, Yann Lecun, Patrick Perez, Jean Ponce

Learning with complete or partial supervision is powerful but relies on ever-growing human annotation efforts.

Ranked #2 on Single-object colocalization on Object Discovery

1 code implementation • 11 Feb 2019 • Dmitry Babichev, Dmitrii Ostrovskii, Francis Bach

We develop efficient algorithms to train $\ell_1$-regularized linear classifiers with large dimensionality $d$ of the feature space, number of classes $k$, and sample size $n$.

no code implementations • 8 Feb 2019 • Ulysse Marteau-Ferey, Dmitrii Ostrovskii, Francis Bach, Alessandro Rudi

We consider learning methods based on the regularization of a convex empirical risk by a squared Hilbertian norm, a setting that includes linear predictors and non-linear predictors through positive-definite kernels.

no code implementations • 5 Feb 2019 • Alex Nowak-Vila, Francis Bach, Alessandro Rudi

In this work we provide a theoretical framework for structured prediction that generalizes the existing theory of surrogate methods for binary and multiclass classification based on estimating conditional probabilities with smooth convex surrogates (e. g. logistic regression).

no code implementations • 5 Feb 2019 • Francis Bach, Kfir. Y. Levy

We consider variational inequalities coming from monotone operators, a setting that includes convex minimization and convex-concave saddle-point problems.

1 code implementation • 3 Feb 2019 • Adrien Taylor, Francis Bach

We use the approach for analyzing deterministic and stochastic first-order methods under different assumptions on the nature of the stochastic noise.

no code implementations • 28 Jan 2019 • Hadrien Hendrikx, Francis Bach, Laurent Massoulié

In this work, we study the problem of minimizing the sum of strongly convex functions split over a network of $n$ nodes.

Optimization and Control Distributed, Parallel, and Cluster Computing

no code implementations • 24 Jan 2019 • Anastasia Podosinnikova, Amelia Perry, Alexander Wein, Francis Bach, Alexandre d'Aspremont, David Sontag

Moreover, we conjecture that the proposed program recovers a mixing component at the rate k < p^2/4 and prove that a mixing component can be recovered with high probability when k < (2 - epsilon) p log p when the original components are sampled uniformly at random on the hyper sphere.

1 code implementation • NeurIPS 2019 • Lenaic Chizat, Edouard Oyallon, Francis Bach

In a series of recent theoretical works, it was shown that strongly over-parameterized neural networks trained with gradient-based methods could converge exponentially fast to zero training loss, with their parameters hardly varying.

no code implementations • NeurIPS 2019 • Jason Altschuler, Francis Bach, Alessandro Rudi, Jonathan Niles-Weed

The Sinkhorn "distance", a variant of the Wasserstein distance with entropic regularization, is an increasingly popular tool in machine learning and statistical inference.

no code implementations • 21 Nov 2018 • Tatiana Shpakova, Francis Bach, Anton Osokin

We consider the structured-output prediction problem through probabilistic approaches and generalize the "perturb-and-MAP" framework to more challenging weighted Hamming losses, which are crucial in applications.

1 code implementation • NeurIPS 2018 • Alexandre Défossez, Neil Zeghidour, Nicolas Usunier, Léon Bottou, Francis Bach

On the generalization task of synthesizing notes for pairs of pitch and instrument not seen during training, SING produces audio with significantly improved perceptual quality compared to a state-of-the-art autoencoder based on WaveNet as measured by a Mean Opinion Score (MOS), and is about 32 times faster for training and 2, 500 times faster for inference.

no code implementations • 16 Oct 2018 • Sharan Vaswani, Francis Bach, Mark Schmidt

Under this condition, we prove that constant step-size stochastic gradient descent (SGD) with Nesterov acceleration matches the convergence rate of the deterministic accelerated method for both convex and strongly-convex functions.

no code implementations • 16 Oct 2018 • Alex Nowak-Vila, Francis Bach, Alessandro Rudi

The problem of devising learning strategies for discrete losses (e. g., multilabeling, ranking) is currently addressed with methods and theoretical analyses ad-hoc for each loss.

1 code implementation • 16 Oct 2018 • Dmitrii Ostrovskii, Francis Bach

We demonstrate how self-concordance of the loss allows to characterize the critical sample size sufficient to guarantee a chi-square type in-probability bound for the excess risk.

no code implementations • 5 Oct 2018 • Hadrien Hendrikx, Francis Bach, Laurent Massoulié

Applying $ESDACD$ to quadratic local functions leads to an accelerated randomized gossip algorithm of rate $O( \sqrt{\theta_{\rm gossip}/n})$ where $\theta_{\rm gossip}$ is the rate of the standard randomized gossip.

no code implementations • NeurIPS 2019 • Carlo Ciliberto, Francis Bach, Alessandro Rudi

Key to structured prediction is exploiting the problem structure to simplify the learning process.

1 code implementation • 1 Jun 2018 • Damien Scieur, Edouard Oyallon, Alexandre d'Aspremont, Francis Bach

The Regularized Nonlinear Acceleration (RNA) algorithm is an acceleration method capable of improving the rate of convergence of many optimization schemes such as gradient descend, SAGA or SVRG.

no code implementations • NeurIPS 2018 • Kevin Scaman, Francis Bach, Sébastien Bubeck, Yin Tat Lee, Laurent Massoulié

Under the global regularity assumption, we provide a simple yet efficient algorithm called distributed randomized smoothing (DRS) based on a local smoothing of the objective function, and show that DRS is within a $d^{1/4}$ multiplicative factor of the optimal convergence rate, where $d$ is the underlying dimension.

Optimization and Control

no code implementations • NeurIPS 2018 • Loucas Pillaud-Vivien, Alessandro Rudi, Francis Bach

We consider stochastic gradient descent (SGD) for least-squares regression with potentially several passes over the data.

1 code implementation • 25 May 2018 • Pierre Ablin, Alexandre Gramfort, Jean-François Cardoso, Francis Bach

We derive an online algorithm for the streaming setting, and an incremental algorithm for the finite sum setting, with the following benefits.

no code implementations • 24 May 2018 • Damien Scieur, Edouard Oyallon, Alexandre d'Aspremont, Francis Bach

Regularized nonlinear acceleration (RNA) estimates the minimum of a function by post-processing iterates from an algorithm such as the gradient method.

no code implementations • NeurIPS 2018 • Lenaic Chizat, Francis Bach

Many tasks in machine learning and signal processing can be solved by minimizing a convex function of a measure.

1 code implementation • 22 May 2018 • Raphaël Berthier, Francis Bach, Pierre Gaillard

We develop a method solving the gossip problem that depends only on the spectral dimension of the network, that is, in the communication network set-up, the dimension of the space in which the agents live.

no code implementations • NeurIPS 2018 • Edouard Pauwels, Francis Bach, Jean-Philippe Vert

Statistical leverage scores emerged as a fundamental tool for matrix sketching and column sampling with applications to low rank approximation, regression, random feature learning and quadrature.

no code implementations • 16 Apr 2018 • Dmitry Babichev, Francis Bach

We propose averaging moment parameters instead of natural parameters for constant-step-size stochastic gradient descent.

no code implementations • 26 Feb 2018 • Nilesh Tripuraneni, Nicolas Flammarion, Francis Bach, Michael. I. Jordan

We consider the minimization of a function defined on a Riemannian manifold $\mathcal{M}$ accessible only through unbiased estimates of its gradients.

no code implementations • 13 Dec 2017 • Loucas Pillaud-Vivien, Alessandro Rudi, Francis Bach

We consider binary classification problems with positive definite kernels and square loss, and study the convergence rates of stochastic gradient methods.

no code implementations • NeurIPS 2017 • Damien Scieur, Francis Bach, Alexandre d'Aspremont

Here, we study extrapolation methods in a stochastic setting, where the iterates are produced by either a simple or an accelerated stochastic gradient algorithm.

no code implementations • NeurIPS 2017 • Damien Scieur, Vincent Roulet, Francis Bach, Alexandre d'Aspremont

We show that accelerated optimization methods can be seen as particular instances of multi-step integration schemes from numerical analysis, applied to the gradient flow equation.

no code implementations • 6 Nov 2017 • Alexandre Défossez, Francis Bach

We study a new aggregation operator for gradients coming from a mini-batch for stochastic gradient (SG) methods that allows a significant speed-up in the case of sparse optimization problems.

1 code implementation • 20 Oct 2017 • Robert M. Gower, Nicolas Le Roux, Francis Bach

Our goal is to improve variance reducing stochastic methods through better control variates.

no code implementations • 17 Oct 2017 • Marwa El Halabi, Francis Bach, Volkan Cevher

We consider the homogeneous and the non-homogeneous convex relaxations for combinatorial penalty functions defined on support sets.

no code implementations • 5 Sep 2017 • Sashank J. Reddi, Manzil Zaheer, Suvrit Sra, Barnabas Poczos, Francis Bach, Ruslan Salakhutdinov, Alexander J. Smola

A central challenge to using first-order methods for optimizing nonconvex problems is the presence of saddle points.

no code implementations • NeurIPS 2018 • Francis Bach

We consider the minimization of submodular functions subject to ordering constraints.

no code implementations • 20 Jul 2017 • Aymeric Dieuleveut, Alain Durmus, Francis Bach

We consider the minimization of an objective function given access to unbiased estimates of its gradient through stochastic gradient descent (SGD) with constant step-size.

no code implementations • CVPR 2017 • Rafael S. Rezende, Joaquin Zepeda, Jean Ponce, Francis Bach, Patrick Perez

Zepeda and Perez have recently demonstrated the promise of the exemplar SVM (ESVM) as a feature encoder for image retrieval.

1 code implementation • NeurIPS 2017 • Anton Osokin, Francis Bach, Simon Lacoste-Julien

We provide novel theoretical insights on structured prediction in the context of efficient convex surrogate loss minimization with consistency guarantees.

1 code implementation • ICML 2017 • Kevin Scaman, Francis Bach, Sébastien Bubeck, Yin Tat Lee, Laurent Massoulié

For centralized (i. e. master/slave) algorithms, we show that distributing Nesterov's accelerated gradient descent is optimal and achieves a precision $\varepsilon > 0$ in time $O(\sqrt{\kappa_g}(1+\Delta\tau)\ln(1/\varepsilon))$, where $\kappa_g$ is the condition number of the (global) function to optimize, $\Delta$ is the diameter of the network, and $\tau$ (resp.

no code implementations • 21 Feb 2017 • Nicolas Flammarion, Francis Bach

We consider the minimization of composite objective functions composed of the expectation of quadratic functions and an arbitrary convex function.

no code implementations • 19 Oct 2016 • Christophe Dupuy, Francis Bach

We propose a new class of determinantal point processes (DPPs) which can be manipulated for inference and parameter learning in potentially sublinear time in the number of items.

no code implementations • 29 Aug 2016 • Nicolas Flammarion, Balamurugan Palaniappan, Francis Bach

Clustering high-dimensional data often requires some form of dimensionality reduction, where clustered variables are separated from "noise-looking" variables.

no code implementations • NeurIPS 2016 • Tatiana Shpakova, Francis Bach

Then, to learn parameters, given that our approximation of the log-partition function is an expectation (over our own randomization), we use a stochastic subgradient technique to maximize a lower-bound on the log-likelihood.

no code implementations • NeurIPS 2016 • Genevay Aude, Marco Cuturi, Gabriel Peyré, Francis Bach

We instantiate these ideas in three different setups: (i) when comparing a discrete distribution to another, we show that incremental stochastic optimization schemes can beat Sinkhorn's algorithm, the current state-of-the-art finite dimensional OT solver; (ii) when comparing a discrete distribution to a continuous density, a semi-discrete reformulation of the dual program is amenable to averaged stochastic gradient descent, leading to better performance than approximately solving the problem by discretization ; (iii) when dealing with two continuous densities, we propose a stochastic gradient descent over a reproducing kernel Hilbert space (RKHS).

no code implementations • NeurIPS 2016 • Pascal Germain, Francis Bach, Alexandre Lacoste, Simon Lacoste-Julien

That is, for the negative log-likelihood loss function, we show that the minimization of PAC-Bayesian generalization risk bounds maximizes the Bayesian marginal likelihood.

no code implementations • 26 May 2016 • Francis Bach, Vianney Perchet

The minimization of convex functions which are only available through partial and noisy information is a key methodological problem in many disciplines.

no code implementations • NeurIPS 2016 • P. Balamurugan, Francis Bach

We consider convex-concave saddle-point problems where the objective functions may be split in many components, and extend recent stochastic variance reduction methods (such as SVRG or SAGA) to provide the first large-scale linearly convergent algorithms for this class of problems which is common in machine learning.

no code implementations • 8 Mar 2016 • Christophe Dupuy, Francis Bach

We first propose an unified treatment of online inference for latent variable models from a non-canonical exponential family, and draw explicit links between several previously proposed frequentist or Bayesian methods.

no code implementations • 29 Feb 2016 • Anastasia Podosinnikova, Francis Bach, Simon Lacoste-Julien

We introduce three novel semi-parametric extensions of probabilistic canonical correlation analysis with identifiability guarantees.

no code implementations • 17 Feb 2016 • Aymeric Dieuleveut, Nicolas Flammarion, Francis Bach

We consider the optimization of a quadratic objective function whose gradients are only accessible through a stochastic oracle that returns the gradient at any given point plus a zero-mean finite variance random error.

no code implementations • NeurIPS 2015 • Rakesh Shivanna, Bibaswan K. Chatterjee, Raman Sankaran, Chiranjib Bhattacharyya, Francis Bach

We propose an alternative PAC-based bound, which do not depend on the VC dimension of the underlying function class, but is related to the famous Lov\'{a}sz~$\vartheta$ function.

no code implementations • 2 Nov 2015 • Francis Bach

A key element in many of the algorithms and analyses is the possibility of extending the submodular set-function to a convex function, which opens up tools from convex optimization.

no code implementations • NeurIPS 2015 • Anastasia Podosinnikova, Francis Bach, Simon Lacoste-Julien

We consider moment matching techniques for estimation in Latent Dirichlet Allocation (LDA).

no code implementations • 16 Jun 2015 • Vincent Roulet, Fajwel Fogel, Alexandre d'Aspremont, Francis Bach

We study supervised learning problems using clustering constraints to impose structure on either features or samples, seeking to help both prediction and interpretation.

no code implementations • 5 Jun 2015 • Rémi Lajugie, Piotr Bojanowski, Sylvain Arlot, Francis Bach

In this paper, we address the problem of multi-label classification.

no code implementations • ICCV 2015 • Piotr Bojanowski, Rémi Lajugie, Edouard Grave, Francis Bach, Ivan Laptev, Jean Ponce, Cordelia Schmid

Given vectorial features for both video and text, we propose to cast this task as a temporal assignment problem, with an implicit linear mapping between the two feature modalities.

no code implementations • 7 Apr 2015 • Nicolas Flammarion, Francis Bach

We show that accelerated gradient descent, averaged gradient descent and the heavy-ball method for non-strongly-convex problems may be reformulated as constant parameter second-order difference equation algorithms, where stability of the system is equivalent to convergence at rate O(1/n 2), where n is the number of iterations.

no code implementations • 10 Mar 2015 • Nino Shervashidze, Francis Bach

Structured sparsity has recently emerged in statistics, machine learning and signal processing as a promising paradigm for learning in high-dimensional settings.

no code implementations • 5 Mar 2015 • K. S. Sesh Kumar, Alvaro Barbero, Stefanie Jegelka, Suvrit Sra, Francis Bach

By exploiting results from convex and submodular theory, we reformulate the quadratic energy minimization problem as a total variation denoising problem, which, when viewed geometrically, enables the use of projection and reflection based convex methods.

no code implementations • 24 Feb 2015 • Francis Bach

We show that kernel-based quadrature rules for computing integrals can be seen as a special case of random feature expansions for positive definite kernels, for a particular decomposition that always exists for such kernels.

no code implementations • 9 Jan 2015 • Simon Lacoste-Julien, Fredrik Lindsten, Francis Bach

Recently, the Frank-Wolfe optimization algorithm was suggested as a procedure to obtain adaptive quadrature rules for integrals of functions in a reproducing kernel Hilbert space (RKHS) with a potentially faster rate of convergence than Monte Carlo integration (and "kernel herding" was shown to be a special case of this procedure).

no code implementations • 30 Dec 2014 • Francis Bach

Moreover, when using sparsity-inducing norms on the input weights, we show that high-dimensional non-linear variable selection may be achieved, without any strong assumption regarding the data and with a total number of variables potentially exponential in the number of ob-servations.

2 code implementations • 4 Dec 2014 • Felipe Yanez, Francis Bach

Non-negative matrix factorization (NMF) approximates a given matrix as a product of two non-negative matrices.

no code implementations • 29 Nov 2014 • Alexandre Défossez, Francis Bach

Our analysis leads to new insights into stochastic approximation algorithms: (a) it gives a tighter bound on the allowed step-size; (b) the generalization error may be divided into a variance term which is decaying as O(1/n), independently of the step-size $\gamma$, and a bias term that decays as O(1/$\gamma$ 2 n 2); (c) when allowing non-uniform sampling, the choice of a good sampling density depends on whether the variance or bias terms dominate.

no code implementations • 12 Nov 2014 • Julien Mairal, Francis Bach, Jean Ponce

In recent years, a large amount of multi-disciplinary research has been conducted on sparse models and their applications.

no code implementations • NeurIPS 2014 • Damien Garreau, Rémi Lajugie, Sylvain Arlot, Francis Bach

The learning examples for this task are time series for which the true alignment is known.

no code implementations • 11 Aug 2014 • Fabian Pedregosa, Francis Bach, Alexandre Gramfort

We will see that, for a family of surrogate loss functions that subsumes support vector ordinal regression and ORBoosting, consistency can be fully characterized by the derivative of a real-valued function at zero, as happens for convex margin-based surrogates in binary classification.

no code implementations • 19 Jul 2014 • Rémi Gribonval, Rodolphe Jenatton, Francis Bach

A popular approach within the signal processing and machine learning communities consists in modelling signals as sparse linear combinations of atoms selected from a learned dictionary.

no code implementations • 4 Jul 2014 • Piotr Bojanowski, Rémi Lajugie, Francis Bach, Ivan Laptev, Jean Ponce, Cordelia Schmid, Josef Sivic

We are given a set of video clips, each one annotated with an {\em ordered} list of actions, such as "walk" then "sit" then "answer phone" extracted from, for example, the associated text script.

5 code implementations • NeurIPS 2014 • Aaron Defazio, Francis Bach, Simon Lacoste-Julien

In this work we introduce a new optimisation method called SAGA in the spirit of SAG, SDCA, MISO and SVRG, a set of recently proposed incremental gradient algorithms with fast linear convergence rates.

no code implementations • 20 Mar 2014 • Matthias Seibert, Martin Kleinsteuber, Rémi Gribonval, Rodolphe Jenatton, Francis Bach

The main goal of this paper is to provide a sample complexity estimate that controls to what extent the empirical average deviates from the cost function.

no code implementations • 14 Dec 2013 • Edouard Grave, Guillaume Obozinski, Francis Bach

Most natural language processing systems based on machine learning are not robust to domain shift.

no code implementations • 13 Dec 2013 • Rémi Gribonval, Rodolphe Jenatton, Francis Bach, Martin Kleinsteuber, Matthias Seibert

Many modern tools in machine learning and signal processing, such as sparse dictionary learning, principal component analysis (PCA), non-negative matrix factorization (NMF), $K$-means clustering, etc., rely on the factorization of a matrix obtained by concatenating high-dimensional vectors from a training collection.

no code implementations • NeurIPS 2013 • Fajwel Fogel, Rodolphe Jenatton, Francis Bach, Alexandre d'Aspremont

Seriation seeks to reconstruct a linear order between variables using unsorted similarity information.

no code implementations • NeurIPS 2013 • Stefanie Jegelka, Francis Bach, Suvrit Sra

A key component of our method is a formulation of the discrete submodular minimization problem as a continuous best approximation problem that is solved through a sequence of reflections, and its solution can be easily thresholded to obtain an optimal discrete solution.

no code implementations • 12 Sep 2013 • Francis Bach

We consider the factorization of a rectangular matrix $X $ into a positive linear combination of rank-one factors of the form $u v^\top$, where $u$ and $v$ belongs to certain sets $\mathcal{U}$ and $\mathcal{V}$, that may encode specific structures regarding the factors, such as positivity or sparsity.

2 code implementations • 10 Sep 2013 • Mark Schmidt, Nicolas Le Roux, Francis Bach

Further, in many cases the convergence rate of the new method is also faster than black-box deterministic gradient methods, in terms of the number of gradient evaluations.

no code implementations • 10 Sep 2013 • K. S. Sesh Kumar, Francis Bach

In a graphical model, the entropy of the joint distribution decomposes as a sum of marginal entropies of subsets of variables; moreover, for any distribution, the entropy of the closest distribution factorizing in the graphical model provides an bound on the entropy.

no code implementations • NeurIPS 2013 • Francis Bach, Eric Moulines

We consider the stochastic approximation problem where a convex function has to be minimized, given only the knowledge of unbiased estimates of its gradients at certain points, a framework which includes machine learning methods based on the minimization of the empirical risk.

no code implementations • 25 Mar 2013 • Francis Bach

In this paper, we consider supervised learning problems such as logistic regression and study the stochastic gradient method with averaging, in the usual stochastic approximation setting where observations are used only once.

no code implementations • 27 Nov 2012 • Francis Bach

Given a convex optimization problem and its dual, there are many possible first-order algorithms.

no code implementations • 9 Aug 2012 • Francis Bach

We consider supervised learning problems within the positive-definite kernel framework, such as kernel ridge regression, kernel logistic regression or the support vector machine.

no code implementations • 28 Nov 2011 • Francis Bach

Submodular functions are relevant to machine learning for at least two reasons: (1) some problems may be expressed directly as the optimization of submodular functions and (2) the lovasz extension of submodular functions provides a useful set of regularization functions for supervised and unsupervised learning.

no code implementations • 27 Sep 2010 • Julien Mairal, Francis Bach, Jean Ponce

Modeling data with linear combinations of a few elements from a learned dictionary has been the focus of much recent research in machine learning, neuroscience and signal processing.

no code implementations • 8 Sep 2009 • Rodolphe Jenatton, Guillaume Obozinski, Francis Bach

We present an extension of sparse PCA, or sparse dictionary learning, where the sparsity patterns of all dictionary elements are structured and constrained to belong to a prespecified set of shapes.

no code implementations • 9 Sep 2008 • Francis Bach

For supervised and unsupervised learning, positive definite kernels allow to use large and potentially infinite dimensional feature spaces with a computational cost that only depends on the number of observations.

2 code implementations • 8 Apr 2008 • Francis Bach

For various decays of the regularization parameter, we compute asymptotic equivalents of the probability of correct model selection (i. e., variable selection).

Cannot find the paper you are looking for? You can
Submit a new open access paper.

Contact us on:
hello@paperswithcode.com
.
Papers With Code is a free resource with all data licensed under CC-BY-SA.