Search Results for author: Olivier Bousquet

Found 35 papers, 10 papers with code

The Tradeoffs of Large Scale Learning

no code implementations NeurIPS 2007 Léon Bottou, Olivier Bousquet

This contribution develops a theoretical framework that takes into account the effect of approximate optimization on learning algorithms.

AdaGAN: Boosting Generative Models

1 code implementation NeurIPS 2017 Ilya Tolstikhin, Sylvain Gelly, Olivier Bousquet, Carl-Johann Simon-Gabriel, Bernhard Schölkopf

Generative Adversarial Networks (GAN) (Goodfellow et al., 2014) are an effective method for training generative models of complex data such as natural images.

From optimal transport to generative modeling: the VEGAN cookbook

1 code implementation22 May 2017 Olivier Bousquet, Sylvain Gelly, Ilya Tolstikhin, Carl-Johann Simon-Gabriel, Bernhard Schoelkopf

We study unsupervised generative modeling in terms of the optimal transport (OT) problem between true (but unknown) data distribution $P_X$ and the latent variable model distribution $P_G$.

Approximation and Convergence Properties of Generative Adversarial Learning

no code implementations NeurIPS 2017 Shuang Liu, Olivier Bousquet, Kamalika Chaudhuri

In this paper, we address these questions in a broad and unified setting by defining a notion of adversarial divergences that includes a number of recently proposed objective functions.

Toward Optimal Run Racing: Application to Deep Learning Calibration

no code implementations10 Jun 2017 Olivier Bousquet, Sylvain Gelly, Karol Kurach, Marc Schoenauer, Michele Sebag, Olivier Teytaud, Damien Vincent

This paper aims at one-shot learning of deep neural nets, where a highly parallel setting is considered to address the algorithm calibration problem - selecting the best neural architecture and learning hyper-parameter values depending on the dataset at hand.

One-Shot Learning Two-sample testing

Wasserstein Auto-Encoders

13 code implementations ICLR 2018 Ilya Tolstikhin, Olivier Bousquet, Sylvain Gelly, Bernhard Schoelkopf

We propose the Wasserstein Auto-Encoder (WAE)---a new algorithm for building a generative model of the data distribution.

Online Hyper-Parameter Optimization

no code implementations ICLR 2018 Damien Vincent, Sylvain Gelly, Nicolas Le Roux, Olivier Bousquet

We propose an efficient online hyperparameter optimization method which uses a joint dynamical system to evaluate the gradient with respect to the hyperparameters.

Hyperparameter Optimization

Gradient Descent Quantizes ReLU Network Features

no code implementations22 Mar 2018 Hartmut Maennel, Olivier Bousquet, Sylvain Gelly

Deep neural networks are often trained in the over-parametrized regime (i. e. with far more parameters than training examples), and understanding why the training converges to solutions that generalize remains an open problem.

Quantization

Assessing Generative Models via Precision and Recall

4 code implementations NeurIPS 2018 Mehdi S. M. Sajjadi, Olivier Bachem, Mario Lucic, Olivier Bousquet, Sylvain Gelly

Recent advances in generative modeling have led to an increased interest in the study of statistical divergences as means of model comparison.

Synthetic Data Generators: Sequential and Private

no code implementations9 Feb 2019 Olivier Bousquet, Roi Livni, Shay Moran

We study the sample complexity of private synthetic data generation over an unbounded sized class of statistical queries, and show that any class that is privately proper PAC learnable admits a private synthetic data generator (perhaps non-efficient).

Synthetic Data Generation

The Optimal Approximation Factor in Density Estimation

no code implementations10 Feb 2019 Olivier Bousquet, Daniel Kane, Shay Moran

We complement and extend this result by showing that: (i) the factor 3 can not be improved if one restricts the algorithm to output a density from $\mathcal{Q}$, and (ii) if one allows the algorithm to output arbitrary densities (e. g.\ a mixture of densities from $\mathcal{Q}$), then the approximation factor can be reduced to 2, which is optimal.

Density Estimation

Precision-Recall Curves Using Information Divergence Frontiers

no code implementations26 May 2019 Josip Djolonga, Mario Lucic, Marco Cuturi, Olivier Bachem, Olivier Bousquet, Sylvain Gelly

Despite the tremendous progress in the estimation of generative models, the development of tools for diagnosing their failures and assessing their performance has advanced at a much slower pace.

Image Generation Information Retrieval +1

Practical and Consistent Estimation of f-Divergences

1 code implementation NeurIPS 2019 Paul K. Rubenstein, Olivier Bousquet, Josip Djolonga, Carlos Riquelme, Ilya Tolstikhin

The estimation of an f-divergence between two probability distributions based on samples is a fundamental problem in statistics and machine learning.

BIG-bench Machine Learning Mutual Information Estimation +1

When can unlabeled data improve the learning rate?

no code implementations28 May 2019 Christina Göpfert, Shai Ben-David, Olivier Bousquet, Sylvain Gelly, Ilya Tolstikhin, Ruth Urner

In semi-supervised classification, one is given access both to labeled and unlabeled data.

Google Research Football: A Novel Reinforcement Learning Environment

1 code implementation25 Jul 2019 Karol Kurach, Anton Raichuk, Piotr Stańczyk, Michał Zając, Olivier Bachem, Lasse Espeholt, Carlos Riquelme, Damien Vincent, Marcin Michalski, Olivier Bousquet, Sylvain Gelly

Recent progress in the field of reinforcement learning has been accelerated by virtual learning environments such as video games, where novel algorithms and ideas can be quickly tested in a safe and reproducible manner.

Game of Football reinforcement-learning +1

Sharper bounds for uniformly stable algorithms

no code implementations17 Oct 2019 Olivier Bousquet, Yegor Klochkov, Nikita Zhivotovskiy

In a series of recent breakthrough papers by Feldman and Vondrak (2018, 2019), it was shown that the best known high probability upper bounds for uniformly stable learning algorithms due to Bousquet and Elisseef (2002) are sub-optimal in some natural regimes.

Generalization Bounds Learning Theory

Fast classification rates without standard margin assumptions

no code implementations28 Oct 2019 Olivier Bousquet, Nikita Zhivotovskiy

First, we consider classification with a reject option, namely Chow's reject option model, and show that by slightly lowering the impact of hard instances, a learning rate of order $O\left(\frac{d}{n}\log \frac{n}{d}\right)$ is always achievable in the agnostic setting by a specific learning algorithm.

Classification General Classification +1

Measuring Compositional Generalization: A Comprehensive Method on Realistic Data

3 code implementations ICLR 2020 Daniel Keysers, Nathanael Schärli, Nathan Scales, Hylke Buisman, Daniel Furrer, Sergii Kashubin, Nikola Momchev, Danila Sinopalnikov, Lukasz Stafiniak, Tibor Tihon, Dmitry Tsarkov, Xiao Wang, Marc van Zee, Olivier Bousquet

We present a large and realistic natural language question answering dataset that is constructed according to this method, and we use it to analyze the compositional generalization ability of three machine learning architectures.

BIG-bench Machine Learning Question Answering +1

Predicting Neural Network Accuracy from Weights

1 code implementation26 Feb 2020 Thomas Unterthiner, Daniel Keysers, Sylvain Gelly, Olivier Bousquet, Ilya Tolstikhin

Furthermore, the predictors are able to rank networks trained on different, unobserved datasets and with different architectures.

Proper Learning, Helly Number, and an Optimal SVM Bound

no code implementations24 May 2020 Olivier Bousquet, Steve Hanneke, Shay Moran, Nikita Zhivotovskiy

It has been recently shown by Hanneke (2016) that the optimal sample complexity of PAC learning for any VC class C is achieved by a particular improper learning algorithm, which outputs a specific majority-vote of hypotheses in C. This leaves the question of when this bound can be achieved by proper learning algorithms, which are restricted to always output a hypothesis from C. In this paper we aim to characterize the classes for which the optimal sample complexity can be achieved by a proper learning algorithm.

PAC learning

What Do Neural Networks Learn When Trained With Random Labels?

no code implementations NeurIPS 2020 Hartmut Maennel, Ibrahim Alabdulmohsin, Ilya Tolstikhin, Robert J. N. Baldock, Olivier Bousquet, Sylvain Gelly, Daniel Keysers

We show how this alignment produces a positive transfer: networks pre-trained with random labels train faster downstream compared to training from scratch even after accounting for simple effects, such as weight scaling.

Memorization

Synthetic Data Generators -- Sequential and Private

no code implementations NeurIPS 2020 Olivier Bousquet, Roi Livni, Shay Moran

We study the sample complexity of private synthetic data generation over an unbounded sized class of statistical queries, and show that any class that is privately proper PAC learnable admits a private synthetic data generator (perhaps non-efficient).

Synthetic Data Generation

Statistically Near-Optimal Hypothesis Selection

no code implementations17 Aug 2021 Olivier Bousquet, Mark Braverman, Klim Efremenko, Gillat Kol, Shay Moran

We derive an optimal $2$-approximation learning strategy for the Hypothesis Selection problem, outputting $q$ such that $\mathsf{TV}(p, q) \leq2 \cdot opt + \eps$, with a (nearly) optimal sample complexity of~$\tilde O(\log n/\epsilon^2)$.

PAC learning

Monotone Learning

no code implementations10 Feb 2022 Olivier Bousquet, Amit Daniely, Haim Kaplan, Yishay Mansour, Shay Moran, Uri Stemmer

Our transformation readily implies monotone learners in a variety of contexts: for example it extends Pestov's result to classification tasks with an arbitrary number of labels.

Binary Classification Classification +1

Fine-Grained Distribution-Dependent Learning Curves

no code implementations31 Aug 2022 Olivier Bousquet, Steve Hanneke, Shay Moran, Jonathan Shafer, Ilya Tolstikhin

We solve this problem in a principled manner, by introducing a combinatorial dimension called VCL that characterizes the best $d'$ for which $d'/n$ is a strong minimax lower bound.

Learning Theory PAC learning

The Dynamics of Sharpness-Aware Minimization: Bouncing Across Ravines and Drifting Towards Wide Minima

no code implementations4 Oct 2022 Peter L. Bartlett, Philip M. Long, Olivier Bousquet

We consider Sharpness-Aware Minimization (SAM), a gradient-based optimization method for deep networks that has exhibited performance improvements on image and language prediction problems.

Cannot find the paper you are looking for? You can Submit a new open access paper.