Search Results for author: Pavel Izmailov

Found 25 papers, 22 papers with code

Faster variational inducing input Gaussian process classification

no code implementations18 Nov 2016 Pavel Izmailov, Dmitry Kropotov

However, the new lower bound depends on $O(m^2)$ variational parameter, which makes optimization challenging in case of big m. In this work we develop a new approach for training inducing input GP models for classification problems.

Classification Dimensionality Reduction +4

Tensor Train decomposition on TensorFlow (T3F)

2 code implementations5 Jan 2018 Alexander Novikov, Pavel Izmailov, Valentin Khrulkov, Michael Figurnov, Ivan Oseledets

Tensor Train decomposition is used across many branches of machine learning.

Mathematical Software Numerical Analysis

Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs

10 code implementations NeurIPS 2018 Timur Garipov, Pavel Izmailov, Dmitrii Podoprikhin, Dmitry Vetrov, Andrew Gordon Wilson

The loss functions of deep neural networks are complex and their geometric properties are not well understood.

Averaging Weights Leads to Wider Optima and Better Generalization

15 code implementations14 Mar 2018 Pavel Izmailov, Dmitrii Podoprikhin, Timur Garipov, Dmitry Vetrov, Andrew Gordon Wilson

Deep neural networks are typically trained by optimizing a loss function with an SGD variant, in conjunction with a decaying learning rate, until convergence.

Ranked #78 on Image Classification on CIFAR-100 (using extra training data)

Image Classification Stochastic Optimization

There Are Many Consistent Explanations of Unlabeled Data: Why You Should Average

2 code implementations ICLR 2019 Ben Athiwaratkun, Marc Finzi, Pavel Izmailov, Andrew Gordon Wilson

Presently the most successful approaches to semi-supervised learning are based on consistency regularization, whereby a model is trained to be robust to small perturbations of its inputs and parameters.

Domain Adaptation Semi-Supervised Image Classification

A Simple Baseline for Bayesian Uncertainty in Deep Learning

8 code implementations NeurIPS 2019 Wesley Maddox, Timur Garipov, Pavel Izmailov, Dmitry Vetrov, Andrew Gordon Wilson

We propose SWA-Gaussian (SWAG), a simple, scalable, and general purpose approach for uncertainty representation and calibration in deep learning.

Bayesian Inference Transfer Learning

Subspace Inference for Bayesian Deep Learning

1 code implementation17 Jul 2019 Pavel Izmailov, Wesley J. Maddox, Polina Kirichenko, Timur Garipov, Dmitry Vetrov, Andrew Gordon Wilson

Bayesian inference was once a gold standard for learning with neural networks, providing accurate full predictive distributions and well calibrated uncertainty.

Bayesian Inference Image Classification +2

Semi-Supervised Learning with Normalizing Flows

2 code implementations ICML 2020 Pavel Izmailov, Polina Kirichenko, Marc Finzi, Andrew Gordon Wilson

Normalizing flows transform a latent distribution through an invertible neural network for a flexible and pleasingly simple approach to generative modelling, while preserving an exact likelihood.

Semi-Supervised Image Classification Semi-Supervised Text Classification

Bayesian Deep Learning and a Probabilistic Perspective of Generalization

1 code implementation NeurIPS 2020 Andrew Gordon Wilson, Pavel Izmailov

The key distinguishing property of a Bayesian approach is marginalization, rather than using a single setting of weights.

Gaussian Processes

Learning Invariances in Neural Networks

1 code implementation22 Oct 2020 Gregory Benton, Marc Finzi, Pavel Izmailov, Andrew Gordon Wilson

Invariances to translations have imbued convolutional neural networks with powerful generalization properties.

Image Classification Molecular Property Prediction +2

What Are Bayesian Neural Network Posteriors Really Like?

3 code implementations29 Apr 2021 Pavel Izmailov, Sharad Vikram, Matthew D. Hoffman, Andrew Gordon Wilson

The posterior over Bayesian neural network (BNN) parameters is extremely high-dimensional and non-convex.

Data Augmentation Variational Inference

Does Knowledge Distillation Really Work?

2 code implementations NeurIPS 2021 Samuel Stanton, Pavel Izmailov, Polina Kirichenko, Alexander A. Alemi, Andrew Gordon Wilson

Knowledge distillation is a popular technique for training a small student network to emulate a larger teacher model, such as an ensemble of networks.

Knowledge Distillation

Dangers of Bayesian Model Averaging under Covariate Shift

1 code implementation NeurIPS 2021 Pavel Izmailov, Patrick Nicholson, Sanae Lotfi, Andrew Gordon Wilson

Approximate Bayesian inference for neural networks is considered a robust alternative to standard training, often providing good performance on out-of-distribution data.

Bayesian Inference

Bayesian Model Selection, the Marginal Likelihood, and Generalization

1 code implementation23 Feb 2022 Sanae Lotfi, Pavel Izmailov, Gregory Benton, Micah Goldblum, Andrew Gordon Wilson

We provide a partial remedy through a conditional marginal likelihood, which we show is more aligned with generalization, and practically valuable for large-scale hyperparameter learning, such as in deep kernel learning.

Model Selection Neural Architecture Search

On Uncertainty, Tempering, and Data Augmentation in Bayesian Classification

1 code implementation30 Mar 2022 Sanyam Kapoor, Wesley J. Maddox, Pavel Izmailov, Andrew Gordon Wilson

In Bayesian regression, we often use a Gaussian observation model, where we control the level of aleatoric uncertainty with a noise variance parameter.

Classification Data Augmentation

On Feature Learning in the Presence of Spurious Correlations

1 code implementation20 Oct 2022 Pavel Izmailov, Polina Kirichenko, Nate Gruver, Andrew Gordon Wilson

Deep classifiers are known to rely on spurious features $\unicode{x2013}$ patterns which are correlated with the target on the training data but not inherently relevant to the learning problem, such as the image backgrounds when classifying the foregrounds.

Simple and Fast Group Robustness by Automatic Feature Reweighting

1 code implementation19 Jun 2023 Shikai Qiu, Andres Potapczynski, Pavel Izmailov, Andrew Gordon Wilson

A major challenge to out-of-distribution generalization is reliance on spurious features -- patterns that are predictive of the class label in the training data distribution, but not causally related to the target.

Out-of-Distribution Generalization

Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision

no code implementations14 Dec 2023 Collin Burns, Pavel Izmailov, Jan Hendrik Kirchner, Bowen Baker, Leo Gao, Leopold Aschenbrenner, Yining Chen, Adrien Ecoffet, Manas Joglekar, Jan Leike, Ilya Sutskever, Jeff Wu

Widely used alignment techniques, such as reinforcement learning from human feedback (RLHF), rely on the ability of humans to supervise model behavior - for example, to evaluate whether a model faithfully followed instructions or generated safe outputs.

Can a Confident Prior Replace a Cold Posterior?

1 code implementation2 Mar 2024 Martin Marek, Brooks Paige, Pavel Izmailov

First, we introduce a "DirClip" prior that is practical to sample and nearly matches the performance of a cold posterior.

Image Classification

Cannot find the paper you are looking for? You can Submit a new open access paper.