Search Results for author: Andrew Gordon Wilson

Found 113 papers, 86 papers with code

What Are Bayesian Neural Network Posteriors Really Like?

3 code implementations • 29 Apr 2021 • Pavel Izmailov, Sharad Vikram, Matthew D. Hoffman, Andrew Gordon Wilson

The posterior over Bayesian neural network (BNN) parameters is extremely high-dimensional and non-convex.

32,793

Paper
Code

Product Kernel Interpolation for Scalable Gaussian Processes

1 code implementation • 24 Feb 2018 • Jacob R. Gardner, Geoff Pleiss, Ruihan Wu, Kilian Q. Weinberger, Andrew Gordon Wilson

Recent work shows that inference for Gaussian processes can be performed efficiently using iterative methods that rely only on matrix-vector multiplications (MVMs).

Gaussian Processes

3,406

Paper
Code

GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration

4 code implementations • NeurIPS 2018 • Jacob R. Gardner, Geoff Pleiss, David Bindel, Kilian Q. Weinberger, Andrew Gordon Wilson

Despite advances in scalable models, the inference tools used for Gaussian processes (GPs) have yet to fully capitalize on developments in computing hardware.

Gaussian Processes

3,406

Paper
Code

Exact Gaussian Processes on a Million Data Points

3 code implementations • NeurIPS 2019 • Ke Alexander Wang, Geoff Pleiss, Jacob R. Gardner, Stephen Tyree, Kilian Q. Weinberger, Andrew Gordon Wilson

Gaussian processes (GPs) are flexible non-parametric models, with a capacity that grows with the available data.

Gaussian Processes

3,406

Paper
Code

Constant-Time Predictive Distributions for Gaussian Processes

1 code implementation • ICML 2018 • Geoff Pleiss, Jacob R. Gardner, Kilian Q. Weinberger, Andrew Gordon Wilson

One of the most compelling features of Gaussian process (GP) regression is its ability to provide well-calibrated posterior distributions.

Gaussian Processes regression

3,405

Paper
Code

BoTorch: A Framework for Efficient Monte-Carlo Bayesian Optimization

2 code implementations • NeurIPS 2020 • Maximilian Balandat, Brian Karrer, Daniel R. Jiang, Samuel Daulton, Benjamin Letham, Andrew Gordon Wilson, Eytan Bakshy

Bayesian optimization provides sample-efficient global optimization for a broad range of applications, including automatic machine learning, engineering, physics, and experimental design.

Experimental Design

2,949

Paper
Code

Chronos: Learning the Language of Time Series

2 code implementations • 12 Mar 2024 • Abdul Fatir Ansari, Lorenzo Stella, Caner Turkmen, Xiyuan Zhang, Pedro Mercado, Huibin Shen, Oleksandr Shchur, Syama Sundar Rangapuram, Sebastian Pineda Arango, Shubham Kapoor, Jasper Zschiegner, Danielle C. Maddix, Michael W. Mahoney, Kari Torkkola, Andrew Gordon Wilson, Michael Bohlke-Schneider, Yuyang Wang

We introduce Chronos, a simple yet effective framework for pretrained probabilistic time series models.

Gaussian Processes Language Modelling +2

1,614

Paper
Code

Bayesian GAN

4 code implementations • NeurIPS 2017 • Yunus Saatchi, Andrew Gordon Wilson

Generative adversarial networks (GANs) can implicitly learn rich distributions over images, audio, and data which are hard to model with an explicit likelihood.

1,019

Paper
Code

Averaging Weights Leads to Wider Optima and Better Generalization

15 code implementations • 14 Mar 2018 • Pavel Izmailov, Dmitrii Podoprikhin, Timur Garipov, Dmitry Vetrov, Andrew Gordon Wilson

Deep neural networks are typically trained by optimizing a loss function with an SGD variant, in conjunction with a decaying learning rate, until convergence.

Ranked #78 on Image Classification on CIFAR-100 (using extra training data)

Image Classification Stochastic Optimization

949

Paper
Code

Fortuna: A Library for Uncertainty Quantification in Deep Learning

1 code implementation • 8 Feb 2023 • Gianluca Detommaso, Alberto Gasparin, Michele Donini, Matthias Seeger, Andrew Gordon Wilson, Cedric Archambeau

We present Fortuna, an open-source library for uncertainty quantification in deep learning.

Bayesian Inference Benchmarking +2

851

Paper
Code

Large Language Models Are Zero-Shot Time Series Forecasters

1 code implementation • NeurIPS 2023 • Nate Gruver, Marc Finzi, Shikai Qiu, Andrew Gordon Wilson

By encoding time series as a string of numerical digits, we can frame time series forecasting as next-token prediction in text.

Imputation Time Series +1

546

Paper
Code

A Simple Baseline for Bayesian Uncertainty in Deep Learning

8 code implementations • NeurIPS 2019 • Wesley Maddox, Timur Garipov, Pavel Izmailov, Dmitry Vetrov, Andrew Gordon Wilson

We propose SWA-Gaussian (SWAG), a simple, scalable, and general purpose approach for uncertainty representation and calibration in deep learning.

Bayesian Inference Transfer Learning

425

Paper
Code

CoLA: Exploiting Compositional Structure for Automatic and Efficient Numerical Linear Algebra

1 code implementation • NeurIPS 2023 • Andres Potapczynski, Marc Finzi, Geoff Pleiss, Andrew Gordon Wilson

In this paper, we propose a simple but general framework for large-scale linear algebra problems in machine learning, named CoLA (Compositional Linear Algebra).

CoLA Gaussian Processes

296

Paper
Code

Multimodal Word Distributions

2 code implementations • ACL 2017 • Ben Athiwaratkun, Andrew Gordon Wilson

Word embeddings provide point representations of words containing useful semantic information.

Word Embeddings Word Similarity

279

Paper
Code

Bayesian Optimization with Gradients

1 code implementation • NeurIPS 2017 • Jian Wu, Matthias Poloczek, Andrew Gordon Wilson, Peter I. Frazier

Bayesian optimization has been successful at global optimization of expensive-to-evaluate multimodal objective functions.

Bayesian Optimization

254

Paper
Code

Generalizing Convolutional Neural Networks for Equivariance to Lie Groups on Arbitrary Continuous Data

2 code implementations • ICML 2020 • Marc Finzi, Samuel Stanton, Pavel Izmailov, Andrew Gordon Wilson

The translation equivariance of convolutional layers enables convolutional neural networks to generalize well on image problems.

Inductive Bias Translation

253

Paper
Code

Fast Adaptation with Linearized Neural Networks

1 code implementation • 2 Mar 2021 • Wesley J. Maddox, Shuai Tang, Pablo Garcia Moreno, Andrew Gordon Wilson, Andreas Damianou

The inductive biases of trained neural networks are difficult to understand and, consequently, to adapt to new settings.

Domain Adaptation Gaussian Processes +2

250

Paper
Code

Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs

10 code implementations • NeurIPS 2018 • Timur Garipov, Pavel Izmailov, Dmitrii Podoprikhin, Dmitry Vetrov, Andrew Gordon Wilson

The loss functions of deep neural networks are complex and their geometric properties are not well understood.

247

Paper
Code

Learning Scalable Deep Kernels with Recurrent Structure

2 code implementations • 27 Oct 2016 • Maruan Al-Shedivat, Andrew Gordon Wilson, Yunus Saatchi, Zhiting Hu, Eric P. Xing

To model such structure, we propose expressive closed-form kernel functions for Gaussian processes.

Autonomous Driving Gaussian Processes +1

244

Paper
Code

Thoughts on Massively Scalable Gaussian Processes

3 code implementations • 5 Nov 2015 • Andrew Gordon Wilson, Christoph Dann, Hannes Nickisch

This multi-level circulant approximation allows one to unify the orthogonal computational benefits of fast Kronecker and Toeplitz approaches, and is significantly faster than either approach in isolation; 2) local kernel interpolation and inducing points to allow for arbitrarily located data inputs, and $O(1)$ test time predictions; 3) exploiting block-Toeplitz Toeplitz-block structure (BTTB), which enables fast inference and learning when multidimensional Kronecker structure is not present; and 4) projections of the input space to flexibly model correlated inputs and high dimensional data.

Gaussian Processes

244

Paper
Code

A Practical Method for Constructing Equivariant Multilayer Perceptrons for Arbitrary Matrix Groups

4 code implementations • 19 Apr 2021 • Marc Finzi, Max Welling, Andrew Gordon Wilson

Symmetries and equivariance are fundamental to the generalization of neural networks on domains such as images, graphs, and point clouds.

Rubik's Cube Translation

244

Paper
Code

Cyclical Stochastic Gradient MCMC for Bayesian Deep Learning

4 code implementations • ICLR 2020 • Ruqi Zhang, Chunyuan Li, Jianyi Zhang, Changyou Chen, Andrew Gordon Wilson

The posteriors over neural network weights are high dimensional and multimodal.

Bayesian Inference Stochastic Optimization

233

Paper
Code

Bayesian Deep Learning and a Probabilistic Perspective of Generalization

1 code implementation • NeurIPS 2020 • Andrew Gordon Wilson, Pavel Izmailov

The key distinguishing property of a Bayesian approach is marginalization, rather than using a single setting of weights.

Gaussian Processes

226

Paper
Code

Simple Black-box Adversarial Attacks

4 code implementations • ICLR 2019 • Chuan Guo, Jacob R. Gardner, Yurong You, Andrew Gordon Wilson, Kilian Q. Weinberger

We propose an intriguingly simple method for the construction of adversarial images in the black-box setting.

187

Paper
Code

There Are Many Consistent Explanations of Unlabeled Data: Why You Should Average

2 code implementations • ICLR 2019 • Ben Athiwaratkun, Marc Finzi, Pavel Izmailov, Andrew Gordon Wilson

Presently the most successful approaches to semi-supervised learning are based on consistency regularization, whereby a model is trained to be robust to small perturbations of its inputs and parameters.

Ranked #19 on Semi-Supervised Image Classification on CIFAR-10, 4000 Labels

Domain Adaptation Semi-Supervised Image Classification

185

Paper
Code

Deep Kernel Learning

5 code implementations • 6 Nov 2015 • Andrew Gordon Wilson, Zhiting Hu, Ruslan Salakhutdinov, Eric P. Xing

We introduce scalable deep kernels, which combine the structural properties of deep learning architectures with the non-parametric flexibility of kernel methods.

Gaussian Processes

180

Paper
Code

Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks

2 code implementations • NeurIPS 2023 • Micah Goldblum, Hossein Souri, Renkun Ni, Manli Shu, Viraj Prabhu, Gowthami Somepalli, Prithvijit Chattopadhyay, Mark Ibrahim, Adrien Bardes, Judy Hoffman, Rama Chellappa, Andrew Gordon Wilson, Tom Goldstein

Battle of the Backbones (BoB) makes this choice easier by benchmarking a diverse suite of pretrained models, including vision-language models, those trained via self-supervised learning, and the Stable Diffusion backbone, across a diverse set of computer vision tasks ranging from classification to object detection to OOD generalization and more.

Benchmarking object-detection +2

179

Paper
Code

What do Vision Transformers Learn? A Visual Exploration

1 code implementation • 13 Dec 2022 • Amin Ghiasi, Hamid Kazemi, Eitan Borgnia, Steven Reich, Manli Shu, Micah Goldblum, Andrew Gordon Wilson, Tom Goldstein

In addition, we show that ViTs maintain spatial information in all layers except the final layer.

Language Modelling

169

Paper
Code

Probabilistic FastText for Multi-Sense Word Embeddings

1 code implementation • ACL 2018 • Ben Athiwaratkun, Andrew Gordon Wilson, Anima Anandkumar

We introduce Probabilistic FastText, a new model for word embeddings that can capture multiple word senses, sub-word structure, and uncertainty information.

Word Embeddings Word Similarity

148

Paper
Code

Semi-Supervised Learning with Normalizing Flows

2 code implementations • ICML 2020 • Pavel Izmailov, Polina Kirichenko, Marc Finzi, Andrew Gordon Wilson

Normalizing flows transform a latent distribution through an invertible neural network for a flexible and pleasingly simple approach to generative modelling, while preserving an exact likelihood.

Ranked #1 on Semi-Supervised Text Classification on AG News (200 Labels)

Semi-Supervised Image Classification Semi-Supervised Text Classification

144

Paper
Code

SWALP : Stochastic Weight Averaging in Low-Precision Training

3 code implementations • 26 Apr 2019 • Guandao Yang, Tianyi Zhang, Polina Kirichenko, Junwen Bai, Andrew Gordon Wilson, Christopher De Sa

Low precision operations can provide scalability, memory savings, portability, and energy efficiency.

136

Paper
Code

Pre-Train Your Loss: Easy Bayesian Transfer Learning with Informative Priors

1 code implementation • 20 May 2022 • Ravid Shwartz-Ziv, Micah Goldblum, Hossein Souri, Sanyam Kapoor, Chen Zhu, Yann Lecun, Andrew Gordon Wilson

Deep learning is increasingly moving towards a transfer learning paradigm whereby large foundation models are fine-tuned on downstream tasks, starting from an initialization learned on the source task.

Transfer Learning

107

Paper
Code

Last Layer Re-Training is Sufficient for Robustness to Spurious Correlations

4 code implementations • 6 Apr 2022 • Polina Kirichenko, Pavel Izmailov, Andrew Gordon Wilson

Neural network classifiers can largely rely on simple spurious features, such as backgrounds, to make predictions.

Ranked #1 on Out-of-Distribution Generalization on UrbanCars

Out-of-Distribution Generalization

Paper
Code

Simplifying Hamiltonian and Lagrangian Neural Networks via Explicit Constraints

1 code implementation • NeurIPS 2020 • Marc Finzi, Ke Alexander Wang, Andrew Gordon Wilson

Reasoning about the physical world requires models that are endowed with the right inductive biases to learn the underlying dynamics.

Paper
Code

Loss Surface Simplexes for Mode Connecting Volumes and Fast Ensembling

1 code implementation • 25 Feb 2021 • Gregory W. Benton, Wesley J. Maddox, Sanae Lotfi, Andrew Gordon Wilson

In this paper, we show that there are mode-connecting simplicial complexes that form multi-dimensional manifolds of low loss, connecting many independently trained models.

Paper
Code

Transfer Learning with Deep Tabular Models

1 code implementation • 30 Jun 2022 • Roman Levin, Valeriia Cherepanova, Avi Schwarzschild, Arpit Bansal, C. Bayan Bruss, Tom Goldstein, Andrew Gordon Wilson, Micah Goldblum

In this work, we demonstrate that upstream data gives tabular neural networks a decisive advantage over widely used GBDT models.

Medical Diagnosis Transfer Learning

Paper
Code

Learning Invariances in Neural Networks

1 code implementation • 22 Oct 2020 • Gregory Benton, Marc Finzi, Pavel Izmailov, Andrew Gordon Wilson

Invariances to translations have imbued convolutional neural networks with powerful generalization properties.

Image Classification Molecular Property Prediction +2

Paper
Code

Why Normalizing Flows Fail to Detect Out-of-Distribution Data

1 code implementation • NeurIPS 2020 • Polina Kirichenko, Pavel Izmailov, Andrew Gordon Wilson

Detecting out-of-distribution (OOD) data is crucial for robust machine learning systems.

Out of Distribution (OOD) Detection

Paper
Code

Protein Design with Guided Discrete Diffusion

1 code implementation • NeurIPS 2023 • Nate Gruver, Samuel Stanton, Nathan C. Frey, Tim G. J. Rudner, Isidro Hotzel, Julien Lafrance-Vanasse, Arvind Rajpal, Kyunghyun Cho, Andrew Gordon Wilson

A popular approach to protein design is to combine a generative model with a discriminative model for conditional sampling.

Bayesian Optimization Denoising +1

Paper
Code

Subspace Inference for Bayesian Deep Learning

1 code implementation • 17 Jul 2019 • Pavel Izmailov, Wesley J. Maddox, Polina Kirichenko, Timur Garipov, Dmitry Vetrov, Andrew Gordon Wilson

Bayesian inference was once a gold standard for learning with neural networks, providing accurate full predictive distributions and well calibrated uncertainty.

Bayesian Inference Image Classification +2

Paper
Code

Accelerating Bayesian Optimization for Biological Sequence Design with Denoising Autoencoders

1 code implementation • 23 Mar 2022 • Samuel Stanton, Wesley Maddox, Nate Gruver, Phillip Maffettone, Emily Delaney, Peyton Greenside, Andrew Gordon Wilson

Bayesian optimization (BayesOpt) is a gold standard for query-efficient continuous optimization.

Bayesian Optimization Denoising

Paper
Code

On the model-based stochastic value gradient for continuous reinforcement learning

1 code implementation • 28 Aug 2020 • Brandon Amos, Samuel Stanton, Denis Yarats, Andrew Gordon Wilson

For over a decade, model-based reinforcement learning has been seen as a way to leverage control-based domain knowledge to improve the sample-efficiency of reinforcement learning agents.

Continuous Control Humanoid Control +4

Paper
Code

Fine-Tuned Language Models Generate Stable Inorganic Materials as Text

1 code implementation • 6 Feb 2024 • Nate Gruver, Anuroop Sriram, Andrea Madotto, Andrew Gordon Wilson, C. Lawrence Zitnick, Zachary Ulissi

We propose fine-tuning large language models for generation of stable materials.

Paper
Code

Kernel Interpolation for Scalable Structured Gaussian Processes (KISS-GP)

3 code implementations • 3 Mar 2015 • Andrew Gordon Wilson, Hannes Nickisch

We introduce a new structured kernel interpolation (SKI) framework, which generalises and unifies inducing point methods for scalable Gaussian processes (GPs).

Gaussian Processes

Paper
Code

Kernel Interpolation for Scalable Online Gaussian Processes

2 code implementations • 2 Mar 2021 • Samuel Stanton, Wesley J. Maddox, Ian Delbridge, Andrew Gordon Wilson

Gaussian processes (GPs) provide a gold standard for performance in online settings, such as sample-efficient control and black box optimization, where we need to update a posterior distribution as we acquire data in a sequential fashion.

Bayesian Optimization Gaussian Processes

Paper
Code

Volatility Based Kernels and Moving Average Means for Accurate Forecasting with Gaussian Processes

1 code implementation • 13 Jul 2022 • Gregory Benton, Wesley J. Maddox, Andrew Gordon Wilson

A broad class of stochastic volatility models are defined by systems of stochastic differential equations.

Gaussian Processes

Paper
Code

A Study of Bayesian Neural Network Surrogates for Bayesian Optimization

2 code implementations • 31 May 2023 • Yucen Lily Li, Tim G. J. Rudner, Andrew Gordon Wilson

Bayesian optimization is a highly efficient approach to optimizing objective functions which are expensive to query.

Bayesian Optimization

Paper
Code

On Feature Learning in the Presence of Spurious Correlations

1 code implementation • 20 Oct 2022 • Pavel Izmailov, Polina Kirichenko, Nate Gruver, Andrew Gordon Wilson

Deep classifiers are known to rely on spurious features $\unicode{x2013}$ patterns which are correlated with the target on the training data but not inherently relevant to the learning problem, such as the image backgrounds when classifying the foregrounds.

Paper
Code

Scalable Log Determinants for Gaussian Process Kernel Learning

3 code implementations • NeurIPS 2017 • Kun Dong, David Eriksson, Hannes Nickisch, David Bindel, Andrew Gordon Wilson

For applications as varied as Bayesian neural networks, determinantal point processes, elliptical graphical models, and kernel learning for Gaussian processes (GPs), one must compute a log determinant of an $n \times n$ positive definite matrix, and its derivatives - leading to prohibitive $\mathcal{O}(n^3)$ computations.

Gaussian Processes Point Processes

Paper
Code

Scaling Gaussian Process Regression with Derivatives

1 code implementation • NeurIPS 2018 • David Eriksson, Kun Dong, Eric Hans Lee, David Bindel, Andrew Gordon Wilson

Gaussian processes (GPs) with derivatives are useful in many applications, including Bayesian optimization, implicit surface reconstruction, and terrain reconstruction.

Bayesian Optimization Dimensionality Reduction +3

Paper
Code

The Lie Derivative for Measuring Learned Equivariance

1 code implementation • 6 Oct 2022 • Nate Gruver, Marc Finzi, Micah Goldblum, Andrew Gordon Wilson

In order to better understand the role of equivariance in recent vision models, we introduce the Lie derivative, a method for measuring equivariance with strong mathematical foundations and minimal hyperparameters.

Paper
Code

Learning Multimodal Data Augmentation in Feature Space

1 code implementation • 29 Dec 2022 • Zichang Liu, Zhiqiang Tang, Xingjian Shi, Aston Zhang, Mu Li, Anshumali Shrivastava, Andrew Gordon Wilson

The ability to jointly learn from multiple modalities, such as text, audio, and visual data, is a defining feature of intelligent systems.

Data Augmentation Image Classification +1

Paper
Code

Rethinking Parameter Counting in Deep Models: Effective Dimensionality Revisited

1 code implementation • 4 Mar 2020 • Wesley J. Maddox, Gregory Benton, Andrew Gordon Wilson

Neural networks appear to have mysterious generalization properties when using parameter counting as a proxy for complexity.

Model Selection

Paper
Code

Hierarchical Density Order Embeddings

2 code implementations • ICLR 2018 • Ben Athiwaratkun, Andrew Gordon Wilson

By representing words with probability densities rather than point vectors, probabilistic word embeddings can capture rich and interpretable semantic information and uncertainty.

Ranked #2 on Lexical Entailment on HyperLex

Lexical Entailment Word Embeddings

Paper
Code

Bayesian Model Selection, the Marginal Likelihood, and Generalization

1 code implementation • 23 Feb 2022 • Sanae Lotfi, Pavel Izmailov, Gregory Benton, Micah Goldblum, Andrew Gordon Wilson

We provide a partial remedy through a conditional marginal likelihood, which we show is more aligned with generalization, and practically valuable for large-scale hyperparameter learning, such as in deep kernel learning.

Model Selection Neural Architecture Search

Paper
Code

Dangers of Bayesian Model Averaging under Covariate Shift

1 code implementation • NeurIPS 2021 • Pavel Izmailov, Patrick Nicholson, Sanae Lotfi, Andrew Gordon Wilson

Approximate Bayesian inference for neural networks is considered a robust alternative to standard training, often providing good performance on out-of-distribution data.

Bayesian Inference

Paper
Code

Function-Space Distributions over Kernels

1 code implementation • NeurIPS 2019 • Gregory W. Benton, Wesley J. Maddox, Jayson P. Salkey, Julio Albinati, Andrew Gordon Wilson

The resulting approach enables learning of rich representations, with support for any stationary kernel, uncertainty over the values of the kernel, and an interpretable specification of a prior directly over kernels, without requiring sophisticated initialization or manual intervention.

Gaussian Processes Representation Learning

Paper
Code

Does Knowledge Distillation Really Work?

2 code implementations • NeurIPS 2021 • Samuel Stanton, Pavel Izmailov, Polina Kirichenko, Alexander A. Alemi, Andrew Gordon Wilson

Knowledge distillation is a popular technique for training a small student network to emulate a larger teacher model, such as an ensemble of networks.

Knowledge Distillation

Paper
Code

Randomly Projected Additive Gaussian Processes for Regression

1 code implementation • ICML 2020 • Ian A. Delbridge, David S. Bindel, Andrew Gordon Wilson

Surprisingly, we find that as the number of random projections increases, the predictive performance of this approach quickly converges to the performance of a kernel operating on the original full dimensional inputs, over a wide range of data sets, even if we are projecting into a single dimension.

Gaussian Processes regression +1

Paper
Code

Conditioning Sparse Variational Gaussian Processes for Online Decision-making

1 code implementation • NeurIPS 2021 • Wesley J. Maddox, Samuel Stanton, Andrew Gordon Wilson

With a principled representation of uncertainty and closed form posterior updates, Gaussian processes (GPs) are a natural choice for online decision making.

Active Learning Decision Making +1

Paper
Code

Bayesian Optimization with Conformal Prediction Sets

1 code implementation • 22 Oct 2022 • Samuel Stanton, Wesley Maddox, Andrew Gordon Wilson

Bayesian optimization is a coherent, ubiquitous approach to decision-making under uncertainty, with applications including multi-arm bandits, active learning, and black-box optimization.

Active Learning Bayesian Optimization +5

Paper
Code

Improving GAN Training with Probability Ratio Clipping and Sample Reweighting

1 code implementation • NeurIPS 2020 • Yue Wu, Pan Zhou, Andrew Gordon Wilson, Eric P. Xing, Zhiting Hu

Despite success on a wide range of problems related to vision, generative adversarial networks (GANs) often suffer from inferior performance due to unstable training, especially for text generation.

Ranked #2 on Text Generation on EMNLP2017 WMT

Image Generation Style Transfer +1

Paper
Code

Scalable Variational Gaussian Processes via Harmonic Kernel Decomposition

2 code implementations • 10 Jun 2021 • Shengyang Sun, Jiaxin Shi, Andrew Gordon Wilson, Roger Grosse

We introduce a new scalable variational Gaussian process approximation which provides a high fidelity approximation while retaining general applicability.

Gaussian Processes regression

Paper
Code

On Uncertainty, Tempering, and Data Augmentation in Bayesian Classification

1 code implementation • 30 Mar 2022 • Sanyam Kapoor, Wesley J. Maddox, Pavel Izmailov, Andrew Gordon Wilson

In Bayesian regression, we often use a Gaussian observation model, where we control the level of aleatoric uncertainty with a noise variance parameter.

Classification Data Augmentation

Paper
Code

Visual Explanations of Image-Text Representations via Multi-Modal Information Bottleneck Attribution

1 code implementation • NeurIPS 2023 • Ying Wang, Tim G. J. Rudner, Andrew Gordon Wilson

Vision-language pretrained models have seen remarkable success, but their application to safety-critical settings is limited by their lack of interpretability.

Paper
Code

Residual Pathway Priors for Soft Equivariance Constraints

1 code implementation • NeurIPS 2021 • Marc Finzi, Gregory Benton, Andrew Gordon Wilson

There is often a trade-off between building deep learning systems that are expressive enough to capture the nuances of the reality, and having the right inductive biases for efficient learning.

Paper
Code

How Much Data Are Augmentations Worth? An Investigation into Scaling Laws, Invariance, and Implicit Regularization

1 code implementation • 12 Oct 2022 • Jonas Geiping, Micah Goldblum, Gowthami Somepalli, Ravid Shwartz-Ziv, Tom Goldstein, Andrew Gordon Wilson

Despite the clear performance benefits of data augmentations, little is known about why they are so effective.

Paper
Code

The No Free Lunch Theorem, Kolmogorov Complexity, and the Role of Inductive Biases in Machine Learning

1 code implementation • 11 Apr 2023 • Micah Goldblum, Marc Finzi, Keefer Rowan, Andrew Gordon Wilson

No free lunch theorems for supervised learning state that no learner can solve all problems or that all learners achieve exactly the same accuracy on average over a uniform distribution on learning problems.

Paper
Code

Bayesian Optimization with High-Dimensional Outputs

2 code implementations • NeurIPS 2021 • Wesley J. Maddox, Maximilian Balandat, Andrew Gordon Wilson, Eytan Bakshy

However, the Gaussian Process (GP) models typically used as probabilistic surrogates for multi-task Bayesian Optimization scale poorly with the number of outcomes, greatly limiting applicability.

Bayesian Optimization Vocal Bursts Intensity Prediction

Paper
Code

PAC-Bayes Compression Bounds So Tight That They Can Explain Generalization

1 code implementation • 24 Nov 2022 • Sanae Lotfi, Marc Finzi, Sanyam Kapoor, Andres Potapczynski, Micah Goldblum, Andrew Gordon Wilson

While there has been progress in developing non-vacuous generalization bounds for deep neural networks, these bounds tend to be uninformative about why deep learning works.

Generalization Bounds Transfer Learning

Paper
Code

Low-Precision Arithmetic for Fast Gaussian Processes

1 code implementation • 14 Jul 2022 • Wesley J. Maddox, Andres Potapczynski, Andrew Gordon Wilson

Low-precision arithmetic has had a transformative effect on the training of neural networks, reducing computation, memory and energy requirements.

Gaussian Processes

Paper
Code

Gaussian Process Kernels for Pattern Discovery and Extrapolation

2 code implementations • 18 Feb 2013 • Andrew Gordon Wilson, Ryan Prescott Adams

Gaussian processes are rich distributions over functions, which provide a Bayesian nonparametric approach to smoothing and interpolation.

Gaussian Processes

Paper
Code

Scalable Lévy Process Priors for Spectral Kernel Learning

1 code implementation • 2 Feb 2018 • Phillip A. Jang, Andrew E. Loeb, Matthew B. Davidow, Andrew Gordon Wilson

We propose a distribution over kernels formed by modelling a spectral mixture density with a L\'evy process.

Gaussian Processes

Paper
Code

Deconstructing the Inductive Biases of Hamiltonian Neural Networks

1 code implementation • ICLR 2022 • Nate Gruver, Marc Finzi, Samuel Stanton, Andrew Gordon Wilson

Physics-inspired neural networks (NNs), such as Hamiltonian or Lagrangian NNs, dramatically outperform other learned dynamics models by leveraging strong inductive biases.

Paper
Code

SKIing on Simplices: Kernel Interpolation on the Permutohedral Lattice for Scalable Gaussian Processes

1 code implementation • 12 Jun 2021 • Sanyam Kapoor, Marc Finzi, Ke Alexander Wang, Andrew Gordon Wilson

State-of-the-art methods for scalable Gaussian processes use iterative algorithms, requiring fast matrix vector multiplies (MVMs) with the covariance kernel.

Gaussian Processes

Paper
Code

A Stable and Scalable Method for Solving Initial Value PDEs with Neural Networks

1 code implementation • 28 Apr 2023 • Marc Finzi, Andres Potapczynski, Matthew Choptuik, Andrew Gordon Wilson

Unlike conventional grid and mesh based methods for solving partial differential equations (PDEs), neural networks have the potential to break the curse of dimensionality, providing approximate solutions to problems where using classical solvers is difficult or impossible.

Paper
Code

Gaussian Process Regression Networks

1 code implementation • 19 Oct 2011 • Andrew Gordon Wilson, David A. Knowles, Zoubin Ghahramani

We introduce a new regression framework, Gaussian process regression networks (GPRN), which combines the structural properties of Bayesian neural networks with the non-parametric flexibility of Gaussian processes.

Gaussian Processes regression

Paper
Code

Generating Potent Poisons and Backdoors from Scratch with Guided Diffusion

1 code implementation • 25 Mar 2024 • Hossein Souri, Arpit Bansal, Hamid Kazemi, Liam Fowl, Aniruddha Saha, Jonas Geiping, Andrew Gordon Wilson, Rama Chellappa, Tom Goldstein, Micah Goldblum

As a result, we may be able to craft more potent poisons by carefully choosing the base samples.

Backdoor Attack

Paper
Code

Automated Few-shot Classification with Instruction-Finetuned Language Models

1 code implementation • 21 May 2023 • Rami Aly, Xingjian Shi, Kaixiang Lin, Aston Zhang, Andrew Gordon Wilson

We observe, in the context of classification tasks, that instruction finetuned language models exhibit remarkable prompt robustness, and we subsequently propose a simple method to eliminate the need for handcrafted prompts, named AuT-Few.

Classification Few-Shot Learning +1

Paper
Code

Simple and Fast Group Robustness by Automatic Feature Reweighting

1 code implementation • 19 Jun 2023 • Shikai Qiu, Andres Potapczynski, Pavel Izmailov, Andrew Gordon Wilson

A major challenge to out-of-distribution generalization is reliance on spurious features -- patterns that are predictive of the class label in the training data distribution, but not causally related to the target.

Out-of-Distribution Generalization

Paper
Code

Low-Precision Stochastic Gradient Langevin Dynamics

1 code implementation • 20 Jun 2022 • Ruqi Zhang, Andrew Gordon Wilson, Christopher De Sa

While low-precision optimization has been widely used to accelerate deep learning, low-precision sampling remains largely unexplored.

Quantization

Paper
Code

Function-Space Regularization in Neural Networks: A Probabilistic Perspective

1 code implementation • 28 Dec 2023 • Tim G. J. Rudner, Sanyam Kapoor, Shikai Qiu, Andrew Gordon Wilson

In this work, we approach regularization in neural networks from a probabilistic perspective and show that by viewing parameter-space regularization as specifying an empirical prior distribution over the model parameters, we can derive a probabilistically well-motivated regularization technique that allows explicitly encoding information about desired predictive functions into neural network training.

Paper
Code

Should We Learn Most Likely Functions or Parameters?

1 code implementation • NeurIPS 2023 • Shikai Qiu, Tim G. J. Rudner, Sanyam Kapoor, Andrew Gordon Wilson

Moreover, the most likely parameters under the parameter posterior do not generally correspond to the most likely function induced by the parameter posterior.

Paper
Code

When are Iterative Gaussian Processes Reliably Accurate?

1 code implementation • 31 Dec 2021 • Wesley J. Maddox, Sanyam Kapoor, Andrew Gordon Wilson

While recent work on conjugate gradient methods and Lanczos decompositions have achieved scalable Gaussian process inference with highly accurate point predictions, in several implementations these iterative methods appear to struggle with numerical instabilities in learning kernel hyperparameters, and poor test likelihoods.

Gaussian Processes

Paper
Code

Simplifying Neural Network Training Under Class Imbalance

1 code implementation • NeurIPS 2023 • Ravid Shwartz-Ziv, Micah Goldblum, Yucen Lily Li, C. Bayan Bruss, Andrew Gordon Wilson

Real-world datasets are often highly class-imbalanced, which can adversely impact the performance of deep learning models.

Data Augmentation

Paper
Code

Mind the GAP: Improving Robustness to Subpopulation Shifts with Group-Aware Priors

1 code implementation • 14 Mar 2024 • Tim G. J. Rudner, Ya Shi Zhang, Andrew Gordon Wilson, Julia Kempe

Machine learning models often perform poorly under subpopulation shifts in the data distribution.

Attribute Bayesian Inference

Paper
Code

Gaussian Process Subset Scanning for Anomalous Pattern Detection in Non-iid Data

no code implementations • 4 Apr 2018 • William Herlands, Edward McFowland III, Andrew Gordon Wilson, Daniel B. Neill

We introduce methods for identifying anomalous patterns in non-iid data by combining Gaussian processes with novel log-likelihood ratio statistic and subset scanning techniques.

Gaussian Processes

Paper
Add Code

Proceedings of NIPS 2017 Symposium on Interpretable Machine Learning

no code implementations • 27 Nov 2017 • Andrew Gordon Wilson, Jason Yosinski, Patrice Simard, Rich Caruana, William Herlands

This is the Proceedings of NIPS 2017 Symposium on Interpretable Machine Learning, held in Long Beach, California, USA on December 7, 2017

BIG-bench Machine Learning Interpretable Machine Learning

Paper
Add Code

Proceedings of NIPS 2016 Workshop on Interpretable Machine Learning for Complex Systems

no code implementations • 28 Nov 2016 • Andrew Gordon Wilson, Been Kim, William Herlands

This is the Proceedings of NIPS 2016 Workshop on Interpretable Machine Learning for Complex Systems, held in Barcelona, Spain on December 9, 2016

BIG-bench Machine Learning Interpretable Machine Learning

Paper
Add Code

Stochastic Variational Deep Kernel Learning

no code implementations • NeurIPS 2016 • Andrew Gordon Wilson, Zhiting Hu, Ruslan Salakhutdinov, Eric P. Xing

We propose a novel deep kernel learning model and stochastic variational inference procedure which generalizes deep kernel learning approaches to enable classification, multi-task learning, additive covariance structures, and stochastic gradient training.

Gaussian Processes General Classification +2

Paper
Add Code

The Human Kernel

no code implementations • NeurIPS 2015 • Andrew Gordon Wilson, Christoph Dann, Christopher G. Lucas, Eric P. Xing

Bayesian nonparametric models, such as Gaussian processes, provide a compelling framework for automatic statistical modelling: these models have a high degree of flexibility, and automatically calibrated complexity.

Gaussian Processes

Paper
Add Code

A la Carte - Learning Fast Kernels

no code implementations • 19 Dec 2014 • Zichao Yang, Alexander J. Smola, Le Song, Andrew Gordon Wilson

Kernel methods have great promise for learning rich statistical representations of large modern datasets.

Paper
Add Code

Bayesian Inference for NMR Spectroscopy with Applications to Chemical Quantification

no code implementations • 14 Feb 2014 • Andrew Gordon Wilson, Yuting Wu, Daniel J. Holland, Sebastian Nowozin, Mick D. Mantle, Lynn F. Gladden, Andrew Blake

Nuclear magnetic resonance (NMR) spectroscopy exploits the magnetic properties of atomic nuclei to discover the structure, reaction state and chemical environment of molecules.

Bayesian Inference

Paper
Add Code

Student-t Processes as Alternatives to Gaussian Processes

no code implementations • 18 Feb 2014 • Amar Shah, Andrew Gordon Wilson, Zoubin Ghahramani

We investigate the Student-t process as an alternative to the Gaussian process as a nonparametric prior over functions.

Bayesian Optimization Gaussian Processes +1

Paper
Add Code

GPatt: Fast Multidimensional Pattern Extrapolation with Gaussian Processes

no code implementations • 20 Oct 2013 • Andrew Gordon Wilson, Elad Gilboa, Arye Nehorai, John P. Cunningham

We introduce a new Bayesian nonparametric framework -- GPatt -- enabling automatic pattern extrapolation with Gaussian processes on large multidimensional datasets.

Gaussian Processes

Paper
Add Code

Change Surfaces for Expressive Multidimensional Changepoints and Counterfactual Prediction

no code implementations • 28 Oct 2018 • William Herlands, Daniel B. Neill, Hannes Nickisch, Andrew Gordon Wilson

We provide a model-agnostic formalization of change surfaces, illustrating how they can provide variable, heterogeneous, and non-monotonic rates of change across multiple dimensions.

counterfactual

Paper
Add Code

Practical Multi-fidelity Bayesian Optimization for Hyperparameter Tuning

no code implementations • 12 Mar 2019 • Jian Wu, Saul Toscano-Palmerin, Peter I. Frazier, Andrew Gordon Wilson

Nonetheless, for hyperparameter tuning in deep neural networks, the time required to evaluate the validation error for even a few hyperparameter settings remains a bottleneck.

Bayesian Optimization

Paper
Add Code

MLSys: The New Frontier of Machine Learning Systems

no code implementations • 29 Mar 2019 • Alexander Ratner, Dan Alistarh, Gustavo Alonso, David G. Andersen, Peter Bailis, Sarah Bird, Nicholas Carlini, Bryan Catanzaro, Jennifer Chayes, Eric Chung, Bill Dally, Jeff Dean, Inderjit S. Dhillon, Alexandros Dimakis, Pradeep Dubey, Charles Elkan, Grigori Fursin, Gregory R. Ganger, Lise Getoor, Phillip B. Gibbons, Garth A. Gibson, Joseph E. Gonzalez, Justin Gottschlich, Song Han, Kim Hazelwood, Furong Huang, Martin Jaggi, Kevin Jamieson, Michael. I. Jordan, Gauri Joshi, Rania Khalaf, Jason Knight, Jakub Konečný, Tim Kraska, Arun Kumar, Anastasios Kyrillidis, Aparna Lakshmiratan, Jing Li, Samuel Madden, H. Brendan McMahan, Erik Meijer, Ioannis Mitliagkas, Rajat Monga, Derek Murray, Kunle Olukotun, Dimitris Papailiopoulos, Gennady Pekhimenko, Theodoros Rekatsinas, Afshin Rostamizadeh, Christopher Ré, Christopher De Sa, Hanie Sedghi, Siddhartha Sen, Virginia Smith, Alex Smola, Dawn Song, Evan Sparks, Ion Stoica, Vivienne Sze, Madeleine Udell, Joaquin Vanschoren, Shivaram Venkataraman, Rashmi Vinayak, Markus Weimer, Andrew Gordon Wilson, Eric Xing, Matei Zaharia, Ce Zhang, Ameet Talwalkar

Machine learning (ML) techniques are enjoying rapidly increasing adoption.

BIG-bench Machine Learning

Paper
Add Code

Towards understanding the true loss surface of deep neural networks using random matrix theory and iterative spectral methods

no code implementations • ICLR 2020 • Diego Granziol, Timur Garipov, Dmitry Vetrov, Stefan Zohren, Stephen Roberts, Andrew Gordon Wilson

This approach is an order of magnitude faster than state-of-the-art methods for spectral visualization, and can be generically used to investigate the spectral properties of matrices in deep learning.

Paper
Add Code

The Case for Bayesian Deep Learning

no code implementations • 29 Jan 2020 • Andrew Gordon Wilson

(3) The structure of neural networks gives rise to a structured prior in function space, which reflects the inductive biases of neural networks that help them generalize.

Bayesian Inference

Paper
Add Code

Generalised Wishart Processes

no code implementations • 31 Dec 2010 • Andrew Gordon Wilson, Zoubin Ghahramani

We introduce a stochastic process with Wishart marginals: the generalised Wishart process (GWP).

Paper
Add Code

Rethinking Parameter Counting: Effective Dimensionality Revisited

no code implementations • 1 Jan 2021 • Gregory Benton, Wesley Maddox, Andrew Gordon Wilson

Neural networks appear to have mysterious generalization properties when using parameter counting as a proxy for complexity.

Model Selection

Paper
Add Code

Learning Invariances in Neural Networks from Training Data

no code implementations • NeurIPS 2020 • Gregory Benton, Marc Finzi, Pavel Izmailov, Andrew Gordon Wilson

Invariances to translations have imbued convolutional neural networks with powerful generalization properties.

Image Classification Molecular Property Prediction +1

Paper
Add Code

Task-agnostic Continual Learning with Hybrid Probabilistic Models

no code implementations • ICML Workshop INNF 2021 • Polina Kirichenko, Mehrdad Farajtabar, Dushyant Rao, Balaji Lakshminarayanan, Nir Levine, Ang Li, Huiyi Hu, Andrew Gordon Wilson, Razvan Pascanu

Learning new tasks continuously without forgetting on a constantly changing data distribution is essential for real-world problems but extremely challenging for modern deep learning.

Anomaly Detection Continual Learning +1

Paper
Add Code

K-SAM: Sharpness-Aware Minimization at the Speed of SGD

no code implementations • 23 Oct 2022 • Renkun Ni, Ping-Yeh Chiang, Jonas Geiping, Micah Goldblum, Andrew Gordon Wilson, Tom Goldstein

Sharpness-Aware Minimization (SAM) has recently emerged as a robust technique for improving the accuracy of deep neural networks.

Paper
Add Code

Chroma-VAE: Mitigating Shortcut Learning with Generative Classifiers

no code implementations • 28 Nov 2022 • Wanqian Yang, Polina Kirichenko, Micah Goldblum, Andrew Gordon Wilson

Deep neural networks are susceptible to shortcut learning, using simple features to achieve low training loss without discovering essential semantic structure.

Paper
Add Code

A Cookbook of Self-Supervised Learning

no code implementations • 24 Apr 2023 • Randall Balestriero, Mark Ibrahim, Vlad Sobal, Ari Morcos, Shashank Shekhar, Tom Goldstein, Florian Bordes, Adrien Bardes, Gregoire Mialon, Yuandong Tian, Avi Schwarzschild, Andrew Gordon Wilson, Jonas Geiping, Quentin Garrido, Pierre Fernandez, Amir Bar, Hamed Pirsiavash, Yann Lecun, Micah Goldblum

Self-supervised learning, dubbed the dark matter of intelligence, is a promising path to advance machine learning.

Navigate Self-Supervised Learning

Paper
Add Code

User-defined Event Sampling and Uncertainty Quantification in Diffusion Models for Physical Dynamical Systems

no code implementations • 13 Jun 2023 • Marc Finzi, Anudhyan Boral, Andrew Gordon Wilson, Fei Sha, Leonardo Zepeda-Núñez

In this work, we develop a probabilistic approximation scheme for the conditional score function which provably converges to the true distribution as the noise level decreases.

Uncertainty Quantification

Paper
Add Code

Materials Expert-Artificial Intelligence for Materials Discovery

no code implementations • 5 Dec 2023 • Yanjun Liu, Milena Jovanovic, Krishnanand Mallayya, Wesley J. Maddox, Andrew Gordon Wilson, Sebastian Klemenz, Leslie M. Schoop, Eun-Ah Kim

The advent of material databases provides an unprecedented opportunity to uncover predictive descriptors for emergent material properties from vast data space.

Paper
Add Code

Perspectives on the State and Future of Deep Learning -- 2023

no code implementations • 7 Dec 2023 • Micah Goldblum, Anima Anandkumar, Richard Baraniuk, Tom Goldstein, Kyunghyun Cho, Zachary C Lipton, Melanie Mitchell, Preetum Nakkiran, Max Welling, Andrew Gordon Wilson

The goal of this series is to chronicle opinions and issues in the field of machine learning as they stand today and as they change over time.

Benchmarking

Paper
Add Code

Non-Vacuous Generalization Bounds for Large Language Models

no code implementations • 28 Dec 2023 • Sanae Lotfi, Marc Finzi, Yilun Kuang, Tim G. J. Rudner, Micah Goldblum, Andrew Gordon Wilson

Modern language models can contain billions of parameters, raising the question of whether they can generalize beyond the training data or simply regurgitate their training corpora.

Generalization Bounds valid

Paper
Add Code

Position Paper: Bayesian Deep Learning in the Age of Large-Scale AI

no code implementations • 1 Feb 2024 • Theodore Papamarkou, Maria Skoularidou, Konstantina Palla, Laurence Aitchison, Julyan Arbel, David Dunson, Maurizio Filippone, Vincent Fortuin, Philipp Hennig, Jose Miguel Hernandez Lobato, Aliaksandr Hubin, Alexander Immer, Theofanis Karaletsos, Mohammad Emtiyaz Khan, Agustinus Kristiadi, Yingzhen Li, Stephan Mandt, Christopher Nemeth, Michael A. Osborne, Tim G. J. Rudner, David Rügamer, Yee Whye Teh, Max Welling, Andrew Gordon Wilson, Ruqi Zhang

In the current landscape of deep learning research, there is a predominant emphasis on achieving high predictive accuracy in supervised tasks involving large image and language datasets.

Continual Learning Position

Paper
Add Code

Controllable Prompt Tuning For Balancing Group Distributional Robustness

no code implementations • 5 Mar 2024 • Hoang Phan, Andrew Gordon Wilson, Qi Lei

Models trained on data composed of different groups or domains can suffer from severe performance degradation under distribution shifts.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.