1 code implementation • 6 Apr 2022 • Polina Kirichenko, Pavel Izmailov, Andrew Gordon Wilson
Neural network classifiers can largely rely on simple spurious features, such as backgrounds, to make predictions.
1 code implementation • 30 Mar 2022 • Sanyam Kapoor, Wesley J. Maddox, Pavel Izmailov, Andrew Gordon Wilson
In Bayesian regression, we often use a Gaussian observation model, where we control the level of aleatoric uncertainty with a noise variance parameter.
1 code implementation • 23 Mar 2022 • Samuel Stanton, Wesley Maddox, Nate Gruver, Phillip Maffettone, Emily Delaney, Peyton Greenside, Andrew Gordon Wilson
Bayesian optimization is a gold standard for query-efficient continuous optimization.
1 code implementation • 23 Feb 2022 • Sanae Lotfi, Pavel Izmailov, Gregory Benton, Micah Goldblum, Andrew Gordon Wilson
We provide a partial remedy through a conditional marginal likelihood, which we show is more aligned with generalization, and practically valuable for large-scale hyperparameter learning, such as in deep kernel learning.
1 code implementation • ICLR 2022 • Nate Gruver, Marc Finzi, Samuel Stanton, Andrew Gordon Wilson
Physics-inspired neural networks (NNs), such as Hamiltonian or Lagrangian NNs, dramatically outperform other learned dynamics models by leveraging strong inductive biases.
1 code implementation • 31 Dec 2021 • Wesley J. Maddox, Sanyam Kapoor, Andrew Gordon Wilson
While recent work on conjugate gradient methods and Lanczos decompositions have achieved scalable Gaussian process inference with highly accurate point predictions, in several implementations these iterative methods appear to struggle with numerical instabilities in learning kernel hyperparameters, and poor test likelihoods.
1 code implementation • NeurIPS 2021 • Marc Finzi, Gregory Benton, Andrew Gordon Wilson
There is often a trade-off between building deep learning systems that are expressive enough to capture the nuances of the reality, and having the right inductive biases for efficient learning.
1 code implementation • NeurIPS 2021 • Wesley J. Maddox, Samuel Stanton, Andrew Gordon Wilson
With a principled representation of uncertainty and closed form posterior updates, Gaussian processes (GPs) are a natural choice for online decision making.
no code implementations • 29 Sep 2021 • Ruqi Zhang, Andrew Gordon Wilson, Christopher De Sa
Low-precision optimization is widely used to accelerate large-scale deep learning.
no code implementations • ICML Workshop INNF 2021 • Polina Kirichenko, Mehrdad Farajtabar, Dushyant Rao, Balaji Lakshminarayanan, Nir Levine, Ang Li, Huiyi Hu, Andrew Gordon Wilson, Razvan Pascanu
Learning new tasks continuously without forgetting on a constantly changing data distribution is essential for real-world problems but extremely challenging for modern deep learning.
2 code implementations • NeurIPS 2021 • Wesley J. Maddox, Maximilian Balandat, Andrew Gordon Wilson, Eytan Bakshy
However, the Gaussian Process (GP) models typically used as probabilistic surrogates for multi-task Bayesian Optimization scale poorly with the number of outcomes, greatly limiting applicability.
1 code implementation • NeurIPS 2021 • Pavel Izmailov, Patrick Nicholson, Sanae Lotfi, Andrew Gordon Wilson
Approximate Bayesian inference for neural networks is considered a robust alternative to standard training, often providing good performance on out-of-distribution data.
1 code implementation • 12 Jun 2021 • Sanyam Kapoor, Marc Finzi, Ke Alexander Wang, Andrew Gordon Wilson
State-of-the-art methods for scalable Gaussian processes use iterative algorithms, requiring fast matrix vector multiplies (MVMs) with the covariance kernel.
1 code implementation • NeurIPS 2021 • Samuel Stanton, Pavel Izmailov, Polina Kirichenko, Alexander A. Alemi, Andrew Gordon Wilson
Knowledge distillation is a popular technique for training a small student network to emulate a larger teacher model, such as an ensemble of networks.
2 code implementations • 10 Jun 2021 • Shengyang Sun, Jiaxin Shi, Andrew Gordon Wilson, Roger Grosse
We introduce a new scalable variational Gaussian process approximation which provides a high fidelity approximation while retaining general applicability.
3 code implementations • 29 Apr 2021 • Pavel Izmailov, Sharad Vikram, Matthew D. Hoffman, Andrew Gordon Wilson
The posterior over Bayesian neural network (BNN) parameters is extremely high-dimensional and non-convex.
4 code implementations • 19 Apr 2021 • Marc Finzi, Max Welling, Andrew Gordon Wilson
Symmetries and equivariance are fundamental to the generalization of neural networks on domains such as images, graphs, and point clouds.
1 code implementation • 2 Mar 2021 • Wesley J. Maddox, Shuai Tang, Pablo Garcia Moreno, Andrew Gordon Wilson, Andreas Damianou
The inductive biases of trained neural networks are difficult to understand and, consequently, to adapt to new settings.
1 code implementation • 2 Mar 2021 • Samuel Stanton, Wesley J. Maddox, Ian Delbridge, Andrew Gordon Wilson
Gaussian processes (GPs) provide a gold standard for performance in online settings, such as sample-efficient control and black box optimization, where we need to update a posterior distribution as we acquire data in a sequential fashion.
1 code implementation • 25 Feb 2021 • Gregory W. Benton, Wesley J. Maddox, Sanae Lotfi, Andrew Gordon Wilson
In this paper, we show that there are mode-connecting simplicial complexes that form multi-dimensional manifolds of low loss, connecting many independently trained models.
no code implementations • 1 Jan 2021 • Gregory Benton, Wesley Maddox, Andrew Gordon Wilson
Neural networks appear to have mysterious generalization properties when using parameter counting as a proxy for complexity.
no code implementations • NeurIPS 2020 • Gregory Benton, Marc Finzi, Pavel Izmailov, Andrew Gordon Wilson
Invariances to translations have imbued convolutional neural networks with powerful generalization properties.
1 code implementation • NeurIPS 2020 • Marc Finzi, Ke Alexander Wang, Andrew Gordon Wilson
Reasoning about the physical world requires models that are endowed with the right inductive biases to learn the underlying dynamics.
1 code implementation • 22 Oct 2020 • Gregory Benton, Marc Finzi, Pavel Izmailov, Andrew Gordon Wilson
Invariances to translations have imbued convolutional neural networks with powerful generalization properties.
1 code implementation • 28 Aug 2020 • Brandon Amos, Samuel Stanton, Denis Yarats, Andrew Gordon Wilson
For over a decade, model-based reinforcement learning has been seen as a way to leverage control-based domain knowledge to improve the sample-efficiency of reinforcement learning agents.
1 code implementation • NeurIPS 2020 • Polina Kirichenko, Pavel Izmailov, Andrew Gordon Wilson
Detecting out-of-distribution (OOD) data is crucial for robust machine learning systems.
1 code implementation • NeurIPS 2020 • Yue Wu, Pan Zhou, Andrew Gordon Wilson, Eric P. Xing, Zhiting Hu
Despite success on a wide range of problems related to vision, generative adversarial networks (GANs) often suffer from inferior performance due to unstable training, especially for text generation.
Ranked #2 on
Text Generation
on EMNLP2017 WMT
1 code implementation • 4 Mar 2020 • Wesley J. Maddox, Gregory Benton, Andrew Gordon Wilson
Neural networks appear to have mysterious generalization properties when using parameter counting as a proxy for complexity.
1 code implementation • ICML 2020 • Marc Finzi, Samuel Stanton, Pavel Izmailov, Andrew Gordon Wilson
The translation equivariance of convolutional layers enables convolutional neural networks to generalize well on image problems.
1 code implementation • NeurIPS 2020 • Andrew Gordon Wilson, Pavel Izmailov
The key distinguishing property of a Bayesian approach is marginalization, rather than using a single setting of weights.
no code implementations • 29 Jan 2020 • Andrew Gordon Wilson
(3) The structure of neural networks gives rise to a structured prior in function space, which reflects the inductive biases of neural networks that help them generalize.
no code implementations • ICLR 2020 • Diego Granziol, Timur Garipov, Dmitry Vetrov, Stefan Zohren, Stephen Roberts, Andrew Gordon Wilson
This approach is an order of magnitude faster than state-of-the-art methods for spectral visualization, and can be generically used to investigate the spectral properties of matrices in deep learning.
1 code implementation • ICML 2020 • Ian A. Delbridge, David S. Bindel, Andrew Gordon Wilson
Surprisingly, we find that as the number of random projections increases, the predictive performance of this approach quickly converges to the performance of a kernel operating on the original full dimensional inputs, over a wide range of data sets, even if we are projecting into a single dimension.
2 code implementations • ICML 2020 • Pavel Izmailov, Polina Kirichenko, Marc Finzi, Andrew Gordon Wilson
Normalizing flows transform a latent distribution through an invertible neural network for a flexible and pleasingly simple approach to generative modelling, while preserving an exact likelihood.
Semi-Supervised Image Classification
Semi Supervised Text Classification
+1
1 code implementation • NeurIPS 2019 • Gregory W. Benton, Wesley J. Maddox, Jayson P. Salkey, Julio Albinati, Andrew Gordon Wilson
The resulting approach enables learning of rich representations, with support for any stationary kernel, uncertainty over the values of the kernel, and an interpretable specification of a prior directly over kernels, without requiring sophisticated initialization or manual intervention.
1 code implementation • NeurIPS 2020 • Maximilian Balandat, Brian Karrer, Daniel R. Jiang, Samuel Daulton, Benjamin Letham, Andrew Gordon Wilson, Eytan Bakshy
Bayesian optimization provides sample-efficient global optimization for a broad range of applications, including automatic machine learning, engineering, physics, and experimental design.
1 code implementation • 17 Jul 2019 • Pavel Izmailov, Wesley J. Maddox, Polina Kirichenko, Timur Garipov, Dmitry Vetrov, Andrew Gordon Wilson
Bayesian inference was once a gold standard for learning with neural networks, providing accurate full predictive distributions and well calibrated uncertainty.
3 code implementations • ICLR 2019 • Chuan Guo, Jacob R. Gardner, Yurong You, Andrew Gordon Wilson, Kilian Q. Weinberger
We propose an intriguingly simple method for the construction of adversarial images in the black-box setting.
2 code implementations • 26 Apr 2019 • Guandao Yang, Tianyi Zhang, Polina Kirichenko, Junwen Bai, Andrew Gordon Wilson, Christopher De Sa
Low precision operations can provide scalability, memory savings, portability, and energy efficiency.
no code implementations • 29 Mar 2019 • Alexander Ratner, Dan Alistarh, Gustavo Alonso, David G. Andersen, Peter Bailis, Sarah Bird, Nicholas Carlini, Bryan Catanzaro, Jennifer Chayes, Eric Chung, Bill Dally, Jeff Dean, Inderjit S. Dhillon, Alexandros Dimakis, Pradeep Dubey, Charles Elkan, Grigori Fursin, Gregory R. Ganger, Lise Getoor, Phillip B. Gibbons, Garth A. Gibson, Joseph E. Gonzalez, Justin Gottschlich, Song Han, Kim Hazelwood, Furong Huang, Martin Jaggi, Kevin Jamieson, Michael. I. Jordan, Gauri Joshi, Rania Khalaf, Jason Knight, Jakub Konečný, Tim Kraska, Arun Kumar, Anastasios Kyrillidis, Aparna Lakshmiratan, Jing Li, Samuel Madden, H. Brendan McMahan, Erik Meijer, Ioannis Mitliagkas, Rajat Monga, Derek Murray, Kunle Olukotun, Dimitris Papailiopoulos, Gennady Pekhimenko, Theodoros Rekatsinas, Afshin Rostamizadeh, Christopher Ré, Christopher De Sa, Hanie Sedghi, Siddhartha Sen, Virginia Smith, Alex Smola, Dawn Song, Evan Sparks, Ion Stoica, Vivienne Sze, Madeleine Udell, Joaquin Vanschoren, Shivaram Venkataraman, Rashmi Vinayak, Markus Weimer, Andrew Gordon Wilson, Eric Xing, Matei Zaharia, Ce Zhang, Ameet Talwalkar
Machine learning (ML) techniques are enjoying rapidly increasing adoption.
1 code implementation • NeurIPS 2019 • Ke Alexander Wang, Geoff Pleiss, Jacob R. Gardner, Stephen Tyree, Kilian Q. Weinberger, Andrew Gordon Wilson
Gaussian processes (GPs) are flexible non-parametric models, with a capacity that grows with the available data.
no code implementations • 12 Mar 2019 • Jian Wu, Saul Toscano-Palmerin, Peter I. Frazier, Andrew Gordon Wilson
Nonetheless, for hyperparameter tuning in deep neural networks, the time required to evaluate the validation error for even a few hyperparameter settings remains a bottleneck.
3 code implementations • ICLR 2020 • Ruqi Zhang, Chunyuan Li, Jianyi Zhang, Changyou Chen, Andrew Gordon Wilson
The posteriors over neural network weights are high dimensional and multimodal.
7 code implementations • NeurIPS 2019 • Wesley Maddox, Timur Garipov, Pavel Izmailov, Dmitry Vetrov, Andrew Gordon Wilson
We propose SWA-Gaussian (SWAG), a simple, scalable, and general purpose approach for uncertainty representation and calibration in deep learning.
1 code implementation • NeurIPS 2018 • David Eriksson, Kun Dong, Eric Hans Lee, David Bindel, Andrew Gordon Wilson
Gaussian processes (GPs) with derivatives are useful in many applications, including Bayesian optimization, implicit surface reconstruction, and terrain reconstruction.
no code implementations • 28 Oct 2018 • William Herlands, Daniel B. Neill, Hannes Nickisch, Andrew Gordon Wilson
We provide a model-agnostic formalization of change surfaces, illustrating how they can provide variable, heterogeneous, and non-monotonic rates of change across multiple dimensions.
2 code implementations • NeurIPS 2018 • Jacob R. Gardner, Geoff Pleiss, David Bindel, Kilian Q. Weinberger, Andrew Gordon Wilson
Despite advances in scalable models, the inference tools used for Gaussian processes (GPs) have yet to fully capitalize on developments in computing hardware.
2 code implementations • ICLR 2019 • Ben Athiwaratkun, Marc Finzi, Pavel Izmailov, Andrew Gordon Wilson
Presently the most successful approaches to semi-supervised learning are based on consistency regularization, whereby a model is trained to be robust to small perturbations of its inputs and parameters.
1 code implementation • ACL 2018 • Ben Athiwaratkun, Andrew Gordon Wilson, Anima Anandkumar
We introduce Probabilistic FastText, a new model for word embeddings that can capture multiple word senses, sub-word structure, and uncertainty information.
2 code implementations • ICLR 2018 • Ben Athiwaratkun, Andrew Gordon Wilson
By representing words with probability densities rather than point vectors, probabilistic word embeddings can capture rich and interpretable semantic information and uncertainty.
no code implementations • 4 Apr 2018 • William Herlands, Edward McFowland III, Andrew Gordon Wilson, Daniel B. Neill
We introduce methods for identifying anomalous patterns in non-iid data by combining Gaussian processes with novel log-likelihood ratio statistic and subset scanning techniques.
1 code implementation • ICML 2018 • Geoff Pleiss, Jacob R. Gardner, Kilian Q. Weinberger, Andrew Gordon Wilson
One of the most compelling features of Gaussian process (GP) regression is its ability to provide well-calibrated posterior distributions.
14 code implementations • 14 Mar 2018 • Pavel Izmailov, Dmitrii Podoprikhin, Timur Garipov, Dmitry Vetrov, Andrew Gordon Wilson
Deep neural networks are typically trained by optimizing a loss function with an SGD variant, in conjunction with a decaying learning rate, until convergence.
Ranked #66 on
Image Classification
on CIFAR-100
10 code implementations • NeurIPS 2018 • Timur Garipov, Pavel Izmailov, Dmitrii Podoprikhin, Dmitry Vetrov, Andrew Gordon Wilson
The loss functions of deep neural networks are complex and their geometric properties are not well understood.
1 code implementation • 24 Feb 2018 • Jacob R. Gardner, Geoff Pleiss, Ruihan Wu, Kilian Q. Weinberger, Andrew Gordon Wilson
Recent work shows that inference for Gaussian processes can be performed efficiently using iterative methods that rely only on matrix-vector multiplications (MVMs).
1 code implementation • 2 Feb 2018 • Phillip A. Jang, Andrew E. Loeb, Matthew B. Davidow, Andrew Gordon Wilson
We propose a distribution over kernels formed by modelling a spectral mixture density with a L\'evy process.
no code implementations • 27 Nov 2017 • Andrew Gordon Wilson, Jason Yosinski, Patrice Simard, Rich Caruana, William Herlands
This is the Proceedings of NIPS 2017 Symposium on Interpretable Machine Learning, held in Long Beach, California, USA on December 7, 2017
3 code implementations • NeurIPS 2017 • Kun Dong, David Eriksson, Hannes Nickisch, David Bindel, Andrew Gordon Wilson
For applications as varied as Bayesian neural networks, determinantal point processes, elliptical graphical models, and kernel learning for Gaussian processes (GPs), one must compute a log determinant of an $n \times n$ positive definite matrix, and its derivatives - leading to prohibitive $\mathcal{O}(n^3)$ computations.
4 code implementations • NeurIPS 2017 • Yunus Saatchi, Andrew Gordon Wilson
Generative adversarial networks (GANs) can implicitly learn rich distributions over images, audio, and data which are hard to model with an explicit likelihood.
2 code implementations • ACL 2017 • Ben Athiwaratkun, Andrew Gordon Wilson
Word embeddings provide point representations of words containing useful semantic information.
1 code implementation • NeurIPS 2017 • Jian Wu, Matthias Poloczek, Andrew Gordon Wilson, Peter I. Frazier
Bayesian optimization has been successful at global optimization of expensive-to-evaluate multimodal objective functions.
no code implementations • 28 Nov 2016 • Andrew Gordon Wilson, Been Kim, William Herlands
This is the Proceedings of NIPS 2016 Workshop on Interpretable Machine Learning for Complex Systems, held in Barcelona, Spain on December 9, 2016
no code implementations • NeurIPS 2016 • Andrew Gordon Wilson, Zhiting Hu, Ruslan Salakhutdinov, Eric P. Xing
We propose a novel deep kernel learning model and stochastic variational inference procedure which generalizes deep kernel learning approaches to enable classification, multi-task learning, additive covariance structures, and stochastic gradient training.
2 code implementations • 27 Oct 2016 • Maruan Al-Shedivat, Andrew Gordon Wilson, Yunus Saatchi, Zhiting Hu, Eric P. Xing
To model such structure, we propose expressive closed-form kernel functions for Gaussian processes.
4 code implementations • 6 Nov 2015 • Andrew Gordon Wilson, Zhiting Hu, Ruslan Salakhutdinov, Eric P. Xing
We introduce scalable deep kernels, which combine the structural properties of deep learning architectures with the non-parametric flexibility of kernel methods.
3 code implementations • 5 Nov 2015 • Andrew Gordon Wilson, Christoph Dann, Hannes Nickisch
This multi-level circulant approximation allows one to unify the orthogonal computational benefits of fast Kronecker and Toeplitz approaches, and is significantly faster than either approach in isolation; 2) local kernel interpolation and inducing points to allow for arbitrarily located data inputs, and $O(1)$ test time predictions; 3) exploiting block-Toeplitz Toeplitz-block structure (BTTB), which enables fast inference and learning when multidimensional Kronecker structure is not present; and 4) projections of the input space to flexibly model correlated inputs and high dimensional data.
no code implementations • NeurIPS 2015 • Andrew Gordon Wilson, Christoph Dann, Christopher G. Lucas, Eric P. Xing
Bayesian nonparametric models, such as Gaussian processes, provide a compelling framework for automatic statistical modelling: these models have a high degree of flexibility, and automatically calibrated complexity.
1 code implementation • 3 Mar 2015 • Andrew Gordon Wilson, Hannes Nickisch
We introduce a new structured kernel interpolation (SKI) framework, which generalises and unifies inducing point methods for scalable Gaussian processes (GPs).
no code implementations • 19 Dec 2014 • Zichao Yang, Alexander J. Smola, Le Song, Andrew Gordon Wilson
Kernel methods have great promise for learning rich statistical representations of large modern datasets.
no code implementations • 18 Feb 2014 • Amar Shah, Andrew Gordon Wilson, Zoubin Ghahramani
We investigate the Student-t process as an alternative to the Gaussian process as a nonparametric prior over functions.
no code implementations • 14 Feb 2014 • Andrew Gordon Wilson, Yuting Wu, Daniel J. Holland, Sebastian Nowozin, Mick D. Mantle, Lynn F. Gladden, Andrew Blake
Nuclear magnetic resonance (NMR) spectroscopy exploits the magnetic properties of atomic nuclei to discover the structure, reaction state and chemical environment of molecules.
no code implementations • 20 Oct 2013 • Andrew Gordon Wilson, Elad Gilboa, Arye Nehorai, John P. Cunningham
We introduce a new Bayesian nonparametric framework -- GPatt -- enabling automatic pattern extrapolation with Gaussian processes on large multidimensional datasets.
1 code implementation • 18 Feb 2013 • Andrew Gordon Wilson, Ryan Prescott Adams
Gaussian processes are rich distributions over functions, which provide a Bayesian nonparametric approach to smoothing and interpolation.
1 code implementation • 19 Oct 2011 • Andrew Gordon Wilson, David A. Knowles, Zoubin Ghahramani
We introduce a new regression framework, Gaussian process regression networks (GPRN), which combines the structural properties of Bayesian neural networks with the non-parametric flexibility of Gaussian processes.
no code implementations • 31 Dec 2010 • Andrew Gordon Wilson, Zoubin Ghahramani
We introduce a stochastic process with Wishart marginals: the generalised Wishart process (GWP).