Search Results for author: Roger Grosse

Found 49 papers, 34 papers with code

Amortized Proximal Optimization

no code implementations28 Feb 2022 Juhan Bae, Paul Vicol, Jeff Z. HaoChen, Roger Grosse

Using APO to adapt a structured preconditioning matrix generally results in optimization performance competitive with second-order methods.

Image Classification Image Reconstruction +1

Learning to Give Checkable Answers with Prover-Verifier Games

no code implementations27 Aug 2021 Cem Anil, Guodong Zhang, Yuhuai Wu, Roger Grosse

We develop instantiations of the PVG for two algorithmic tasks, and show that in practice, the verifier learns a robust decision rule that is able to receive useful and reliable information from an untrusted prover.

Differentiable Annealed Importance Sampling and the Perils of Gradient Noise

no code implementations NeurIPS 2021 Guodong Zhang, Kyle Hsu, Jianing Li, Chelsea Finn, Roger Grosse

To this end, we propose Differentiable AIS (DAIS), a variant of AIS which ensures differentiability by abandoning the Metropolis-Hastings corrections.

Stochastic Optimization

Scalable Variational Gaussian Processes via Harmonic Kernel Decomposition

2 code implementations10 Jun 2021 Shengyang Sun, Jiaxin Shi, Andrew Gordon Wilson, Roger Grosse

We introduce a new scalable variational Gaussian process approximation which provides a high fidelity approximation while retaining general applicability.

Gaussian Processes

Analyzing Monotonic Linear Interpolation in Neural Network Loss Landscapes

1 code implementation22 Apr 2021 James Lucas, Juhan Bae, Michael R. Zhang, Stanislav Fort, Richard Zemel, Roger Grosse

Linear interpolation between initial neural network parameters and converged parameters after training with stochastic gradient descent (SGD) typically leads to a monotonic decrease in the training objective.

LIME: Learning Inductive Bias for Primitives of Mathematical Reasoning

1 code implementation15 Jan 2021 Yuhuai Wu, Markus Rabe, Wenda Li, Jimmy Ba, Roger Grosse, Christian Szegedy

While designing inductive bias in neural architectures has been widely studied, we hypothesize that transformer networks are flexible enough to learn inductive bias from suitable generic tasks.

Mathematical Reasoning

Beyond Marginal Uncertainty: How Accurately can Bayesian Regression Models Estimate Posterior Predictive Correlations?

1 code implementation6 Nov 2020 Chaoqi Wang, Shengyang Sun, Roger Grosse

While uncertainty estimation is a well-studied topic in deep learning, most such work focuses on marginal uncertainty estimates, i. e. the predictive mean and variance at individual input locations.

Active Learning

A Unified Analysis of First-Order Methods for Smooth Games via Integral Quadratic Constraints

1 code implementation23 Sep 2020 Guodong Zhang, Xuchan Bao, Laurent Lessard, Roger Grosse

The theory of integral quadratic constraints (IQCs) allows the certification of exponential convergence of interconnected systems containing nonlinear or uncertain elements.

Evaluating Lossy Compression Rates of Deep Generative Models

2 code implementations ICML 2020 Sicong Huang, Alireza Makhzani, Yanshuai Cao, Roger Grosse

The field of deep generative modeling has succeeded in producing astonishingly realistic-seeming images and audio, but quantitative evaluation remains a challenge.

Regularized linear autoencoders recover the principal components, eventually

1 code implementation NeurIPS 2020 Xuchan Bao, James Lucas, Sushant Sachdeva, Roger Grosse

Our understanding of learning input-output relationships with neural nets has improved rapidly in recent years, but little is known about the convergence of the underlying representations, even in the simple case of linear autoencoders (LAEs).

The Scattering Compositional Learner: Discovering Objects, Attributes, Relationships in Analogical Reasoning

3 code implementations8 Jul 2020 Yuhuai Wu, Honghua Dong, Roger Grosse, Jimmy Ba

In this work, we focus on an analogical reasoning task that contains rich compositional structures, Raven's Progressive Matrices (RPM).

Learning Branching Heuristics for Propositional Model Counting

no code implementations7 Jul 2020 Pashootan Vaezipoor, Gil Lederman, Yuhuai Wu, Chris J. Maddison, Roger Grosse, Edward Lee, Sanjit A. Seshia, Fahiem Bacchus

Propositional model counting or #SAT is the problem of computing the number of satisfying assignments of a Boolean formula and many discrete probabilistic inference problems can be translated into a model counting problem to be solved by #SAT solvers.

INT: An Inequality Benchmark for Evaluating Generalization in Theorem Proving

1 code implementation ICLR 2021 Yuhuai Wu, Albert Qiaochu Jiang, Jimmy Ba, Roger Grosse

In learning-assisted theorem proving, one of the most critical challenges is to generalize to theorems unlike those seen at training time.

Automated Theorem Proving

When Does Preconditioning Help or Hurt Generalization?

no code implementations ICLR 2021 Shun-ichi Amari, Jimmy Ba, Roger Grosse, Xuechen Li, Atsushi Nitanda, Taiji Suzuki, Denny Wu, Ji Xu

While second order optimizers such as natural gradient descent (NGD) often speed up optimization, their effect on generalization has been called into question.

Understanding and Mitigating Exploding Inverses in Invertible Neural Networks

1 code implementation16 Jun 2020 Jens Behrmann, Paul Vicol, Kuan-Chieh Wang, Roger Grosse, Jörn-Henrik Jacobsen

For problems where global invertibility is necessary, such as applying normalizing flows on OOD data, we show the importance of designing stable INN building blocks.

Picking Winning Tickets Before Training by Preserving Gradient Flow

2 code implementations ICLR 2020 Chaoqi Wang, Guodong Zhang, Roger Grosse

Overparameterization has been shown to benefit both the optimization and generalization of neural networks, but large networks are resource hungry at both training and test time.

Network Pruning

Don't Blame the ELBO! A Linear VAE Perspective on Posterior Collapse

no code implementations NeurIPS 2019 James Lucas, George Tucker, Roger Grosse, Mohammad Norouzi

Posterior collapse in Variational Autoencoders (VAEs) arises when the variational posterior distribution closely matches the prior for a subset of latent variables.

Variational Inference

Which Algorithmic Choices Matter at Which Batch Sizes? Insights From a Noisy Quadratic Model

1 code implementation NeurIPS 2019 Guodong Zhang, Lala Li, Zachary Nado, James Martens, Sushant Sachdeva, George E. Dahl, Christopher J. Shallue, Roger Grosse

Increasing the batch size is a popular way to speed up neural network training, but beyond some critical batch size, larger batch sizes yield diminishing returns.

Fast Convergence of Natural Gradient Descent for Overparameterized Neural Networks

no code implementations27 May 2019 Guodong Zhang, James Martens, Roger Grosse

In this work, we analyze for the first time the speed of convergence of natural gradient descent on nonlinear neural networks with squared-error loss.

EigenDamage: Structured Pruning in the Kronecker-Factored Eigenbasis

1 code implementation15 May 2019 Chaoqi Wang, Roger Grosse, Sanja Fidler, Guodong Zhang

Reducing the test time resource requirements of a neural network while preserving test accuracy is crucial for running inference on resource-constrained devices.

Network Pruning

Online Hyperparameter Adaptation via Amortized Proximal Optimization

no code implementations ICLR 2019 Paul Vicol, Jeffery Z. HaoChen, Roger Grosse

Effective performance of neural networks depends critically on effective tuning of optimization hyperparameters, especially learning rates (and schedules thereof).

Understanding Posterior Collapse in Generative Latent Variable Models

no code implementations ICLR Workshop DeepGenStruct 2019 James Lucas, George Tucker, Roger Grosse, Mohammad Norouzi

Posterior collapse in Variational Autoencoders (VAEs) arises when the variational distribution closely matches the uninformative prior for a subset of latent variables.

Variational Inference

Functional Variational Bayesian Neural Networks

2 code implementations ICLR 2019 Shengyang Sun, Guodong Zhang, Jiaxin Shi, Roger Grosse

We introduce functional variational Bayesian neural networks (fBNNs), which maximize an Evidence Lower BOund (ELBO) defined directly on stochastic processes, i. e. distributions over functions.

Bayesian Inference Gaussian Processes +1

Eigenvalue Corrected Noisy Natural Gradient

3 code implementations30 Nov 2018 Juhan Bae, Guodong Zhang, Roger Grosse

A recently proposed method, noisy natural gradient, is a surprisingly simple method to fit expressive posteriors by adding weight noise to regular natural gradient updates.

Sorting out Lipschitz function approximation

1 code implementation13 Nov 2018 Cem Anil, James Lucas, Roger Grosse

We identify a necessary property for such an architecture: each of the layers must preserve the gradient norm during backpropagation.

Adversarial Robustness Generalization Bounds

Three Mechanisms of Weight Decay Regularization

no code implementations ICLR 2019 Guodong Zhang, Chaoqi Wang, Bowen Xu, Roger Grosse

Weight decay is one of the standard tricks in the neural network toolbox, but the reasons for its regularization effect are poorly understood, and recent results have cast doubt on the traditional interpretation in terms of $L_2$ regularization.

Reversible Recurrent Neural Networks

1 code implementation NeurIPS 2018 Matthew MacKay, Paul Vicol, Jimmy Ba, Roger Grosse

Reversible RNNs---RNNs for which the hidden-to-hidden transition can be reversed---offer a path to reduce the memory requirements of training, as hidden states need not be stored and instead can be recomputed during backpropagation.

A Coordinate-Free Construction of Scalable Natural Gradient

no code implementations30 Aug 2018 Kevin Luk, Roger Grosse

Most neural networks are trained using first-order optimization methods, which are sensitive to the parameterization of the model.

Distilling the Posterior in Bayesian Neural Networks

no code implementations ICML 2018 Kuan-Chieh Wang, Paul Vicol, James Lucas, Li Gu, Roger Grosse, Richard Zemel

We propose a framework, Adversarial Posterior Distillation, to distill the SGLD samples using a Generative Adversarial Network (GAN).

Active Learning Anomaly Detection

Adversarial Distillation of Bayesian Neural Network Posteriors

1 code implementation27 Jun 2018 Kuan-Chieh Wang, Paul Vicol, James Lucas, Li Gu, Roger Grosse, Richard Zemel

We propose a framework, Adversarial Posterior Distillation, to distill the SGLD samples using a Generative Adversarial Network (GAN).

Active Learning Anomaly Detection

Differentiable Compositional Kernel Learning for Gaussian Processes

3 code implementations ICML 2018 Shengyang Sun, Guodong Zhang, Chaoqi Wang, Wenyuan Zeng, Jiaman Li, Roger Grosse

The NKN architecture is based on the composition rules for kernels, so that each unit of the network corresponds to a valid kernel.

Gaussian Processes Time Series

Aggregated Momentum: Stability Through Passive Damping

2 code implementations ICLR 2019 James Lucas, Shengyang Sun, Richard Zemel, Roger Grosse

Momentum is a simple and widely used trick which allows gradient-based optimizers to pick up speed along low curvature directions.

Flipout: Efficient Pseudo-Independent Weight Perturbations on Mini-Batches

3 code implementations ICLR 2018 Yeming Wen, Paul Vicol, Jimmy Ba, Dustin Tran, Roger Grosse

Stochastic neural net weights are used in a variety of contexts, including regularization, Bayesian neural nets, exploration in reinforcement learning, and evolution strategies.

Understanding Short-Horizon Bias in Stochastic Meta-Optimization

1 code implementation ICLR 2018 Yuhuai Wu, Mengye Ren, Renjie Liao, Roger Grosse

Careful tuning of the learning rate, or even schedules thereof, can be crucial to effective neural net training.

Isolating Sources of Disentanglement in Variational Autoencoders

8 code implementations NeurIPS 2018 Ricky T. Q. Chen, Xuechen Li, Roger Grosse, David Duvenaud

We decompose the evidence lower bound to show the existence of a term measuring the total correlation between latent variables.

Disentanglement

Noisy Natural Gradient as Variational Inference

2 code implementations ICML 2018 Guodong Zhang, Shengyang Sun, David Duvenaud, Roger Grosse

Variational Bayesian neural nets combine the flexibility of deep learning with Bayesian uncertainty estimation.

Active Learning Efficient Exploration +2

Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation

8 code implementations NeurIPS 2017 Yuhuai Wu, Elman Mansimov, Shun Liao, Roger Grosse, Jimmy Ba

In this work, we propose to apply trust region optimization to deep reinforcement learning using a recently proposed Kronecker-factored approximation to the curvature.

Atari Games Continuous Control +1

On the Quantitative Analysis of Decoder-Based Generative Models

2 code implementations14 Nov 2016 Yuhuai Wu, Yuri Burda, Ruslan Salakhutdinov, Roger Grosse

The past several years have seen remarkable progress in generative models which produce convincing samples of images and other modalities.

A Kronecker-factored approximate Fisher matrix for convolution layers

1 code implementation3 Feb 2016 Roger Grosse, James Martens

Second-order optimization methods such as natural gradient descent have the potential to speed up training of neural networks by correcting for the curvature of the loss function.

Learning Wake-Sleep Recurrent Attention Models

no code implementations NeurIPS 2015 Jimmy Ba, Roger Grosse, Ruslan Salakhutdinov, Brendan Frey

Despite their success, convolutional neural networks are computationally expensive because they must examine all image locations.

General Classification Image Classification

Statistical Inference, Learning and Models in Big Data

no code implementations9 Sep 2015 Beate Franke, Jean-François Plante, Ribana Roscher, Annie Lee, Cathal Smyth, Armin Hatefi, Fuqi Chen, Einat Gil, Alexander Schwing, Alessandro Selvitella, Michael M. Hoffman, Roger Grosse, Dieter Hendricks, Nancy Reid

The need for new methods to deal with big data is a common theme in most scientific fields, although its definition tends to vary with the context.

Importance Weighted Autoencoders

20 code implementations1 Sep 2015 Yuri Burda, Roger Grosse, Ruslan Salakhutdinov

The variational autoencoder (VAE; Kingma, Welling (2014)) is a recently proposed generative model pairing a top-down generative network with a bottom-up recognition network which approximates posterior inference.

Density Estimation

Optimizing Neural Networks with Kronecker-factored Approximate Curvature

8 code implementations19 Mar 2015 James Martens, Roger Grosse

This is because the cost of storing and inverting K-FAC's approximation to the curvature matrix does not depend on the amount of data used to estimate it, which is a feature typically associated only with diagonal or low-rank approximations to the curvature matrix.

Stochastic Optimization

Structure Discovery in Nonparametric Regression through Compositional Kernel Search

4 code implementations20 Feb 2013 David Duvenaud, James Robert Lloyd, Roger Grosse, Joshua B. Tenenbaum, Zoubin Ghahramani

Despite its importance, choosing the structural form of the kernel in nonparametric regression remains a black art.

Time Series

Cannot find the paper you are looking for? You can Submit a new open access paper.