You need to log in to edit.

You can create a new account if you don't have one.

Or, discuss a change on Slack.

You can create a new account if you don't have one.

Or, discuss a change on Slack.

no code implementations • 28 Feb 2022 • Juhan Bae, Paul Vicol, Jeff Z. HaoChen, Roger Grosse

Using APO to adapt a structured preconditioning matrix generally results in optimization performance competitive with second-order methods.

no code implementations • 27 Aug 2021 • Cem Anil, Guodong Zhang, Yuhuai Wu, Roger Grosse

We develop instantiations of the PVG for two algorithmic tasks, and show that in practice, the verifier learns a robust decision rule that is able to receive useful and reliable information from an untrusted prover.

no code implementations • NeurIPS 2021 • Guodong Zhang, Kyle Hsu, Jianing Li, Chelsea Finn, Roger Grosse

To this end, we propose Differentiable AIS (DAIS), a variant of AIS which ensures differentiability by abandoning the Metropolis-Hastings corrections.

2 code implementations • 10 Jun 2021 • Shengyang Sun, Jiaxin Shi, Andrew Gordon Wilson, Roger Grosse

We introduce a new scalable variational Gaussian process approximation which provides a high fidelity approximation while retaining general applicability.

1 code implementation • 22 Apr 2021 • James Lucas, Juhan Bae, Michael R. Zhang, Stanislav Fort, Richard Zemel, Roger Grosse

Linear interpolation between initial neural network parameters and converged parameters after training with stochastic gradient descent (SGD) typically leads to a monotonic decrease in the training objective.

no code implementations • 18 Feb 2021 • Guodong Zhang, Yuanhao Wang, Laurent Lessard, Roger Grosse

Smooth minimax games often proceed by simultaneous or alternating gradient updates.

1 code implementation • 15 Jan 2021 • Yuhuai Wu, Markus Rabe, Wenda Li, Jimmy Ba, Roger Grosse, Christian Szegedy

While designing inductive bias in neural architectures has been widely studied, we hypothesize that transformer networks are flexible enough to learn inductive bias from suitable generic tasks.

1 code implementation • 6 Nov 2020 • Chaoqi Wang, Shengyang Sun, Roger Grosse

While uncertainty estimation is a well-studied topic in deep learning, most such work focuses on marginal uncertainty estimates, i. e. the predictive mean and variance at individual input locations.

1 code implementation • NeurIPS 2020 • Juhan Bae, Roger Grosse

Hyperparameter optimization of neural networks can be elegantly formulated as a bilevel optimization problem.

1 code implementation • 23 Sep 2020 • Guodong Zhang, Xuchan Bao, Laurent Lessard, Roger Grosse

The theory of integral quadratic constraints (IQCs) allows the certification of exponential convergence of interconnected systems containing nonlinear or uncertain elements.

2 code implementations • ICML 2020 • Sicong Huang, Alireza Makhzani, Yanshuai Cao, Roger Grosse

The field of deep generative modeling has succeeded in producing astonishingly realistic-seeming images and audio, but quantitative evaluation remains a challenge.

1 code implementation • NeurIPS 2020 • Xuchan Bao, James Lucas, Sushant Sachdeva, Roger Grosse

Our understanding of learning input-output relationships with neural nets has improved rapidly in recent years, but little is known about the convergence of the underlying representations, even in the simple case of linear autoencoders (LAEs).

3 code implementations • 8 Jul 2020 • Yuhuai Wu, Honghua Dong, Roger Grosse, Jimmy Ba

In this work, we focus on an analogical reasoning task that contains rich compositional structures, Raven's Progressive Matrices (RPM).

no code implementations • 7 Jul 2020 • Pashootan Vaezipoor, Gil Lederman, Yuhuai Wu, Chris J. Maddison, Roger Grosse, Edward Lee, Sanjit A. Seshia, Fahiem Bacchus

Propositional model counting or #SAT is the problem of computing the number of satisfying assignments of a Boolean formula and many discrete probabilistic inference problems can be translated into a model counting problem to be solved by #SAT solvers.

1 code implementation • ICLR 2021 • Yuhuai Wu, Albert Qiaochu Jiang, Jimmy Ba, Roger Grosse

In learning-assisted theorem proving, one of the most critical challenges is to generalize to theorems unlike those seen at training time.

no code implementations • ICLR 2021 • Shun-ichi Amari, Jimmy Ba, Roger Grosse, Xuechen Li, Atsushi Nitanda, Taiji Suzuki, Denny Wu, Ji Xu

While second order optimizers such as natural gradient descent (NGD) often speed up optimization, their effect on generalization has been called into question.

1 code implementation • 16 Jun 2020 • Jens Behrmann, Paul Vicol, Kuan-Chieh Wang, Roger Grosse, Jörn-Henrik Jacobsen

For problems where global invertibility is necessary, such as applying normalizing flows on OOD data, we show the importance of designing stable INN building blocks.

2 code implementations • ICLR 2020 • Chaoqi Wang, Guodong Zhang, Roger Grosse

Overparameterization has been shown to benefit both the optimization and generalization of neural networks, but large networks are resource hungry at both training and test time.

no code implementations • NeurIPS 2019 • James Lucas, George Tucker, Roger Grosse, Mohammad Norouzi

Posterior collapse in Variational Autoencoders (VAEs) arises when the variational posterior distribution closely matches the prior for a subset of latent variables.

1 code implementation • NeurIPS 2019 • Qiyang Li, Saminul Haque, Cem Anil, James Lucas, Roger Grosse, Jörn-Henrik Jacobsen

Our BCOP parameterization allows us to train large convolutional networks with provable Lipschitz bounds.

1 code implementation • NeurIPS 2019 • Guodong Zhang, Lala Li, Zachary Nado, James Martens, Sushant Sachdeva, George E. Dahl, Christopher J. Shallue, Roger Grosse

Increasing the batch size is a popular way to speed up neural network training, but beyond some critical batch size, larger batch sizes yield diminishing returns.

no code implementations • 27 May 2019 • Guodong Zhang, James Martens, Roger Grosse

In this work, we analyze for the first time the speed of convergence of natural gradient descent on nonlinear neural networks with squared-error loss.

1 code implementation • 15 May 2019 • Chaoqi Wang, Roger Grosse, Sanja Fidler, Guodong Zhang

Reducing the test time resource requirements of a neural network while preserving test accuracy is crucial for running inference on resource-constrained devices.

no code implementations • ICLR 2019 • Paul Vicol, Jeffery Z. HaoChen, Roger Grosse

Effective performance of neural networks depends critically on effective tuning of optimization hyperparameters, especially learning rates (and schedules thereof).

no code implementations • ICLR Workshop DeepGenStruct 2019 • James Lucas, George Tucker, Roger Grosse, Mohammad Norouzi

Posterior collapse in Variational Autoencoders (VAEs) arises when the variational distribution closely matches the uninformative prior for a subset of latent variables.

2 code implementations • ICLR 2019 • Shengyang Sun, Guodong Zhang, Jiaxin Shi, Roger Grosse

We introduce functional variational Bayesian neural networks (fBNNs), which maximize an Evidence Lower BOund (ELBO) defined directly on stochastic processes, i. e. distributions over functions.

3 code implementations • ICLR 2019 • Matthew MacKay, Paul Vicol, Jon Lorraine, David Duvenaud, Roger Grosse

Empirically, our approach outperforms competing hyperparameter optimization methods on large-scale deep learning problems.

3 code implementations • 30 Nov 2018 • Juhan Bae, Guodong Zhang, Roger Grosse

A recently proposed method, noisy natural gradient, is a surprisingly simple method to fit expressive posteriors by adding weight noise to regular natural gradient updates.

1 code implementation • 13 Nov 2018 • Cem Anil, James Lucas, Roger Grosse

We identify a necessary property for such an architecture: each of the layers must preserve the gradient norm during backpropagation.

no code implementations • ICLR 2019 • Guodong Zhang, Chaoqi Wang, Bowen Xu, Roger Grosse

Weight decay is one of the standard tricks in the neural network toolbox, but the reasons for its regularization effect are poorly understood, and recent results have cast doubt on the traditional interpretation in terms of $L_2$ regularization.

1 code implementation • NeurIPS 2018 • Matthew MacKay, Paul Vicol, Jimmy Ba, Roger Grosse

Reversible RNNs---RNNs for which the hidden-to-hidden transition can be reversed---offer a path to reduce the memory requirements of training, as hidden states need not be stored and instead can be recomputed during backpropagation.

no code implementations • 30 Aug 2018 • Kevin Luk, Roger Grosse

Most neural networks are trained using first-order optimization methods, which are sensitive to the parameterization of the model.

no code implementations • ICML 2018 • Kuan-Chieh Wang, Paul Vicol, James Lucas, Li Gu, Roger Grosse, Richard Zemel

We propose a framework, Adversarial Posterior Distillation, to distill the SGLD samples using a Generative Adversarial Network (GAN).

1 code implementation • 27 Jun 2018 • Kuan-Chieh Wang, Paul Vicol, James Lucas, Li Gu, Roger Grosse, Richard Zemel

We propose a framework, Adversarial Posterior Distillation, to distill the SGLD samples using a Generative Adversarial Network (GAN).

3 code implementations • ICML 2018 • Shengyang Sun, Guodong Zhang, Chaoqi Wang, Wenyuan Zeng, Jiaman Li, Roger Grosse

The NKN architecture is based on the composition rules for kernels, so that each unit of the network corresponds to a valid kernel.

2 code implementations • ICLR 2019 • James Lucas, Shengyang Sun, Richard Zemel, Roger Grosse

Momentum is a simple and widely used trick which allows gradient-based optimizers to pick up speed along low curvature directions.

3 code implementations • ICLR 2018 • Yeming Wen, Paul Vicol, Jimmy Ba, Dustin Tran, Roger Grosse

Stochastic neural net weights are used in a variety of contexts, including regularization, Bayesian neural nets, exploration in reinforcement learning, and evolution strategies.

1 code implementation • ICLR 2018 • Yuhuai Wu, Mengye Ren, Renjie Liao, Roger Grosse

Careful tuning of the learning rate, or even schedules thereof, can be crucial to effective neural net training.

8 code implementations • NeurIPS 2018 • Ricky T. Q. Chen, Xuechen Li, Roger Grosse, David Duvenaud

We decompose the evidence lower bound to show the existence of a term measuring the total correlation between latent variables.

2 code implementations • ICML 2018 • Guodong Zhang, Shengyang Sun, David Duvenaud, Roger Grosse

Variational Bayesian neural nets combine the flexibility of deep learning with Bayesian uncertainty estimation.

8 code implementations • NeurIPS 2017 • Yuhuai Wu, Elman Mansimov, Shun Liao, Roger Grosse, Jimmy Ba

In this work, we propose to apply trust region optimization to deep reinforcement learning using a recently proposed Kronecker-factored approximation to the curvature.

2 code implementations • 14 Nov 2016 • Yuhuai Wu, Yuri Burda, Ruslan Salakhutdinov, Roger Grosse

The past several years have seen remarkable progress in generative models which produce convincing samples of images and other modalities.

1 code implementation • 3 Feb 2016 • Roger Grosse, James Martens

Second-order optimization methods such as natural gradient descent have the potential to speed up training of neural networks by correcting for the curvature of the loss function.

no code implementations • NeurIPS 2015 • Jimmy Ba, Roger Grosse, Ruslan Salakhutdinov, Brendan Frey

Despite their success, convolutional neural networks are computationally expensive because they must examine all image locations.

no code implementations • 9 Sep 2015 • Beate Franke, Jean-François Plante, Ribana Roscher, Annie Lee, Cathal Smyth, Armin Hatefi, Fuqi Chen, Einat Gil, Alexander Schwing, Alessandro Selvitella, Michael M. Hoffman, Roger Grosse, Dieter Hendricks, Nancy Reid

The need for new methods to deal with big data is a common theme in most scientific fields, although its definition tends to vary with the context.

20 code implementations • 1 Sep 2015 • Yuri Burda, Roger Grosse, Ruslan Salakhutdinov

The variational autoencoder (VAE; Kingma, Welling (2014)) is a recently proposed generative model pairing a top-down generative network with a bottom-up recognition network which approximates posterior inference.

8 code implementations • 19 Mar 2015 • James Martens, Roger Grosse

This is because the cost of storing and inverting K-FAC's approximation to the curvature matrix does not depend on the amount of data used to estimate it, which is a feature typically associated only with diagonal or low-rank approximations to the curvature matrix.

2 code implementations • 18 Feb 2014 • James Robert Lloyd, David Duvenaud, Roger Grosse, Joshua B. Tenenbaum, Zoubin Ghahramani

This paper presents the beginnings of an automatic statistician, focusing on regression problems.

4 code implementations • 20 Feb 2013 • David Duvenaud, James Robert Lloyd, Roger Grosse, Joshua B. Tenenbaum, Zoubin Ghahramani

Despite its importance, choosing the structural form of the kernel in nonparametric regression remains a black art.

Cannot find the paper you are looking for? You can
Submit a new open access paper.

Contact us on:
hello@paperswithcode.com
.
Papers With Code is a free resource with all data licensed under CC-BY-SA.