1 code implementation • ICLR 2022 • Rob Brekelmans, Sicong Huang, Marzyeh Ghassemi, Greg Ver Steeg, Roger Grosse, Alireza Makhzani
Since accurate estimation of MI without density information requires a sample size exponential in the true MI, we assume either a single marginal or the full joint density information is known.
no code implementations • 7 Feb 2023 • Nikita Dhawan, Sicong Huang, Juhan Bae, Roger Grosse
It is often useful to compactly summarize important properties of model parameters and training data so that they can be used later without storing and/or iterating over the entire dataset.
no code implementations • 28 Dec 2022 • Paul Vicol, Jonathan Lorraine, Fabian Pedregosa, David Duvenaud, Roger Grosse
Bilevel problems consist of two nested sub-problems, called the outer and inner problems, respectively.
1 code implementation • 19 Dec 2022 • Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, Andy Jones, Anna Chen, Ben Mann, Brian Israel, Bryan Seethor, Cameron McKinnon, Christopher Olah, Da Yan, Daniela Amodei, Dario Amodei, Dawn Drain, Dustin Li, Eli Tran-Johnson, Guro Khundadze, Jackson Kernion, James Landis, Jamie Kerr, Jared Mueller, Jeeyoon Hyun, Joshua Landau, Kamal Ndousse, Landon Goldberg, Liane Lovitt, Martin Lucas, Michael Sellitto, Miranda Zhang, Neerav Kingsland, Nelson Elhage, Nicholas Joseph, Noemí Mercado, Nova DasSarma, Oliver Rausch, Robin Larson, Sam McCandlish, Scott Johnston, Shauna Kravec, Sheer El Showk, Tamera Lanham, Timothy Telleen-Lawton, Tom Brown, Tom Henighan, Tristan Hume, Yuntao Bai, Zac Hatfield-Dodds, Jack Clark, Samuel R. Bowman, Amanda Askell, Roger Grosse, Danny Hernandez, Deep Ganguli, Evan Hubinger, Nicholas Schiefer, Jared Kaplan
We also find some of the first examples of inverse scaling in RL from Human Feedback (RLHF), where more RLHF makes LMs worse.
no code implementations • 7 Dec 2022 • Juhan Bae, Michael R. Zhang, Michael Ruan, Eric Wang, So Hasegawa, Jimmy Ba, Roger Grosse
Variational autoencoders (VAEs) are powerful tools for learning latent representations of data used in a wide range of applications.
no code implementations • 26 Nov 2022 • Caspar Oesterheld, Johannes Treutlein, Roger Grosse, Vincent Conitzer, Jakob Foerster
Moreover, it is challenging for agents to learn their way to cooperation in the full transparency setting.
no code implementations • 18 Nov 2022 • Cem Anil, Ashwini Pokle, Kaiqu Liang, Johannes Treutlein, Yuhuai Wu, Shaojie Bai, Zico Kolter, Roger Grosse
Designing networks capable of attaining better performance with an increased inference budget is important to facilitate generalization to harder problem instances.
1 code implementation • 21 Sep 2022 • Nelson Elhage, Tristan Hume, Catherine Olsson, Nicholas Schiefer, Tom Henighan, Shauna Kravec, Zac Hatfield-Dodds, Robert Lasenby, Dawn Drain, Carol Chen, Roger Grosse, Sam McCandlish, Jared Kaplan, Dario Amodei, Martin Wattenberg, Christopher Olah
Neural networks often pack many unrelated concepts into a single neuron - a puzzling phenomenon known as 'polysemanticity' which makes interpretability much more challenging.
2 code implementations • 12 Sep 2022 • Juhan Bae, Nathan Ng, Alston Lo, Marzyeh Ghassemi, Roger Grosse
Influence functions efficiently estimate the effect of removing a single training data point on a model's learned parameters.
no code implementations • 28 Feb 2022 • Juhan Bae, Paul Vicol, Jeff Z. HaoChen, Roger Grosse
Using APO to adapt a structured preconditioning matrix generally results in optimization performance competitive with second-order methods.
no code implementations • 27 Aug 2021 • Cem Anil, Guodong Zhang, Yuhuai Wu, Roger Grosse
We develop instantiations of the PVG for two algorithmic tasks, and show that in practice, the verifier learns a robust decision rule that is able to receive useful and reliable information from an untrusted prover.
no code implementations • NeurIPS 2021 • Guodong Zhang, Kyle Hsu, Jianing Li, Chelsea Finn, Roger Grosse
To this end, we propose Differentiable AIS (DAIS), a variant of AIS which ensures differentiability by abandoning the Metropolis-Hastings corrections.
2 code implementations • 10 Jun 2021 • Shengyang Sun, Jiaxin Shi, Andrew Gordon Wilson, Roger Grosse
We introduce a new scalable variational Gaussian process approximation which provides a high fidelity approximation while retaining general applicability.
1 code implementation • 22 Apr 2021 • James Lucas, Juhan Bae, Michael R. Zhang, Stanislav Fort, Richard Zemel, Roger Grosse
Linear interpolation between initial neural network parameters and converged parameters after training with stochastic gradient descent (SGD) typically leads to a monotonic decrease in the training objective.
no code implementations • 18 Feb 2021 • Guodong Zhang, Yuanhao Wang, Laurent Lessard, Roger Grosse
Smooth minimax games often proceed by simultaneous or alternating gradient updates.
1 code implementation • 15 Jan 2021 • Yuhuai Wu, Markus Rabe, Wenda Li, Jimmy Ba, Roger Grosse, Christian Szegedy
While designing inductive bias in neural architectures has been widely studied, we hypothesize that transformer networks are flexible enough to learn inductive bias from suitable generic tasks.
1 code implementation • 6 Nov 2020 • Chaoqi Wang, Shengyang Sun, Roger Grosse
While uncertainty estimation is a well-studied topic in deep learning, most such work focuses on marginal uncertainty estimates, i. e. the predictive mean and variance at individual input locations.
1 code implementation • NeurIPS 2020 • Juhan Bae, Roger Grosse
Hyperparameter optimization of neural networks can be elegantly formulated as a bilevel optimization problem.
1 code implementation • 23 Sep 2020 • Guodong Zhang, Xuchan Bao, Laurent Lessard, Roger Grosse
The theory of integral quadratic constraints (IQCs) allows the certification of exponential convergence of interconnected systems containing nonlinear or uncertain elements.
2 code implementations • ICML 2020 • Sicong Huang, Alireza Makhzani, Yanshuai Cao, Roger Grosse
The field of deep generative modeling has succeeded in producing astonishingly realistic-seeming images and audio, but quantitative evaluation remains a challenge.
1 code implementation • NeurIPS 2020 • Xuchan Bao, James Lucas, Sushant Sachdeva, Roger Grosse
Our understanding of learning input-output relationships with neural nets has improved rapidly in recent years, but little is known about the convergence of the underlying representations, even in the simple case of linear autoencoders (LAEs).
3 code implementations • 8 Jul 2020 • Yuhuai Wu, Honghua Dong, Roger Grosse, Jimmy Ba
In this work, we focus on an analogical reasoning task that contains rich compositional structures, Raven's Progressive Matrices (RPM).
no code implementations • 7 Jul 2020 • Pashootan Vaezipoor, Gil Lederman, Yuhuai Wu, Chris J. Maddison, Roger Grosse, Sanjit A. Seshia, Fahiem Bacchus
In addition to step count improvements, Neuro# can also achieve orders of magnitude wall-clock speedups over the vanilla solver on larger instances in some problem families, despite the runtime overhead of querying the model.
1 code implementation • ICLR 2021 • Yuhuai Wu, Albert Qiaochu Jiang, Jimmy Ba, Roger Grosse
In learning-assisted theorem proving, one of the most critical challenges is to generalize to theorems unlike those seen at training time.
no code implementations • ICLR 2021 • Shun-ichi Amari, Jimmy Ba, Roger Grosse, Xuechen Li, Atsushi Nitanda, Taiji Suzuki, Denny Wu, Ji Xu
While second order optimizers such as natural gradient descent (NGD) often speed up optimization, their effect on generalization has been called into question.
1 code implementation • 16 Jun 2020 • Jens Behrmann, Paul Vicol, Kuan-Chieh Wang, Roger Grosse, Jörn-Henrik Jacobsen
For problems where global invertibility is necessary, such as applying normalizing flows on OOD data, we show the importance of designing stable INN building blocks.
3 code implementations • ICLR 2020 • Chaoqi Wang, Guodong Zhang, Roger Grosse
Overparameterization has been shown to benefit both the optimization and generalization of neural networks, but large networks are resource hungry at both training and test time.
no code implementations • NeurIPS 2019 • James Lucas, George Tucker, Roger Grosse, Mohammad Norouzi
Posterior collapse in Variational Autoencoders (VAEs) arises when the variational posterior distribution closely matches the prior for a subset of latent variables.
1 code implementation • NeurIPS 2019 • Qiyang Li, Saminul Haque, Cem Anil, James Lucas, Roger Grosse, Jörn-Henrik Jacobsen
Our BCOP parameterization allows us to train large convolutional networks with provable Lipschitz bounds.
1 code implementation • NeurIPS 2019 • Guodong Zhang, Lala Li, Zachary Nado, James Martens, Sushant Sachdeva, George E. Dahl, Christopher J. Shallue, Roger Grosse
Increasing the batch size is a popular way to speed up neural network training, but beyond some critical batch size, larger batch sizes yield diminishing returns.
no code implementations • 27 May 2019 • Guodong Zhang, James Martens, Roger Grosse
In this work, we analyze for the first time the speed of convergence of natural gradient descent on nonlinear neural networks with squared-error loss.
1 code implementation • 15 May 2019 • Chaoqi Wang, Roger Grosse, Sanja Fidler, Guodong Zhang
Reducing the test time resource requirements of a neural network while preserving test accuracy is crucial for running inference on resource-constrained devices.
no code implementations • ICLR 2019 • Paul Vicol, Jeffery Z. HaoChen, Roger Grosse
Effective performance of neural networks depends critically on effective tuning of optimization hyperparameters, especially learning rates (and schedules thereof).
no code implementations • ICLR Workshop DeepGenStruct 2019 • James Lucas, George Tucker, Roger Grosse, Mohammad Norouzi
Posterior collapse in Variational Autoencoders (VAEs) arises when the variational distribution closely matches the uninformative prior for a subset of latent variables.
2 code implementations • ICLR 2019 • Shengyang Sun, Guodong Zhang, Jiaxin Shi, Roger Grosse
We introduce functional variational Bayesian neural networks (fBNNs), which maximize an Evidence Lower BOund (ELBO) defined directly on stochastic processes, i. e. distributions over functions.
3 code implementations • ICLR 2019 • Matthew MacKay, Paul Vicol, Jon Lorraine, David Duvenaud, Roger Grosse
Empirically, our approach outperforms competing hyperparameter optimization methods on large-scale deep learning problems.
3 code implementations • 30 Nov 2018 • Juhan Bae, Guodong Zhang, Roger Grosse
A recently proposed method, noisy natural gradient, is a surprisingly simple method to fit expressive posteriors by adding weight noise to regular natural gradient updates.
1 code implementation • 13 Nov 2018 • Cem Anil, James Lucas, Roger Grosse
We identify a necessary property for such an architecture: each of the layers must preserve the gradient norm during backpropagation.
no code implementations • ICLR 2019 • Guodong Zhang, Chaoqi Wang, Bowen Xu, Roger Grosse
Weight decay is one of the standard tricks in the neural network toolbox, but the reasons for its regularization effect are poorly understood, and recent results have cast doubt on the traditional interpretation in terms of $L_2$ regularization.
1 code implementation • NeurIPS 2018 • Matthew MacKay, Paul Vicol, Jimmy Ba, Roger Grosse
Reversible RNNs---RNNs for which the hidden-to-hidden transition can be reversed---offer a path to reduce the memory requirements of training, as hidden states need not be stored and instead can be recomputed during backpropagation.
no code implementations • 30 Aug 2018 • Kevin Luk, Roger Grosse
Most neural networks are trained using first-order optimization methods, which are sensitive to the parameterization of the model.
no code implementations • ICML 2018 • Kuan-Chieh Wang, Paul Vicol, James Lucas, Li Gu, Roger Grosse, Richard Zemel
We propose a framework, Adversarial Posterior Distillation, to distill the SGLD samples using a Generative Adversarial Network (GAN).
1 code implementation • 27 Jun 2018 • Kuan-Chieh Wang, Paul Vicol, James Lucas, Li Gu, Roger Grosse, Richard Zemel
We propose a framework, Adversarial Posterior Distillation, to distill the SGLD samples using a Generative Adversarial Network (GAN).
3 code implementations • ICML 2018 • Shengyang Sun, Guodong Zhang, Chaoqi Wang, Wenyuan Zeng, Jiaman Li, Roger Grosse
The NKN architecture is based on the composition rules for kernels, so that each unit of the network corresponds to a valid kernel.
1 code implementation • ICLR 2019 • James Lucas, Shengyang Sun, Richard Zemel, Roger Grosse
Momentum is a simple and widely used trick which allows gradient-based optimizers to pick up speed along low curvature directions.
3 code implementations • ICLR 2018 • Yeming Wen, Paul Vicol, Jimmy Ba, Dustin Tran, Roger Grosse
Stochastic neural net weights are used in a variety of contexts, including regularization, Bayesian neural nets, exploration in reinforcement learning, and evolution strategies.
1 code implementation • ICLR 2018 • Yuhuai Wu, Mengye Ren, Renjie Liao, Roger Grosse
Careful tuning of the learning rate, or even schedules thereof, can be crucial to effective neural net training.
10 code implementations • NeurIPS 2018 • Ricky T. Q. Chen, Xuechen Li, Roger Grosse, David Duvenaud
We decompose the evidence lower bound to show the existence of a term measuring the total correlation between latent variables.
2 code implementations • ICML 2018 • Guodong Zhang, Shengyang Sun, David Duvenaud, Roger Grosse
Variational Bayesian neural nets combine the flexibility of deep learning with Bayesian uncertainty estimation.
8 code implementations • NeurIPS 2017 • Yuhuai Wu, Elman Mansimov, Shun Liao, Roger Grosse, Jimmy Ba
In this work, we propose to apply trust region optimization to deep reinforcement learning using a recently proposed Kronecker-factored approximation to the curvature.
2 code implementations • 14 Nov 2016 • Yuhuai Wu, Yuri Burda, Ruslan Salakhutdinov, Roger Grosse
The past several years have seen remarkable progress in generative models which produce convincing samples of images and other modalities.
1 code implementation • 3 Feb 2016 • Roger Grosse, James Martens
Second-order optimization methods such as natural gradient descent have the potential to speed up training of neural networks by correcting for the curvature of the loss function.
no code implementations • NeurIPS 2015 • Jimmy Ba, Roger Grosse, Ruslan Salakhutdinov, Brendan Frey
Despite their success, convolutional neural networks are computationally expensive because they must examine all image locations.
no code implementations • 9 Sep 2015 • Beate Franke, Jean-François Plante, Ribana Roscher, Annie Lee, Cathal Smyth, Armin Hatefi, Fuqi Chen, Einat Gil, Alexander Schwing, Alessandro Selvitella, Michael M. Hoffman, Roger Grosse, Dieter Hendricks, Nancy Reid
The need for new methods to deal with big data is a common theme in most scientific fields, although its definition tends to vary with the context.
21 code implementations • 1 Sep 2015 • Yuri Burda, Roger Grosse, Ruslan Salakhutdinov
The variational autoencoder (VAE; Kingma, Welling (2014)) is a recently proposed generative model pairing a top-down generative network with a bottom-up recognition network which approximates posterior inference.
12 code implementations • 19 Mar 2015 • James Martens, Roger Grosse
This is because the cost of storing and inverting K-FAC's approximation to the curvature matrix does not depend on the amount of data used to estimate it, which is a feature typically associated only with diagonal or low-rank approximations to the curvature matrix.
2 code implementations • 18 Feb 2014 • James Robert Lloyd, David Duvenaud, Roger Grosse, Joshua B. Tenenbaum, Zoubin Ghahramani
This paper presents the beginnings of an automatic statistician, focusing on regression problems.
4 code implementations • 20 Feb 2013 • David Duvenaud, James Robert Lloyd, Roger Grosse, Joshua B. Tenenbaum, Zoubin Ghahramani
Despite its importance, choosing the structural form of the kernel in nonparametric regression remains a black art.