1 code implementation • 22 May 2024 • Sang Keun Choe, Hwijeen Ahn, Juhan Bae, Kewen Zhao, Minsoo Kang, Youngseog Chung, Adithya Pratapa, Willie Neiswanger, Emma Strubell, Teruko Mitamura, Jeff Schneider, Eduard Hovy, Roger Grosse, Eric Xing

Large language models (LLMs) are trained on a vast amount of human-written data, but data providers often remain uncredited.

1 code implementation • 20 May 2024 • Juhan Bae, Wu Lin, Jonathan Lorraine, Roger Grosse

While being computationally efficient compared to unrolling-based approaches, Source is suitable in cases where implicit-differentiation-based approaches struggle, such as in non-converged models and multi-stage training pipelines.

1 code implementation • 26 Apr 2024 • Stephen Zhao, Rob Brekelmans, Alireza Makhzani, Roger Grosse

Numerous capability and safety techniques of Large Language Models (LLMs), including RLHF, automated red-teaming, prompt engineering, and infilling, can be cast as sampling from an unnormalized target distribution defined by a given reward or potential function over the full sequence.

1 code implementation • 26 Feb 2024 • Jin Peng Zhou, Yuhuai Wu, Qiyang Li, Roger Grosse

With newly extracted theorems, we show that the existing proofs in the MetaMath database can be refactored.

1 code implementation • 10 Jan 2024 • Evan Hubinger, Carson Denison, Jesse Mu, Mike Lambert, Meg Tong, Monte MacDiarmid, Tamera Lanham, Daniel M. Ziegler, Tim Maxwell, Newton Cheng, Adam Jermyn, Amanda Askell, Ansh Radhakrishnan, Cem Anil, David Duvenaud, Deep Ganguli, Fazl Barez, Jack Clark, Kamal Ndousse, Kshitij Sachan, Michael Sellitto, Mrinank Sharma, Nova DasSarma, Roger Grosse, Shauna Kravec, Yuntao Bai, Zachary Witten, Marina Favaro, Jan Brauner, Holden Karnofsky, Paul Christiano, Samuel R. Bowman, Logan Graham, Jared Kaplan, Sören Mindermann, Ryan Greenblatt, Buck Shlegeris, Nicholas Schiefer, Ethan Perez

We find that such backdoor behavior can be made persistent, so that it is not removed by standard safety training techniques, including supervised fine-tuning, reinforcement learning, and adversarial training (eliciting unsafe behavior and then training to remove it).

2 code implementations • 7 Aug 2023 • Roger Grosse, Juhan Bae, Cem Anil, Nelson Elhage, Alex Tamkin, Amirhossein Tajdini, Benoit Steiner, Dustin Li, Esin Durmus, Ethan Perez, Evan Hubinger, Kamilė Lukošiūtė, Karina Nguyen, Nicholas Joseph, Sam McCandlish, Jared Kaplan, Samuel R. Bowman

When trying to gain better visibility into a machine learning model in order to understand and mitigate the associated risks, a potentially valuable source of evidence is: which training examples most contribute to a given behavior?

1 code implementation • ICLR 2022 • Rob Brekelmans, Sicong Huang, Marzyeh Ghassemi, Greg Ver Steeg, Roger Grosse, Alireza Makhzani

Since accurate estimation of MI without density information requires a sample size exponential in the true MI, we assume either a single marginal or the full joint density information is known.

no code implementations • 7 Feb 2023 • Nikita Dhawan, Sicong Huang, Juhan Bae, Roger Grosse

It is often useful to compactly summarize important properties of model parameters and training data so that they can be used later without storing and/or iterating over the entire dataset.

no code implementations • 28 Dec 2022 • Paul Vicol, Jonathan Lorraine, Fabian Pedregosa, David Duvenaud, Roger Grosse

Bilevel problems consist of two nested sub-problems, called the outer and inner problems, respectively.

1 code implementation • 19 Dec 2022 • Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, Andy Jones, Anna Chen, Ben Mann, Brian Israel, Bryan Seethor, Cameron McKinnon, Christopher Olah, Da Yan, Daniela Amodei, Dario Amodei, Dawn Drain, Dustin Li, Eli Tran-Johnson, Guro Khundadze, Jackson Kernion, James Landis, Jamie Kerr, Jared Mueller, Jeeyoon Hyun, Joshua Landau, Kamal Ndousse, Landon Goldberg, Liane Lovitt, Martin Lucas, Michael Sellitto, Miranda Zhang, Neerav Kingsland, Nelson Elhage, Nicholas Joseph, Noemí Mercado, Nova DasSarma, Oliver Rausch, Robin Larson, Sam McCandlish, Scott Johnston, Shauna Kravec, Sheer El Showk, Tamera Lanham, Timothy Telleen-Lawton, Tom Brown, Tom Henighan, Tristan Hume, Yuntao Bai, Zac Hatfield-Dodds, Jack Clark, Samuel R. Bowman, Amanda Askell, Roger Grosse, Danny Hernandez, Deep Ganguli, Evan Hubinger, Nicholas Schiefer, Jared Kaplan

We also find some of the first examples of inverse scaling in RL from Human Feedback (RLHF), where more RLHF makes LMs worse.

no code implementations • 7 Dec 2022 • Juhan Bae, Michael R. Zhang, Michael Ruan, Eric Wang, So Hasegawa, Jimmy Ba, Roger Grosse

Variational autoencoders (VAEs) are powerful tools for learning latent representations of data used in a wide range of applications.

no code implementations • 18 Nov 2022 • Cem Anil, Ashwini Pokle, Kaiqu Liang, Johannes Treutlein, Yuhuai Wu, Shaojie Bai, Zico Kolter, Roger Grosse

Designing networks capable of attaining better performance with an increased inference budget is important to facilitate generalization to harder problem instances.

1 code implementation • 21 Sep 2022 • Nelson Elhage, Tristan Hume, Catherine Olsson, Nicholas Schiefer, Tom Henighan, Shauna Kravec, Zac Hatfield-Dodds, Robert Lasenby, Dawn Drain, Carol Chen, Roger Grosse, Sam McCandlish, Jared Kaplan, Dario Amodei, Martin Wattenberg, Christopher Olah

Neural networks often pack many unrelated concepts into a single neuron - a puzzling phenomenon known as 'polysemanticity' which makes interpretability much more challenging.

2 code implementations • 12 Sep 2022 • Juhan Bae, Nathan Ng, Alston Lo, Marzyeh Ghassemi, Roger Grosse

Influence functions efficiently estimate the effect of removing a single training data point on a model's learned parameters.

no code implementations • 28 Feb 2022 • Juhan Bae, Paul Vicol, Jeff Z. HaoChen, Roger Grosse

Using APO to adapt a structured preconditioning matrix generally results in optimization performance competitive with second-order methods.

no code implementations • 27 Aug 2021 • Cem Anil, Guodong Zhang, Yuhuai Wu, Roger Grosse

We develop instantiations of the PVG for two algorithmic tasks, and show that in practice, the verifier learns a robust decision rule that is able to receive useful and reliable information from an untrusted prover.

no code implementations • NeurIPS 2021 • Guodong Zhang, Kyle Hsu, Jianing Li, Chelsea Finn, Roger Grosse

To this end, we propose Differentiable AIS (DAIS), a variant of AIS which ensures differentiability by abandoning the Metropolis-Hastings corrections.

2 code implementations • 10 Jun 2021 • Shengyang Sun, Jiaxin Shi, Andrew Gordon Wilson, Roger Grosse

We introduce a new scalable variational Gaussian process approximation which provides a high fidelity approximation while retaining general applicability.

1 code implementation • 22 Apr 2021 • James Lucas, Juhan Bae, Michael R. Zhang, Stanislav Fort, Richard Zemel, Roger Grosse

Linear interpolation between initial neural network parameters and converged parameters after training with stochastic gradient descent (SGD) typically leads to a monotonic decrease in the training objective.

no code implementations • 18 Feb 2021 • Guodong Zhang, Yuanhao Wang, Laurent Lessard, Roger Grosse

Smooth minimax games often proceed by simultaneous or alternating gradient updates.

1 code implementation • 15 Jan 2021 • Yuhuai Wu, Markus Rabe, Wenda Li, Jimmy Ba, Roger Grosse, Christian Szegedy

While designing inductive bias in neural architectures has been widely studied, we hypothesize that transformer networks are flexible enough to learn inductive bias from suitable generic tasks.

1 code implementation • 6 Nov 2020 • Chaoqi Wang, Shengyang Sun, Roger Grosse

While uncertainty estimation is a well-studied topic in deep learning, most such work focuses on marginal uncertainty estimates, i. e. the predictive mean and variance at individual input locations.

1 code implementation • NeurIPS 2020 • Juhan Bae, Roger Grosse

Hyperparameter optimization of neural networks can be elegantly formulated as a bilevel optimization problem.

1 code implementation • 23 Sep 2020 • Guodong Zhang, Xuchan Bao, Laurent Lessard, Roger Grosse

The theory of integral quadratic constraints (IQCs) allows the certification of exponential convergence of interconnected systems containing nonlinear or uncertain elements.

2 code implementations • ICML 2020 • Sicong Huang, Alireza Makhzani, Yanshuai Cao, Roger Grosse

The field of deep generative modeling has succeeded in producing astonishingly realistic-seeming images and audio, but quantitative evaluation remains a challenge.

1 code implementation • NeurIPS 2020 • Xuchan Bao, James Lucas, Sushant Sachdeva, Roger Grosse

Our understanding of learning input-output relationships with neural nets has improved rapidly in recent years, but little is known about the convergence of the underlying representations, even in the simple case of linear autoencoders (LAEs).

3 code implementations • 8 Jul 2020 • Yuhuai Wu, Honghua Dong, Roger Grosse, Jimmy Ba

In this work, we focus on an analogical reasoning task that contains rich compositional structures, Raven's Progressive Matrices (RPM).

no code implementations • 7 Jul 2020 • Pashootan Vaezipoor, Gil Lederman, Yuhuai Wu, Chris J. Maddison, Roger Grosse, Sanjit A. Seshia, Fahiem Bacchus

In addition to step count improvements, Neuro# can also achieve orders of magnitude wall-clock speedups over the vanilla solver on larger instances in some problem families, despite the runtime overhead of querying the model.

1 code implementation • ICLR 2021 • Yuhuai Wu, Albert Qiaochu Jiang, Jimmy Ba, Roger Grosse

In learning-assisted theorem proving, one of the most critical challenges is to generalize to theorems unlike those seen at training time.

no code implementations • ICLR 2021 • Shun-ichi Amari, Jimmy Ba, Roger Grosse, Xuechen Li, Atsushi Nitanda, Taiji Suzuki, Denny Wu, Ji Xu

While second order optimizers such as natural gradient descent (NGD) often speed up optimization, their effect on generalization has been called into question.

1 code implementation • 16 Jun 2020 • Jens Behrmann, Paul Vicol, Kuan-Chieh Wang, Roger Grosse, Jörn-Henrik Jacobsen

For problems where global invertibility is necessary, such as applying normalizing flows on OOD data, we show the importance of designing stable INN building blocks.

3 code implementations • ICLR 2020 • Chaoqi Wang, Guodong Zhang, Roger Grosse

Overparameterization has been shown to benefit both the optimization and generalization of neural networks, but large networks are resource hungry at both training and test time.

no code implementations • NeurIPS 2019 • James Lucas, George Tucker, Roger Grosse, Mohammad Norouzi

Posterior collapse in Variational Autoencoders (VAEs) arises when the variational posterior distribution closely matches the prior for a subset of latent variables.

1 code implementation • NeurIPS 2019 • Qiyang Li, Saminul Haque, Cem Anil, James Lucas, Roger Grosse, Jörn-Henrik Jacobsen

Our BCOP parameterization allows us to train large convolutional networks with provable Lipschitz bounds.

1 code implementation • NeurIPS 2019 • Guodong Zhang, Lala Li, Zachary Nado, James Martens, Sushant Sachdeva, George E. Dahl, Christopher J. Shallue, Roger Grosse

Increasing the batch size is a popular way to speed up neural network training, but beyond some critical batch size, larger batch sizes yield diminishing returns.

no code implementations • 27 May 2019 • Guodong Zhang, James Martens, Roger Grosse

In this work, we analyze for the first time the speed of convergence of natural gradient descent on nonlinear neural networks with squared-error loss.

1 code implementation • 15 May 2019 • Chaoqi Wang, Roger Grosse, Sanja Fidler, Guodong Zhang

Reducing the test time resource requirements of a neural network while preserving test accuracy is crucial for running inference on resource-constrained devices.

no code implementations • ICLR 2019 • Paul Vicol, Jeffery Z. HaoChen, Roger Grosse

Effective performance of neural networks depends critically on effective tuning of optimization hyperparameters, especially learning rates (and schedules thereof).

no code implementations • ICLR Workshop DeepGenStruct 2019 • James Lucas, George Tucker, Roger Grosse, Mohammad Norouzi

Posterior collapse in Variational Autoencoders (VAEs) arises when the variational distribution closely matches the uninformative prior for a subset of latent variables.

3 code implementations • ICLR 2019 • Shengyang Sun, Guodong Zhang, Jiaxin Shi, Roger Grosse

We introduce functional variational Bayesian neural networks (fBNNs), which maximize an Evidence Lower BOund (ELBO) defined directly on stochastic processes, i. e. distributions over functions.

3 code implementations • ICLR 2019 • Matthew MacKay, Paul Vicol, Jon Lorraine, David Duvenaud, Roger Grosse

Empirically, our approach outperforms competing hyperparameter optimization methods on large-scale deep learning problems.

3 code implementations • 30 Nov 2018 • Juhan Bae, Guodong Zhang, Roger Grosse

A recently proposed method, noisy natural gradient, is a surprisingly simple method to fit expressive posteriors by adding weight noise to regular natural gradient updates.

1 code implementation • 13 Nov 2018 • Cem Anil, James Lucas, Roger Grosse

We identify a necessary property for such an architecture: each of the layers must preserve the gradient norm during backpropagation.

no code implementations • ICLR 2019 • Guodong Zhang, Chaoqi Wang, Bowen Xu, Roger Grosse

Weight decay is one of the standard tricks in the neural network toolbox, but the reasons for its regularization effect are poorly understood, and recent results have cast doubt on the traditional interpretation in terms of $L_2$ regularization.

1 code implementation • NeurIPS 2018 • Matthew MacKay, Paul Vicol, Jimmy Ba, Roger Grosse

Reversible RNNs---RNNs for which the hidden-to-hidden transition can be reversed---offer a path to reduce the memory requirements of training, as hidden states need not be stored and instead can be recomputed during backpropagation.

no code implementations • 30 Aug 2018 • Kevin Luk, Roger Grosse

Most neural networks are trained using first-order optimization methods, which are sensitive to the parameterization of the model.

no code implementations • ICML 2018 • Kuan-Chieh Wang, Paul Vicol, James Lucas, Li Gu, Roger Grosse, Richard Zemel

We propose a framework, Adversarial Posterior Distillation, to distill the SGLD samples using a Generative Adversarial Network (GAN).

1 code implementation • 27 Jun 2018 • Kuan-Chieh Wang, Paul Vicol, James Lucas, Li Gu, Roger Grosse, Richard Zemel

We propose a framework, Adversarial Posterior Distillation, to distill the SGLD samples using a Generative Adversarial Network (GAN).

4 code implementations • ICML 2018 • Shengyang Sun, Guodong Zhang, Chaoqi Wang, Wenyuan Zeng, Jiaman Li, Roger Grosse

The NKN architecture is based on the composition rules for kernels, so that each unit of the network corresponds to a valid kernel.

1 code implementation • ICLR 2019 • James Lucas, Shengyang Sun, Richard Zemel, Roger Grosse

Momentum is a simple and widely used trick which allows gradient-based optimizers to pick up speed along low curvature directions.

3 code implementations • ICLR 2018 • Yeming Wen, Paul Vicol, Jimmy Ba, Dustin Tran, Roger Grosse

Stochastic neural net weights are used in a variety of contexts, including regularization, Bayesian neural nets, exploration in reinforcement learning, and evolution strategies.

1 code implementation • ICLR 2018 • Yuhuai Wu, Mengye Ren, Renjie Liao, Roger Grosse

Careful tuning of the learning rate, or even schedules thereof, can be crucial to effective neural net training.

10 code implementations • NeurIPS 2018 • Ricky T. Q. Chen, Xuechen Li, Roger Grosse, David Duvenaud

We decompose the evidence lower bound to show the existence of a term measuring the total correlation between latent variables.

2 code implementations • ICML 2018 • Guodong Zhang, Shengyang Sun, David Duvenaud, Roger Grosse

Variational Bayesian neural nets combine the flexibility of deep learning with Bayesian uncertainty estimation.

8 code implementations • NeurIPS 2017 • Yuhuai Wu, Elman Mansimov, Shun Liao, Roger Grosse, Jimmy Ba

In this work, we propose to apply trust region optimization to deep reinforcement learning using a recently proposed Kronecker-factored approximation to the curvature.

2 code implementations • 14 Nov 2016 • Yuhuai Wu, Yuri Burda, Ruslan Salakhutdinov, Roger Grosse

The past several years have seen remarkable progress in generative models which produce convincing samples of images and other modalities.

2 code implementations • 3 Feb 2016 • Roger Grosse, James Martens

Second-order optimization methods such as natural gradient descent have the potential to speed up training of neural networks by correcting for the curvature of the loss function.

no code implementations • NeurIPS 2015 • Jimmy Ba, Roger Grosse, Ruslan Salakhutdinov, Brendan Frey

Despite their success, convolutional neural networks are computationally expensive because they must examine all image locations.

no code implementations • 9 Sep 2015 • Beate Franke, Jean-François Plante, Ribana Roscher, Annie Lee, Cathal Smyth, Armin Hatefi, Fuqi Chen, Einat Gil, Alexander Schwing, Alessandro Selvitella, Michael M. Hoffman, Roger Grosse, Dieter Hendricks, Nancy Reid

The need for new methods to deal with big data is a common theme in most scientific fields, although its definition tends to vary with the context.

23 code implementations • 1 Sep 2015 • Yuri Burda, Roger Grosse, Ruslan Salakhutdinov

The variational autoencoder (VAE; Kingma, Welling (2014)) is a recently proposed generative model pairing a top-down generative network with a bottom-up recognition network which approximates posterior inference.

17 code implementations • 19 Mar 2015 • James Martens, Roger Grosse

This is because the cost of storing and inverting K-FAC's approximation to the curvature matrix does not depend on the amount of data used to estimate it, which is a feature typically associated only with diagonal or low-rank approximations to the curvature matrix.

2 code implementations • 18 Feb 2014 • James Robert Lloyd, David Duvenaud, Roger Grosse, Joshua B. Tenenbaum, Zoubin Ghahramani

This paper presents the beginnings of an automatic statistician, focusing on regression problems.

4 code implementations • 20 Feb 2013 • David Duvenaud, James Robert Lloyd, Roger Grosse, Joshua B. Tenenbaum, Zoubin Ghahramani

Despite its importance, choosing the structural form of the kernel in nonparametric regression remains a black art.

Cannot find the paper you are looking for? You can
Submit a new open access paper.

Contact us on:
hello@paperswithcode.com
.
Papers With Code is a free resource with all data licensed under CC-BY-SA.