Search Results for author: Jascha Sohl-Dickstein

Found 70 papers, 34 papers with code

Training Learned Optimizers with Randomly Initialized Learned Optimizers

no code implementations14 Jan 2021 Luke Metz, C. Daniel Freeman, Niru Maheswaranathan, Jascha Sohl-Dickstein

We show that a population of randomly initialized learned optimizers can be used to train themselves from scratch in an online fashion, without resorting to a hand designed optimizer in any part of the process.

Overcoming barriers to the training of effective learned optimizers

no code implementations1 Jan 2021 Luke Metz, Niru Maheswaranathan, C. Daniel Freeman, Ben Poole, Jascha Sohl-Dickstein

In this work we focus on general-purpose learned optimizers capable of training a wide variety of problems with no user-specified hyperparameters.

The large learning rate phase of deep learning

1 code implementation1 Jan 2021 Aitor Lewkowycz, Yasaman Bahri, Ethan Dyer, Jascha Sohl-Dickstein, Guy Gur-Ari

In the small learning rate phase, training can be understood using the existing theory of infinitely wide neural networks.

Parallel Training of Deep Networks with Local Updates

no code implementations7 Dec 2020 Michael Laskin, Luke Metz, Seth Nabarro, Mark Saroufim, Badreddine Noune, Carlo Luschi, Jascha Sohl-Dickstein, Pieter Abbeel

Deep learning models trained on large data sets have been widely successful in both vision and language domains.

Score-Based Generative Modeling through Stochastic Differential Equations

5 code implementations ICLR 2021 Yang song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, Ben Poole

Combined with multiple architectural improvements, we achieve record-breaking performance for unconditional image generation on CIFAR-10 with an Inception score of 9. 89 and FID of 2. 20, a competitive likelihood of 2. 99 bits/dim, and demonstrate high fidelity generation of 1024 x 1024 images for the first time from a score-based generative model.

Colorization Image Inpainting

Towards NNGP-guided Neural Architecture Search

1 code implementation11 Nov 2020 Daniel S. Park, Jaehoon Lee, Daiyi Peng, Yuan Cao, Jascha Sohl-Dickstein

Since NNGP inference provides a cheap measure of performance of a network architecture, we investigate its potential as a signal for neural architecture search (NAS).

Neural Architecture Search

Reverse engineering learned optimizers reveals known and novel mechanisms

no code implementations4 Nov 2020 Niru Maheswaranathan, David Sussillo, Luke Metz, Ruoxi Sun, Jascha Sohl-Dickstein

Learned optimizers are algorithms that can themselves be trained to solve optimization problems.

Tasks, stability, architecture, and compute: Training more effective learned optimizers, and using them to train themselves

no code implementations23 Sep 2020 Luke Metz, Niru Maheswaranathan, C. Daniel Freeman, Ben Poole, Jascha Sohl-Dickstein

In this work we focus on general-purpose learned optimizers capable of training a wide variety of problems with no user-specified hyperparameters.

Finite Versus Infinite Neural Networks: an Empirical Study

1 code implementation NeurIPS 2020 Jaehoon Lee, Samuel S. Schoenholz, Jeffrey Pennington, Ben Adlam, Lechao Xiao, Roman Novak, Jascha Sohl-Dickstein

We perform a careful, thorough, and large scale empirical study of the correspondence between wide neural networks and kernel methods.

A new method for parameter estimation in probabilistic models: Minimum probability flow

1 code implementation17 Jul 2020 Jascha Sohl-Dickstein, Peter Battaglino, Michael R. DeWeese

Fitting probabilistic models to data is often difficult, due to the general intractability of the partition function.

Exact posterior distributions of wide Bayesian neural networks

1 code implementation18 Jun 2020 Jiri Hron, Yasaman Bahri, Roman Novak, Jeffrey Pennington, Jascha Sohl-Dickstein

Recent work has shown that the prior over functions induced by a deep Bayesian neural network (BNN) behaves as a Gaussian process (GP) as the width of all layers becomes large.

Infinite attention: NNGP and NTK for deep attention networks

1 code implementation ICML 2020 Jiri Hron, Yasaman Bahri, Jascha Sohl-Dickstein, Roman Novak

There is a growing amount of literature on the relationship between wide neural networks (NNs) and Gaussian processes (GPs), identifying an equivalence between the two for a variety of NN architectures.

Deep Attention Gaussian Processes

Your GAN is Secretly an Energy-based Model and You Should use Discriminator Driven Latent Sampling

3 code implementations NeurIPS 2020 Tong Che, Ruixiang Zhang, Jascha Sohl-Dickstein, Hugo Larochelle, Liam Paull, Yuan Cao, Yoshua Bengio

To make that practical, we show that sampling from this modified density can be achieved by sampling in latent space according to an energy-based model induced by the sum of the latent prior log-density and the discriminator output score.

Image Generation

The large learning rate phase of deep learning: the catapult mechanism

no code implementations4 Mar 2020 Aitor Lewkowycz, Yasaman Bahri, Ethan Dyer, Jascha Sohl-Dickstein, Guy Gur-Ari

In the small learning rate phase, training can be understood using the existing theory of infinitely wide neural networks.

On the infinite width limit of neural networks with a standard parameterization

1 code implementation21 Jan 2020 Jascha Sohl-Dickstein, Roman Novak, Samuel S. Schoenholz, Jaehoon Lee

However, the extrapolation of both of these parameterizations to infinite width is problematic.

Invertible Convolutional Flow

1 code implementation NeurIPS 2019 Mahdi Karami, Dale Schuurmans, Jascha Sohl-Dickstein, Laurent Dinh, Daniel Duckworth

We show that these transforms allow more effective normalizing flow models to be developed for generative image models.

Neural reparameterization improves structural optimization

1 code implementation10 Sep 2019 Stephan Hoyer, Jascha Sohl-Dickstein, Sam Greydanus

Structural optimization is a popular method for designing objects such as bridge trusses, airplane wings, and optical devices.

Using learned optimizers to make models robust to input noise

no code implementations8 Jun 2019 Luke Metz, Niru Maheswaranathan, Jonathon Shlens, Jascha Sohl-Dickstein, Ekin D. Cubuk

State-of-the art vision models can achieve superhuman performance on image classification tasks when testing and training data come from the same distribution.

General Classification Image Classification +1

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

no code implementations9 May 2019 Daniel S. Park, Jascha Sohl-Dickstein, Quoc V. Le, Samuel L. Smith

We find that the optimal SGD hyper-parameters are determined by a "normalized noise scale," which is a function of the batch size, learning rate, and initialization conditions.

Learning Unsupervised Learning Rules

no code implementations ICLR 2019 Luke Metz, Niru Maheswaranathan, Brian Cheung, Jascha Sohl-Dickstein

Here, our desired task (meta-objective) is the performance of the representation on semi-supervised classification, and we meta-learn an algorithm -- an unsupervised weight update rule -- that produces representations that perform well under this meta-objective.

Meta-Learning

Guided Evolutionary Strategies: Escaping the curse of dimensionality in random search

no code implementations ICLR 2019 Niru Maheswaranathan, Luke Metz, George Tucker, Dami Choi, Jascha Sohl-Dickstein

This arises when an approximate gradient is easier to compute than the full gradient (e. g. in meta-learning or unrolled optimization), or when a true gradient is intractable and is replaced with a surrogate (e. g. in certain reinforcement learning applications or training networks with discrete variables).

Meta-Learning

Learned optimizers that outperform on wall-clock and validation loss

no code implementations ICLR 2019 Luke Metz, Niru Maheswaranathan, Jeremy Nixon, Daniel Freeman, Jascha Sohl-Dickstein

We demonstrate these results on problems where our learned optimizer trains convolutional networks in a fifth of the wall-clock time compared to tuned first-order methods, and with an improvement

A RAD approach to deep mixture models

no code implementations18 Mar 2019 Laurent Dinh, Jascha Sohl-Dickstein, Hugo Larochelle, Razvan Pascanu

Flow based models such as Real NVP are an extremely powerful approach to density estimation.

Density Estimation

A Mean Field Theory of Batch Normalization

no code implementations ICLR 2019 Greg Yang, Jeffrey Pennington, Vinay Rao, Jascha Sohl-Dickstein, Samuel S. Schoenholz

We develop a mean field theory for batch normalization in fully-connected feedforward neural networks.

Eliminating all bad Local Minima from Loss Landscapes without even adding an Extra Unit

no code implementations12 Jan 2019 Jascha Sohl-Dickstein, Kenji Kawaguchi

Recent work has noted that all bad local minima can be removed from neural network loss landscapes, by adding a single unit with a particular parameterization.

Measuring the Effects of Data Parallelism on Neural Network Training

no code implementations8 Nov 2018 Christopher J. Shallue, Jaehoon Lee, Joseph Antognini, Jascha Sohl-Dickstein, Roy Frostig, George E. Dahl

Along the way, we show that disagreements in the literature on how batch size affects model quality can largely be explained by differences in metaparameter tuning and compute budgets at different batch sizes.

Understanding and correcting pathologies in the training of learned optimizers

1 code implementation24 Oct 2018 Luke Metz, Niru Maheswaranathan, Jeremy Nixon, C. Daniel Freeman, Jascha Sohl-Dickstein

Deep learning has shown that learned functions can dramatically outperform hand-designed functions on perceptual tasks.

Adversarial Reprogramming of Neural Networks

5 code implementations ICLR 2019 Gamaleldin F. Elsayed, Ian Goodfellow, Jascha Sohl-Dickstein

Previous adversarial attacks have been designed to degrade performance of models or cause machine learning models to produce specific outputs chosen ahead of time by the attacker.

Classification General Classification

Guided evolutionary strategies: Augmenting random search with surrogate gradients

1 code implementation ICLR 2019 Niru Maheswaranathan, Luke Metz, George Tucker, Dami Choi, Jascha Sohl-Dickstein

We propose Guided Evolutionary Strategies, a method for optimally using surrogate gradient directions along with random search.

Meta-Learning

Stochastic natural gradient descent draws posterior samples in function space

no code implementations25 Jun 2018 Samuel L. Smith, Daniel Duckworth, Semon Rezchikov, Quoc V. Le, Jascha Sohl-Dickstein

Recent work has argued that stochastic gradient descent can approximate the Bayesian uncertainty in model parameters near local minima.

PCA of high dimensional random walks with comparison to neural network training

no code implementations NeurIPS 2018 Joseph M. Antognini, Jascha Sohl-Dickstein

One technique to visualize the training of neural networks is to perform PCA on the parameters over the course of training and to project to the subspace spanned by the first few PCA components.

Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks

3 code implementations ICML 2018 Lechao Xiao, Yasaman Bahri, Jascha Sohl-Dickstein, Samuel S. Schoenholz, Jeffrey Pennington

In this work, we demonstrate that it is possible to train vanilla CNNs with ten thousand layers or more simply by using an appropriate initialization scheme.

Meta-Learning Update Rules for Unsupervised Representation Learning

2 code implementations ICLR 2019 Luke Metz, Niru Maheswaranathan, Brian Cheung, Jascha Sohl-Dickstein

Specifically, we target semi-supervised classification performance, and we meta-learn an algorithm -- an unsupervised weight update rule -- that produces representations useful for this task.

Meta-Learning Unsupervised Representation Learning

Sensitivity and Generalization in Neural Networks: an Empirical Study

no code implementations ICLR 2018 Roman Novak, Yasaman Bahri, Daniel A. Abolafia, Jeffrey Pennington, Jascha Sohl-Dickstein

In practice it is often found that large over-parameterized neural networks generalize better than their smaller counterparts, an observation that appears to conflict with classical notions of function complexity, which typically favor smaller models.

Data Augmentation Image Classification

Adversarial Examples that Fool both Computer Vision and Time-Limited Humans

no code implementations NeurIPS 2018 Gamaleldin F. Elsayed, Shreya Shankar, Brian Cheung, Nicolas Papernot, Alex Kurakin, Ian Goodfellow, Jascha Sohl-Dickstein

Machine learning models are vulnerable to adversarial examples: small changes to images can cause computer vision models to make mistakes such as identifying a school bus as an ostrich.

Generalizing Hamiltonian Monte Carlo with Neural Networks

2 code implementations ICLR 2018 Daniel Levy, Matthew D. Hoffman, Jascha Sohl-Dickstein

We present a general-purpose method to train Markov chain Monte Carlo kernels, parameterized by deep neural networks, that converge and mix quickly to their target distribution.

Deep Neural Networks as Gaussian Processes

5 code implementations ICLR 2018 Jaehoon Lee, Yasaman Bahri, Roman Novak, Samuel S. Schoenholz, Jeffrey Pennington, Jascha Sohl-Dickstein

As such, previous work has not identified that these kernels can be used as covariance functions for GPs and allow fully Bayesian prediction with a deep neural network.

Bayesian Inference Gaussian Processes

A Correspondence Between Random Neural Networks and Statistical Field Theory

no code implementations18 Oct 2017 Samuel S. Schoenholz, Jeffrey Pennington, Jascha Sohl-Dickstein

In this work, we show that the distribution of pre-activations in random neural networks can be exactly mapped onto lattice models in statistical physics.

SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability

3 code implementations NeurIPS 2017 Maithra Raghu, Justin Gilmer, Jason Yosinski, Jascha Sohl-Dickstein

We propose a new technique, Singular Vector Canonical Correlation Analysis (SVCCA), a tool for quickly comparing two representations in a way that is both invariant to affine transform (allowing comparison between different layers and networks) and fast to compute (allowing more comparisons to be calculated than with previous methods).

Learned Optimizers that Scale and Generalize

1 code implementation ICML 2017 Olga Wichrowska, Niru Maheswaranathan, Matthew W. Hoffman, Sergio Gomez Colmenarejo, Misha Denil, Nando de Freitas, Jascha Sohl-Dickstein

Two of the primary barriers to its adoption are an inability to scale to larger problems and a limited ability to generalize to new tasks.

Improved generator objectives for GANs

no code implementations8 Dec 2016 Ben Poole, Alexander A. Alemi, Jascha Sohl-Dickstein, Anelia Angelova

We present a framework to understand GAN training as alternating density ratio estimation and approximate divergence minimization.

Density Ratio Estimation

Capacity and Trainability in Recurrent Neural Networks

1 code implementation29 Nov 2016 Jasmine Collins, Jascha Sohl-Dickstein, David Sussillo

They can store an amount of task information which is linear in the number of parameters, and is approximately 5 bits per parameter.

Survey of Expressivity in Deep Neural Networks

no code implementations24 Nov 2016 Maithra Raghu, Ben Poole, Jon Kleinberg, Surya Ganguli, Jascha Sohl-Dickstein

This quantity grows exponentially in the depth of the network, and is responsible for the depth sensitivity observed.

Unrolled Generative Adversarial Networks

8 code implementations7 Nov 2016 Luke Metz, Ben Poole, David Pfau, Jascha Sohl-Dickstein

We introduce a method to stabilize Generative Adversarial Networks (GANs) by defining the generator objective with respect to an unrolled optimization of the discriminator.

Deep Information Propagation

1 code implementation4 Nov 2016 Samuel S. Schoenholz, Justin Gilmer, Surya Ganguli, Jascha Sohl-Dickstein

We show the existence of depth scales that naturally limit the maximum depth of signal propagation through these random networks.

Exponential expressivity in deep neural networks through transient chaos

1 code implementation NeurIPS 2016 Ben Poole, Subhaneil Lahiri, Maithra Raghu, Jascha Sohl-Dickstein, Surya Ganguli

We combine Riemannian geometry with the mean field theory of high dimensional chaos to study the nature of signal propagation in generic, deep neural networks with random weights.

On the Expressive Power of Deep Neural Networks

no code implementations ICML 2017 Maithra Raghu, Ben Poole, Jon Kleinberg, Surya Ganguli, Jascha Sohl-Dickstein

We propose a new approach to the problem of neural network expressivity, which seeks to characterize how structural properties of a neural network family affect the functions it is able to compute.

Density estimation using Real NVP

27 code implementations27 May 2016 Laurent Dinh, Jascha Sohl-Dickstein, Samy Bengio

Unsupervised learning of probabilistic models is a central yet challenging problem in machine learning.

Density Estimation Image Generation

A universal tradeoff between power, precision and speed in physical communication

no code implementations24 Mar 2016 Subhaneil Lahiri, Jascha Sohl-Dickstein, Surya Ganguli

Maximizing the speed and precision of communication while minimizing power dissipation is a fundamental engineering design goal.

A Markov Jump Process for More Efficient Hamiltonian Monte Carlo

no code implementations13 Sep 2015 Andrew B. Berger, Mayur Mudigonda, Michael R. DeWeese, Jascha Sohl-Dickstein

In most sampling algorithms, including Hamiltonian Monte Carlo, transition rates between states correspond to the probability of making a transition in a single time step, and are constrained to be less than or equal to 1.

Deep Knowledge Tracing

5 code implementations NeurIPS 2015 Chris Piech, Jonathan Spencer, Jonathan Huang, Surya Ganguli, Mehran Sahami, Leonidas Guibas, Jascha Sohl-Dickstein

Knowledge tracing---where a machine models the knowledge of a student as they interact with coursework---is a well established problem in computer supported education.

Knowledge Tracing

Note on Equivalence Between Recurrent Neural Network Time Series Models and Variational Bayesian Models

no code implementations29 Apr 2015 Jascha Sohl-Dickstein, Diederik P. Kingma

We observe that the standard log likelihood training objective for a Recurrent Neural Network (RNN) model of time series data is equivalent to a variational Bayesian training objective, given the proper choice of generative and inference models.

Time Series

Deep Unsupervised Learning using Nonequilibrium Thermodynamics

3 code implementations12 Mar 2015 Jascha Sohl-Dickstein, Eric A. Weiss, Niru Maheswaranathan, Surya Ganguli

A central problem in machine learning involves modeling complex data-sets using highly flexible families of probability distributions in which learning, sampling, inference, and evaluation are still analytically or computationally tractable.

Hamiltonian Monte Carlo Without Detailed Balance

2 code implementations18 Sep 2014 Jascha Sohl-Dickstein, Mayur Mudigonda, Michael R. DeWeese

We present a method for performing Hamiltonian Monte Carlo that largely eliminates sample rejection for typical hyperparameters.

Analyzing noise in autoencoders and deep networks

no code implementations6 Jun 2014 Ben Poole, Jascha Sohl-Dickstein, Surya Ganguli

Autoencoders have emerged as a useful framework for unsupervised learning of internal representations, and a wide variety of apparently conceptually disparate regularization techniques have been proposed to generate useful features.

Denoising

Fast large-scale optimization by unifying stochastic gradient and quasi-Newton methods

1 code implementation9 Nov 2013 Jascha Sohl-Dickstein, Ben Poole, Surya Ganguli

This algorithm contrasts with earlier stochastic second order techniques that treat the Hessian of each contributing function as a noisy approximation to the full Hessian, rather than as a target for direct estimation.

Training sparse natural image models with a fast Gibbs sampler of an extended state space

no code implementations NeurIPS 2012 Lucas Theis, Jascha Sohl-Dickstein, Matthias Bethge

We present a new learning strategy based on an efficient blocked Gibbs sampler for sparse overcomplete linear models.

Efficient Methods for Unsupervised Learning of Probabilistic Models

no code implementations19 May 2012 Jascha Sohl-Dickstein

In this thesis I develop a variety of techniques to train, evaluate, and sample from intractable and high dimensional probabilistic models.

An Unsupervised Algorithm For Learning Lie Group Transformations

no code implementations7 Jan 2010 Jascha Sohl-Dickstein, Ching Ming Wang, Bruno A. Olshausen

Transformation operators are represented in their eigen-basis, reducing the computational complexity of parameter estimation to that of training a linear transformation model.

Minimum Probability Flow Learning

1 code implementation25 Jun 2009 Jascha Sohl-Dickstein, Peter Battaglino, Michael R. DeWeese

Fitting probabilistic models to data is often difficult, due to the general intractability of the partition function and its derivatives.

Cannot find the paper you are looking for? You can Submit a new open access paper.