Search Results for author: Jimmy Ba

Found 49 papers, 29 papers with code

Improving Transformer Optimization Through Better Initialization

1 code implementation ICML 2020 Xiao Shi Huang, Felipe Perez, Jimmy Ba, Maksims Volkovs

As Transformer models are becoming larger and more expensive to train, recent research has focused on understanding and improving optimization in these models.

Language Modelling Machine Translation +1

Improving Transformer Optimization Through Better Initialization

1 code implementation ICML 2020 Xiao Shi Huang, Felipe Perez, Jimmy Ba, Maksims Volkovs

As Transformer models are becoming larger and more expensive to train, recent research has focused on understanding and improving optimization in these models.

Language Modelling Machine Translation +1

High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation

no code implementations3 May 2022 Jimmy Ba, Murat A. Erdogdu, Taiji Suzuki, Zhichao Wang, Denny Wu, Greg Yang

We study the first gradient descent step on the first-layer parameters $\boldsymbol{W}$ in a two-layer neural network: $f(\boldsymbol{x}) = \frac{1}{\sqrt{N}}\boldsymbol{a}^\top\sigma(\boldsymbol{W}^\top\boldsymbol{x})$, where $\boldsymbol{W}\in\mathbb{R}^{d\times N}, \boldsymbol{a}\in\mathbb{R}^{N}$ are randomly initialized, and the training objective is the empirical MSE loss: $\frac{1}{n}\sum_{i=1}^n (f(\boldsymbol{x}_i)-y_i)^2$.

Learning Domain Invariant Representations in Goal-conditioned Block MDPs

1 code implementation NeurIPS 2021 Beining Han, Chongyi Zheng, Harris Chan, Keiran Paster, Michael R. Zhang, Jimmy Ba

These changes are often spurious and unrelated to the underlying problem, such as background shifts for visual input agents.

Domain Generalization

Understanding the Variance Collapse of SVGD in High Dimensions

no code implementations ICLR 2022 Jimmy Ba, Murat A Erdogdu, Marzyeh Ghassemi, Shengyang Sun, Taiji Suzuki, Denny Wu, Tianzong Zhang

Stein variational gradient descent (SVGD) is a deterministic inference algorithm that evolves a set of particles to fit a target distribution.

Clockwork Variational Autoencoders

2 code implementations NeurIPS 2021 Vaibhav Saxena, Jimmy Ba, Danijar Hafner

We introduce the Clockwork VAE (CW-VAE), a video prediction model that leverages a hierarchy of latent sequences, where higher levels tick at slower intervals.

Video Prediction

LIME: Learning Inductive Bias for Primitives of Mathematical Reasoning

1 code implementation15 Jan 2021 Yuhuai Wu, Markus Rabe, Wenda Li, Jimmy Ba, Roger Grosse, Christian Szegedy

While designing inductive bias in neural architectures has been widely studied, we hypothesize that transformer networks are flexible enough to learn inductive bias from suitable generic tasks.

Mathematical Reasoning

Video Prediction with Variational Temporal Hierarchies

no code implementations1 Jan 2021 Vaibhav Saxena, Jimmy Ba, Danijar Hafner

Deep learning has shown promise for accurately predicting high-dimensional video sequences.

Video Prediction

How Does a Neural Network's Architecture Impact Its Robustness to Noisy Labels?

no code implementations NeurIPS 2021 Jingling Li, Mozhi Zhang, Keyulu Xu, John P. Dickerson, Jimmy Ba

Our framework measures a network's robustness via the predictive power in its representations -- the test performance of a linear model trained on the learned representations using a small set of clean labels.

Learning with noisy labels

Evaluating Agents without Rewards

1 code implementation21 Dec 2020 Brendon Matusch, Jimmy Ba, Danijar Hafner

Moreover, input entropy and information gain correlate more strongly with human similarity than task reward does, suggesting the use of intrinsic objectives for designing agents that behave similarly to human players.

Atari Games

Planning from Pixels using Inverse Dynamics Models

no code implementations ICLR 2021 Keiran Paster, Sheila A. McIlraith, Jimmy Ba

Learning task-agnostic dynamics models in high-dimensional observation spaces can be challenging for model-based RL agents.

Mastering Atari with Discrete World Models

6 code implementations ICLR 2021 Danijar Hafner, Timothy Lillicrap, Mohammad Norouzi, Jimmy Ba

The world model uses discrete representations and is trained separately from the policy.

Ranked #3 on Atari Games on Atari 2600 Skiing (using extra training data)

Atari Games

Action and Perception as Divergence Minimization

1 code implementation3 Sep 2020 Danijar Hafner, Pedro A. Ortega, Jimmy Ba, Thomas Parr, Karl Friston, Nicolas Heess

While the narrow objectives correspond to domain-specific rewards as typical in reinforcement learning, the general objectives maximize information with the environment through latent variable models of input sequences.

Decision Making Representation Learning

A Study of Gradient Variance in Deep Learning

1 code implementation9 Jul 2020 Fartash Faghri, David Duvenaud, David J. Fleet, Jimmy Ba

We introduce a method, Gradient Clustering, to minimize the variance of average mini-batch gradient with stratified sampling.

The Scattering Compositional Learner: Discovering Objects, Attributes, Relationships in Analogical Reasoning

3 code implementations8 Jul 2020 Yuhuai Wu, Honghua Dong, Roger Grosse, Jimmy Ba

In this work, we focus on an analogical reasoning task that contains rich compositional structures, Raven's Progressive Matrices (RPM).

INT: An Inequality Benchmark for Evaluating Generalization in Theorem Proving

1 code implementation ICLR 2021 Yuhuai Wu, Albert Qiaochu Jiang, Jimmy Ba, Roger Grosse

In learning-assisted theorem proving, one of the most critical challenges is to generalize to theorems unlike those seen at training time.

Automated Theorem Proving

When Does Preconditioning Help or Hurt Generalization?

no code implementations ICLR 2021 Shun-ichi Amari, Jimmy Ba, Roger Grosse, Xuechen Li, Atsushi Nitanda, Taiji Suzuki, Denny Wu, Ji Xu

While second order optimizers such as natural gradient descent (NGD) often speed up optimization, their effect on generalization has been called into question.

Second-order methods

Generalization of Two-layer Neural Networks: An Asymptotic Viewpoint

no code implementations ICLR 2020 Jimmy Ba, Murat Erdogdu, Taiji Suzuki, Denny Wu, Tianzong Zhang

This paper investigates the generalization properties of two-layer neural networks in high-dimensions, i. e. when the number of samples $n$, features $d$, and neurons $h$ tend to infinity at the same rate.

BatchEnsemble: An Alternative Approach to Efficient Ensemble and Lifelong Learning

4 code implementations ICLR 2020 Yeming Wen, Dustin Tran, Jimmy Ba

We also apply BatchEnsemble to lifelong learning, where on Split-CIFAR-100, BatchEnsemble yields comparable performance to progressive neural networks while having a much lower computational and memory costs.

An Inductive Bias for Distances: Neural Nets that Respect the Triangle Inequality

1 code implementation ICLR 2020 Silviu Pitis, Harris Chan, Kiarash Jamali, Jimmy Ba

When defining distances, the triangle inequality has proven to be a useful constraint, both theoretically--to prove convergence and optimality guarantees--and empirically--as an inductive bias.

Metric Learning Multi-Goal Reinforcement Learning +1

Towards Characterizing the High-dimensional Bias of Kernel-based Particle Inference Algorithms

no code implementations pproximateinference AABI Symposium 2019 Jimmy Ba, Murat A. Erdogdu, Marzyeh Ghassemi, Taiji Suzuki, Shengyang Sun, Denny Wu, Tianzong Zhang

Particle-based inference algorithm is a promising method to efficiently generate samples for an intractable target distribution by iteratively updating a set of particles.

On Solving Minimax Optimization Locally: A Follow-the-Ridge Approach

no code implementations ICLR 2020 Yuanhao Wang, Guodong Zhang, Jimmy Ba

Many tasks in modern machine learning can be formulated as finding equilibria in \emph{sequential} games.

A Non-asymptotic comparison of SVRG and SGD: tradeoffs between compute and speed

no code implementations25 Sep 2019 Qingru Zhang, Yuhuai Wu, Fartash Faghri, Tianzong Zhang, Jimmy Ba

In this paper, we present a non-asymptotic analysis of SVRG under a noisy least squares regression problem.

Stochastic Optimization

Benchmarking Model-Based Reinforcement Learning

2 code implementations3 Jul 2019 Tingwu Wang, Xuchan Bao, Ignasi Clavera, Jerrick Hoang, Yeming Wen, Eric Langlois, Shunshi Zhang, Guodong Zhang, Pieter Abbeel, Jimmy Ba

Model-based reinforcement learning (MBRL) is widely seen as having the potential to be significantly more sample efficient than model-free RL.

Model-based Reinforcement Learning reinforcement-learning

Exploring Model-based Planning with Policy Networks

1 code implementation ICLR 2020 Tingwu Wang, Jimmy Ba

Model-based reinforcement learning (MBRL) with model-predictive control or online planning has shown great potential for locomotion control tasks in terms of both sample efficiency and asymptotic performance.

Model-based Reinforcement Learning

Neural Graph Evolution: Towards Efficient Automatic Robot Design

1 code implementation12 Jun 2019 Tingwu Wang, Yuhao Zhou, Sanja Fidler, Jimmy Ba

To address the two challenges, we formulate automatic robot design as a graph search problem and perform evolution search in graph space.

Graph Normalizing Flows

1 code implementation NeurIPS 2019 Jenny Liu, Aviral Kumar, Jimmy Ba, Jamie Kiros, Kevin Swersky

We introduce graph normalizing flows: a new, reversible graph neural network model for prediction and generation.

Neural Graph Evolution: Automatic Robot Design

no code implementations ICLR 2019 Tingwu Wang, Yuhao Zhou, Sanja Fidler, Jimmy Ba

To address the two challenges, we formulate automatic robot design as a graph search problem and perform evolution search in graph space.

An Empirical Study of Large-Batch Stochastic Gradient Descent with Structured Covariance Noise

no code implementations21 Feb 2019 Yeming Wen, Kevin Luk, Maxime Gazeau, Guodong Zhang, Harris Chan, Jimmy Ba

We demonstrate that the learning performance of our method is more accurately captured by the structure of the covariance matrix of the noise rather than by the variance of gradients.

Stochastic Optimization

DOM-Q-NET: Grounded RL on Structured Language

1 code implementation ICLR 2019 Sheng Jia, Jamie Kiros, Jimmy Ba

Building agents to interact with the web would allow for significant improvements in knowledge understanding and representation learning.

Representation Learning

ACTRCE: Augmenting Experience via Teacher's Advice For Multi-Goal Reinforcement Learning

no code implementations12 Feb 2019 Harris Chan, Yuhuai Wu, Jamie Kiros, Sanja Fidler, Jimmy Ba

We first analyze the differences among goal representation, and show that ACTRCE can efficiently solve difficult reinforcement learning problems in challenging 3D navigation tasks, whereas HER with non-language goal representation failed to learn.

Multi-Goal Reinforcement Learning reinforcement-learning

Reversible Recurrent Neural Networks

1 code implementation NeurIPS 2018 Matthew MacKay, Paul Vicol, Jimmy Ba, Roger Grosse

Reversible RNNs---RNNs for which the hidden-to-hidden transition can be reversed---offer a path to reduce the memory requirements of training, as hidden states need not be stored and instead can be recomputed during backpropagation.

Exploring Curvature Noise in Large-Batch Stochastic Optimization

no code implementations27 Sep 2018 Yeming Wen, Kevin Luk, Maxime Gazeau, Guodong Zhang, Harris Chan, Jimmy Ba

Unfortunately, a major drawback is the so-called generalization gap: large-batch training typically leads to a degradation in generalization performance of the model as compared to small-batch training.

Stochastic Optimization

Flipout: Efficient Pseudo-Independent Weight Perturbations on Mini-Batches

3 code implementations ICLR 2018 Yeming Wen, Paul Vicol, Jimmy Ba, Dustin Tran, Roger Grosse

Stochastic neural net weights are used in a variety of contexts, including regularization, Bayesian neural nets, exploration in reinforcement learning, and evolution strategies.

On the Convergence and Robustness of Training GANs with Regularized Optimal Transport

no code implementations NeurIPS 2018 Maziar Sanjabi, Jimmy Ba, Meisam Razaviyayn, Jason D. Lee

A popular GAN formulation is based on the use of Wasserstein distance as a metric between probability distributions.

Kronecker-factored Curvature Approximations for Recurrent Neural Networks

no code implementations ICLR 2018 James Martens, Jimmy Ba, Matt Johnson

Kronecker-factor Approximate Curvature (Martens & Grosse, 2015) (K-FAC) is a 2nd-order optimization method which has been shown to give state-of-the-art performance on large-scale neural network optimization tasks (Ba et al., 2017).

Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation

8 code implementations NeurIPS 2017 Yuhuai Wu, Elman Mansimov, Shun Liao, Roger Grosse, Jimmy Ba

In this work, we propose to apply trust region optimization to deep reinforcement learning using a recently proposed Kronecker-factored approximation to the curvature.

Atari Games Continuous Control +1

Using Fast Weights to Attend to the Recent Past

4 code implementations NeurIPS 2016 Jimmy Ba, Geoffrey Hinton, Volodymyr Mnih, Joel Z. Leibo, Catalin Ionescu

Until recently, research on artificial neural networks was largely restricted to systems with only two types of variable: Neural activities that represent the current or recent input and weights that learn to capture regularities among inputs, outputs and payoffs.

Learning Wake-Sleep Recurrent Attention Models

no code implementations NeurIPS 2015 Jimmy Ba, Roger Grosse, Ruslan Salakhutdinov, Brendan Frey

Despite their success, convolutional neural networks are computationally expensive because they must examine all image locations.

General Classification Image Classification

Predicting Deep Zero-Shot Convolutional Neural Networks using Textual Descriptions

no code implementations ICCV 2015 Jimmy Ba, Kevin Swersky, Sanja Fidler, Ruslan Salakhutdinov

One of the main challenges in Zero-Shot Learning of visual categories is gathering semantic attributes to accompany images.

Zero-Shot Learning

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

76 code implementations10 Feb 2015 Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, Yoshua Bengio

Inspired by recent work in machine translation and object detection, we introduce an attention based model that automatically learns to describe the content of images.

Image Captioning Translation

Adam: A Method for Stochastic Optimization

71 code implementations22 Dec 2014 Diederik P. Kingma, Jimmy Ba

We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments.

Stochastic Optimization

Adaptive dropout for training deep neural networks

no code implementations NeurIPS 2013 Jimmy Ba, Brendan Frey

For example, our model achieves 5. 8% error on the NORB test set, which is better than state-of-the-art results obtained using convolutional architectures. "

Denoising

Cannot find the paper you are looking for? You can Submit a new open access paper.