Search Results for author: Jimmy Ba

Found 68 papers, 43 papers with code

Improving Transformer Optimization Through Better Initialization

1 code implementation ICML 2020 Xiao Shi Huang, Felipe Perez, Jimmy Ba, Maksims Volkovs

As Transformer models are becoming larger and more expensive to train, recent research has focused on understanding and improving optimization in these models.

Decoder Language Modelling +2

Improving Transformer Optimization Through Better Initialization

1 code implementation ICML 2020 Xiao Shi Huang, Felipe Perez, Jimmy Ba, Maksims Volkovs

As Transformer models are becoming larger and more expensive to train, recent research has focused on understanding and improving optimization in these models.

Decoder Language Modelling +2

Report Cards: Qualitative Evaluation of Language Models Using Natural Language Summaries

no code implementations1 Sep 2024 Blair Yang, Fuyang Cui, Keiran Paster, Jimmy Ba, Pashootan Vaezipoor, Silviu Pitis, Michael R. Zhang

The rapid development and dynamic nature of large language models (LLMs) make it difficult for conventional quantitative benchmarks to accurately assess their capabilities.

Specificity

Decomposed Prompting to Answer Questions on a Course Discussion Board

1 code implementation30 Jul 2024 Brandon Jaipersaud, Paul Zhang, Jimmy Ba, Andrew Petersen, Lisa Zhang, Michael R. Zhang

We propose and evaluate a question-answering system that uses decomposed prompting to classify and answer student questions on a course discussion board.

Language Modelling Large Language Model +1

Using Large Language Models for Hyperparameter Optimization

no code implementations7 Dec 2023 Michael R. Zhang, Nishkrit Desai, Juhan Bae, Jonathan Lorraine, Jimmy Ba

This paper studies using foundational large language models (LLMs) to make decisions during hyperparameter optimization (HPO).

Bayesian Optimization Decision Making +1

OpenWebMath: An Open Dataset of High-Quality Mathematical Web Text

2 code implementations10 Oct 2023 Keiran Paster, Marco Dos Santos, Zhangir Azerbayev, Jimmy Ba

We hope that our dataset, openly released on the Hugging Face Hub, will help spur advances in the reasoning abilities of large language models.

Identifying the Risks of LM Agents with an LM-Emulated Sandbox

1 code implementation25 Sep 2023 Yangjun Ruan, Honghua Dong, Andrew Wang, Silviu Pitis, Yongchao Zhou, Jimmy Ba, Yann Dubois, Chris J. Maddison, Tatsunori Hashimoto

Alongside the emulator, we develop an LM-based automatic safety evaluator that examines agent failures and quantifies associated risks.

Language Modelling valid

Training on Thin Air: Improve Image Classification with Generated Data

1 code implementation24 May 2023 Yongchao Zhou, Hshmat Sahak, Jimmy Ba

In this paper, we present Diffusion Inversion, a simple yet effective method that leverages the pre-trained generative model, Stable Diffusion, to generate diverse, high-quality training data for image classification.

Data Augmentation Few-Shot Learning +2

AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback

2 code implementations NeurIPS 2023 Yann Dubois, Xuechen Li, Rohan Taori, Tianyi Zhang, Ishaan Gulrajani, Jimmy Ba, Carlos Guestrin, Percy Liang, Tatsunori B. Hashimoto

As a demonstration of the research possible in AlpacaFarm, we find that methods that use a reward model can substantially improve over supervised fine-tuning and that our reference PPO implementation leads to a +10% improvement in win-rate against Davinci003.

Instruction Following

Residual Prompt Tuning: Improving Prompt Tuning with Residual Reparameterization

1 code implementation6 May 2023 Anastasia Razdaibiedina, Yuning Mao, Rui Hou, Madian Khabsa, Mike Lewis, Jimmy Ba, Amjad Almahairi

In this work, we introduce Residual Prompt Tuning - a simple and efficient method that significantly improves the performance and stability of prompt tuning.

TR0N: Translator Networks for 0-Shot Plug-and-Play Conditional Generation

2 code implementations26 Apr 2023 Zhaoyan Liu, Noel Vouitsis, Satya Krishna Gorti, Jimmy Ba, Gabriel Loaiza-Ganem

We propose TR0N, a highly general framework to turn pre-trained unconditional generative models, such as GANs and VAEs, into conditional models.

Text-to-Image Generation

Boosted Prompt Ensembles for Large Language Models

1 code implementation12 Apr 2023 Silviu Pitis, Michael R. Zhang, Andrew Wang, Jimmy Ba

Methods such as chain-of-thought prompting and self-consistency have pushed the frontier of language model reasoning performance with no additional training.

GSM8K Language Modelling

Mastering Diverse Domains through World Models

7 code implementations10 Jan 2023 Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap

Developing a general algorithm that learns to solve tasks across a wide range of applications has been a fundamental challenge in artificial intelligence.

Atari Games 100k Decision Making +4

Multi-Rate VAE: Train Once, Get the Full Rate-Distortion Curve

no code implementations7 Dec 2022 Juhan Bae, Michael R. Zhang, Michael Ruan, Eric Wang, So Hasegawa, Jimmy Ba, Roger Grosse

Variational autoencoders (VAEs) are powerful tools for learning latent representations of data used in a wide range of applications.

Large Language Models Are Human-Level Prompt Engineers

4 code implementations3 Nov 2022 Yongchao Zhou, Andrei Ioan Muresanu, Ziwen Han, Keiran Paster, Silviu Pitis, Harris Chan, Jimmy Ba

By conditioning on natural language instructions, large language models (LLMs) have displayed impressive capabilities as general-purpose computers.

Few-Shot Learning In-Context Learning +3

Exploring Low Rank Training of Deep Neural Networks

no code implementations27 Sep 2022 Siddhartha Rao Kamalakara, Acyr Locatelli, Bharat Venkitesh, Jimmy Ba, Yarin Gal, Aidan N. Gomez

Training deep neural networks in low rank, i. e. with factorised layers, is of particular interest to the community: it offers efficiency over unfactorised training in terms of both memory consumption and training time.

Dataset Distillation using Neural Feature Regression

2 code implementations1 Jun 2022 Yongchao Zhou, Ehsan Nezhadarya, Jimmy Ba

Dataset distillation can be formulated as a bi-level meta-learning problem where the outer loop optimizes the meta-dataset and the inner loop trains a model on the distilled data.

Continual Learning Dataset Distillation +4

You Can't Count on Luck: Why Decision Transformers and RvS Fail in Stochastic Environments

no code implementations31 May 2022 Keiran Paster, Sheila Mcilraith, Jimmy Ba

In all tested domains, ESPER achieves significantly better alignment between the target return and achieved return than simply conditioning on returns.

Offline RL Playing the Game of 2048

High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation

no code implementations3 May 2022 Jimmy Ba, Murat A. Erdogdu, Taiji Suzuki, Zhichao Wang, Denny Wu, Greg Yang

We study the first gradient descent step on the first-layer parameters $\boldsymbol{W}$ in a two-layer neural network: $f(\boldsymbol{x}) = \frac{1}{\sqrt{N}}\boldsymbol{a}^\top\sigma(\boldsymbol{W}^\top\boldsymbol{x})$, where $\boldsymbol{W}\in\mathbb{R}^{d\times N}, \boldsymbol{a}\in\mathbb{R}^{N}$ are randomly initialized, and the training objective is the empirical MSE loss: $\frac{1}{n}\sum_{i=1}^n (f(\boldsymbol{x}_i)-y_i)^2$.

Clockwork Variational Autoencoders

2 code implementations NeurIPS 2021 Vaibhav Saxena, Jimmy Ba, Danijar Hafner

We introduce the Clockwork VAE (CW-VAE), a video prediction model that leverages a hierarchy of latent sequences, where higher levels tick at slower intervals.

Minecraft Video Prediction

LIME: Learning Inductive Bias for Primitives of Mathematical Reasoning

1 code implementation15 Jan 2021 Yuhuai Wu, Markus Rabe, Wenda Li, Jimmy Ba, Roger Grosse, Christian Szegedy

While designing inductive bias in neural architectures has been widely studied, we hypothesize that transformer networks are flexible enough to learn inductive bias from suitable generic tasks.

Inductive Bias Mathematical Reasoning

Video Prediction with Variational Temporal Hierarchies

no code implementations1 Jan 2021 Vaibhav Saxena, Jimmy Ba, Danijar Hafner

Deep learning has shown promise for accurately predicting high-dimensional video sequences.

Video Prediction

How Does a Neural Network's Architecture Impact Its Robustness to Noisy Labels?

no code implementations NeurIPS 2021 Jingling Li, Mozhi Zhang, Keyulu Xu, John P. Dickerson, Jimmy Ba

Our framework measures a network's robustness via the predictive power in its representations -- the test performance of a linear model trained on the learned representations using a small set of clean labels.

Learning with noisy labels

Evaluating Agents without Rewards

1 code implementation21 Dec 2020 Brendon Matusch, Jimmy Ba, Danijar Hafner

Moreover, input entropy and information gain correlate more strongly with human similarity than task reward does, suggesting the use of intrinsic objectives for designing agents that behave similarly to human players.

Atari Games Minecraft

Planning from Pixels using Inverse Dynamics Models

no code implementations ICLR 2021 Keiran Paster, Sheila A. McIlraith, Jimmy Ba

Learning task-agnostic dynamics models in high-dimensional observation spaces can be challenging for model-based RL agents.

Mastering Atari with Discrete World Models

9 code implementations ICLR 2021 Danijar Hafner, Timothy Lillicrap, Mohammad Norouzi, Jimmy Ba

The world model uses discrete representations and is trained separately from the policy.

Ranked #3 on Atari Games on Atari 2600 Skiing (using extra training data)

Atari Games

Action and Perception as Divergence Minimization

1 code implementation3 Sep 2020 Danijar Hafner, Pedro A. Ortega, Jimmy Ba, Thomas Parr, Karl Friston, Nicolas Heess

While the narrow objectives correspond to domain-specific rewards as typical in reinforcement learning, the general objectives maximize information with the environment through latent variable models of input sequences.

Decision Making Representation Learning

A Study of Gradient Variance in Deep Learning

1 code implementation9 Jul 2020 Fartash Faghri, David Duvenaud, David J. Fleet, Jimmy Ba

We introduce a method, Gradient Clustering, to minimize the variance of average mini-batch gradient with stratified sampling.

Clustering Deep Learning

The Scattering Compositional Learner: Discovering Objects, Attributes, Relationships in Analogical Reasoning

3 code implementations8 Jul 2020 Yuhuai Wu, Honghua Dong, Roger Grosse, Jimmy Ba

In this work, we focus on an analogical reasoning task that contains rich compositional structures, Raven's Progressive Matrices (RPM).

Zero-shot Generalization

INT: An Inequality Benchmark for Evaluating Generalization in Theorem Proving

1 code implementation ICLR 2021 Yuhuai Wu, Albert Qiaochu Jiang, Jimmy Ba, Roger Grosse

In learning-assisted theorem proving, one of the most critical challenges is to generalize to theorems unlike those seen at training time.

Automated Theorem Proving

When Does Preconditioning Help or Hurt Generalization?

no code implementations ICLR 2021 Shun-ichi Amari, Jimmy Ba, Roger Grosse, Xuechen Li, Atsushi Nitanda, Taiji Suzuki, Denny Wu, Ji Xu

While second order optimizers such as natural gradient descent (NGD) often speed up optimization, their effect on generalization has been called into question.

regression Second-order methods

Generalization of Two-layer Neural Networks: An Asymptotic Viewpoint

no code implementations ICLR 2020 Jimmy Ba, Murat Erdogdu, Taiji Suzuki, Denny Wu, Tianzong Zhang

This paper investigates the generalization properties of two-layer neural networks in high-dimensions, i. e. when the number of samples $n$, features $d$, and neurons $h$ tend to infinity at the same rate.

Inductive Bias Vocal Bursts Valence Prediction

BatchEnsemble: An Alternative Approach to Efficient Ensemble and Lifelong Learning

5 code implementations ICLR 2020 Yeming Wen, Dustin Tran, Jimmy Ba

We also apply BatchEnsemble to lifelong learning, where on Split-CIFAR-100, BatchEnsemble yields comparable performance to progressive neural networks while having a much lower computational and memory costs.

de-en Uncertainty Quantification

An Inductive Bias for Distances: Neural Nets that Respect the Triangle Inequality

2 code implementations ICLR 2020 Silviu Pitis, Harris Chan, Kiarash Jamali, Jimmy Ba

When defining distances, the triangle inequality has proven to be a useful constraint, both theoretically--to prove convergence and optimality guarantees--and empirically--as an inductive bias.

Inductive Bias Metric Learning +4

Towards Characterizing the High-dimensional Bias of Kernel-based Particle Inference Algorithms

no code implementations pproximateinference AABI Symposium 2019 Jimmy Ba, Murat A. Erdogdu, Marzyeh Ghassemi, Taiji Suzuki, Shengyang Sun, Denny Wu, Tianzong Zhang

Particle-based inference algorithm is a promising method to efficiently generate samples for an intractable target distribution by iteratively updating a set of particles.

LEMMA

On Solving Minimax Optimization Locally: A Follow-the-Ridge Approach

no code implementations ICLR 2020 Yuanhao Wang, Guodong Zhang, Jimmy Ba

Many tasks in modern machine learning can be formulated as finding equilibria in \emph{sequential} games.

Benchmarking Model-Based Reinforcement Learning

2 code implementations3 Jul 2019 Tingwu Wang, Xuchan Bao, Ignasi Clavera, Jerrick Hoang, Yeming Wen, Eric Langlois, Shunshi Zhang, Guodong Zhang, Pieter Abbeel, Jimmy Ba

Model-based reinforcement learning (MBRL) is widely seen as having the potential to be significantly more sample efficient than model-free RL.

Benchmarking Model-based Reinforcement Learning +4

Exploring Model-based Planning with Policy Networks

1 code implementation ICLR 2020 Tingwu Wang, Jimmy Ba

Model-based reinforcement learning (MBRL) with model-predictive control or online planning has shown great potential for locomotion control tasks in terms of both sample efficiency and asymptotic performance.

Benchmarking Model-based Reinforcement Learning +2

Neural Graph Evolution: Towards Efficient Automatic Robot Design

1 code implementation12 Jun 2019 Tingwu Wang, Yuhao Zhou, Sanja Fidler, Jimmy Ba

To address the two challenges, we formulate automatic robot design as a graph search problem and perform evolution search in graph space.

Graph Normalizing Flows

1 code implementation NeurIPS 2019 Jenny Liu, Aviral Kumar, Jimmy Ba, Jamie Kiros, Kevin Swersky

We introduce graph normalizing flows: a new, reversible graph neural network model for prediction and generation.

Graph Neural Network

Neural Graph Evolution: Automatic Robot Design

no code implementations ICLR 2019 Tingwu Wang, Yuhao Zhou, Sanja Fidler, Jimmy Ba

To address the two challenges, we formulate automatic robot design as a graph search problem and perform evolution search in graph space.

An Empirical Study of Large-Batch Stochastic Gradient Descent with Structured Covariance Noise

no code implementations21 Feb 2019 Yeming Wen, Kevin Luk, Maxime Gazeau, Guodong Zhang, Harris Chan, Jimmy Ba

We demonstrate that the learning performance of our method is more accurately captured by the structure of the covariance matrix of the noise rather than by the variance of gradients.

Stochastic Optimization

DOM-Q-NET: Grounded RL on Structured Language

1 code implementation ICLR 2019 Sheng Jia, Jamie Kiros, Jimmy Ba

Building agents to interact with the web would allow for significant improvements in knowledge understanding and representation learning.

Graph Neural Network Reinforcement Learning +2

ACTRCE: Augmenting Experience via Teacher's Advice For Multi-Goal Reinforcement Learning

no code implementations12 Feb 2019 Harris Chan, Yuhuai Wu, Jamie Kiros, Sanja Fidler, Jimmy Ba

We first analyze the differences among goal representation, and show that ACTRCE can efficiently solve difficult reinforcement learning problems in challenging 3D navigation tasks, whereas HER with non-language goal representation failed to learn.

Multi-Goal Reinforcement Learning reinforcement-learning +2

Reversible Recurrent Neural Networks

1 code implementation NeurIPS 2018 Matthew MacKay, Paul Vicol, Jimmy Ba, Roger Grosse

Reversible RNNs---RNNs for which the hidden-to-hidden transition can be reversed---offer a path to reduce the memory requirements of training, as hidden states need not be stored and instead can be recomputed during backpropagation.

Decoder

Exploring Curvature Noise in Large-Batch Stochastic Optimization

no code implementations27 Sep 2018 Yeming Wen, Kevin Luk, Maxime Gazeau, Guodong Zhang, Harris Chan, Jimmy Ba

Unfortunately, a major drawback is the so-called generalization gap: large-batch training typically leads to a degradation in generalization performance of the model as compared to small-batch training.

Stochastic Optimization

Flipout: Efficient Pseudo-Independent Weight Perturbations on Mini-Batches

3 code implementations ICLR 2018 Yeming Wen, Paul Vicol, Jimmy Ba, Dustin Tran, Roger Grosse

Stochastic neural net weights are used in a variety of contexts, including regularization, Bayesian neural nets, exploration in reinforcement learning, and evolution strategies.

Reinforcement Learning

On the Convergence and Robustness of Training GANs with Regularized Optimal Transport

no code implementations NeurIPS 2018 Maziar Sanjabi, Jimmy Ba, Meisam Razaviyayn, Jason D. Lee

A popular GAN formulation is based on the use of Wasserstein distance as a metric between probability distributions.

Kronecker-factored Curvature Approximations for Recurrent Neural Networks

no code implementations ICLR 2018 James Martens, Jimmy Ba, Matt Johnson

Kronecker-factor Approximate Curvature (Martens & Grosse, 2015) (K-FAC) is a 2nd-order optimization method which has been shown to give state-of-the-art performance on large-scale neural network optimization tasks (Ba et al., 2017).

Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation

8 code implementations NeurIPS 2017 Yuhuai Wu, Elman Mansimov, Shun Liao, Roger Grosse, Jimmy Ba

In this work, we propose to apply trust region optimization to deep reinforcement learning using a recently proposed Kronecker-factored approximation to the curvature.

Atari Games continuous-control +4

Using Fast Weights to Attend to the Recent Past

3 code implementations NeurIPS 2016 Jimmy Ba, Geoffrey Hinton, Volodymyr Mnih, Joel Z. Leibo, Catalin Ionescu

Until recently, research on artificial neural networks was largely restricted to systems with only two types of variable: Neural activities that represent the current or recent input and weights that learn to capture regularities among inputs, outputs and payoffs.

Learning Wake-Sleep Recurrent Attention Models

no code implementations NeurIPS 2015 Jimmy Ba, Roger Grosse, Ruslan Salakhutdinov, Brendan Frey

Despite their success, convolutional neural networks are computationally expensive because they must examine all image locations.

Caption Generation Computational Efficiency +2

Predicting Deep Zero-Shot Convolutional Neural Networks using Textual Descriptions

no code implementations ICCV 2015 Jimmy Ba, Kevin Swersky, Sanja Fidler, Ruslan Salakhutdinov

One of the main challenges in Zero-Shot Learning of visual categories is gathering semantic attributes to accompany images.

Zero-Shot Learning

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

90 code implementations10 Feb 2015 Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, Yoshua Bengio

Inspired by recent work in machine translation and object detection, we introduce an attention based model that automatically learns to describe the content of images.

Caption Generation Image Captioning +1

Multiple Object Recognition with Visual Attention

5 code implementations24 Dec 2014 Jimmy Ba, Volodymyr Mnih, Koray Kavukcuoglu

We present an attention-based model for recognizing multiple objects in images.

Object Object Recognition +3

Adam: A Method for Stochastic Optimization

85 code implementations22 Dec 2014 Diederik P. Kingma, Jimmy Ba

We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments.

Stochastic Optimization

Adaptive dropout for training deep neural networks

no code implementations NeurIPS 2013 Jimmy Ba, Brendan Frey

For example, our model achieves 5. 8% error on the NORB test set, which is better than state-of-the-art results obtained using convolutional architectures. "

Denoising

Cannot find the paper you are looking for? You can Submit a new open access paper.