Search Results for author: Vaishnavh Nagarajan

Found 18 papers, 5 papers with code

The pitfalls of next-token prediction

1 code implementation • 11 Mar 2024 • Gregor Bachmann, Vaishnavh Nagarajan

As a starting point, we argue that the two often-conflated phases of next-token prediction -- autoregressive inference and teacher-forced training -- must be treated distinctly.

Paper
Code

What do larger image classifiers memorise?

no code implementations • 9 Oct 2023 • Michal Lukasik, Vaishnavh Nagarajan, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar

The success of modern neural networks has prompted study of the connection between memorisation and generalisation: overparameterised models generalise well, despite being able to perfectly fit (memorise) completely random labels.

Image Classification Knowledge Distillation +2

Paper
Add Code

The Cost of Down-Scaling Language Models: Fact Recall Deteriorates before In-Context Learning

no code implementations • 7 Oct 2023 • Tian Jin, Nolan Clement, Xin Dong, Vaishnavh Nagarajan, Michael Carbin, Jonathan Ragan-Kelley, Gintare Karolina Dziugaite

We study two natural scaling techniques -- weight pruning and simply training a smaller or larger model, which we refer to as dense scaling -- and their effects on two core capabilities of LLMs: (a) recalling facts presented during pre-training and (b) processing information presented in-context during inference.

In-Context Learning

Paper
Add Code

Think before you speak: Training Language Models With Pause Tokens

no code implementations • 3 Oct 2023 • Sachin Goyal, Ziwei Ji, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar, Vaishnavh Nagarajan

Language models generate responses by producing a series of tokens in immediate succession: the $(K+1)^{th}$ token is an outcome of manipulating $K$ hidden vectors per layer, one vector per preceding token.

GSM8K Question Answering

Paper
Add Code

On student-teacher deviations in distillation: does it pay to disobey?

no code implementations • NeurIPS 2023 • Vaishnavh Nagarajan, Aditya Krishna Menon, Srinadh Bhojanapalli, Hossein Mobahi, Sanjiv Kumar

Knowledge distillation (KD) has been widely used to improve the test accuracy of a "student" network, by training it to mimic the soft probabilities of a trained "teacher" network.

Knowledge Distillation

Paper
Add Code

Explaining generalization in deep learning: progress and fundamental limits

no code implementations • 17 Oct 2021 • Vaishnavh Nagarajan

This dissertation studies a fundamental open challenge in deep learning theory: why do deep networks generalize well even while being overparameterized, unregularized and fitting the training data to zero error?

Generalization Bounds Learning Theory

Paper
Add Code

Assessing Generalization of SGD via Disagreement

no code implementations • ICLR 2022 • Yiding Jiang, Vaishnavh Nagarajan, Christina Baek, J. Zico Kolter

We empirically show that the test error of deep networks can be estimated by simply training the same architecture on the same training set but with a different run of Stochastic Gradient Descent (SGD), and measuring the disagreement rate between the two networks on unlabeled test data.

Paper
Add Code

A Learning Theoretic Perspective on Local Explainability

no code implementations • ICLR 2021 • Jeffrey Li, Vaishnavh Nagarajan, Gregory Plumb, Ameet Talwalkar

In this paper, we explore connections between interpretable machine learning and learning theory through the lens of local approximation explanations.

BIG-bench Machine Learning Interpretable Machine Learning +1

Paper
Add Code

Understanding the Failure Modes of Out-of-Distribution Generalization

1 code implementation • ICLR 2021 • Vaishnavh Nagarajan, Anders Andreassen, Behnam Neyshabur

Empirical studies suggest that machine learning models often rely on features, such as the background, that may be spuriously correlated with the label only during training time, resulting in poor accuracy during test-time.

Image Classification Out-of-Distribution Generalization

Paper
Code

Provably Safe PAC-MDP Exploration Using Analogies

1 code implementation • 7 Jul 2020 • Melrose Roderick, Vaishnavh Nagarajan, J. Zico Kolter

A key challenge in applying reinforcement learning to safety-critical domains is understanding how to balance exploration (needed to attain good performance on the task) with safety (needed to avoid catastrophic failure).

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Code

Deterministic PAC-Bayesian generalization bounds for deep networks via generalizing noise-resilience

no code implementations • ICLR 2019 • Vaishnavh Nagarajan, J. Zico Kolter

The ability of overparameterized deep networks to generalize well has been linked to the fact that stochastic gradient descent (SGD) finds solutions that lie in flat, wide minima in the training loss -- minima where the output of the network is resilient to small random noise added to its parameters.

Generalization Bounds

Paper
Add Code

Uniform convergence may be unable to explain generalization in deep learning

1 code implementation • NeurIPS 2019 • Vaishnavh Nagarajan, J. Zico Kolter

Aimed at explaining the surprisingly good generalization behavior of overparameterized deep networks, recent works have developed a variety of generalization bounds for deep learning, all based on the fundamental learning-theoretic technique of uniform convergence.

Generalization Bounds

Paper
Code

Generalization in Deep Networks: The Role of Distance from Initialization

no code implementations • 7 Jan 2019 • Vaishnavh Nagarajan, J. Zico Kolter

Why does training deep neural networks using stochastic gradient descent (SGD) result in a generalization error that does not worsen with the number of parameters in the network?

Paper
Add Code

Revisiting Adversarial Risk

no code implementations • 7 Jun 2018 • Arun Sai Suggala, Adarsh Prasad, Vaishnavh Nagarajan, Pradeep Ravikumar

Based on the modified definition, we show that there is no trade-off between adversarial and standard accuracies; there exist classifiers that are robust and achieve high standard accuracy.

Image Classification

Paper
Add Code

Lifelong Learning in Costly Feature Spaces

no code implementations • 30 Jun 2017 • Maria-Florina Balcan, Avrim Blum, Vaishnavh Nagarajan

An important long-term goal in machine learning systems is to build learning agents that, like humans, can learn many tasks over their lifetime, and moreover use information from these tasks to improve their ability to do so efficiently.

Paper
Add Code

Gradient descent GAN optimization is locally stable

1 code implementation • NeurIPS 2017 • Vaishnavh Nagarajan, J. Zico Kolter

Despite the growing prominence of generative adversarial networks (GANs), optimization in GANs is still a poorly understood topic.

Paper
Code

Learning-Theoretic Foundations of Algorithm Configuration for Combinatorial Partitioning Problems

no code implementations • 14 Nov 2016 • Maria-Florina Balcan, Vaishnavh Nagarajan, Ellen Vitercik, Colin White

We address this problem for clustering, max-cut, and other partitioning problems, such as integer quadratic programming, by designing computationally efficient and sample efficient learning algorithms which receive samples from an application-specific distribution over problem instances and learn a partitioning algorithm with high expected performance.

Clustering Learning Theory

Paper
Add Code

A Reinforcement Learning Approach to Online Learning of Decision Trees

no code implementations • 24 Jul 2015 • Abhinav Garlapati, aditi raghunathan, Vaishnavh Nagarajan, Balaraman Ravindran

Online decision tree learning algorithms typically examine all features of a new data point to update model parameters.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.