no code implementations • 30 May 2024 • Jacob Mitchell Springer, Vaishnavh Nagarajan, aditi raghunathan
Sharpness-Aware Minimization (SAM) has emerged as a promising alternative optimizer to stochastic gradient descent (SGD).
1 code implementation • 11 Mar 2024 • Gregor Bachmann, Vaishnavh Nagarajan
The popular criticism that errors can compound during autoregressive inference, crucially assumes that teacher-forcing has learned an accurate next-token predictor.
no code implementations • 9 Oct 2023 • Michal Lukasik, Vaishnavh Nagarajan, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar
The success of modern neural networks has prompted study of the connection between memorisation and generalisation: overparameterised models generalise well, despite being able to perfectly fit (memorise) completely random labels.
no code implementations • 7 Oct 2023 • Tian Jin, Nolan Clement, Xin Dong, Vaishnavh Nagarajan, Michael Carbin, Jonathan Ragan-Kelley, Gintare Karolina Dziugaite
We study two natural scaling techniques -- weight pruning and simply training a smaller or larger model, which we refer to as dense scaling -- and their effects on two core capabilities of LLMs: (a) recalling facts presented during pre-training and (b) processing information presented in-context during inference.
1 code implementation • 3 Oct 2023 • Sachin Goyal, Ziwei Ji, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar, Vaishnavh Nagarajan
Language models generate responses by producing a series of tokens in immediate succession: the $(K+1)^{th}$ token is an outcome of manipulating $K$ hidden vectors per layer, one vector per preceding token.
no code implementations • NeurIPS 2023 • Vaishnavh Nagarajan, Aditya Krishna Menon, Srinadh Bhojanapalli, Hossein Mobahi, Sanjiv Kumar
Knowledge distillation (KD) has been widely used to improve the test accuracy of a "student" network, by training it to mimic the soft probabilities of a trained "teacher" network.
no code implementations • 17 Oct 2021 • Vaishnavh Nagarajan
This dissertation studies a fundamental open challenge in deep learning theory: why do deep networks generalize well even while being overparameterized, unregularized and fitting the training data to zero error?
no code implementations • ICLR 2022 • Yiding Jiang, Vaishnavh Nagarajan, Christina Baek, J. Zico Kolter
We empirically show that the test error of deep networks can be estimated by simply training the same architecture on the same training set but with a different run of Stochastic Gradient Descent (SGD), and measuring the disagreement rate between the two networks on unlabeled test data.
no code implementations • ICLR 2021 • Jeffrey Li, Vaishnavh Nagarajan, Gregory Plumb, Ameet Talwalkar
In this paper, we explore connections between interpretable machine learning and learning theory through the lens of local approximation explanations.
BIG-bench Machine Learning Interpretable Machine Learning +1
1 code implementation • ICLR 2021 • Vaishnavh Nagarajan, Anders Andreassen, Behnam Neyshabur
Empirical studies suggest that machine learning models often rely on features, such as the background, that may be spuriously correlated with the label only during training time, resulting in poor accuracy during test-time.
1 code implementation • 7 Jul 2020 • Melrose Roderick, Vaishnavh Nagarajan, J. Zico Kolter
A key challenge in applying reinforcement learning to safety-critical domains is understanding how to balance exploration (needed to attain good performance on the task) with safety (needed to avoid catastrophic failure).
no code implementations • ICLR 2019 • Vaishnavh Nagarajan, J. Zico Kolter
The ability of overparameterized deep networks to generalize well has been linked to the fact that stochastic gradient descent (SGD) finds solutions that lie in flat, wide minima in the training loss -- minima where the output of the network is resilient to small random noise added to its parameters.
1 code implementation • NeurIPS 2019 • Vaishnavh Nagarajan, J. Zico Kolter
Aimed at explaining the surprisingly good generalization behavior of overparameterized deep networks, recent works have developed a variety of generalization bounds for deep learning, all based on the fundamental learning-theoretic technique of uniform convergence.
no code implementations • 7 Jan 2019 • Vaishnavh Nagarajan, J. Zico Kolter
Why does training deep neural networks using stochastic gradient descent (SGD) result in a generalization error that does not worsen with the number of parameters in the network?
no code implementations • 7 Jun 2018 • Arun Sai Suggala, Adarsh Prasad, Vaishnavh Nagarajan, Pradeep Ravikumar
Based on the modified definition, we show that there is no trade-off between adversarial and standard accuracies; there exist classifiers that are robust and achieve high standard accuracy.
no code implementations • 30 Jun 2017 • Maria-Florina Balcan, Avrim Blum, Vaishnavh Nagarajan
An important long-term goal in machine learning systems is to build learning agents that, like humans, can learn many tasks over their lifetime, and moreover use information from these tasks to improve their ability to do so efficiently.
1 code implementation • NeurIPS 2017 • Vaishnavh Nagarajan, J. Zico Kolter
Despite the growing prominence of generative adversarial networks (GANs), optimization in GANs is still a poorly understood topic.
no code implementations • 14 Nov 2016 • Maria-Florina Balcan, Vaishnavh Nagarajan, Ellen Vitercik, Colin White
We address this problem for clustering, max-cut, and other partitioning problems, such as integer quadratic programming, by designing computationally efficient and sample efficient learning algorithms which receive samples from an application-specific distribution over problem instances and learn a partitioning algorithm with high expected performance.
no code implementations • 24 Jul 2015 • Abhinav Garlapati, aditi raghunathan, Vaishnavh Nagarajan, Balaraman Ravindran
Online decision tree learning algorithms typically examine all features of a new data point to update model parameters.