Search Results for author: Sadhika Malladi

Found 8 papers, 6 papers with code

LESS: Selecting Influential Data for Targeted Instruction Tuning

1 code implementation6 Feb 2024 Mengzhou Xia, Sadhika Malladi, Suchin Gururangan, Sanjeev Arora, Danqi Chen

Instruction tuning has unlocked powerful capabilities in large language models (LLMs), effectively using combined datasets to develop generalpurpose chatbots.

The Marginal Value of Momentum for Small Learning Rate SGD

no code implementations27 Jul 2023 Runzhe Wang, Sadhika Malladi, Tianhao Wang, Kaifeng Lyu, Zhiyuan Li

Momentum is known to accelerate the convergence of gradient descent in strongly convex settings without stochastic gradient noise.

Stochastic Optimization

Trainable Transformer in Transformer

1 code implementation3 Jul 2023 Abhishek Panigrahi, Sadhika Malladi, Mengzhou Xia, Sanjeev Arora

In this work, we propose an efficient construction, Transformer in Transformer (in short, TinT), that allows a transformer to simulate and fine-tune complex models internally during inference (e. g., pre-trained language models).

Attribute In-Context Learning +1

Fine-Tuning Language Models with Just Forward Passes

2 code implementations NeurIPS 2023 Sadhika Malladi, Tianyu Gao, Eshaan Nichani, Alex Damian, Jason D. Lee, Danqi Chen, Sanjeev Arora

Fine-tuning language models (LMs) has yielded success on diverse downstream tasks, but as LMs grow in size, backpropagation requires a prohibitively large amount of memory.

In-Context Learning Multiple-choice

A Kernel-Based View of Language Model Fine-Tuning

1 code implementation11 Oct 2022 Sadhika Malladi, Alexander Wettig, Dingli Yu, Danqi Chen, Sanjeev Arora

It has become standard to solve NLP tasks by fine-tuning pre-trained language models (LMs), especially in low-data settings.

Language Modelling

On the SDEs and Scaling Rules for Adaptive Gradient Algorithms

1 code implementation20 May 2022 Sadhika Malladi, Kaifeng Lyu, Abhishek Panigrahi, Sanjeev Arora

Approximating Stochastic Gradient Descent (SGD) as a Stochastic Differential Equation (SDE) has allowed researchers to enjoy the benefits of studying a continuous optimization trajectory while carefully preserving the stochasticity of SGD.

On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs)

1 code implementation NeurIPS 2021 Zhiyuan Li, Sadhika Malladi, Sanjeev Arora

It is generally recognized that finite learning rate (LR), in contrast to infinitesimal LR, is important for good generalization in real-life deep nets.

A Mathematical Exploration of Why Language Models Help Solve Downstream Tasks

no code implementations ICLR 2021 Nikunj Saunshi, Sadhika Malladi, Sanjeev Arora

This paper initiates a mathematical study of this phenomenon for the downstream task of text classification by considering the following questions: (1) What is the intuitive connection between the pretraining task of next word prediction and text classification?

General Classification Language Modelling +4

Cannot find the paper you are looking for? You can Submit a new open access paper.