Search Results for author: Sadhika Malladi

Found 17 papers, 11 papers with code

Overtrained Language Models Are Harder to Fine-Tune

no code implementations24 Mar 2025 Jacob Mitchell Springer, Sachin Goyal, Kaiyue Wen, Tanishq Kumar, Xiang Yue, Sadhika Malladi, Graham Neubig, aditi raghunathan

Large language models are pre-trained on ever-growing token budgets under the assumption that better pre-training performance translates to improved downstream models.

Metadata Conditioning Accelerates Language Model Pre-training

1 code implementation3 Jan 2025 Tianyu Gao, Alexander Wettig, Luxi He, Yihe Dong, Sadhika Malladi, Danqi Chen

The vast diversity of styles, domains, and quality levels present in language model pre-training corpora is essential in developing general model capabilities, but efficiently learning and deploying the correct behaviors exemplified in each of these heterogeneous data sources is challenging.

Language Modeling Language Modelling +1

Provable unlearning in topic modeling and downstream tasks

no code implementations19 Nov 2024 Stanley Wei, Sadhika Malladi, Sanjeev Arora, Amartya Sanyal

Machine unlearning algorithms are increasingly important as legal concerns arise around the provenance of training data, but verifying the success of unlearning is often difficult.

Machine Unlearning Topic Models

Adaptive Data Optimization: Dynamic Sample Selection with Scaling Laws

2 code implementations15 Oct 2024 Yiding Jiang, Allan Zhou, Zhili Feng, Sadhika Malladi, J. Zico Kolter

The composition of pretraining data is a key determinant of foundation models' performance, but there is no standard guideline for allocating a limited computational budget across different data sources.

Computational Efficiency

Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization

1 code implementation11 Oct 2024 Noam Razin, Sadhika Malladi, Adithya Bhaskar, Danqi Chen, Sanjeev Arora, Boris Hanin

Direct Preference Optimization (DPO) and its variants are increasingly used for aligning language models with human preferences.

Progressive distillation induces an implicit curriculum

no code implementations7 Oct 2024 Abhishek Panigrahi, Bingbin Liu, Sadhika Malladi, Andrej Risteski, Surbhi Goel

Our theoretical and empirical findings on sparse parity, complemented by empirical observations on more complex tasks, highlight the benefit of progressive distillation via implicit curriculum across setups.

Knowledge Distillation

CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs

1 code implementation26 Jun 2024 ZiRui Wang, Mengzhou Xia, Luxi He, Howard Chen, Yitao Liu, Richard Zhu, Kaiqu Liang, Xindi Wu, Haotian Liu, Sadhika Malladi, Alexis Chevalier, Sanjeev Arora, Danqi Chen

All models lag far behind human performance of 80. 5%, underscoring weaknesses in the chart understanding capabilities of existing MLLMs.

Chart Understanding

Preference Learning Algorithms Do Not Learn Preference Rankings

no code implementations29 May 2024 Angelica Chen, Sadhika Malladi, Lily H. Zhang, Xinyi Chen, Qiuyi Zhang, Rajesh Ranganath, Kyunghyun Cho

Preference learning algorithms (e. g., RLHF and DPO) are frequently used to steer LLMs to produce generations that are more preferred by humans, but our understanding of their inner workings is still limited.

Attribute

LESS: Selecting Influential Data for Targeted Instruction Tuning

3 code implementations6 Feb 2024 Mengzhou Xia, Sadhika Malladi, Suchin Gururangan, Sanjeev Arora, Danqi Chen

Instruction tuning has unlocked powerful capabilities in large language models (LLMs), effectively using combined datasets to develop generalpurpose chatbots.

The Marginal Value of Momentum for Small Learning Rate SGD

no code implementations27 Jul 2023 Runzhe Wang, Sadhika Malladi, Tianhao Wang, Kaifeng Lyu, Zhiyuan Li

Momentum is known to accelerate the convergence of gradient descent in strongly convex settings without stochastic gradient noise.

Stochastic Optimization

Trainable Transformer in Transformer

1 code implementation3 Jul 2023 Abhishek Panigrahi, Sadhika Malladi, Mengzhou Xia, Sanjeev Arora

In this work, we propose an efficient construction, Transformer in Transformer (in short, TinT), that allows a transformer to simulate and fine-tune complex models internally during inference (e. g., pre-trained language models).

Attribute In-Context Learning +2

Fine-Tuning Language Models with Just Forward Passes

3 code implementations NeurIPS 2023 Sadhika Malladi, Tianyu Gao, Eshaan Nichani, Alex Damian, Jason D. Lee, Danqi Chen, Sanjeev Arora

Fine-tuning language models (LMs) has yielded success on diverse downstream tasks, but as LMs grow in size, backpropagation requires a prohibitively large amount of memory.

In-Context Learning Multiple-choice

A Kernel-Based View of Language Model Fine-Tuning

1 code implementation11 Oct 2022 Sadhika Malladi, Alexander Wettig, Dingli Yu, Danqi Chen, Sanjeev Arora

It has become standard to solve NLP tasks by fine-tuning pre-trained language models (LMs), especially in low-data settings.

Language Modeling Language Modelling

On the SDEs and Scaling Rules for Adaptive Gradient Algorithms

1 code implementation20 May 2022 Sadhika Malladi, Kaifeng Lyu, Abhishek Panigrahi, Sanjeev Arora

Approximating Stochastic Gradient Descent (SGD) as a Stochastic Differential Equation (SDE) has allowed researchers to enjoy the benefits of studying a continuous optimization trajectory while carefully preserving the stochasticity of SGD.

On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs)

1 code implementation NeurIPS 2021 Zhiyuan Li, Sadhika Malladi, Sanjeev Arora

It is generally recognized that finite learning rate (LR), in contrast to infinitesimal LR, is important for good generalization in real-life deep nets.

A Mathematical Exploration of Why Language Models Help Solve Downstream Tasks

no code implementations ICLR 2021 Nikunj Saunshi, Sadhika Malladi, Sanjeev Arora

This paper initiates a mathematical study of this phenomenon for the downstream task of text classification by considering the following questions: (1) What is the intuitive connection between the pretraining task of next word prediction and text classification?

General Classification Language Modeling +5

Cannot find the paper you are looking for? You can Submit a new open access paper.