Search Results for author: Abhishek Panigrahi

Found 17 papers, 6 papers with code

AdaptMI: Adaptive Skill-based In-context Math Instruction for Small Language Models

no code implementations30 Apr 2025 Yinghui He, Abhishek Panigrahi, Yong Lin, Sanjeev Arora

In-context learning (ICL) allows a language model to improve its problem-solving capability when provided with suitable information in context.

In-Context Learning Math

Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs?

1 code implementation5 Jan 2025 Simon Park, Abhishek Panigrahi, Yun Cheng, Dingli Yu, Anirudh Goyal, Sanjeev Arora

We seek strategies for training on the SIMPLE version of the tasks that improve performance on the corresponding HARD task, i. e., S2H generalization.

Image Captioning Image to text +3

Progressive distillation induces an implicit curriculum

no code implementations7 Oct 2024 Abhishek Panigrahi, Bingbin Liu, Sadhika Malladi, Andrej Risteski, Surbhi Goel

Our theoretical and empirical findings on sparse parity, complemented by empirical observations on more complex tasks, highlight the benefit of progressive distillation via implicit curriculum across setups.

Knowledge Distillation

Representing Rule-based Chatbots with Transformers

1 code implementation15 Jul 2024 Dan Friedman, Abhishek Panigrahi, Danqi Chen

Next, we train Transformers on a dataset of synthetically generated ELIZA conversations and investigate the mechanisms the models learn.

Chatbot dialog state tracking

Efficient Stagewise Pretraining via Progressive Subnetworks

no code implementations8 Feb 2024 Abhishek Panigrahi, Nikunj Saunshi, Kaifeng Lyu, Sobhan Miryoosefi, Sashank Reddi, Satyen Kale, Sanjiv Kumar

We propose an instantiation of this framework - Random Part Training (RAPTR) - that selects and trains only a random subnetwork (e. g. depth-wise, width-wise) of the network at each step, progressively increasing the size in stages.

Inductive Bias

Trainable Transformer in Transformer

1 code implementation3 Jul 2023 Abhishek Panigrahi, Sadhika Malladi, Mengzhou Xia, Sanjeev Arora

In this work, we propose an efficient construction, Transformer in Transformer (in short, TinT), that allows a transformer to simulate and fine-tune complex models internally during inference (e. g., pre-trained language models).

Attribute In-Context Learning +2

Do Transformers Parse while Predicting the Masked Word?

no code implementations14 Mar 2023 Haoyu Zhao, Abhishek Panigrahi, Rong Ge, Sanjeev Arora

We also show that the Inside-Outside algorithm is optimal for masked language modeling loss on the PCFG-generated data.

Constituency Parsing Language Modeling +2

Task-Specific Skill Localization in Fine-tuned Language Models

1 code implementation13 Feb 2023 Abhishek Panigrahi, Nikunj Saunshi, Haoyu Zhao, Sanjeev Arora

Given the downstream task and a model fine-tuned on that task, a simple optimization is used to identify a very small subset of parameters ($\sim0. 01$% of model parameters) responsible for ($>95$%) of the model's performance, in the sense that grafting the fine-tuned values for just this tiny subset onto the pre-trained model gives performance almost as well as the fine-tuned model.

Continual Learning parameter-efficient fine-tuning

On the SDEs and Scaling Rules for Adaptive Gradient Algorithms

1 code implementation20 May 2022 Sadhika Malladi, Kaifeng Lyu, Abhishek Panigrahi, Sanjeev Arora

Approximating Stochastic Gradient Descent (SGD) as a Stochastic Differential Equation (SDE) has allowed researchers to enjoy the benefits of studying a continuous optimization trajectory while carefully preserving the stochasticity of SGD.

Understanding Gradient Descent on Edge of Stability in Deep Learning

no code implementations19 May 2022 Sanjeev Arora, Zhiyuan Li, Abhishek Panigrahi

The current paper mathematically analyzes a new mechanism of implicit regularization in the EoS phase, whereby GD updates due to non-smooth loss landscape turn out to evolve along some deterministic flow on the manifold of minimum loss.

Deep Learning

Learning and Generalization in RNNs

no code implementations NeurIPS 2021 Abhishek Panigrahi, Navin Goyal

In contrast to the previous work that could only deal with functions of sequences that are sums of functions of individual tokens in the sequence, we allow general functions.

Non-Gaussianity of Stochastic Gradient Noise

no code implementations21 Oct 2019 Abhishek Panigrahi, Raghav Somani, Navin Goyal, Praneeth Netrapalli

What enables Stochastic Gradient Descent (SGD) to achieve better generalization than Gradient Descent (GD) in Neural Network training?

Effect of Activation Functions on the Training of Overparametrized Neural Nets

no code implementations ICLR 2020 Abhishek Panigrahi, Abhishek Shetty, Navin Goyal

In the present paper, we provide theoretical results about the effect of activation function on the training of highly overparametrized 2-layer neural networks.

Small Data Image Classification

Word2Sense: Sparse Interpretable Word Embeddings

no code implementations ACL 2019 Abhishek Panigrahi, Harsha Vardhan Simhadri, Chiranjib Bhattacharyya

We present an unsupervised method to generate Word2Sense word embeddings that are interpretable {---} each dimension of the embedding space corresponds to a fine-grained sense, and the non-negative value of the embedding along the j-th dimension represents the relevance of the j-th sense to the word.

Word Embeddings Word Similarity

DeepTagRec: A Content-cum-User based Tag Recommendation Framework for Stack Overflow

1 code implementation10 Mar 2019 Suman Kalyan Maity, Abhishek Panigrahi, Sayan Ghosh, Arundhati Banerjee, Pawan Goyal, Animesh Mukherjee

In this paper, we develop a content-cum-user based deep learning framework DeepTagRec to recommend appropriate question tags on Stack Overflow.

TAG

Analysis on Gradient Propagation in Batch Normalized Residual Networks

no code implementations ICLR 2018 Abhishek Panigrahi, Yueru Chen, C. -C. Jay Kuo

We conduct mathematical analysis on the effect of batch normalization (BN) on gradient backpropogation in residual network training, which is believed to play a critical role in addressing the gradient vanishing/explosion problem, in this work.

Cannot find the paper you are looking for? You can Submit a new open access paper.