Search Results for author: Nikunj Saunshi

Found 17 papers, 7 papers with code

Efficient Stagewise Pretraining via Progressive Subnetworks

no code implementations8 Feb 2024 Abhishek Panigrahi, Nikunj Saunshi, Kaifeng Lyu, Sobhan Miryoosefi, Sashank Reddi, Satyen Kale, Sanjiv Kumar

RaPTr achieves better pre-training loss for BERT and UL2 language models while requiring 20-33% fewer FLOPs compared to standard training, and is competitive or better than other efficient training methods.

Reasoning in Large Language Models Through Symbolic Math Word Problems

1 code implementation3 Aug 2023 Vedant Gaur, Nikunj Saunshi

Large language models (LLMs) have revolutionized NLP by solving downstream tasks with little to no labeled data.

Math

Task-Specific Skill Localization in Fine-tuned Language Models

1 code implementation13 Feb 2023 Abhishek Panigrahi, Nikunj Saunshi, Haoyu Zhao, Sanjeev Arora

Given the downstream task and a model fine-tuned on that task, a simple optimization is used to identify a very small subset of parameters ($\sim0. 01$% of model parameters) responsible for ($>95$%) of the model's performance, in the sense that grafting the fine-tuned values for just this tiny subset onto the pre-trained model gives performance almost as well as the fine-tuned model.

Continual Learning

New Definitions and Evaluations for Saliency Methods: Staying Intrinsic, Complete and Sound

1 code implementation5 Nov 2022 Arushi Gupta, Nikunj Saunshi, Dingli Yu, Kaifeng Lyu, Sanjeev Arora

Saliency methods compute heat maps that highlight portions of an input that were most {\em important} for the label assigned to it by a deep net.

Understanding Influence Functions and Datamodels via Harmonic Analysis

no code implementations3 Oct 2022 Nikunj Saunshi, Arushi Gupta, Mark Braverman, Sanjeev Arora

Influence functions estimate effect of individual data points on predictions of the model on test data and were adapted to deep learning in Koh and Liang [2017].

Data Poisoning

Understanding Contrastive Learning Requires Incorporating Inductive Biases

no code implementations28 Feb 2022 Nikunj Saunshi, Jordan Ash, Surbhi Goel, Dipendra Misra, Cyril Zhang, Sanjeev Arora, Sham Kakade, Akshay Krishnamurthy

Contrastive learning is a popular form of self-supervised learning that encourages augmentations (views) of the same input to have more similar representations compared to augmentations of different inputs.

Contrastive Learning Self-Supervised Learning

On Predicting Generalization using GANs

no code implementations ICLR 2022 Yi Zhang, Arushi Gupta, Nikunj Saunshi, Sanjeev Arora

Research on generalization bounds for deep networks seeks to give ways to predict test error using just the training dataset and the network parameters.

Generalization Bounds Generative Adversarial Network

New Definitions and Evaluations for Saliency Methods: Staying Intrinsic and Sound

no code implementations29 Sep 2021 Arushi Gupta, Nikunj Saunshi, Dingli Yu, Kaifeng Lyu, Sanjeev Arora

Saliency methods seek to provide human-interpretable explanations for the output of machine learning model on a given input.

A Representation Learning Perspective on the Importance of Train-Validation Splitting in Meta-Learning

1 code implementation29 Jun 2021 Nikunj Saunshi, Arushi Gupta, Wei Hu

An effective approach in meta-learning is to utilize multiple "train tasks" to learn a good initialization for model parameters that can help solve unseen "test tasks" with very few samples by fine-tuning from this initialization.

Meta-Learning Representation Learning

A Mathematical Exploration of Why Language Models Help Solve Downstream Tasks

no code implementations ICLR 2021 Nikunj Saunshi, Sadhika Malladi, Sanjeev Arora

This paper initiates a mathematical study of this phenomenon for the downstream task of text classification by considering the following questions: (1) What is the intuitive connection between the pretraining task of next word prediction and text classification?

General Classification Language Modelling +4

Predicting What You Already Know Helps: Provable Self-Supervised Learning

no code implementations NeurIPS 2021 Jason D. Lee, Qi Lei, Nikunj Saunshi, Jiacheng Zhuo

Self-supervised representation learning solves auxiliary prediction tasks (known as pretext tasks) without requiring labeled data to learn useful semantic representations.

Representation Learning Self-Supervised Learning

A Sample Complexity Separation between Non-Convex and Convex Meta-Learning

no code implementations ICML 2020 Nikunj Saunshi, Yi Zhang, Mikhail Khodak, Sanjeev Arora

In contrast, for the non-convex formulation of a two layer linear network on the same instance, we show that both Reptile and multi-task representation learning can have new task sample complexity of $\mathcal{O}(1)$, demonstrating a separation from convex meta-learning.

Meta-Learning Representation Learning

Provable Representation Learning for Imitation Learning via Bi-level Optimization

no code implementations ICML 2020 Sanjeev Arora, Simon S. Du, Sham Kakade, Yuping Luo, Nikunj Saunshi

We formulate representation learning as a bi-level optimization problem where the "outer" optimization tries to learn the joint representation and the "inner" optimization encodes the imitation learning setup and tries to learn task-specific parameters.

Imitation Learning Representation Learning

A Theoretical Analysis of Contrastive Unsupervised Representation Learning

no code implementations25 Feb 2019 Sanjeev Arora, Hrishikesh Khandeparkar, Mikhail Khodak, Orestis Plevrakis, Nikunj Saunshi

This framework allows us to show provable guarantees on the performance of the learned representations on the average classification task that is comprised of a subset of the same set of latent classes.

Contrastive Learning General Classification +1

A La Carte Embedding: Cheap but Effective Induction of Semantic Feature Vectors

1 code implementation ACL 2018 Mikhail Khodak, Nikunj Saunshi, YIngyu Liang, Tengyu Ma, Brandon Stewart, Sanjeev Arora

Motivations like domain adaptation, transfer learning, and feature learning have fueled interest in inducing embeddings for rare or unseen words, n-grams, synsets, and other textual features.

Document Classification Domain Adaptation +2

A Compressed Sensing View of Unsupervised Text Embeddings, Bag-of-n-Grams, and LSTMs

2 code implementations ICLR 2018 Sanjeev Arora, Mikhail Khodak, Nikunj Saunshi, Kiran Vodrahalli

We also show a surprising new property of embeddings such as GloVe and word2vec: they form a good sensing matrix for text that is more efficient than random matrices, the standard sparse recovery tool, which may explain why they lead to better representations in practice.

A Large Self-Annotated Corpus for Sarcasm

6 code implementations LREC 2018 Mikhail Khodak, Nikunj Saunshi, Kiran Vodrahalli

We introduce the Self-Annotated Reddit Corpus (SARC), a large corpus for sarcasm research and for training and evaluating systems for sarcasm detection.

Sarcasm Detection

Cannot find the paper you are looking for? You can Submit a new open access paper.