Search Results for author: Sanjeev Arora

Found 73 papers, 25 papers with code

Instance-hiding Schemes for Private Distributed Learning

no code implementations ICML 2020 Yangsibo Huang, Zhao Song, Sanjeev Arora, Kai Li

The new ideas in the current paper are: (a) new variants of mixup with negative as well as positive coefficients, and extend the sample-wise mixup to be pixel-wise.

Federated Learning

Unlearning via Sparse Representations

no code implementations26 Nov 2023 Vedant Shah, Frederik Träuble, Ashish Malik, Hugo Larochelle, Michael Mozer, Sanjeev Arora, Yoshua Bengio, Anirudh Goyal

Machine \emph{unlearning}, which involves erasing knowledge about a \emph{forget set} from a trained model, can prove to be costly and infeasible by existing techniques.

Knowledge Distillation

Skill-Mix: a Flexible and Expandable Family of Evaluations for AI models

no code implementations26 Oct 2023 Dingli Yu, Simran Kaur, Arushi Gupta, Jonah Brown-Cohen, Anirudh Goyal, Sanjeev Arora

The paper develops a methodology for (a) designing and administering such an evaluation, and (b) automatic grading (plus spot-checking by humans) of the results using GPT-4 as well as the open LLaMA-2 70B model.

A Quadratic Synchronization Rule for Distributed Deep Learning

1 code implementation22 Oct 2023 Xinran Gu, Kaifeng Lyu, Sanjeev Arora, Jingzhao Zhang, Longbo Huang

In distributed deep learning with data parallelism, synchronizing gradients at each training step can cause a huge communication overhead, especially when many nodes work together to train large models.

A Theory for Emergence of Complex Skills in Language Models

no code implementations29 Jul 2023 Sanjeev Arora, Anirudh Goyal

Contributions include: (a) A statistical framework that relates cross-entropy loss of LLMs to competence on the basic skills that underlie language tasks.

Inductive Bias

Trainable Transformer in Transformer

1 code implementation3 Jul 2023 Abhishek Panigrahi, Sadhika Malladi, Mengzhou Xia, Sanjeev Arora

In this work, we propose an efficient construction, Transformer in Transformer (in short, TinT), that allows a transformer to simulate and fine-tune complex models internally during inference (e. g., pre-trained language models).

Language Modelling

Do Transformers Parse while Predicting the Masked Word?

no code implementations14 Mar 2023 Haoyu Zhao, Abhishek Panigrahi, Rong Ge, Sanjeev Arora

We also show that the Inside-Outside algorithm is optimal for masked language modeling loss on the PCFG-generated data.

Constituency Parsing Language Modelling +1

Why (and When) does Local SGD Generalize Better than SGD?

1 code implementation2 Mar 2023 Xinran Gu, Kaifeng Lyu, Longbo Huang, Sanjeev Arora

Local SGD is a communication-efficient variant of SGD for large-scale training, where multiple GPUs perform SGD independently and average the model parameters periodically.

Task-Specific Skill Localization in Fine-tuned Language Models

1 code implementation13 Feb 2023 Abhishek Panigrahi, Nikunj Saunshi, Haoyu Zhao, Sanjeev Arora

Given the downstream task and a model fine-tuned on that task, a simple optimization is used to identify a very small subset of parameters ($\sim0. 01$% of model parameters) responsible for ($>95$%) of the model's performance, in the sense that grafting the fine-tuned values for just this tiny subset onto the pre-trained model gives performance almost as well as the fine-tuned model.

Continual Learning

New Definitions and Evaluations for Saliency Methods: Staying Intrinsic, Complete and Sound

1 code implementation5 Nov 2022 Arushi Gupta, Nikunj Saunshi, Dingli Yu, Kaifeng Lyu, Sanjeev Arora

Saliency methods compute heat maps that highlight portions of an input that were most {\em important} for the label assigned to it by a deep net.

A Kernel-Based View of Language Model Fine-Tuning

1 code implementation11 Oct 2022 Sadhika Malladi, Alexander Wettig, Dingli Yu, Danqi Chen, Sanjeev Arora

It has become standard to solve NLP tasks by fine-tuning pre-trained language models (LMs), especially in low-data settings.

Language Modelling

Understanding Influence Functions and Datamodels via Harmonic Analysis

no code implementations3 Oct 2022 Nikunj Saunshi, Arushi Gupta, Mark Braverman, Sanjeev Arora

Influence functions estimate effect of individual data points on predictions of the model on test data and were adapted to deep learning in Koh and Liang [2017].

Data Poisoning Test

Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent

no code implementations8 Jul 2022 Zhiyuan Li, Tianhao Wang, JasonD. Lee, Sanjeev Arora

Conversely, continuous mirror descent with any Legendre function can be viewed as gradient flow with a related commuting parametrization.

Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction

no code implementations14 Jun 2022 Kaifeng Lyu, Zhiyuan Li, Sanjeev Arora

Normalization layers (e. g., Batch Normalization, Layer Normalization) were introduced to help with optimization difficulties in very deep nets, but they clearly also help generalization, even in not-so-deep nets.

On the SDEs and Scaling Rules for Adaptive Gradient Algorithms

1 code implementation20 May 2022 Sadhika Malladi, Kaifeng Lyu, Abhishek Panigrahi, Sanjeev Arora

Approximating Stochastic Gradient Descent (SGD) as a Stochastic Differential Equation (SDE) has allowed researchers to enjoy the benefits of studying a continuous optimization trajectory while carefully preserving the stochasticity of SGD.

Understanding Gradient Descent on Edge of Stability in Deep Learning

no code implementations19 May 2022 Sanjeev Arora, Zhiyuan Li, Abhishek Panigrahi

The current paper mathematically analyzes a new mechanism of implicit regularization in the EoS phase, whereby GD updates due to non-smooth loss landscape turn out to evolve along some deterministic flow on the manifold of minimum loss.

Adaptive Gradient Methods with Local Guarantees

no code implementations2 Mar 2022 Zhou Lu, Wenhan Xia, Sanjeev Arora, Elad Hazan

Adaptive gradient methods are the method of choice for optimization in machine learning and used to train the largest deep models.


Understanding Contrastive Learning Requires Incorporating Inductive Biases

no code implementations28 Feb 2022 Nikunj Saunshi, Jordan Ash, Surbhi Goel, Dipendra Misra, Cyril Zhang, Sanjeev Arora, Sham Kakade, Akshay Krishnamurthy

Contrastive learning is a popular form of self-supervised learning that encourages augmentations (views) of the same input to have more similar representations compared to augmentations of different inputs.

Contrastive Learning Self-Supervised Learning

Evaluating Gradient Inversion Attacks and Defenses in Federated Learning

1 code implementation NeurIPS 2021 Yangsibo Huang, Samyak Gupta, Zhao Song, Kai Li, Sanjeev Arora

Gradient inversion attack (or input recovery from gradient) is an emerging threat to the security and privacy preservation of Federated learning, whereby malicious eavesdroppers or participants in the protocol can recover (partially) the clients' private data.

Federated Learning

On Predicting Generalization using GANs

no code implementations ICLR 2022 Yi Zhang, Arushi Gupta, Nikunj Saunshi, Sanjeev Arora

Research on generalization bounds for deep networks seeks to give ways to predict test error using just the training dataset and the network parameters.

Generalization Bounds Test

Gradient Descent on Two-layer Nets: Margin Maximization and Simplicity Bias

no code implementations NeurIPS 2021 Kaifeng Lyu, Zhiyuan Li, Runzhe Wang, Sanjeev Arora

The current paper is able to establish this global optimality for two-layer Leaky ReLU nets trained with gradient flow on linearly separable and symmetric data, regardless of the width.

Vocal Bursts Valence Prediction

What Happens after SGD Reaches Zero Loss? --A Mathematical Framework

no code implementations ICLR 2022 Zhiyuan Li, Tianhao Wang, Sanjeev Arora

Understanding the implicit bias of Stochastic Gradient Descent (SGD) is one of the key challenges in deep learning, especially for overparametrized models, where the local minimizers of the loss function $L$ can form a manifold.


New Definitions and Evaluations for Saliency Methods: Staying Intrinsic and Sound

no code implementations29 Sep 2021 Arushi Gupta, Nikunj Saunshi, Dingli Yu, Kaifeng Lyu, Sanjeev Arora

Saliency methods seek to provide human-interpretable explanations for the output of machine learning model on a given input.

On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs)

1 code implementation NeurIPS 2021 Zhiyuan Li, Sadhika Malladi, Sanjeev Arora

It is generally recognized that finite learning rate (LR), in contrast to infinitesimal LR, is important for good generalization in real-life deep nets.

Why Are Convolutional Nets More Sample-Efficient than Fully-Connected Nets?

no code implementations ICLR 2021 Zhiyuan Li, Yi Zhang, Sanjeev Arora

However, this has not been made mathematically rigorous, and the hurdle is that the fully connected net can always simulate the convolutional net (for a fixed task).

Image Classification Inductive Bias

A Mathematical Exploration of Why Language Models Help Solve Downstream Tasks

no code implementations ICLR 2021 Nikunj Saunshi, Sadhika Malladi, Sanjeev Arora

This paper initiates a mathematical study of this phenomenon for the downstream task of text classification by considering the following questions: (1) What is the intuitive connection between the pretraining task of next word prediction and text classification?

General Classification Language Modelling +3

Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate

no code implementations NeurIPS 2020 Zhiyuan Li, Kaifeng Lyu, Sanjeev Arora

Recent works (e. g., (Li and Arora, 2020)) suggest that the use of popular normalization schemes (including Batch Normalization) in today's deep learning can move it far from a traditional optimization viewpoint, e. g., use of exponentially increasing learning rates.

InstaHide: Instance-hiding Schemes for Private Distributed Learning

3 code implementations6 Oct 2020 Yangsibo Huang, Zhao Song, Kai Li, Sanjeev Arora

This paper introduces InstaHide, a simple encryption of training images, which can be plugged into existing distributed deep learning pipelines.

Privacy-preserving Learning via Deep Net Pruning

no code implementations4 Mar 2020 Yangsibo Huang, Yushan Su, Sachin Ravi, Zhao Song, Sanjeev Arora, Kai Li

This paper attempts to answer the question whether neural network pruning can be used as a tool to achieve differential privacy without losing much data utility.

Network Pruning Privacy Preserving

A Sample Complexity Separation between Non-Convex and Convex Meta-Learning

no code implementations ICML 2020 Nikunj Saunshi, Yi Zhang, Mikhail Khodak, Sanjeev Arora

In contrast, for the non-convex formulation of a two layer linear network on the same instance, we show that both Reptile and multi-task representation learning can have new task sample complexity of $\mathcal{O}(1)$, demonstrating a separation from convex meta-learning.

Meta-Learning Representation Learning

Provable Representation Learning for Imitation Learning via Bi-level Optimization

no code implementations ICML 2020 Sanjeev Arora, Simon S. Du, Sham Kakade, Yuping Luo, Nikunj Saunshi

We formulate representation learning as a bi-level optimization problem where the "outer" optimization tries to learn the joint representation and the "inner" optimization encodes the imitation learning setup and tries to learn task-specific parameters.

Imitation Learning Representation Learning

Over-parameterized Adversarial Training: An Analysis Overcoming the Curse of Dimensionality

no code implementations NeurIPS 2020 Yi Zhang, Orestis Plevrakis, Simon S. Du, Xingguo Li, Zhao Song, Sanjeev Arora

Our work proves convergence to low robust training loss for \emph{polynomial} width instead of exponential, under natural assumptions and with the ReLU activation.

Enhanced Convolutional Neural Tangent Kernels

no code implementations3 Nov 2019 Zhiyuan Li, Ruosong Wang, Dingli Yu, Simon S. Du, Wei Hu, Ruslan Salakhutdinov, Sanjeev Arora

An exact algorithm to compute CNTK (Arora et al., 2019) yielded the finding that classification accuracy of CNTK on CIFAR-10 is within 6-7% of that of that of the corresponding CNN architecture (best figure being around 78%) which is interesting performance for a fixed kernel.

Data Augmentation regression

An Exponential Learning Rate Schedule for Deep Learning

no code implementations ICLR 2020 Zhiyuan Li, Sanjeev Arora

This paper suggests that the phenomenon may be due to Batch Normalization or BN, which is ubiquitous and provides benefits in optimization and generalization across all standard architectures.

Harnessing the Power of Infinitely Wide Deep Nets on Small-data Tasks

4 code implementations ICLR 2020 Sanjeev Arora, Simon S. Du, Zhiyuan Li, Ruslan Salakhutdinov, Ruosong Wang, Dingli Yu

On VOC07 testbed for few-shot image classification tasks on ImageNet with transfer learning (Goyal et al., 2019), replacing the linear SVM currently used with a Convolutional NTK SVM consistently improves performance.

Few-Shot Image Classification General Classification +3

A Simple Technique to Enable Saliency Methods to Pass the Sanity Checks

no code implementations25 Sep 2019 Arushi Gupta, Sanjeev Arora

This involves computing saliency maps for all possible labels in the classification task, and using a simple competition among them to identify and remove less relevant pixels from the map.

Implicit Regularization in Deep Matrix Factorization

1 code implementation NeurIPS 2019 Sanjeev Arora, Nadav Cohen, Wei Hu, Yuping Luo

Efforts to understand the generalization mystery in deep learning have led to the belief that gradient-based optimization induces a form of implicit regularization, a bias towards models of low "complexity."

Matrix Completion

A Simple Saliency Method That Passes the Sanity Checks

no code implementations27 May 2019 Arushi Gupta, Sanjeev Arora

There is great interest in "saliency methods" (also called "attribution methods"), which give "explanations" for a deep net's decision, by assigning a "score" to each feature/pixel in the input.


On Exact Computation with an Infinitely Wide Neural Net

2 code implementations NeurIPS 2019 Sanjeev Arora, Simon S. Du, Wei Hu, Zhiyuan Li, Ruslan Salakhutdinov, Ruosong Wang

An attraction of such ideas is that a pure kernel-based method is used to capture the power of a fully-trained deep net of infinite width.

Gaussian Processes

A Theoretical Analysis of Contrastive Unsupervised Representation Learning

no code implementations25 Feb 2019 Sanjeev Arora, Hrishikesh Khandeparkar, Mikhail Khodak, Orestis Plevrakis, Nikunj Saunshi

This framework allows us to show provable guarantees on the performance of the learned representations on the average classification task that is comprised of a subset of the same set of latent classes.

Contrastive Learning General Classification +1

Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks

no code implementations24 Jan 2019 Sanjeev Arora, Simon S. Du, Wei Hu, Zhiyuan Li, Ruosong Wang

This paper analyzes training and generalization for a simple 2-layer ReLU net with random initialization, and provides the following improvements over recent works: (i) Using a tighter characterization of training speed than recent papers, an explanation for why training a neural net with random labels leads to slower training, as originally observed in [Zhang et al. ICLR'17].

Theoretical Analysis of Auto Rate-Tuning by Batch Normalization

no code implementations ICLR 2019 Sanjeev Arora, Zhiyuan Li, Kaifeng Lyu

Batch Normalization (BN) has become a cornerstone of deep learning across diverse architectures, appearing to help optimization as well as generalization.

A Convergence Analysis of Gradient Descent for Deep Linear Neural Networks

no code implementations ICLR 2019 Sanjeev Arora, Nadav Cohen, Noah Golowich, Wei Hu

We analyze speed of convergence to global optimum for gradient descent training a deep linear neural network (parameterized as $x \mapsto W_N W_{N-1} \cdots W_1 x$) by minimizing the $\ell_2$ loss over whitened data.

A La Carte Embedding: Cheap but Effective Induction of Semantic Feature Vectors

1 code implementation ACL 2018 Mikhail Khodak, Nikunj Saunshi, YIngyu Liang, Tengyu Ma, Brandon Stewart, Sanjeev Arora

Motivations like domain adaptation, transfer learning, and feature learning have fueled interest in inducing embeddings for rare or unseen words, n-grams, synsets, and other textual features.

Document Classification Domain Adaptation +2

An Analysis of the t-SNE Algorithm for Data Visualization

no code implementations5 Mar 2018 Sanjeev Arora, Wei Hu, Pravesh K. Kothari

A first line of attack in exploratory data analysis is data visualization, i. e., generating a 2-dimensional representation of data that makes clusters of similar points visually identifiable.

Clustering Data Visualization +1

On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization

1 code implementation ICML 2018 Sanjeev Arora, Nadav Cohen, Elad Hazan

The effect of depth on optimization is decoupled from expressiveness by focusing on settings where additional layers amount to overparameterization - linear neural networks, a well-studied model.


Stronger generalization bounds for deep nets via a compression approach

no code implementations ICML 2018 Sanjeev Arora, Rong Ge, Behnam Neyshabur, Yi Zhang

Analysis of correctness of our compression relies upon some newly identified \textquotedblleft noise stability\textquotedblright properties of trained deep nets, which are also experimentally verified.

Generalization Bounds

Towards Provable Control for Unknown Linear Dynamical Systems

no code implementations ICLR 2018 Sanjeev Arora, Elad Hazan, Holden Lee, Karan Singh, Cyril Zhang, Yi Zhang

We study the control of symmetric linear dynamical systems with unknown dynamics and a hidden state.

A Compressed Sensing View of Unsupervised Text Embeddings, Bag-of-n-Grams, and LSTMs

2 code implementations ICLR 2018 Sanjeev Arora, Mikhail Khodak, Nikunj Saunshi, Kiran Vodrahalli

We also show a surprising new property of embeddings such as GloVe and word2vec: they form a good sensing matrix for text that is more efficient than random matrices, the standard sparse recovery tool, which may explain why they lead to better representations in practice.

Do GANs learn the distribution? Some Theory and Empirics

no code implementations ICLR 2018 Sanjeev Arora, Andrej Risteski, Yi Zhang

Using this evidence is presented that well-known GANs approaches do learn distributions of fairly low support.


Theoretical limitations of Encoder-Decoder GAN architectures

no code implementations7 Nov 2017 Sanjeev Arora, Andrej Risteski, Yi Zhang

Encoder-decoder GANs architectures (e. g., BiGAN and ALI) seek to add an inference mechanism to the GANs setup, consisting of a small encoder deep net that maps data-points to their succinct encodings.

Do GANs actually learn the distribution? An empirical study

no code implementations26 Jun 2017 Sanjeev Arora, Yi Zhang

Do GANS (Generative Adversarial Nets) actually learn the target distribution?


Provable benefits of representation learning

no code implementations14 Jun 2017 Sanjeev Arora, Andrej Risteski

There is general consensus that learning representations is useful for a variety of reasons, e. g. efficient use of labeled data (semi-supervised learning), transfer learning and understanding hidden structure of data.

Clustering Representation Learning +1

Extending and Improving Wordnet via Unsupervised Word Embeddings

no code implementations29 Apr 2017 Mikhail Khodak, Andrej Risteski, Christiane Fellbaum, Sanjeev Arora

Our methods require very few linguistic resources, thus being applicable for Wordnet construction in low-resources languages, and may further be applied to sense clustering and other Wordnet improvements.

Clustering Test +1

Automated WordNet Construction Using Word Embeddings

1 code implementation WS 2017 Mikhail Khodak, Andrej Risteski, Christiane Fellbaum, Sanjeev Arora

To evaluate our method we construct two 600-word testsets for word-to-synset matching in French and Russian using native speakers and evaluate the performance of our method along with several other recent approaches.

Information Retrieval Machine Translation +3

Generalization and Equilibrium in Generative Adversarial Nets (GANs)

1 code implementation ICML 2017 Sanjeev Arora, Rong Ge, YIngyu Liang, Tengyu Ma, Yi Zhang

We show that training of generative adversarial network (GAN) may not have good generalization properties; e. g., training may appear successful but the trained distribution may be far from target distribution in standard metrics.

On the ability of neural nets to express distributions

no code implementations22 Feb 2017 Holden Lee, Rong Ge, Tengyu Ma, Andrej Risteski, Sanjeev Arora

We take a first cut at explaining the expressivity of multilayer nets by giving a sufficient criterion for a function to be approximable by a neural network with $n$ hidden layers.

Provable learning of Noisy-or Networks

no code implementations28 Dec 2016 Sanjeev Arora, Rong Ge, Tengyu Ma, Andrej Risteski

Many machine learning applications use latent variable models to explain structure in data, whereby visible variables (= coordinates of the given datapoint) are explained as a probabilistic function of some hidden variables.

Tensor Decomposition Topic Models

Provable Algorithms for Inference in Topic Models

no code implementations27 May 2016 Sanjeev Arora, Rong Ge, Frederic Koehler, Tengyu Ma, Ankur Moitra

But designing provable algorithms for inference has proven to be more challenging.

Topic Models

Linear Algebraic Structure of Word Senses, with Applications to Polysemy

1 code implementation TACL 2018 Sanjeev Arora, Yuanzhi Li, YIngyu Liang, Tengyu Ma, Andrej Risteski

A novel aspect of our technique is that each extracted word sense is accompanied by one of about 2000 "discourse atoms" that gives a succinct description of which other words co-occur with that word sense.

Information Retrieval Retrieval +1

Why are deep nets reversible: A simple theory, with implications for training

no code implementations18 Nov 2015 Sanjeev Arora, YIngyu Liang, Tengyu Ma

Under this assumption ---which is experimentally tested on real-life nets like AlexNet--- it is formally proved that feed forward net is a correct inference method for recovering the hidden layer.


Simple, Efficient, and Neural Algorithms for Sparse Coding

no code implementations2 Mar 2015 Sanjeev Arora, Rong Ge, Tengyu Ma, Ankur Moitra

Its standard formulation is as a non-convex optimization problem which is solved in practice by heuristics based on alternating minimization.

A Latent Variable Model Approach to PMI-based Word Embeddings

4 code implementations TACL 2016 Sanjeev Arora, Yuanzhi Li, YIngyu Liang, Tengyu Ma, Andrej Risteski

Semantic word embeddings represent the meaning of a word via a vector, and are created by diverse methods.

Word Embeddings

More Algorithms for Provable Dictionary Learning

no code implementations3 Jan 2014 Sanjeev Arora, Aditya Bhaskara, Rong Ge, Tengyu Ma

In dictionary learning, also known as sparse coding, the algorithm is given samples of the form $y = Ax$ where $x\in \mathbb{R}^m$ is an unknown random sparse vector and $A$ is an unknown dictionary matrix in $\mathbb{R}^{n\times m}$ (usually $m > n$, which is the overcomplete case).

Dictionary Learning

Provable Bounds for Learning Some Deep Representations

no code implementations23 Oct 2013 Sanjeev Arora, Aditya Bhaskara, Rong Ge, Tengyu Ma

The analysis of the algorithm reveals interesting structure of neural networks with random edge weights.

New Algorithms for Learning Incoherent and Overcomplete Dictionaries

no code implementations28 Aug 2013 Sanjeev Arora, Rong Ge, Ankur Moitra

In sparse recovery we are given a matrix $A$ (the dictionary) and a vector of the form $A X$ where $X$ is sparse, and the goal is to recover $X$.

Dictionary Learning Edge Detection +1

A Practical Algorithm for Topic Modeling with Provable Guarantees

2 code implementations19 Dec 2012 Sanjeev Arora, Rong Ge, Yoni Halpern, David Mimno, Ankur Moitra, David Sontag, Yichen Wu, Michael Zhu

Topic models provide a useful method for dimensionality reduction and exploratory data analysis in large text corpora.

Dimensionality Reduction Topic Models

Learning Topic Models - Going beyond SVD

2 code implementations9 Apr 2012 Sanjeev Arora, Rong Ge, Ankur Moitra

Topic Modeling is an approach used for automatic comprehension and classification of data in a variety of settings, and perhaps the canonical application is in uncovering thematic structure in a corpus of documents.

Topic Models

Cannot find the paper you are looking for? You can Submit a new open access paper.