You need to log in to edit.

You can create a new account if you don't have one.

Or, discuss a change on Slack.

You can create a new account if you don't have one.

Or, discuss a change on Slack.

no code implementations • ICML 2020 • Yangsibo Huang, Zhao Song, Sanjeev Arora, Kai Li

The new ideas in the current paper are: (a) new variants of mixup with negative as well as positive coefficients, and extend the sample-wise mixup to be pixel-wise.

no code implementations • 26 Nov 2023 • Vedant Shah, Frederik Träuble, Ashish Malik, Hugo Larochelle, Michael Mozer, Sanjeev Arora, Yoshua Bengio, Anirudh Goyal

Machine \emph{unlearning}, which involves erasing knowledge about a \emph{forget set} from a trained model, can prove to be costly and infeasible by existing techniques.

no code implementations • 26 Oct 2023 • Dingli Yu, Simran Kaur, Arushi Gupta, Jonah Brown-Cohen, Anirudh Goyal, Sanjeev Arora

The paper develops a methodology for (a) designing and administering such an evaluation, and (b) automatic grading (plus spot-checking by humans) of the results using GPT-4 as well as the open LLaMA-2 70B model.

1 code implementation • 22 Oct 2023 • Xinran Gu, Kaifeng Lyu, Sanjeev Arora, Jingzhao Zhang, Longbo Huang

In distributed deep learning with data parallelism, synchronizing gradients at each training step can cause a huge communication overhead, especially when many nodes work together to train large models.

no code implementations • 29 Jul 2023 • Sanjeev Arora, Anirudh Goyal

Contributions include: (a) A statistical framework that relates cross-entropy loss of LLMs to competence on the basic skills that underlie language tasks.

1 code implementation • 3 Jul 2023 • Abhishek Panigrahi, Sadhika Malladi, Mengzhou Xia, Sanjeev Arora

In this work, we propose an efficient construction, Transformer in Transformer (in short, TinT), that allows a transformer to simulate and fine-tune complex models internally during inference (e. g., pre-trained language models).

no code implementations • 14 Mar 2023 • Haoyu Zhao, Abhishek Panigrahi, Rong Ge, Sanjeev Arora

We also show that the Inside-Outside algorithm is optimal for masked language modeling loss on the PCFG-generated data.

1 code implementation • 2 Mar 2023 • Xinran Gu, Kaifeng Lyu, Longbo Huang, Sanjeev Arora

Local SGD is a communication-efficient variant of SGD for large-scale training, where multiple GPUs perform SGD independently and average the model parameters periodically.

1 code implementation • 13 Feb 2023 • Abhishek Panigrahi, Nikunj Saunshi, Haoyu Zhao, Sanjeev Arora

Given the downstream task and a model fine-tuned on that task, a simple optimization is used to identify a very small subset of parameters ($\sim0. 01$% of model parameters) responsible for ($>95$%) of the model's performance, in the sense that grafting the fine-tuned values for just this tiny subset onto the pre-trained model gives performance almost as well as the fine-tuned model.

1 code implementation • 5 Nov 2022 • Arushi Gupta, Nikunj Saunshi, Dingli Yu, Kaifeng Lyu, Sanjeev Arora

Saliency methods compute heat maps that highlight portions of an input that were most {\em important} for the label assigned to it by a deep net.

1 code implementation • 11 Oct 2022 • Sadhika Malladi, Alexander Wettig, Dingli Yu, Danqi Chen, Sanjeev Arora

It has become standard to solve NLP tasks by fine-tuning pre-trained language models (LMs), especially in low-data settings.

no code implementations • 3 Oct 2022 • Nikunj Saunshi, Arushi Gupta, Mark Braverman, Sanjeev Arora

Influence functions estimate effect of individual data points on predictions of the model on test data and were adapted to deep learning in Koh and Liang [2017].

no code implementations • 8 Jul 2022 • Zhiyuan Li, Tianhao Wang, JasonD. Lee, Sanjeev Arora

Conversely, continuous mirror descent with any Legendre function can be viewed as gradient flow with a related commuting parametrization.

no code implementations • 14 Jun 2022 • Kaifeng Lyu, Zhiyuan Li, Sanjeev Arora

Normalization layers (e. g., Batch Normalization, Layer Normalization) were introduced to help with optimization difficulties in very deep nets, but they clearly also help generalization, even in not-so-deep nets.

1 code implementation • 20 May 2022 • Sadhika Malladi, Kaifeng Lyu, Abhishek Panigrahi, Sanjeev Arora

Approximating Stochastic Gradient Descent (SGD) as a Stochastic Differential Equation (SDE) has allowed researchers to enjoy the benefits of studying a continuous optimization trajectory while carefully preserving the stochasticity of SGD.

no code implementations • 19 May 2022 • Sanjeev Arora, Zhiyuan Li, Abhishek Panigrahi

The current paper mathematically analyzes a new mechanism of implicit regularization in the EoS phase, whereby GD updates due to non-smooth loss landscape turn out to evolve along some deterministic flow on the manifold of minimum loss.

no code implementations • 2 Mar 2022 • Zhou Lu, Wenhan Xia, Sanjeev Arora, Elad Hazan

Adaptive gradient methods are the method of choice for optimization in machine learning and used to train the largest deep models.

no code implementations • 28 Feb 2022 • Nikunj Saunshi, Jordan Ash, Surbhi Goel, Dipendra Misra, Cyril Zhang, Sanjeev Arora, Sham Kakade, Akshay Krishnamurthy

Contrastive learning is a popular form of self-supervised learning that encourages augmentations (views) of the same input to have more similar representations compared to augmentations of different inputs.

1 code implementation • NeurIPS 2021 • Yangsibo Huang, Samyak Gupta, Zhao Song, Kai Li, Sanjeev Arora

Gradient inversion attack (or input recovery from gradient) is an emerging threat to the security and privacy preservation of Federated learning, whereby malicious eavesdroppers or participants in the protocol can recover (partially) the clients' private data.

no code implementations • ICLR 2022 • Yi Zhang, Arushi Gupta, Nikunj Saunshi, Sanjeev Arora

Research on generalization bounds for deep networks seeks to give ways to predict test error using just the training dataset and the network parameters.

no code implementations • NeurIPS 2021 • Kaifeng Lyu, Zhiyuan Li, Runzhe Wang, Sanjeev Arora

The current paper is able to establish this global optimality for two-layer Leaky ReLU nets trained with gradient flow on linearly separable and symmetric data, regardless of the width.

no code implementations • ICLR 2022 • Zhiyuan Li, Tianhao Wang, Sanjeev Arora

Understanding the implicit bias of Stochastic Gradient Descent (SGD) is one of the key challenges in deep learning, especially for overparametrized models, where the local minimizers of the loss function $L$ can form a manifold.

no code implementations • 29 Sep 2021 • Arushi Gupta, Nikunj Saunshi, Dingli Yu, Kaifeng Lyu, Sanjeev Arora

Saliency methods seek to provide human-interpretable explanations for the output of machine learning model on a given input.

no code implementations • 25 Feb 2021 • Sanjeev Arora, Yi Zhang

Traditional statistics forbids use of test data (a. k. a.

1 code implementation • NeurIPS 2021 • Zhiyuan Li, Sadhika Malladi, Sanjeev Arora

It is generally recognized that finite learning rate (LR), in contrast to infinitesimal LR, is important for good generalization in real-life deep nets.

no code implementations • ICLR 2021 • Zhiyuan Li, Yi Zhang, Sanjeev Arora

However, this has not been made mathematically rigorous, and the hurdle is that the fully connected net can always simulate the convolutional net (for a fixed task).

1 code implementation • Findings of the Association for Computational Linguistics 2020 • Yangsibo Huang, Zhao Song, Danqi Chen, Kai Li, Sanjeev Arora

In addition, TextHide fits well with the popular framework of fine-tuning pre-trained language models (e. g., BERT) for any sentence or sentence-pair task.

no code implementations • ICLR 2021 • Nikunj Saunshi, Sadhika Malladi, Sanjeev Arora

This paper initiates a mathematical study of this phenomenon for the downstream task of text classification by considering the following questions: (1) What is the intuitive connection between the pretraining task of next word prediction and text classification?

no code implementations • NeurIPS 2020 • Zhiyuan Li, Kaifeng Lyu, Sanjeev Arora

Recent works (e. g., (Li and Arora, 2020)) suggest that the use of popular normalization schemes (including Batch Normalization) in today's deep learning can move it far from a traditional optimization viewpoint, e. g., use of exponentially increasing learning rates.

3 code implementations • 6 Oct 2020 • Yangsibo Huang, Zhao Song, Kai Li, Sanjeev Arora

This paper introduces InstaHide, a simple encryption of training images, which can be plugged into existing distributed deep learning pipelines.

no code implementations • 4 Mar 2020 • Yangsibo Huang, Yushan Su, Sachin Ravi, Zhao Song, Sanjeev Arora, Kai Li

This paper attempts to answer the question whether neural network pruning can be used as a tool to achieve differential privacy without losing much data utility.

no code implementations • ICML 2020 • Nikunj Saunshi, Yi Zhang, Mikhail Khodak, Sanjeev Arora

In contrast, for the non-convex formulation of a two layer linear network on the same instance, we show that both Reptile and multi-task representation learning can have new task sample complexity of $\mathcal{O}(1)$, demonstrating a separation from convex meta-learning.

no code implementations • ICML 2020 • Sanjeev Arora, Simon S. Du, Sham Kakade, Yuping Luo, Nikunj Saunshi

We formulate representation learning as a bi-level optimization problem where the "outer" optimization tries to learn the joint representation and the "inner" optimization encodes the imitation learning setup and tries to learn task-specific parameters.

no code implementations • NeurIPS 2020 • Yi Zhang, Orestis Plevrakis, Simon S. Du, Xingguo Li, Zhao Song, Sanjeev Arora

Our work proves convergence to low robust training loss for \emph{polynomial} width instead of exponential, under natural assumptions and with the ReLU activation.

no code implementations • 3 Nov 2019 • Zhiyuan Li, Ruosong Wang, Dingli Yu, Simon S. Du, Wei Hu, Ruslan Salakhutdinov, Sanjeev Arora

An exact algorithm to compute CNTK (Arora et al., 2019) yielded the finding that classification accuracy of CNTK on CIFAR-10 is within 6-7% of that of that of the corresponding CNN architecture (best figure being around 78%) which is interesting performance for a fixed kernel.

no code implementations • ICLR 2020 • Zhiyuan Li, Sanjeev Arora

This paper suggests that the phenomenon may be due to Batch Normalization or BN, which is ubiquitous and provides benefits in optimization and generalization across all standard architectures.

4 code implementations • ICLR 2020 • Sanjeev Arora, Simon S. Du, Zhiyuan Li, Ruslan Salakhutdinov, Ruosong Wang, Dingli Yu

On VOC07 testbed for few-shot image classification tasks on ImageNet with transfer learning (Goyal et al., 2019), replacing the linear SVM currently used with a Convolutional NTK SVM consistently improves performance.

no code implementations • 25 Sep 2019 • Arushi Gupta, Sanjeev Arora

This involves computing saliency maps for all possible labels in the classification task, and using a simple competition among them to identify and remove less relevant pixels from the map.

1 code implementation • NeurIPS 2019 • Rohith Kuditipudi, Xiang Wang, Holden Lee, Yi Zhang, Zhiyuan Li, Wei Hu, Sanjeev Arora, Rong Ge

Mode connectivity is a surprising phenomenon in the loss landscape of deep nets.

1 code implementation • NeurIPS 2019 • Sanjeev Arora, Nadav Cohen, Wei Hu, Yuping Luo

Efforts to understand the generalization mystery in deep learning have led to the belief that gradient-based optimization induces a form of implicit regularization, a bias towards models of low "complexity."

no code implementations • 27 May 2019 • Arushi Gupta, Sanjeev Arora

There is great interest in "saliency methods" (also called "attribution methods"), which give "explanations" for a deep net's decision, by assigning a "score" to each feature/pixel in the input.

2 code implementations • NeurIPS 2019 • Sanjeev Arora, Simon S. Du, Wei Hu, Zhiyuan Li, Ruslan Salakhutdinov, Ruosong Wang

An attraction of such ideas is that a pure kernel-based method is used to capture the power of a fully-trained deep net of infinite width.

no code implementations • 25 Feb 2019 • Sanjeev Arora, Hrishikesh Khandeparkar, Mikhail Khodak, Orestis Plevrakis, Nikunj Saunshi

This framework allows us to show provable guarantees on the performance of the learned representations on the average classification task that is comprised of a subset of the same set of latent classes.

no code implementations • 24 Jan 2019 • Sanjeev Arora, Simon S. Du, Wei Hu, Zhiyuan Li, Ruosong Wang

This paper analyzes training and generalization for a simple 2-layer ReLU net with random initialization, and provides the following improvements over recent works: (i) Using a tighter characterization of training speed than recent papers, an explanation for why training a neural net with random labels leads to slower training, as originally observed in [Zhang et al. ICLR'17].

no code implementations • ICLR 2019 • Sanjeev Arora, Zhiyuan Li, Kaifeng Lyu

Batch Normalization (BN) has become a cornerstone of deep learning across diverse architectures, appearing to help optimization as well as generalization.

no code implementations • ICLR 2019 • Sanjeev Arora, Nadav Cohen, Noah Golowich, Wei Hu

We analyze speed of convergence to global optimum for gradient descent training a deep linear neural network (parameterized as $x \mapsto W_N W_{N-1} \cdots W_1 x$) by minimizing the $\ell_2$ loss over whitened data.

1 code implementation • ACL 2018 • Mikhail Khodak, Nikunj Saunshi, YIngyu Liang, Tengyu Ma, Brandon Stewart, Sanjeev Arora

Motivations like domain adaptation, transfer learning, and feature learning have fueled interest in inducing embeddings for rare or unseen words, n-grams, synsets, and other textual features.

Ranked #3 on Sentiment Analysis on MPQA

no code implementations • 5 Mar 2018 • Sanjeev Arora, Wei Hu, Pravesh K. Kothari

A first line of attack in exploratory data analysis is data visualization, i. e., generating a 2-dimensional representation of data that makes clusters of similar points visually identifiable.

1 code implementation • ICML 2018 • Sanjeev Arora, Nadav Cohen, Elad Hazan

The effect of depth on optimization is decoupled from expressiveness by focusing on settings where additional layers amount to overparameterization - linear neural networks, a well-studied model.

no code implementations • ICML 2018 • Sanjeev Arora, Rong Ge, Behnam Neyshabur, Yi Zhang

Analysis of correctness of our compression relies upon some newly identified \textquotedblleft noise stability\textquotedblright properties of trained deep nets, which are also experimentally verified.

no code implementations • ICLR 2018 • Sanjeev Arora, Elad Hazan, Holden Lee, Karan Singh, Cyril Zhang, Yi Zhang

We study the control of symmetric linear dynamical systems with unknown dynamics and a hidden state.

2 code implementations • ICLR 2018 • Sanjeev Arora, Mikhail Khodak, Nikunj Saunshi, Kiran Vodrahalli

We also show a surprising new property of embeddings such as GloVe and word2vec: they form a good sensing matrix for text that is more efficient than random matrices, the standard sparse recovery tool, which may explain why they lead to better representations in practice.

no code implementations • ICLR 2018 • Sanjeev Arora, Andrej Risteski, Yi Zhang

Using this evidence is presented that well-known GANs approaches do learn distributions of fairly low support.

no code implementations • 7 Nov 2017 • Sanjeev Arora, Andrej Risteski, Yi Zhang

Encoder-decoder GANs architectures (e. g., BiGAN and ALI) seek to add an inference mechanism to the GANs setup, consisting of a small encoder deep net that maps data-points to their succinct encodings.

no code implementations • 26 Jun 2017 • Sanjeev Arora, Yi Zhang

Do GANS (Generative Adversarial Nets) actually learn the target distribution?

no code implementations • 14 Jun 2017 • Sanjeev Arora, Andrej Risteski

There is general consensus that learning representations is useful for a variety of reasons, e. g. efficient use of labeled data (semi-supervised learning), transfer learning and understanding hidden structure of data.

no code implementations • 29 Apr 2017 • Mikhail Khodak, Andrej Risteski, Christiane Fellbaum, Sanjeev Arora

Our methods require very few linguistic resources, thus being applicable for Wordnet construction in low-resources languages, and may further be applied to sense clustering and other Wordnet improvements.

1 code implementation • WS 2017 • Mikhail Khodak, Andrej Risteski, Christiane Fellbaum, Sanjeev Arora

To evaluate our method we construct two 600-word testsets for word-to-synset matching in French and Russian using native speakers and evaluate the performance of our method along with several other recent approaches.

1 code implementation • ICML 2017 • Sanjeev Arora, Rong Ge, YIngyu Liang, Tengyu Ma, Yi Zhang

We show that training of generative adversarial network (GAN) may not have good generalization properties; e. g., training may appear successful but the trained distribution may be far from target distribution in standard metrics.

no code implementations • 22 Feb 2017 • Holden Lee, Rong Ge, Tengyu Ma, Andrej Risteski, Sanjeev Arora

We take a first cut at explaining the expressivity of multilayer nets by giving a sufficient criterion for a function to be approximable by a neural network with $n$ hidden layers.

no code implementations • 28 Dec 2016 • Sanjeev Arora, Rong Ge, Tengyu Ma, Andrej Risteski

Many machine learning applications use latent variable models to explain structure in data, whereby visible variables (= coordinates of the given datapoint) are explained as a probabilistic function of some hidden variables.

1 code implementation • 13 Oct 2016 • Kiran Vodrahalli, Po-Hsuan Chen, YIngyu Liang, Christopher Baldassano, Janice Chen, Esther Yong, Christopher Honey, Uri Hasson, Peter Ramadge, Ken Norman, Sanjeev Arora

Several research groups have shown how to correlate fMRI responses to the meanings of presented stimuli.

no code implementations • 27 May 2016 • Sanjeev Arora, Rong Ge, Frederic Koehler, Tengyu Ma, Ankur Moitra

But designing provable algorithms for inference has proven to be more challenging.

1 code implementation • TACL 2018 • Sanjeev Arora, Yuanzhi Li, YIngyu Liang, Tengyu Ma, Andrej Risteski

A novel aspect of our technique is that each extracted word sense is accompanied by one of about 2000 "discourse atoms" that gives a succinct description of which other words co-occur with that word sense.

no code implementations • 18 Nov 2015 • Sanjeev Arora, YIngyu Liang, Tengyu Ma

Under this assumption ---which is experimentally tested on real-life nets like AlexNet--- it is formally proved that feed forward net is a correct inference method for recovering the hidden layer.

no code implementations • 2 Mar 2015 • Sanjeev Arora, Rong Ge, Tengyu Ma, Ankur Moitra

Its standard formulation is as a non-convex optimization problem which is solved in practice by heuristics based on alternating minimization.

4 code implementations • TACL 2016 • Sanjeev Arora, Yuanzhi Li, YIngyu Liang, Tengyu Ma, Andrej Risteski

Semantic word embeddings represent the meaning of a word via a vector, and are created by diverse methods.

no code implementations • 3 Jan 2014 • Sanjeev Arora, Aditya Bhaskara, Rong Ge, Tengyu Ma

In dictionary learning, also known as sparse coding, the algorithm is given samples of the form $y = Ax$ where $x\in \mathbb{R}^m$ is an unknown random sparse vector and $A$ is an unknown dictionary matrix in $\mathbb{R}^{n\times m}$ (usually $m > n$, which is the overcomplete case).

no code implementations • 23 Oct 2013 • Sanjeev Arora, Aditya Bhaskara, Rong Ge, Tengyu Ma

The analysis of the algorithm reveals interesting structure of neural networks with random edge weights.

no code implementations • 28 Aug 2013 • Sanjeev Arora, Rong Ge, Ankur Moitra

In sparse recovery we are given a matrix $A$ (the dictionary) and a vector of the form $A X$ where $X$ is sparse, and the goal is to recover $X$.

2 code implementations • 19 Dec 2012 • Sanjeev Arora, Rong Ge, Yoni Halpern, David Mimno, Ankur Moitra, David Sontag, Yichen Wu, Michael Zhu

Topic models provide a useful method for dimensionality reduction and exploratory data analysis in large text corpora.

no code implementations • NeurIPS 2012 • Sanjeev Arora, Rong Ge, Ankur Moitra, Sushant Sachdeva

We present a new algorithm for Independent Component Analysis (ICA) which has provable performance guarantees.

2 code implementations • 9 Apr 2012 • Sanjeev Arora, Rong Ge, Ankur Moitra

Topic Modeling is an approach used for automatic comprehension and classification of data in a variety of settings, and perhaps the canonical application is in uncovering thematic structure in a corpus of documents.

Cannot find the paper you are looking for? You can
Submit a new open access paper.

Contact us on:
hello@paperswithcode.com
.
Papers With Code is a free resource with all data licensed under CC-BY-SA.