Search Results for author: Sanjeev Arora

Found 77 papers, 29 papers with code

Instance-hiding Schemes for Private Distributed Learning

no code implementations • ICML 2020 • Yangsibo Huang, Zhao Song, Sanjeev Arora, Kai Li

The new ideas in the current paper are: (a) new variants of mixup with negative as well as positive coefficients, and extend the sample-wise mixup to be pixel-wise.

Federated Learning

Paper
Add Code

Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates

1 code implementation • 28 Feb 2024 • Kaifeng Lyu, Haoyu Zhao, Xinran Gu, Dingli Yu, Anirudh Goyal, Sanjeev Arora

Public LLMs such as the Llama 2-Chat have driven huge activity in LLM research.

GSM8K

Paper
Code

Language Models as Science Tutors

1 code implementation • 16 Feb 2024 • Alexis Chevalier, Jiayi Geng, Alexander Wettig, Howard Chen, Sebastian Mizera, Toni Annala, Max Jameson Aragon, Arturo Rodríguez Fanlo, Simon Frieder, Simon Machado, Akshara Prabhakar, Ellie Thieu, Jiachen T. Wang, ZiRui Wang, Xindi Wu, Mengzhou Xia, Wenhan Jia, Jiatong Yu, Jun-Jie Zhu, Zhiyong Jason Ren, Sanjeev Arora, Danqi Chen

We use TutorChat to fine-tune Llemma models with 7B and 34B parameters.

GSM8K Math +1

Paper
Code

LESS: Selecting Influential Data for Targeted Instruction Tuning

1 code implementation • 6 Feb 2024 • Mengzhou Xia, Sadhika Malladi, Suchin Gururangan, Sanjeev Arora, Danqi Chen

Instruction tuning has unlocked powerful capabilities in large language models (LLMs), effectively using combined datasets to develop generalpurpose chatbots.

199

Paper
Code

Unlearning via Sparse Representations

no code implementations • 26 Nov 2023 • Vedant Shah, Frederik Träuble, Ashish Malik, Hugo Larochelle, Michael Mozer, Sanjeev Arora, Yoshua Bengio, Anirudh Goyal

Machine \emph{unlearning}, which involves erasing knowledge about a \emph{forget set} from a trained model, can prove to be costly and infeasible by existing techniques.

Knowledge Distillation

Paper
Add Code

Skill-Mix: a Flexible and Expandable Family of Evaluations for AI models

no code implementations • 26 Oct 2023 • Dingli Yu, Simran Kaur, Arushi Gupta, Jonah Brown-Cohen, Anirudh Goyal, Sanjeev Arora

The paper develops a methodology for (a) designing and administering such an evaluation, and (b) automatic grading (plus spot-checking by humans) of the results using GPT-4 as well as the open LLaMA-2 70B model.

Paper
Add Code

A Quadratic Synchronization Rule for Distributed Deep Learning

1 code implementation • 22 Oct 2023 • Xinran Gu, Kaifeng Lyu, Sanjeev Arora, Jingzhao Zhang, Longbo Huang

In distributed deep learning with data parallelism, synchronizing gradients at each training step can cause a huge communication overhead, especially when many nodes work together to train large models.

Paper
Code

A Theory for Emergence of Complex Skills in Language Models

no code implementations • 29 Jul 2023 • Sanjeev Arora, Anirudh Goyal

Contributions include: (a) A statistical framework that relates cross-entropy loss of LLMs to competence on the basic skills that underlie language tasks.

Inductive Bias

Paper
Add Code

Trainable Transformer in Transformer

1 code implementation • 3 Jul 2023 • Abhishek Panigrahi, Sadhika Malladi, Mengzhou Xia, Sanjeev Arora

In this work, we propose an efficient construction, Transformer in Transformer (in short, TinT), that allows a transformer to simulate and fine-tune complex models internally during inference (e. g., pre-trained language models).

Attribute In-Context Learning +1

Paper
Code

Fine-Tuning Language Models with Just Forward Passes

2 code implementations • NeurIPS 2023 • Sadhika Malladi, Tianyu Gao, Eshaan Nichani, Alex Damian, Jason D. Lee, Danqi Chen, Sanjeev Arora

Fine-tuning language models (LMs) has yielded success on diverse downstream tasks, but as LMs grow in size, backpropagation requires a prohibitively large amount of memory.

In-Context Learning Multiple-choice

971

Paper
Code

Do Transformers Parse while Predicting the Masked Word?

no code implementations • 14 Mar 2023 • Haoyu Zhao, Abhishek Panigrahi, Rong Ge, Sanjeev Arora

We also show that the Inside-Outside algorithm is optimal for masked language modeling loss on the PCFG-generated data.

Constituency Parsing Language Modelling +1

Paper
Add Code

Why (and When) does Local SGD Generalize Better than SGD?

1 code implementation • 2 Mar 2023 • Xinran Gu, Kaifeng Lyu, Longbo Huang, Sanjeev Arora

Local SGD is a communication-efficient variant of SGD for large-scale training, where multiple GPUs perform SGD independently and average the model parameters periodically.

Paper
Code

Task-Specific Skill Localization in Fine-tuned Language Models

1 code implementation • 13 Feb 2023 • Abhishek Panigrahi, Nikunj Saunshi, Haoyu Zhao, Sanjeev Arora

Given the downstream task and a model fine-tuned on that task, a simple optimization is used to identify a very small subset of parameters ($\sim0. 01$% of model parameters) responsible for ($>95$%) of the model's performance, in the sense that grafting the fine-tuned values for just this tiny subset onto the pre-trained model gives performance almost as well as the fine-tuned model.

Continual Learning

Paper
Code

New Definitions and Evaluations for Saliency Methods: Staying Intrinsic, Complete and Sound

1 code implementation • 5 Nov 2022 • Arushi Gupta, Nikunj Saunshi, Dingli Yu, Kaifeng Lyu, Sanjeev Arora

Saliency methods compute heat maps that highlight portions of an input that were most {\em important} for the label assigned to it by a deep net.

Paper
Code

A Kernel-Based View of Language Model Fine-Tuning

1 code implementation • 11 Oct 2022 • Sadhika Malladi, Alexander Wettig, Dingli Yu, Danqi Chen, Sanjeev Arora

It has become standard to solve NLP tasks by fine-tuning pre-trained language models (LMs), especially in low-data settings.

Language Modelling

Paper
Code

Understanding Influence Functions and Datamodels via Harmonic Analysis

no code implementations • 3 Oct 2022 • Nikunj Saunshi, Arushi Gupta, Mark Braverman, Sanjeev Arora

Influence functions estimate effect of individual data points on predictions of the model on test data and were adapted to deep learning in Koh and Liang [2017].

Data Poisoning

Paper
Add Code

Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent

no code implementations • 8 Jul 2022 • Zhiyuan Li, Tianhao Wang, JasonD. Lee, Sanjeev Arora

Conversely, continuous mirror descent with any Legendre function can be viewed as gradient flow with a related commuting parametrization.

Paper
Add Code

Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction

no code implementations • 14 Jun 2022 • Kaifeng Lyu, Zhiyuan Li, Sanjeev Arora

Normalization layers (e. g., Batch Normalization, Layer Normalization) were introduced to help with optimization difficulties in very deep nets, but they clearly also help generalization, even in not-so-deep nets.

Paper
Add Code

On the SDEs and Scaling Rules for Adaptive Gradient Algorithms

1 code implementation • 20 May 2022 • Sadhika Malladi, Kaifeng Lyu, Abhishek Panigrahi, Sanjeev Arora

Approximating Stochastic Gradient Descent (SGD) as a Stochastic Differential Equation (SDE) has allowed researchers to enjoy the benefits of studying a continuous optimization trajectory while carefully preserving the stochasticity of SGD.

Paper
Code

Understanding Gradient Descent on Edge of Stability in Deep Learning

no code implementations • 19 May 2022 • Sanjeev Arora, Zhiyuan Li, Abhishek Panigrahi

The current paper mathematically analyzes a new mechanism of implicit regularization in the EoS phase, whereby GD updates due to non-smooth loss landscape turn out to evolve along some deterministic flow on the manifold of minimum loss.

Paper
Add Code

Adaptive Gradient Methods with Local Guarantees

no code implementations • 2 Mar 2022 • Zhou Lu, Wenhan Xia, Sanjeev Arora, Elad Hazan

Adaptive gradient methods are the method of choice for optimization in machine learning and used to train the largest deep models.

Benchmarking

Paper
Add Code

Understanding Contrastive Learning Requires Incorporating Inductive Biases

no code implementations • 28 Feb 2022 • Nikunj Saunshi, Jordan Ash, Surbhi Goel, Dipendra Misra, Cyril Zhang, Sanjeev Arora, Sham Kakade, Akshay Krishnamurthy

Contrastive learning is a popular form of self-supervised learning that encourages augmentations (views) of the same input to have more similar representations compared to augmentations of different inputs.

Contrastive Learning Self-Supervised Learning

Paper
Add Code

Evaluating Gradient Inversion Attacks and Defenses in Federated Learning

1 code implementation • NeurIPS 2021 • Yangsibo Huang, Samyak Gupta, Zhao Song, Kai Li, Sanjeev Arora

Gradient inversion attack (or input recovery from gradient) is an emerging threat to the security and privacy preservation of Federated learning, whereby malicious eavesdroppers or participants in the protocol can recover (partially) the clients' private data.

Federated Learning

171

Paper
Code

On Predicting Generalization using GANs

no code implementations • ICLR 2022 • Yi Zhang, Arushi Gupta, Nikunj Saunshi, Sanjeev Arora

Research on generalization bounds for deep networks seeks to give ways to predict test error using just the training dataset and the network parameters.

Generalization Bounds Generative Adversarial Network

Paper
Add Code

Gradient Descent on Two-layer Nets: Margin Maximization and Simplicity Bias

no code implementations • NeurIPS 2021 • Kaifeng Lyu, Zhiyuan Li, Runzhe Wang, Sanjeev Arora

The current paper is able to establish this global optimality for two-layer Leaky ReLU nets trained with gradient flow on linearly separable and symmetric data, regardless of the width.

Vocal Bursts Valence Prediction

Paper
Add Code

What Happens after SGD Reaches Zero Loss? --A Mathematical Framework

no code implementations • ICLR 2022 • Zhiyuan Li, Tianhao Wang, Sanjeev Arora

Understanding the implicit bias of Stochastic Gradient Descent (SGD) is one of the key challenges in deep learning, especially for overparametrized models, where the local minimizers of the loss function $L$ can form a manifold.

valid

Paper
Add Code

New Definitions and Evaluations for Saliency Methods: Staying Intrinsic and Sound

no code implementations • 29 Sep 2021 • Arushi Gupta, Nikunj Saunshi, Dingli Yu, Kaifeng Lyu, Sanjeev Arora

Saliency methods seek to provide human-interpretable explanations for the output of machine learning model on a given input.

Paper
Add Code

Rip van Winkle's Razor: A Simple Estimate of Overfit to Test Data

no code implementations • 25 Feb 2021 • Sanjeev Arora, Yi Zhang

Traditional statistics forbids use of test data (a. k. a.

Math

Paper
Add Code

On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs)

1 code implementation • NeurIPS 2021 • Zhiyuan Li, Sadhika Malladi, Sanjeev Arora

It is generally recognized that finite learning rate (LR), in contrast to infinitesimal LR, is important for good generalization in real-life deep nets.

Paper
Code

Why Are Convolutional Nets More Sample-Efficient than Fully-Connected Nets?

no code implementations • ICLR 2021 • Zhiyuan Li, Yi Zhang, Sanjeev Arora

However, this has not been made mathematically rigorous, and the hurdle is that the fully connected net can always simulate the convolutional net (for a fixed task).

Image Classification Inductive Bias

Paper
Add Code

TextHide: Tackling Data Privacy in Language Understanding Tasks

1 code implementation • Findings of the Association for Computational Linguistics 2020 • Yangsibo Huang, Zhao Song, Danqi Chen, Kai Li, Sanjeev Arora

In addition, TextHide fits well with the popular framework of fine-tuning pre-trained language models (e. g., BERT) for any sentence or sentence-pair task.

Federated Learning Natural Language Understanding +1

Paper
Code

A Mathematical Exploration of Why Language Models Help Solve Downstream Tasks

no code implementations • ICLR 2021 • Nikunj Saunshi, Sadhika Malladi, Sanjeev Arora

This paper initiates a mathematical study of this phenomenon for the downstream task of text classification by considering the following questions: (1) What is the intuitive connection between the pretraining task of next word prediction and text classification?

General Classification Language Modelling +4

Paper
Add Code

Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate

no code implementations • NeurIPS 2020 • Zhiyuan Li, Kaifeng Lyu, Sanjeev Arora

Recent works (e. g., (Li and Arora, 2020)) suggest that the use of popular normalization schemes (including Batch Normalization) in today's deep learning can move it far from a traditional optimization viewpoint, e. g., use of exponentially increasing learning rates.

Paper
Add Code

InstaHide: Instance-hiding Schemes for Private Distributed Learning

3 code implementations • 6 Oct 2020 • Yangsibo Huang, Zhao Song, Kai Li, Sanjeev Arora

This paper introduces InstaHide, a simple encryption of training images, which can be plugged into existing distributed deep learning pipelines.

Paper
Code

Privacy-preserving Learning via Deep Net Pruning

no code implementations • 4 Mar 2020 • Yangsibo Huang, Yushan Su, Sachin Ravi, Zhao Song, Sanjeev Arora, Kai Li

This paper attempts to answer the question whether neural network pruning can be used as a tool to achieve differential privacy without losing much data utility.

Network Pruning Privacy Preserving

Paper
Add Code

A Sample Complexity Separation between Non-Convex and Convex Meta-Learning

no code implementations • ICML 2020 • Nikunj Saunshi, Yi Zhang, Mikhail Khodak, Sanjeev Arora

In contrast, for the non-convex formulation of a two layer linear network on the same instance, we show that both Reptile and multi-task representation learning can have new task sample complexity of $\mathcal{O}(1)$, demonstrating a separation from convex meta-learning.

Meta-Learning Representation Learning

Paper
Add Code

Provable Representation Learning for Imitation Learning via Bi-level Optimization

no code implementations • ICML 2020 • Sanjeev Arora, Simon S. Du, Sham Kakade, Yuping Luo, Nikunj Saunshi

We formulate representation learning as a bi-level optimization problem where the "outer" optimization tries to learn the joint representation and the "inner" optimization encodes the imitation learning setup and tries to learn task-specific parameters.

Imitation Learning Representation Learning

Paper
Add Code

Over-parameterized Adversarial Training: An Analysis Overcoming the Curse of Dimensionality

no code implementations • NeurIPS 2020 • Yi Zhang, Orestis Plevrakis, Simon S. Du, Xingguo Li, Zhao Song, Sanjeev Arora

Our work proves convergence to low robust training loss for \emph{polynomial} width instead of exponential, under natural assumptions and with the ReLU activation.

Paper
Add Code

Enhanced Convolutional Neural Tangent Kernels

no code implementations • 3 Nov 2019 • Zhiyuan Li, Ruosong Wang, Dingli Yu, Simon S. Du, Wei Hu, Ruslan Salakhutdinov, Sanjeev Arora

An exact algorithm to compute CNTK (Arora et al., 2019) yielded the finding that classification accuracy of CNTK on CIFAR-10 is within 6-7% of that of that of the corresponding CNN architecture (best figure being around 78%) which is interesting performance for a fixed kernel.

Data Augmentation regression

Paper
Add Code

An Exponential Learning Rate Schedule for Deep Learning

no code implementations • ICLR 2020 • Zhiyuan Li, Sanjeev Arora

This paper suggests that the phenomenon may be due to Batch Normalization or BN, which is ubiquitous and provides benefits in optimization and generalization across all standard architectures.

Paper
Add Code

Harnessing the Power of Infinitely Wide Deep Nets on Small-data Tasks

4 code implementations • ICLR 2020 • Sanjeev Arora, Simon S. Du, Zhiyuan Li, Ruslan Salakhutdinov, Ruosong Wang, Dingli Yu

On VOC07 testbed for few-shot image classification tasks on ImageNet with transfer learning (Goyal et al., 2019), replacing the linear SVM currently used with a Convolutional NTK SVM consistently improves performance.

Few-Shot Image Classification General Classification +3

Paper
Code

A Simple Technique to Enable Saliency Methods to Pass the Sanity Checks

no code implementations • 25 Sep 2019 • Arushi Gupta, Sanjeev Arora

This involves computing saliency maps for all possible labels in the classification task, and using a simple competition among them to identify and remove less relevant pixels from the map.

Paper
Add Code

Explaining Landscape Connectivity of Low-cost Solutions for Multilayer Nets

1 code implementation • NeurIPS 2019 • Rohith Kuditipudi, Xiang Wang, Holden Lee, Yi Zhang, Zhiyuan Li, Wei Hu, Sanjeev Arora, Rong Ge

Mode connectivity is a surprising phenomenon in the loss landscape of deep nets.

Paper
Code

Implicit Regularization in Deep Matrix Factorization

1 code implementation • NeurIPS 2019 • Sanjeev Arora, Nadav Cohen, Wei Hu, Yuping Luo

Efforts to understand the generalization mystery in deep learning have led to the belief that gradient-based optimization induces a form of implicit regularization, a bias towards models of low "complexity."

Matrix Completion

Paper
Code

A Simple Saliency Method That Passes the Sanity Checks

no code implementations • 27 May 2019 • Arushi Gupta, Sanjeev Arora

There is great interest in "saliency methods" (also called "attribution methods"), which give "explanations" for a deep net's decision, by assigning a "score" to each feature/pixel in the input.

Paper
Add Code

On Exact Computation with an Infinitely Wide Neural Net

2 code implementations • NeurIPS 2019 • Sanjeev Arora, Simon S. Du, Wei Hu, Zhiyuan Li, Ruslan Salakhutdinov, Ruosong Wang

An attraction of such ideas is that a pure kernel-based method is used to capture the power of a fully-trained deep net of infinite width.

Gaussian Processes

108

Paper
Code

A Theoretical Analysis of Contrastive Unsupervised Representation Learning

no code implementations • 25 Feb 2019 • Sanjeev Arora, Hrishikesh Khandeparkar, Mikhail Khodak, Orestis Plevrakis, Nikunj Saunshi

This framework allows us to show provable guarantees on the performance of the learned representations on the average classification task that is comprised of a subset of the same set of latent classes.

Contrastive Learning General Classification +1

Paper
Add Code

Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks

no code implementations • 24 Jan 2019 • Sanjeev Arora, Simon S. Du, Wei Hu, Zhiyuan Li, Ruosong Wang

This paper analyzes training and generalization for a simple 2-layer ReLU net with random initialization, and provides the following improvements over recent works: (i) Using a tighter characterization of training speed than recent papers, an explanation for why training a neural net with random labels leads to slower training, as originally observed in [Zhang et al. ICLR'17].

Paper
Add Code

Theoretical Analysis of Auto Rate-Tuning by Batch Normalization

no code implementations • ICLR 2019 • Sanjeev Arora, Zhiyuan Li, Kaifeng Lyu

Batch Normalization (BN) has become a cornerstone of deep learning across diverse architectures, appearing to help optimization as well as generalization.

Paper
Add Code

A Convergence Analysis of Gradient Descent for Deep Linear Neural Networks

no code implementations • ICLR 2019 • Sanjeev Arora, Nadav Cohen, Noah Golowich, Wei Hu

We analyze speed of convergence to global optimum for gradient descent training a deep linear neural network (parameterized as $x \mapsto W_N W_{N-1} \cdots W_1 x$) by minimizing the $\ell_2$ loss over whitened data.

Paper
Add Code

A La Carte Embedding: Cheap but Effective Induction of Semantic Feature Vectors

1 code implementation • ACL 2018 • Mikhail Khodak, Nikunj Saunshi, YIngyu Liang, Tengyu Ma, Brandon Stewart, Sanjeev Arora

Motivations like domain adaptation, transfer learning, and feature learning have fueled interest in inducing embeddings for rare or unseen words, n-grams, synsets, and other textual features.

Ranked #3 on Sentiment Analysis on MPQA

Document Classification Domain Adaptation +2

102

Paper
Code

An Analysis of the t-SNE Algorithm for Data Visualization

no code implementations • 5 Mar 2018 • Sanjeev Arora, Wei Hu, Pravesh K. Kothari

A first line of attack in exploratory data analysis is data visualization, i. e., generating a 2-dimensional representation of data that makes clusters of similar points visually identifiable.

Clustering Data Visualization +1

Paper
Add Code

On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization

1 code implementation • ICML 2018 • Sanjeev Arora, Nadav Cohen, Elad Hazan

The effect of depth on optimization is decoupled from expressiveness by focusing on settings where additional layers amount to overparameterization - linear neural networks, a well-studied model.

regression

Paper
Code

Stronger generalization bounds for deep nets via a compression approach

no code implementations • ICML 2018 • Sanjeev Arora, Rong Ge, Behnam Neyshabur, Yi Zhang

Analysis of correctness of our compression relies upon some newly identified \textquotedblleft noise stability\textquotedblright properties of trained deep nets, which are also experimentally verified.

Generalization Bounds

Paper
Add Code

Do GANs learn the distribution? Some Theory and Empirics

no code implementations • ICLR 2018 • Sanjeev Arora, Andrej Risteski, Yi Zhang

Using this evidence is presented that well-known GANs approaches do learn distributions of fairly low support.

Paper
Add Code

A Compressed Sensing View of Unsupervised Text Embeddings, Bag-of-n-Grams, and LSTMs

2 code implementations • ICLR 2018 • Sanjeev Arora, Mikhail Khodak, Nikunj Saunshi, Kiran Vodrahalli

We also show a surprising new property of embeddings such as GloVe and word2vec: they form a good sensing matrix for text that is more efficient than random matrices, the standard sparse recovery tool, which may explain why they lead to better representations in practice.

Paper
Code

Towards Provable Control for Unknown Linear Dynamical Systems

no code implementations • ICLR 2018 • Sanjeev Arora, Elad Hazan, Holden Lee, Karan Singh, Cyril Zhang, Yi Zhang

We study the control of symmetric linear dynamical systems with unknown dynamics and a hidden state.

Paper
Add Code

Theoretical limitations of Encoder-Decoder GAN architectures

no code implementations • 7 Nov 2017 • Sanjeev Arora, Andrej Risteski, Yi Zhang

Encoder-decoder GANs architectures (e. g., BiGAN and ALI) seek to add an inference mechanism to the GANs setup, consisting of a small encoder deep net that maps data-points to their succinct encodings.

Paper
Add Code

Do GANs actually learn the distribution? An empirical study

no code implementations • 26 Jun 2017 • Sanjeev Arora, Yi Zhang

Do GANS (Generative Adversarial Nets) actually learn the target distribution?

Paper
Add Code

Provable benefits of representation learning

no code implementations • 14 Jun 2017 • Sanjeev Arora, Andrej Risteski

There is general consensus that learning representations is useful for a variety of reasons, e. g. efficient use of labeled data (semi-supervised learning), transfer learning and understanding hidden structure of data.

Clustering Representation Learning +1

Paper
Add Code

Extending and Improving Wordnet via Unsupervised Word Embeddings

no code implementations • 29 Apr 2017 • Mikhail Khodak, Andrej Risteski, Christiane Fellbaum, Sanjeev Arora

Our methods require very few linguistic resources, thus being applicable for Wordnet construction in low-resources languages, and may further be applied to sense clustering and other Wordnet improvements.

Clustering Word Embeddings

Paper
Add Code

Automated WordNet Construction Using Word Embeddings

1 code implementation • WS 2017 • Mikhail Khodak, Andrej Risteski, Christiane Fellbaum, Sanjeev Arora

To evaluate our method we construct two 600-word testsets for word-to-synset matching in French and Russian using native speakers and evaluate the performance of our method along with several other recent approaches.

Information Retrieval Machine Translation +3

Paper
Code

Generalization and Equilibrium in Generative Adversarial Nets (GANs)

1 code implementation • ICML 2017 • Sanjeev Arora, Rong Ge, YIngyu Liang, Tengyu Ma, Yi Zhang

We show that training of generative adversarial network (GAN) may not have good generalization properties; e. g., training may appear successful but the trained distribution may be far from target distribution in standard metrics.

Generative Adversarial Network

Paper
Code

On the ability of neural nets to express distributions

no code implementations • 22 Feb 2017 • Holden Lee, Rong Ge, Tengyu Ma, Andrej Risteski, Sanjeev Arora

We take a first cut at explaining the expressivity of multilayer nets by giving a sufficient criterion for a function to be approximable by a neural network with $n$ hidden layers.

Paper
Add Code

Provable learning of Noisy-or Networks

no code implementations • 28 Dec 2016 • Sanjeev Arora, Rong Ge, Tengyu Ma, Andrej Risteski

Many machine learning applications use latent variable models to explain structure in data, whereby visible variables (= coordinates of the given datapoint) are explained as a probabilistic function of some hidden variables.

Tensor Decomposition Topic Models

Paper
Add Code

Mapping Between fMRI Responses to Movies and their Natural Language Annotations

1 code implementation • 13 Oct 2016 • Kiran Vodrahalli, Po-Hsuan Chen, YIngyu Liang, Christopher Baldassano, Janice Chen, Esther Yong, Christopher Honey, Uri Hasson, Peter Ramadge, Ken Norman, Sanjeev Arora

Several research groups have shown how to correlate fMRI responses to the meanings of presented stimuli.

Scene Classification Sentence +2

Paper
Code

Provable Algorithms for Inference in Topic Models

no code implementations • 27 May 2016 • Sanjeev Arora, Rong Ge, Frederic Koehler, Tengyu Ma, Ankur Moitra

But designing provable algorithms for inference has proven to be more challenging.

Topic Models

Paper
Add Code

Linear Algebraic Structure of Word Senses, with Applications to Polysemy

1 code implementation • TACL 2018 • Sanjeev Arora, Yuanzhi Li, YIngyu Liang, Tengyu Ma, Andrej Risteski

A novel aspect of our technique is that each extracted word sense is accompanied by one of about 2000 "discourse atoms" that gives a succinct description of which other words co-occur with that word sense.

Information Retrieval Retrieval +1

Paper
Code

Why are deep nets reversible: A simple theory, with implications for training

no code implementations • 18 Nov 2015 • Sanjeev Arora, YIngyu Liang, Tengyu Ma

Under this assumption ---which is experimentally tested on real-life nets like AlexNet--- it is formally proved that feed forward net is a correct inference method for recovering the hidden layer.

Denoising

Paper
Add Code

Simple, Efficient, and Neural Algorithms for Sparse Coding

no code implementations • 2 Mar 2015 • Sanjeev Arora, Rong Ge, Tengyu Ma, Ankur Moitra

Its standard formulation is as a non-convex optimization problem which is solved in practice by heuristics based on alternating minimization.

Paper
Add Code

A Latent Variable Model Approach to PMI-based Word Embeddings

4 code implementations • TACL 2016 • Sanjeev Arora, Yuanzhi Li, YIngyu Liang, Tengyu Ma, Andrej Risteski

Semantic word embeddings represent the meaning of a word via a vector, and are created by diverse methods.

Word Embeddings

Paper
Code

More Algorithms for Provable Dictionary Learning

no code implementations • 3 Jan 2014 • Sanjeev Arora, Aditya Bhaskara, Rong Ge, Tengyu Ma

In dictionary learning, also known as sparse coding, the algorithm is given samples of the form $y = Ax$ where $x\in \mathbb{R}^m$ is an unknown random sparse vector and $A$ is an unknown dictionary matrix in $\mathbb{R}^{n\times m}$ (usually $m > n$, which is the overcomplete case).

Dictionary Learning

Paper
Add Code

Provable Bounds for Learning Some Deep Representations

no code implementations • 23 Oct 2013 • Sanjeev Arora, Aditya Bhaskara, Rong Ge, Tengyu Ma

The analysis of the algorithm reveals interesting structure of neural networks with random edge weights.

Paper
Add Code

New Algorithms for Learning Incoherent and Overcomplete Dictionaries

no code implementations • 28 Aug 2013 • Sanjeev Arora, Rong Ge, Ankur Moitra

In sparse recovery we are given a matrix $A$ (the dictionary) and a vector of the form $A X$ where $X$ is sparse, and the goal is to recover $X$.

Dictionary Learning Edge Detection +1

Paper
Add Code

A Practical Algorithm for Topic Modeling with Provable Guarantees

2 code implementations • 19 Dec 2012 • Sanjeev Arora, Rong Ge, Yoni Halpern, David Mimno, Ankur Moitra, David Sontag, Yichen Wu, Michael Zhu

Topic models provide a useful method for dimensionality reduction and exploratory data analysis in large text corpora.

Dimensionality Reduction Topic Models

Paper
Code

Provable ICA with Unknown Gaussian Noise, with Implications for Gaussian Mixtures and Autoencoders

no code implementations • NeurIPS 2012 • Sanjeev Arora, Rong Ge, Ankur Moitra, Sushant Sachdeva

We present a new algorithm for Independent Component Analysis (ICA) which has provable performance guarantees.

Paper
Add Code

Learning Topic Models - Going beyond SVD

2 code implementations • 9 Apr 2012 • Sanjeev Arora, Rong Ge, Ankur Moitra

Topic Modeling is an approach used for automatic comprehension and classification of data in a variety of settings, and perhaps the canonical application is in uncovering thematic structure in a corpus of documents.

Topic Models

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.