Search Results for author: Srinadh Bhojanapalli

Found 42 papers, 10 papers with code

Efficient Language Model Architectures for Differentially Private Federated Learning

no code implementations • 12 Mar 2024 • Jae Hun Ro, Srinadh Bhojanapalli, Zheng Xu, Yanxiang Zhang, Ananda Theertha Suresh

Cross-device federated learning (FL) is a technique that trains a model on data distributed across typically millions of edge devices without data leaving the devices.

Computational Efficiency Federated Learning +1

Paper
Add Code

HiRE: High Recall Approximate Top-$k$ Estimation for Efficient LLM Inference

no code implementations • 14 Feb 2024 • Yashas Samaga B L, Varun Yerram, Chong You, Srinadh Bhojanapalli, Sanjiv Kumar, Prateek Jain, Praneeth Netrapalli

Autoregressive decoding with generative Large Language Models (LLMs) on accelerators (GPUs/TPUs) is often memory-bound where most of the time is spent on transferring model parameters from high bandwidth memory (HBM) to cache.

Paper
Add Code

Dual-Encoders for Extreme Multi-Label Classification

1 code implementation • 16 Oct 2023 • Nilesh Gupta, Devvrit Khatri, Ankit S Rawat, Srinadh Bhojanapalli, Prateek Jain, Inderjit Dhillon

We propose decoupled softmax loss - a simple modification to the InfoNCE loss - that overcomes the limitations of existing contrastive losses.

Classification Extreme Multi-Label Classification +2

Paper
Code

Functional Interpolation for Relative Positions Improves Long Context Transformers

no code implementations • 6 Oct 2023 • Shanda Li, Chong You, Guru Guruganesh, Joshua Ainslie, Santiago Ontanon, Manzil Zaheer, Sumit Sanghai, Yiming Yang, Sanjiv Kumar, Srinadh Bhojanapalli

Preventing the performance decay of Transformers on inputs longer than those used for training has been an important challenge in extending the context length of these models.

Language Modelling Position

Paper
Add Code

Depth Dependence of $μ$P Learning Rates in ReLU MLPs

no code implementations • 13 May 2023 • Samy Jelassi, Boris Hanin, Ziwei Ji, Sashank J. Reddi, Srinadh Bhojanapalli, Sanjiv Kumar

In this short note we consider random fully connected ReLU networks of width $n$ and depth $L$ equipped with a mean-field weight initialization.

Paper
Add Code

On student-teacher deviations in distillation: does it pay to disobey?

no code implementations • NeurIPS 2023 • Vaishnavh Nagarajan, Aditya Krishna Menon, Srinadh Bhojanapalli, Hossein Mobahi, Sanjiv Kumar

Knowledge distillation (KD) has been widely used to improve the test accuracy of a "student" network, by training it to mimic the soft probabilities of a trained "teacher" network.

Knowledge Distillation

Paper
Add Code

On the Adversarial Robustness of Mixture of Experts

no code implementations • 19 Oct 2022 • Joan Puigcerver, Rodolphe Jenatton, Carlos Riquelme, Pranjal Awasthi, Srinadh Bhojanapalli

We next empirically evaluate the robustness of MoEs on ImageNet using adversarial attacks and show they are indeed more robust than dense models with the same computational cost.

Adversarial Robustness Open-Ended Question Answering

Paper
Add Code

The Lazy Neuron Phenomenon: On Emergence of Activation Sparsity in Transformers

no code implementations • 12 Oct 2022 • Zonglin Li, Chong You, Srinadh Bhojanapalli, Daliang Li, Ankit Singh Rawat, Sashank J. Reddi, Ke Ye, Felix Chern, Felix Yu, Ruiqi Guo, Sanjiv Kumar

This paper studies the curious phenomenon for machine learning models with Transformer architectures that their activation maps are sparse.

Paper
Add Code

Treeformer: Dense Gradient Trees for Efficient Attention Computation

no code implementations • 18 Aug 2022 • Lovish Madaan, Srinadh Bhojanapalli, Himanshu Jain, Prateek Jain

Based on such hierarchical navigation, we design Treeformer which can use one of two efficient attention layers -- TF-Attention and TC-Attention.

Retrieval

Paper
Add Code

Robust Training of Neural Networks Using Scale Invariant Architectures

no code implementations • 2 Feb 2022 • Zhiyuan Li, Srinadh Bhojanapalli, Manzil Zaheer, Sashank J. Reddi, Sanjiv Kumar

In contrast to SGD, adaptive gradient methods like Adam allow robust training of modern deep networks, especially large language models.

Paper
Add Code

Leveraging redundancy in attention with Reuse Transformers

1 code implementation • 13 Oct 2021 • Srinadh Bhojanapalli, Ayan Chakrabarti, Andreas Veit, Michal Lukasik, Himanshu Jain, Frederick Liu, Yin-Wen Chang, Sanjiv Kumar

Pairwise dot product-based attention allows Transformers to exchange information between tokens in an input-dependent way, and is key to their success across diverse applications in language and vision.

76,591

Paper
Code

Teacher's pet: understanding and mitigating biases in distillation

no code implementations • 19 Jun 2021 • Michal Lukasik, Srinadh Bhojanapalli, Aditya Krishna Menon, Sanjiv Kumar

Knowledge distillation is widely used as a means of improving the performance of a relatively simple student model using the predictions from a complex teacher model.

Image Classification Knowledge Distillation

Paper
Add Code

Eigen Analysis of Self-Attention and its Reconstruction from Partial Computation

no code implementations • 16 Jun 2021 • Srinadh Bhojanapalli, Ayan Chakrabarti, Himanshu Jain, Sanjiv Kumar, Michal Lukasik, Andreas Veit

State-of-the-art transformer models use pairwise dot-product based self-attention, which comes at a computational cost quadratic in the input sequence length.

Paper
Add Code

A Simple and Effective Positional Encoding for Transformers

no code implementations • EMNLP 2021 • Pu-Chin Chen, Henry Tsai, Srinadh Bhojanapalli, Hyung Won Chung, Yin-Wen Chang, Chun-Sung Ferng

Our analysis shows that the gain actually comes from moving positional information to attention layer from the input.

Position

Paper
Add Code

Understanding Robustness of Transformers for Image Classification

no code implementations • ICCV 2021 • Srinadh Bhojanapalli, Ayan Chakrabarti, Daniel Glasner, Daliang Li, Thomas Unterthiner, Andreas Veit

We find that when pre-trained with a sufficient amount of data, ViT models are at least as robust as the ResNet counterparts on a broad range of perturbations.

Classification General Classification +1

Paper
Add Code

On the Reproducibility of Neural Network Predictions

no code implementations • 5 Feb 2021 • Srinadh Bhojanapalli, Kimberly Wilber, Andreas Veit, Ankit Singh Rawat, Seungyeon Kim, Aditya Menon, Sanjiv Kumar

By analyzing the relationship between churn and prediction confidences, we pursue an approach with two components for churn reduction.

Data Augmentation Image Classification

Paper
Add Code

Modifying Memories in Transformer Models

no code implementations • 1 Dec 2020 • Chen Zhu, Ankit Singh Rawat, Manzil Zaheer, Srinadh Bhojanapalli, Daliang Li, Felix Yu, Sanjiv Kumar

In this paper, we propose a new task of \emph{explicitly modifying specific factual knowledge in Transformer models while ensuring the model performance does not degrade on the unmodified facts}.

Memorization

Paper
Add Code

O(n) Connections are Expressive Enough: Universal Approximability of Sparse Transformers

no code implementations • NeurIPS 2020 • Chulhee Yun, Yin-Wen Chang, Srinadh Bhojanapalli, Ankit Singh Rawat, Sashank Reddi, Sanjiv Kumar

We propose sufficient conditions under which we prove that a sparse attention model can universally approximate any sequence-to-sequence function.

Paper
Add Code

An efficient nonconvex reformulation of stagewise convex optimization problems

no code implementations • NeurIPS 2020 • Rudy Bunel, Oliver Hinder, Srinadh Bhojanapalli, Krishnamurthy, Dvijotham

We establish theoretical properties of the nonconvex formulation, showing that it is (almost) free of spurious local minima and has the same global optimum as the convex problem.

Paper
Add Code

Coping with Label Shift via Distributionally Robust Optimisation

1 code implementation • ICLR 2021 • Jingzhao Zhang, Aditya Menon, Andreas Veit, Srinadh Bhojanapalli, Sanjiv Kumar, Suvrit Sra

The label shift problem refers to the supervised learning setting where the train and test label distributions do not match.

Paper
Code

Semantic Label Smoothing for Sequence to Sequence Problems

no code implementations • EMNLP 2020 • Michal Lukasik, Himanshu Jain, Aditya Krishna Menon, Seungyeon Kim, Srinadh Bhojanapalli, Felix Yu, Sanjiv Kumar

Label smoothing has been shown to be an effective regularization strategy in classification, that prevents overfitting and helps in label de-noising.

Machine Translation Translation

Paper
Add Code

$O(n)$ Connections are Expressive Enough: Universal Approximability of Sparse Transformers

no code implementations • NeurIPS 2020 • Chulhee Yun, Yin-Wen Chang, Srinadh Bhojanapalli, Ankit Singh Rawat, Sashank J. Reddi, Sanjiv Kumar

We propose sufficient conditions under which we prove that a sparse attention model can universally approximate any sequence-to-sequence function.

Paper
Add Code

Does label smoothing mitigate label noise?

no code implementations • ICML 2020 • Michal Lukasik, Srinadh Bhojanapalli, Aditya Krishna Menon, Sanjiv Kumar

Label smoothing is commonly used in training deep learning models, wherein one-hot training labels are mixed with uniform label vectors.

Ranked #12 on Learning with noisy labels on CIFAR-10N-Random3

Learning with noisy labels

Paper
Add Code

Low-Rank Bottleneck in Multi-head Attention Models

no code implementations • ICML 2020 • Srinadh Bhojanapalli, Chulhee Yun, Ankit Singh Rawat, Sashank J. Reddi, Sanjiv Kumar

Attention based Transformer architecture has enabled significant advances in the field of natural language processing.

Paper
Add Code

Are Transformers universal approximators of sequence-to-sequence functions?

no code implementations • ICLR 2020 • Chulhee Yun, Srinadh Bhojanapalli, Ankit Singh Rawat, Sashank J. Reddi, Sanjiv Kumar

In this paper, we establish that Transformer models are universal approximators of continuous permutation equivariant sequence-to-sequence functions with compact support, which is quite surprising given the amount of shared parameters in these models.

Paper
Add Code

Concise Multi-head Attention Models

no code implementations • 25 Sep 2019 • Srinadh Bhojanapalli, Chulhee Yun, Ankit Singh Rawat, Sashank Reddi, Sanjiv Kumar

Attention based Transformer architecture has enabled significant advances in the field of natural language processing.

Paper
Add Code

The role of over-parametrization in generalization of neural networks

1 code implementation • ICLR 2019 • Behnam Neyshabur, Zhiyuan Li, Srinadh Bhojanapalli, Yann Lecun, Nathan Srebro

Despite existing work on ensuring generalization of neural networks in terms of scale sensitive complexity measures, such as norms, margin and sharpness, these complexity measures do not offer an explanation of why neural networks generalize better with over-parametrization.

Paper
Code

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

24 code implementations • ICLR 2020 • Yang You, Jing Li, Sashank Reddi, Jonathan Hseu, Sanjiv Kumar, Srinadh Bhojanapalli, Xiaodan Song, James Demmel, Kurt Keutzer, Cho-Jui Hsieh

In this paper, we first study a principled layerwise adaptation strategy to accelerate training of deep neural networks using large mini-batches.

Ranked #11 on Question Answering on SQuAD1.1 dev (F1 metric)

Question Answering Stochastic Optimization

1,846

Paper
Code

Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks

2 code implementations • 30 May 2018 • Behnam Neyshabur, Zhiyuan Li, Srinadh Bhojanapalli, Yann Lecun, Nathan Srebro

Paper
Code

Smoothed analysis for low-rank solutions to semidefinite programs in quadratic penalty form

no code implementations • 1 Mar 2018 • Srinadh Bhojanapalli, Nicolas Boumal, Prateek Jain, Praneeth Netrapalli

Semidefinite programs (SDP) are important in learning and combinatorial optimization with numerous applications.

Combinatorial Optimization Matrix Completion

Paper
Add Code

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks

no code implementations • ICLR 2018 • Behnam Neyshabur, Srinadh Bhojanapalli, Nathan Srebro

We present a generalization bound for feedforward neural networks in terms of the product of the spectral norm of the layers and the Frobenius norm of the weights.

Paper
Add Code

Exploring Generalization in Deep Learning

2 code implementations • NeurIPS 2017 • Behnam Neyshabur, Srinadh Bhojanapalli, David McAllester, Nathan Srebro

With a goal of understanding what drives generalization in deep networks, we consider several recently suggested explanations, including norm-based control, sharpness and robustness.

Paper
Code

Implicit Regularization in Matrix Factorization

no code implementations • NeurIPS 2017 • Suriya Gunasekar, Blake Woodworth, Srinadh Bhojanapalli, Behnam Neyshabur, Nathan Srebro

We study implicit regularization when optimizing an underdetermined quadratic objective over a matrix $X$ with gradient descent on a factorization of $X$.

Paper
Add Code

Stabilizing GAN Training with Multiple Random Projections

2 code implementations • ICLR 2018 • Behnam Neyshabur, Srinadh Bhojanapalli, Ayan Chakrabarti

Training generative adversarial networks is unstable in high-dimensions as the true data distribution tends to be concentrated in a small fraction of the ambient space.

Paper
Code

Single Pass PCA of Matrix Products

1 code implementation • NeurIPS 2016 • Shanshan Wu, Srinadh Bhojanapalli, Sujay Sanghavi, Alexandros G. Dimakis

In this paper we present a new algorithm for computing a low rank approximation of the product $A^TB$ by taking only a single pass of the two matrices $A$ and $B$.

Paper
Code

Provable Burer-Monteiro factorization for a class of norm-constrained matrix problems

no code implementations • 4 Jun 2016 • Dohyung Park, Anastasios Kyrillidis, Srinadh Bhojanapalli, Constantine Caramanis, Sujay Sanghavi

We study the projected gradient descent method on low-rank matrix problems with a strongly convex objective.

LEMMA Quantum State Tomography +1

Paper
Add Code

Global Optimality of Local Search for Low Rank Matrix Recovery

no code implementations • NeurIPS 2016 • Srinadh Bhojanapalli, Behnam Neyshabur, Nathan Srebro

We show that there are no spurious local minima in the non-convex factorized parametrization of low-rank matrix recovery from incoherent linear measurements.

Paper
Add Code

Dropping Convexity for Faster Semi-definite Optimization

no code implementations • 14 Sep 2015 • Srinadh Bhojanapalli, Anastasios Kyrillidis, Sujay Sanghavi

To the best of our knowledge, this is the first paper to provide precise convergence rate guarantees for general convex functions under standard convex assumptions.

Paper
Add Code

A New Sampling Technique for Tensors

no code implementations • 17 Feb 2015 • Srinadh Bhojanapalli, Sujay Sanghavi

In this paper we propose new techniques to sample arbitrary third-order tensors, with an objective of speeding up tensor algorithms that have recently gained popularity in machine learning.

Paper
Add Code

Tighter Low-rank Approximation via Sampling the Leveraged Element

1 code implementation • 14 Oct 2014 • Srinadh Bhojanapalli, Prateek Jain, Sujay Sanghavi

The first is a new method to directly compute a low-rank approximation (in efficient factored form) to the product of two given matrices; it computes a small random set of entries of the product, and then executes weighted alternating minimization (as before) on these.

Paper
Code

Universal Matrix Completion

no code implementations • 10 Feb 2014 • Srinadh Bhojanapalli, Prateek Jain

The problem of low-rank matrix completion has recently generated a lot of interest leading to several results that offer exact solutions to the problem.

Low-Rank Matrix Completion

Paper
Add Code

Completing Any Low-rank Matrix, Provably

no code implementations • 12 Jun 2013 • Yudong Chen, Srinadh Bhojanapalli, Sujay Sanghavi, Rachel Ward

Matrix completion, i. e., the exact and provable recovery of a low-rank matrix from a small subset of its elements, is currently only known to be possible if the matrix satisfies a restrictive structural constraint---known as {\em incoherence}---on its row and column spaces.

Matrix Completion

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.