Search Results for author: Shashank Rajput

Found 16 papers, 8 papers with code

Does Data Augmentation Lead to Positive Margin?

no code implementations8 May 2019 Shashank Rajput, Zhili Feng, Zachary Charles, Po-Ling Loh, Dimitris Papailiopoulos

Data augmentation (DA) is commonly used during model training, as it significantly improves test error and model robustness.

Data Augmentation

Convergence and Margin of Adversarial Training on Separable Data

no code implementations22 May 2019 Zachary Charles, Shashank Rajput, Stephen Wright, Dimitris Papailiopoulos

Our results are derived by showing that adversarial training with gradient updates minimizes a robust version of the empirical risk at a $\mathcal{O}(\ln(t)^2/t)$ rate, despite non-smoothness.

DETOX: A Redundancy-based Framework for Faster and More Robust Gradient Aggregation

1 code implementation NeurIPS 2019 Shashank Rajput, Hongyi Wang, Zachary Charles, Dimitris Papailiopoulos

In this work, we present DETOX, a Byzantine-resilient distributed training framework that combines algorithmic redundancy with robust aggregation.

Closing the convergence gap of SGD without replacement

no code implementations ICML 2020 Shashank Rajput, Anant Gupta, Dimitris Papailiopoulos

A recent line of breakthrough works on SGD without replacement (SGDo) established an $\mathcal{O}\left(\frac{n}{T^2}\right)$ convergence rate when the function minimized is strongly convex and is a sum of $n$ smooth functions, and an $\mathcal{O}\left(\frac{1}{T^2}+\frac{n^3}{T^3}\right)$ rate for sums of quadratics.

Optimal Lottery Tickets via SubsetSum: Logarithmic Over-Parameterization is Sufficient

1 code implementation14 Jun 2020 Ankit Pensia, Shashank Rajput, Alliot Nagle, Harit Vishwakarma, Dimitris Papailiopoulos

We show that any target network of width $d$ and depth $l$ can be approximated by pruning a random network that is a factor $O(\log(dl))$ wider and twice as deep.

Optimal Lottery Tickets via Subset Sum: Logarithmic Over-Parameterization is Sufficient

1 code implementation NeurIPS 2020 Ankit Pensia, Shashank Rajput, Alliot Nagle, Harit Vishwakarma, Dimitris Papailiopoulos

We show that any target network of width $d$ and depth $l$ can be approximated by pruning a random network that is a factor $O(log(dl))$ wider and twice as deep.

Permutation-Based SGD: Is Random Optimal?

1 code implementation ICLR 2022 Shashank Rajput, Kangwook Lee, Dimitris Papailiopoulos

However, for general strongly convex functions, random permutations are optimal.

An Exponential Improvement on the Memorization Capacity of Deep Threshold Networks

no code implementations NeurIPS 2021 Shashank Rajput, Kartik Sreenivasan, Dimitris Papailiopoulos, Amin Karbasi

Recently, Vershynin (2020) settled a long standing question by Baum (1988), proving that \emph{deep threshold} networks can memorize $n$ points in $d$ dimensions using $\widetilde{\mathcal{O}}(e^{1/\delta^2}+\sqrt{n})$ neurons and $\widetilde{\mathcal{O}}(e^{1/\delta^2}(d+\sqrt{n})+n)$ weights, where $\delta$ is the minimum distance between the points.

Memorization

Finding Everything within Random Binary Networks

no code implementations18 Oct 2021 Kartik Sreenivasan, Shashank Rajput, Jy-yong Sohn, Dimitris Papailiopoulos

A recent work by Ramanujan et al. (2020) provides significant empirical evidence that sufficiently overparameterized, random neural networks contain untrained subnetworks that achieve state-of-the-art accuracy on several predictive tasks.

Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and Beyond

no code implementations ICLR 2022 Chulhee Yun, Shashank Rajput, Suvrit Sra

In distributed learning, local SGD (also known as federated averaging) and its simple baseline minibatch SGD are widely studied optimization methods.

LIFT: Language-Interfaced Fine-Tuning for Non-Language Machine Learning Tasks

1 code implementation14 Jun 2022 Tuan Dinh, Yuchen Zeng, Ruisu Zhang, Ziqian Lin, Michael Gira, Shashank Rajput, Jy-yong Sohn, Dimitris Papailiopoulos, Kangwook Lee

LIFT does not make any changes to the model architecture or loss function, and it solely relies on the natural language interface, enabling "no-code machine learning with LMs."

BIG-bench Machine Learning General Classification +2

Looped Transformers as Programmable Computers

1 code implementation30 Jan 2023 Angeliki Giannou, Shashank Rajput, Jy-yong Sohn, Kangwook Lee, Jason D. Lee, Dimitris Papailiopoulos

We present a framework for using transformer networks as universal computers by programming them with specific weights and placing them in a loop.

In-Context Learning

The Expressive Power of Tuning Only the Normalization Layers

no code implementations15 Feb 2023 Angeliki Giannou, Shashank Rajput, Dimitris Papailiopoulos

Feature normalization transforms such as Batch and Layer-Normalization have become indispensable ingredients of state-of-the-art deep neural networks.

Maestro: Uncovering Low-Rank Structures via Trainable Decomposition

no code implementations28 Aug 2023 Samuel Horvath, Stefanos Laskaridis, Shashank Rajput, Hongyi Wang

We further apply our technique on DNNs and empirically illustrate that Maestro enables the extraction of lower footprint models that preserve model performance while allowing for graceful accuracy-latency tradeoff for the deployment to devices of different capabilities.

Low-rank compression Quantization

Cannot find the paper you are looking for? You can Submit a new open access paper.