Search Results for author: Dimitris Papailiopoulos

Found 36 papers, 14 papers with code

Rare Gems: Finding Lottery Tickets at Initialization

no code implementations24 Feb 2022 Kartik Sreenivasan, Jy-yong Sohn, Liu Yang, Matthew Grinde, Alliot Nagle, Hongyi Wang, Kangwook Lee, Dimitris Papailiopoulos

It has been widely observed that large neural networks can be pruned to a small fraction of their original size, with little loss in accuracy, by typically following a time-consuming "train, prune, re-train" approach.

Finding Everything within Random Binary Networks

no code implementations18 Oct 2021 Kartik Sreenivasan, Shashank Rajput, Jy-yong Sohn, Dimitris Papailiopoulos

A recent work by Ramanujan et al. (2020) provides significant empirical evidence that sufficiently overparameterized, random neural networks contain untrained subnetworks that achieve state-of-the-art accuracy on several predictive tasks.

An Exponential Improvement on the Memorization Capacity of Deep Threshold Networks

no code implementations NeurIPS 2021 Shashank Rajput, Kartik Sreenivasan, Dimitris Papailiopoulos, Amin Karbasi

Recently, Vershynin (2020) settled a long standing question by Baum (1988), proving that \emph{deep threshold} networks can memorize $n$ points in $d$ dimensions using $\widetilde{\mathcal{O}}(e^{1/\delta^2}+\sqrt{n})$ neurons and $\widetilde{\mathcal{O}}(e^{1/\delta^2}(d+\sqrt{n})+n)$ weights, where $\delta$ is the minimum distance between the points.

Pufferfish: Communication-efficient Models At No Extra Cost

1 code implementation5 Mar 2021 Hongyi Wang, Saurabh Agarwal, Dimitris Papailiopoulos

In this work, we present Pufferfish, a communication and computation efficient distributed training framework that incorporates the gradient compression into the model training process via training low-rank, pre-factorized deep networks.


On the Utility of Gradient Compression in Distributed Training Systems

1 code implementation28 Feb 2021 Saurabh Agarwal, Hongyi Wang, Shivaram Venkataraman, Dimitris Papailiopoulos

A rich body of prior work has highlighted the existence of communication bottlenecks in synchronous data-parallel training.

Model Compression

Permutation-Based SGD: Is Random Optimal?

1 code implementation ICLR 2022 Shashank Rajput, Kangwook Lee, Dimitris Papailiopoulos

However, for general strongly convex functions, random permutations are optimal.

Optimal Lottery Tickets via Subset Sum: Logarithmic Over-Parameterization is Sufficient

1 code implementation NeurIPS 2020 Ankit Pensia, Shashank Rajput, Alliot Nagle, Harit Vishwakarma, Dimitris Papailiopoulos

We show that any target network of width $d$ and depth $l$ can be approximated by pruning a random network that is a factor $O(log(dl))$ wider and twice as deep.

Accordion: Adaptive Gradient Communication via Critical Learning Regime Identification

1 code implementation29 Oct 2020 Saurabh Agarwal, Hongyi Wang, Kangwook Lee, Shivaram Venkataraman, Dimitris Papailiopoulos

The techniques usually require choosing a static compression ratio, often requiring users to balance the trade-off between model accuracy and per-iteration speedup.


Optimal Lottery Tickets via SubsetSum: Logarithmic Over-Parameterization is Sufficient

1 code implementation14 Jun 2020 Ankit Pensia, Shashank Rajput, Alliot Nagle, Harit Vishwakarma, Dimitris Papailiopoulos

We show that any target network of width $d$ and depth $l$ can be approximated by pruning a random network that is a factor $O(\log(dl))$ wider and twice as deep.

Closing the convergence gap of SGD without replacement

no code implementations ICML 2020 Shashank Rajput, Anant Gupta, Dimitris Papailiopoulos

A recent line of breakthrough works on SGD without replacement (SGDo) established an $\mathcal{O}\left(\frac{n}{T^2}\right)$ convergence rate when the function minimized is strongly convex and is a sum of $n$ smooth functions, and an $\mathcal{O}\left(\frac{1}{T^2}+\frac{n^3}{T^3}\right)$ rate for sums of quadratics.

Federated Learning with Matched Averaging

1 code implementation ICLR 2020 Hongyi Wang, Mikhail Yurochkin, Yuekai Sun, Dimitris Papailiopoulos, Yasaman Khazaeni

Federated learning allows edge devices to collaboratively learn a shared model while keeping the training data on device, decoupling the ability to do model training from the need to store the data in the cloud.

Federated Learning

DETOX: A Redundancy-based Framework for Faster and More Robust Gradient Aggregation

1 code implementation NeurIPS 2019 Shashank Rajput, Hongyi Wang, Zachary Charles, Dimitris Papailiopoulos

In this work, we present DETOX, a Byzantine-resilient distributed training framework that combines algorithmic redundancy with robust aggregation.

Bad Global Minima Exist and SGD Can Reach Them

1 code implementation NeurIPS 2020 Shengchao Liu, Dimitris Papailiopoulos, Dimitris Achlioptas

Several works have aimed to explain why overparameterized neural networks generalize well when trained by Stochastic Gradient Descent (SGD).

Data Augmentation Image Classification

Convergence and Margin of Adversarial Training on Separable Data

no code implementations22 May 2019 Zachary Charles, Shashank Rajput, Stephen Wright, Dimitris Papailiopoulos

Our results are derived by showing that adversarial training with gradient updates minimizes a robust version of the empirical risk at a $\mathcal{O}(\ln(t)^2/t)$ rate, despite non-smoothness.

Does Data Augmentation Lead to Positive Margin?

no code implementations8 May 2019 Shashank Rajput, Zhili Feng, Zachary Charles, Po-Ling Loh, Dimitris Papailiopoulos

Data augmentation (DA) is commonly used during model training, as it significantly improves test error and model robustness.

Data Augmentation

ErasureHead: Distributed Gradient Descent without Delays Using Approximate Gradient Coding

1 code implementation28 Jan 2019 Hongyi Wang, Zachary Charles, Dimitris Papailiopoulos

We present ErasureHead, a new approach for distributed gradient descent (GD) that mitigates system delays by employing approximate gradient coding.

A Geometric Perspective on the Transferability of Adversarial Directions

no code implementations8 Nov 2018 Zachary Charles, Harrison Rosenberg, Dimitris Papailiopoulos

We show that these "transferable adversarial directions" are guaranteed to exist for linear separators of a given set, and will exist with high probability for linear classifiers trained on independent sets drawn from the same distribution.

The Effect of Network Width on the Performance of Large-batch Training

no code implementations NeurIPS 2018 Lingjiao Chen, Hongyi Wang, Jinman Zhao, Dimitris Papailiopoulos, Paraschos Koutris

Distributed implementations of mini-batch stochastic gradient descent (SGD) suffer from communication overheads, attributed to the high frequency of gradient updates inherent in small-batch training.

Gradient Coding via the Stochastic Block Model

no code implementations25 May 2018 Zachary Charles, Dimitris Papailiopoulos

Gradient descent and its many variants, including mini-batch stochastic gradient descent, form the algorithmic foundation of modern large-scale machine learning.

Stochastic Block Model

DRACO: Byzantine-resilient Distributed Training via Redundant Gradients

1 code implementation ICML 2018 Lingjiao Chen, Hongyi Wang, Zachary Charles, Dimitris Papailiopoulos

Distributed model training is vulnerable to byzantine system failures and adversarial compute nodes, i. e., nodes that use malicious updates to corrupt the global model stored at a parameter server (PS).

Approximate Gradient Coding via Sparse Random Graphs

no code implementations17 Nov 2017 Zachary Charles, Dimitris Papailiopoulos, Jordan Ellenberg

Distributed algorithms are often beset by the straggler effect, where the slowest compute nodes in the system dictate the overall running time.

Stability and Generalization of Learning Algorithms that Converge to Global Optima

no code implementations ICML 2018 Zachary Charles, Dimitris Papailiopoulos

Finally, we show that although our results imply comparable stability for SGD and GD in the PL setting, there exist simple neural networks with multiple local minima where SGD is stable but GD is not.

Generalization Bounds

Gradient Diversity: a Key Ingredient for Scalable Distributed Learning

no code implementations18 Jun 2017 Dong Yin, Ashwin Pananjady, Max Lam, Dimitris Papailiopoulos, Kannan Ramchandran, Peter Bartlett

It has been experimentally observed that distributed implementations of mini-batch stochastic gradient descent (SGD) algorithms exhibit speedup saturation and decaying generalization ability beyond a particular batch-size.


Bipartite Correlation Clustering -- Maximizing Agreements

no code implementations9 Mar 2016 Megasthenis Asteris, Anastasios Kyrillidis, Dimitris Papailiopoulos, Alexandros G. Dimakis

We present a novel approximation algorithm for $k$-BCC, a variant of BCC with an upper bound $k$ on the number of clusters.

Speeding Up Distributed Machine Learning Using Codes

no code implementations8 Dec 2015 Kangwook Lee, Maximilian Lam, Ramtin Pedarsani, Dimitris Papailiopoulos, Kannan Ramchandran

We focus on two of the most basic building blocks of distributed learning algorithms: matrix multiplication and data shuffling.

Orthogonal NMF through Subspace Exploration

no code implementations NeurIPS 2015 Megasthenis Asteris, Dimitris Papailiopoulos, Alexandros G. Dimakis

Our algorithm relies on a novel approximation to the related Nonnegative Principal Component Analysis (NNPCA) problem; given an arbitrary data matrix, NNPCA seeks $k$ nonnegative components that jointly capture most of the variance.

Sparse PCA via Bipartite Matchings

no code implementations NeurIPS 2015 Megasthenis Asteris, Dimitris Papailiopoulos, Anastasios Kyrillidis, Alexandros G. Dimakis

We consider the following multi-component sparse PCA problem: given a set of data points, we seek to extract a small number of sparse components with disjoint supports that jointly capture the maximum possible variance.

Perturbed Iterate Analysis for Asynchronous Stochastic Optimization

no code implementations24 Jul 2015 Horia Mania, Xinghao Pan, Dimitris Papailiopoulos, Benjamin Recht, Kannan Ramchandran, Michael. I. Jordan

We demonstrate experimentally on a 16-core machine that the sparse and parallel version of SVRG is in some cases more than four orders of magnitude faster than the standard SVRG algorithm.

Stochastic Optimization

On the Worst-Case Approximability of Sparse PCA

no code implementations21 Jul 2015 Siu On Chan, Dimitris Papailiopoulos, Aviad Rubinstein

It is well known that Sparse PCA (Sparse Principal Component Analysis) is NP-hard to solve exactly on worst-case instances.

Parallel Correlation Clustering on Big Graphs

no code implementations NeurIPS 2015 Xinghao Pan, Dimitris Papailiopoulos, Samet Oymak, Benjamin Recht, Kannan Ramchandran, Michael. I. Jordan

We present C4 and ClusterWild!, two algorithms for parallel correlation clustering that run in a polylogarithmic number of rounds and achieve nearly linear speedups, provably.

Provable Deterministic Leverage Score Sampling

no code implementations6 Apr 2014 Dimitris Papailiopoulos, Anastasios Kyrillidis, Christos Boutsidis

We explain theoretically a curious empirical phenomenon: "Approximating a matrix by deterministically selecting a subset of its columns with the corresponding largest leverage scores results in a good low-rank matrix surrogate".

Cannot find the paper you are looking for? You can Submit a new open access paper.