no code implementations • 9 Nov 2022 • Junhyung Lyle Kim, Gauthier Gidel, Anastasios Kyrillidis, Fabian Pedregosa
The extragradient method has recently gained increasing attention, due to its convergence behavior on smooth games.
no code implementations • 9 Nov 2022 • Cameron R. Wolfe, Anastasios Kyrillidis
To mitigate these shortcomings, we propose Cold Start Streaming Learning (CSSL), a simple, end-to-end approach for streaming learning with deep networks that uses a combination of replay and data augmentation to avoid catastrophic forgetting.
no code implementations • 29 Oct 2022 • Zheyang Xiong, Fangshuo Liao, Anastasios Kyrillidis
The strong Lottery Ticket Hypothesis (LTH) claims the existence of a subnetwork in a sufficiently large, randomly initialized neural network that approximates some target neural network without the need of training.
no code implementations • 28 Oct 2022 • Chen Dun, Mirian Hipolito, Chris Jermaine, Dimitrios Dimitriadis, Anastasios Kyrillidis
Asynchronous learning protocols have regained attention lately, especially in the Federated Learning (FL) setup, where slower clients can severely impede the learning process.
no code implementations • 28 Oct 2022 • Qihan Wang, Chen Dun, Fangshuo Liao, Chris Jermaine, Anastasios Kyrillidis
\textsc{LoFT} is a model-parallel pretraining algorithm that partitions convolutional layers by filters to train them independently in a distributed setting, resulting in reduced memory and communication costs during pretraining.
no code implementations • 8 May 2022 • Anastasios Kyrillidis, Moshe Y. Vardi, Zhiwei Zhang
With the power of ADDs and the (graded) project-join-tree builder, our versatile framework can handle many generalizations of MaxSAT, such as MaxSAT with non-CNF constraints, Min-MaxSAT and MinSAT.
no code implementations • 22 Mar 2022 • Junhyung Lyle Kim, Mohammad Taha Toghani, César A. Uribe, Anastasios Kyrillidis
We propose a distributed Quantum State Tomography (QST) protocol, named Local Stochastic Factored Gradient Descent (Local SFGD), to learn the low-rank factor of a density matrix over a set of local machines.
1 code implementation • ICLR 2022 • Cheng Wan, Youjie Li, Cameron R. Wolfe, Anastasios Kyrillidis, Nam Sung Kim, Yingyan Lin
Notably, little is known regarding the convergence rate of GCN training with both stale features and stale feature gradients.
1 code implementation • 4 Mar 2022 • Ahmed Imtiaz Humayun, Randall Balestriero, Anastasios Kyrillidis, Richard Baraniuk
We propose to remedy such a scenario by introducing a maximal radius constraint $r$ on the clusters formed by the centroids, i. e., samples from the same cluster should not be more than $2r$ apart in terms of $\ell_2$ distance.
1 code implementation • 7 Dec 2021 • Cameron R. Wolfe, Anastasios Kyrillidis
We propose a novel, structured pruning algorithm for neural networks -- the iterative, Sparse Structured Pruning algorithm, dubbed as i-SpaSP.
no code implementations • 5 Dec 2021 • Fangshuo Liao, Anastasios Kyrillidis
With the motive of training all the parameters of a neural network, we study why and when one can achieve this by iteratively creating, training, and combining randomly selected subnetworks.
no code implementations • 11 Nov 2021 • Junhyung Lyle Kim, Panos Toulis, Anastasios Kyrillidis
Stochastic gradient descent with momentum (SGDM) is the dominant algorithm in many optimization scenarios, including convex optimization instances and non-convex neural network training.
no code implementations • 23 Oct 2021 • Zhenwei Dai, Chen Dun, Yuxin Tang, Anastasios Kyrillidis, Anshumali Shrivastava
Federated learning enables many local devices to train a deep learning model jointly without sharing the local data.
no code implementations • 31 Jul 2021 • Cameron R. Wolfe, Qihan Wang, Junhyung Lyle Kim, Anastasios Kyrillidis
Neural network pruning is useful for discovering efficient, high-performing subnetworks within pre-trained, dense network architectures.
no code implementations • 9 Jul 2021 • John Chen, Cameron Wolfe, Anastasios Kyrillidis
Deep learning practitioners often operate on a computational and monetary budget.
no code implementations • 2 Jul 2021 • Chen Dun, Cameron R. Wolfe, Christopher M. Jermaine, Anastasios Kyrillidis
Thus, ResIST reduces the per-iteration communication, memory, and time requirements of ResNet training to only a fraction of the requirements of full-model training.
no code implementations • 2 Jul 2021 • John Chen, Qihan Wang, Anastasios Kyrillidis
In this work, we explore the connection between the double descent phenomena and the number of samples in the deep neural network setting.
no code implementations • 16 Jun 2021 • Junhyung Lyle Kim, Jose Antonio Lara Benitez, Mohammad Taha Toghani, Cameron Wolfe, Zhiwei Zhang, Anastasios Kyrillidis
We present a novel, practical, and provable approach for solving diagonally constrained semi-definite programming (SDP) problems at scale using accelerated non-convex programming.
1 code implementation • 14 Apr 2021 • Junhyung Lyle Kim, George Kollias, Amir Kalev, Ken X. Wei, Anastasios Kyrillidis
Despite being a non-convex method, \texttt{MiFGD} converges \emph{provably} close to the true density matrix at an accelerated linear rate, in the absence of experimental and statistical noise, and under common assumptions.
1 code implementation • 20 Feb 2021 • Cameron R. Wolfe, Jingkang Yang, Arindam Chowdhury, Chen Dun, Artun Bayer, Santiago Segarra, Anastasios Kyrillidis
The graph convolutional network (GCN) is a go-to solution for machine learning on graphs, but its training is notoriously difficult to scale both in terms of graph size and the number of model parameters.
no code implementations • 17 Dec 2020 • T. Mitchell Roddenberry, Santiago Segarra, Anastasios Kyrillidis
We study the role of the constraint set in determining the solution to low-rank, positive semidefinite (PSD) matrix sensing problems.
1 code implementation • 14 Dec 2020 • Anastasios Kyrillidis, Moshe Y. Vardi, Zhiwei Zhang
We explore the potential of continuous local search (CLS) in SAT solving by proposing a novel approach for finding a solution of a hybrid system of Boolean constraints.
no code implementations • 28 Nov 2020 • Vatsal Shah, Soumya Basu, Anastasios Kyrillidis, Sujay Sanghavi
In this paper, we aim to characterize the performance of adaptive methods in the over-parameterized linear regression setting.
no code implementations • 25 Nov 2020 • John Chen, Samarth Sinha, Anastasios Kyrillidis
On its own, improvements with StackMix hold across different number of labeled samples on CIFAR-100, maintaining approximately a 2\% gap in test accuracy -- down to using only 5\% of the whole dataset -- and is effective in the semi-supervised setting with a 2\% improvement with the standard benchmark $\Pi$-model.
1 code implementation • 1 Jul 2020 • Jacky Y. Zhang, Rajiv Khanna, Anastasios Kyrillidis, Oluwasanmi Koyejo
Bayesian coresets have emerged as a promising approach for implementing scalable Bayesian inference.
2 code implementations • 2 Dec 2019 • Anastasios Kyrillidis, Anshumali Shrivastava, Moshe Y. Vardi, Zhiwei Zhang
By such a reduction to continuous optimization, we propose an algebraic framework for solving systems consisting of different types of constraints.
no code implementations • 15 Nov 2019 • Michael P. Perrone, Haidar Khan, Changhoan Kim, Anastasios Kyrillidis, Jerry Quinn, Valentina Salapura
This paper presents a methodology for selecting the mini-batch size that minimizes Stochastic Gradient Descent (SGD) learning time for single and multiple learner problems.
1 code implementation • ICML 2020 • John Chen, Vatsal Shah, Anastasios Kyrillidis
We introduce Negative Sampling in Semi-Supervised Learning (NS3L), a simple, fast, easy to tune algorithm for semi-supervised learning (SSL).
no code implementations • NeurIPS 2019 • Jacky Y. Zhang, Rajiv Khanna, Anastasios Kyrillidis, Oluwasanmi Koyejo
Iterative hard thresholding (IHT) is a projected gradient descent algorithm, known to achieve state of the art performance for a wide range of structured estimation problems, such as sparse inference.
2 code implementations • 11 Oct 2019 • John Chen, Cameron Wolfe, Zhao Li, Anastasios Kyrillidis
Momentum is a widely used technique for gradient-based optimizers in deep learning.
2 code implementations • 4 Oct 2019 • Binhang Yuan, Cameron R. Wolfe, Chen Dun, Yuxin Tang, Anastasios Kyrillidis, Christopher M. Jermaine
These properties of IST can cope with issues due to distributed data, slow interconnects, or limited device memory, making IST a suitable approach for cases of mandatory distribution.
no code implementations • 25 Sep 2019 • John Chen, Anastasios Kyrillidis
Momentum is a simple and popular technique in deep learning for gradient-based optimizers.
no code implementations • 29 Mar 2019 • Alexander Ratner, Dan Alistarh, Gustavo Alonso, David G. Andersen, Peter Bailis, Sarah Bird, Nicholas Carlini, Bryan Catanzaro, Jennifer Chayes, Eric Chung, Bill Dally, Jeff Dean, Inderjit S. Dhillon, Alexandros Dimakis, Pradeep Dubey, Charles Elkan, Grigori Fursin, Gregory R. Ganger, Lise Getoor, Phillip B. Gibbons, Garth A. Gibson, Joseph E. Gonzalez, Justin Gottschlich, Song Han, Kim Hazelwood, Furong Huang, Martin Jaggi, Kevin Jamieson, Michael. I. Jordan, Gauri Joshi, Rania Khalaf, Jason Knight, Jakub Konečný, Tim Kraska, Arun Kumar, Anastasios Kyrillidis, Aparna Lakshmiratan, Jing Li, Samuel Madden, H. Brendan McMahan, Erik Meijer, Ioannis Mitliagkas, Rajat Monga, Derek Murray, Kunle Olukotun, Dimitris Papailiopoulos, Gennady Pekhimenko, Theodoros Rekatsinas, Afshin Rostamizadeh, Christopher Ré, Christopher De Sa, Hanie Sedghi, Siddhartha Sen, Virginia Smith, Alex Smola, Dawn Song, Evan Sparks, Ion Stoica, Vivienne Sze, Madeleine Udell, Joaquin Vanschoren, Shivaram Venkataraman, Rashmi Vinayak, Markus Weimer, Andrew Gordon Wilson, Eric Xing, Matei Zaharia, Ce Zhang, Ameet Talwalkar
Machine learning (ML) techniques are enjoying rapidly increasing adoption.
1 code implementation • 1 Feb 2019 • Ryan Spring, Anastasios Kyrillidis, Vijai Mohan, Anshumali Shrivastava
The problem is becoming more severe as deep learning models continue to grow larger in order to learn from complex, large-scale datasets.
no code implementations • 16 Nov 2018 • Vatsal Shah, Anastasios Kyrillidis, Sujay Sanghavi
We empirically show that the minimum weight norm is not necessarily the proper gauge of good generalization in simplified scenaria, and different models found by adaptive methods could outperform plain gradient methods.
no code implementations • 6 Jun 2018 • Kelly Geyer, Anastasios Kyrillidis, Amir Kalev
Surprisingly, recent work argues that the choice of $r \leq n$ is not pivotal: even setting $U \in \mathbb{R}^{n \times n}$ is sufficient for factored gradient descent to find the rank-$r$ solution, which suggests that operating over the factors leads to an implicit regularization.
no code implementations • 1 Jun 2018 • Tayo Ajayi, David Mildebrath, Anastasios Kyrillidis, Shashanka Ubaru, Georgios Kollias, Kristofer Bouchard
We present theoretical results on the convergence of \emph{non-convex} accelerated gradient descent in matrix factorization models with $\ell_2$-norm loss.
no code implementations • 24 May 2018 • Anastasios Kyrillidis
We propose practical algorithms for entrywise $\ell_p$-norm low-rank approximation, for $p = 1$ or $p = \infty$.
no code implementations • 23 May 2018 • Tianyang Li, Anastasios Kyrillidis, Liu Liu, Constantine Caramanis
We present a novel statistical inference framework for convex empirical risk minimization, using approximate stochastic Newton steps.
no code implementations • 26 Dec 2017 • Rajiv Khanna, Anastasios Kyrillidis
We study --both in theory and practice-- the use of momentum motions in classic iterative hard thresholding (IHT) methods.
no code implementations • 21 May 2017 • Tianyang Li, Liu Liu, Anastasios Kyrillidis, Constantine Caramanis
We present a novel method for frequentist statistical inference in $M$-estimation problems, based on stochastic gradient descent (SGD) with a fixed step size: we demonstrate that the average of such SGD sequences can be used for statistical inference, after proper scaling.
no code implementations • 12 Sep 2016 • Dohyung Park, Anastasios Kyrillidis, Constantine Caramanis, Sujay Sanghavi
We consider the non-square matrix sensing problem, under restricted isometry property (RIP) assumptions.
no code implementations • 10 Jun 2016 • Dohyung Park, Anastasios Kyrillidis, Constantine Caramanis, Sujay Sanghavi
We study such parameterization for optimization of generic convex objectives $f$, and focus on first-order, gradient descent algorithmic solutions.
no code implementations • 4 Jun 2016 • Dohyung Park, Anastasios Kyrillidis, Srinadh Bhojanapalli, Constantine Caramanis, Sujay Sanghavi
We study the projected gradient descent method on low-rank matrix problems with a strongly convex objective.
no code implementations • 29 May 2016 • Megasthenis Asteris, Anastasios Kyrillidis, Oluwasanmi Koyejo, Russell Poldrack
Given two sets of variables, derived from a common set of samples, sparse Canonical Correlation Analysis (CCA) seeks linear combinations of a small number of variables in each set, such that the induced canonical variables are maximally correlated.
no code implementations • 2 May 2016 • Hemant Tyagi, Anastasios Kyrillidis, Bernd Gärtner, Andreas Krause
A function $f: \mathbb{R}^d \rightarrow \mathbb{R}$ is a Sparse Additive Model (SPAM), if it is of the form $f(\mathbf{x}) = \sum_{l \in \mathcal{S}}\phi_{l}(x_l)$ where $\mathcal{S} \subset [d]$, $|\mathcal{S}| \ll d$.
no code implementations • 18 Apr 2016 • Hemant Tyagi, Anastasios Kyrillidis, Bernd Gärtner, Andreas Krause
For some $\mathcal{S}_1 \subset [d], \mathcal{S}_2 \subset {[d] \choose 2}$, the function $f$ is assumed to be of the form: $$f(\mathbf{x}) = \sum_{p \in \mathcal{S}_1}\phi_{p} (x_p) + \sum_{(l, l^{\prime}) \in \mathcal{S}_2}\phi_{(l, l^{\prime})} (x_{l}, x_{l^{\prime}}).$$ Assuming $\phi_{p},\phi_{(l, l^{\prime})}$, $\mathcal{S}_1$ and, $\mathcal{S}_2$ to be unknown, we provide a randomized algorithm that queries $f$ and exactly recovers $\mathcal{S}_1,\mathcal{S}_2$.
no code implementations • 22 Mar 2016 • Vatsal Shah, Megasthenis Asteris, Anastasios Kyrillidis, Sujay Sanghavi
Stochastic gradient descent is the method of choice for large-scale machine learning problems, by virtue of its light complexity per iteration.
no code implementations • 21 Mar 2016 • Anastasios Kyrillidis, Bubacarr Bah, Rouzbeh Hasheminezhad, Quoc Tran-Dinh, Luca Baldassarre, Volkan Cevher
Our experimental findings on synthetic and real applications support our claims for faster recovery in the convex setting -- as opposed to using dense sensing matrices, while showing a competitive recovery performance.
no code implementations • 9 Mar 2016 • Megasthenis Asteris, Anastasios Kyrillidis, Dimitris Papailiopoulos, Alexandros G. Dimakis
We present a novel approximation algorithm for $k$-BCC, a variant of BCC with an upper bound $k$ on the number of clusters.
no code implementations • 5 Mar 2016 • Quoc Tran-Dinh, Anastasios Kyrillidis, Volkan Cevher
First, it allows handling non-smooth objectives via proximal operators; this avoids lifting the problem dimension in order to accommodate non-smooth components in optimization.
no code implementations • 14 Sep 2015 • Srinadh Bhojanapalli, Anastasios Kyrillidis, Sujay Sanghavi
To the best of our knowledge, this is the first paper to provide precise convergence rate guarantees for general convex functions under standard convex assumptions.
no code implementations • NeurIPS 2015 • Megasthenis Asteris, Dimitris Papailiopoulos, Anastasios Kyrillidis, Alexandros G. Dimakis
We consider the following multi-component sparse PCA problem: given a set of data points, we seek to extract a small number of sparse components with disjoint supports that jointly capture the maximum possible variance.
no code implementations • 20 Jul 2015 • Volkan Cevher, Sina Jafarpour, Anastasios Kyrillidis
We describe two nonconventional algorithms for linear regression, called GAME and CLASH.
no code implementations • 20 Jul 2015 • Anastasios Kyrillidis, Luca Baldassarre, Marwa El-Halabi, Quoc Tran-Dinh, Volkan Cevher
For each, we present the models in their discrete nature, discuss how to solve the ensuing discrete problems and then describe convex relaxations.
no code implementations • 8 Jun 2015 • Megasthenis Asteris, Anastasios Kyrillidis, Alexandros G. Dimakis, Han-Gyol Yi and, Bharath Chandrasekaran
We introduce a variant of (sparse) PCA in which the set of feasible support sets is determined by a graph.
no code implementations • 22 May 2014 • Michail Vlachos, Nikolaos Freris, Anastasios Kyrillidis
However, distance estimation when the data are represented using different sets of coefficients is still a largely unexplored area.
no code implementations • 13 May 2014 • Anastasios Kyrillidis, Rabeeh Karimi Mahabadi, Quoc Tran-Dinh, Volkan Cevher
We consider the class of convex minimization problems, composed of a self-concordant function, such as the $\log\det$ metric, a convex data fidelity term $h(\cdot)$ and, a regularizing -- possibly non-smooth -- function $g(\cdot)$.
no code implementations • 6 Apr 2014 • Dimitris Papailiopoulos, Anastasios Kyrillidis, Christos Boutsidis
We explain theoretically a curious empirical phenomenon: "Approximating a matrix by deterministically selecting a subset of its columns with the corresponding largest leverage scores results in a good low-rank matrix surrogate".
no code implementations • 30 Mar 2014 • Anastasios Kyrillidis, Michail Vlachos, Anastasios Zouzias
In this paper, we study the problem of approximately computing the product of two real matrices.
no code implementations • 24 Mar 2014 • Anastasios Kyrillidis, Anastasios Zouzias
We study the effectiveness of non-uniform randomized feature selection in decision tree classification.
no code implementations • 7 Nov 2013 • Quoc Tran Dinh, Anastasios Kyrillidis, Volkan Cevher
Many scientific and engineering applications feature nonsmooth convex minimization problems over convex sets.
no code implementations • 13 Aug 2013 • Quoc Tran-Dinh, Anastasios Kyrillidis, Volkan Cevher
We propose a variable metric framework for minimizing the sum of a self-concordant function and a possibly non-smooth convex function, endowed with an easily computable proximal operator.
no code implementations • 13 Mar 2013 • Luca Baldassarre, Nirav Bhan, Volkan Cevher, Anastasios Kyrillidis, Siddhartha Satpathi
Group-based sparsity models are proven instrumental in linear regression problems for recovering signals from much fewer measurements than standard compressive sensing.
no code implementations • 7 Jun 2012 • Anastasios Kyrillidis, Stephen Becker, Volkan Cevher and, Christoph Koch
Most learning methods with rank or sparsity constraints use convex relaxations, which lead to optimization with the nuclear norm or the $\ell_1$-norm.