no code implementations • 4 Mar 2024 • Cameron R. Wolfe, Anastasios Kyrillidis

From these experiments, we discover alternative CPT schedules that offer further improvements in training efficiency and model performance, as well as derive a set of best practices for choosing CPT schedules.

no code implementations • 6 Oct 2023 • Fangshuo Liao, Junhyung Lyle Kim, Cruz Barnum, Anastasios Kyrillidis

Principal Component Analysis (PCA) is a popular tool in data analysis, especially when the data is high-dimensional.

no code implementations • 5 Oct 2023 • Chen Dun, Qiutai Pan, Shikai Jin, Ria Stevens, Mitchell D. Miller, George N. Phillips, Jr., Anastasios Kyrillidis

Determining the structure of a protein has been a decades-long open question.

no code implementations • 4 Oct 2023 • Chen Dun, Mirian Hipolito Garcia, Guoqing Zheng, Ahmed Hassan Awadallah, Anastasios Kyrillidis, Robert Sim

Large Language Models (LLMs) have the ability to solve a variety of tasks, such as text summarization and mathematical questions, just out of the box, but they are often trained with a single task in mind.

no code implementations • 7 Sep 2023 • John Chen, Chen Dun, Anastasios Kyrillidis

Advances in Semi-Supervised Learning (SSL) have almost entirely closed the gap between SSL and Supervised Learning at a fraction of the number of labels.

no code implementations • ICCV 2023 • Erdong Hu, Yuxin Tang, Anastasios Kyrillidis, Chris Jermaine

We carefully evaluate a number of algorithms for learning in a federated environment, and test their utility for a variety of image classification tasks.

1 code implementation • 19 Jun 2023 • Junhyung Lyle Kim, Mohammad Taha Toghani, César A. Uribe, Anastasios Kyrillidis

Federated learning (FL) is a distributed machine learning framework where the global model of a central server is trained via multiple collaborative steps by participating clients without sharing their data.

no code implementations • 14 Jun 2023 • Chen Dun, Mirian Hipolito Garcia, Guoqing Zheng, Ahmed Hassan Awadallah, Robert Sim, Anastasios Kyrillidis, Dimitrios Dimitriadis

Our gating function harnesses the knowledge of a pretrained model common expert to enhance its routing decisions on-the-fly.

no code implementations • 13 Jun 2023 • Fangshuo Liao, Anastasios Kyrillidis

Current state-of-the-art analyses on the convergence of gradient descent for training neural networks focus on characterizing properties of the loss landscape, such as the Polyak-Lojaciewicz (PL) condition and the restricted strong convexity.

no code implementations • 9 Nov 2022 • Cameron R. Wolfe, Anastasios Kyrillidis

To mitigate these shortcomings, we propose Cold Start Streaming Learning (CSSL), a simple, end-to-end approach for streaming learning with deep networks that uses a combination of replay and data augmentation to avoid catastrophic forgetting.

no code implementations • 9 Nov 2022 • Junhyung Lyle Kim, Gauthier Gidel, Anastasios Kyrillidis, Fabian Pedregosa

The extragradient method has gained popularity due to its robust convergence properties for differentiable games.

no code implementations • 29 Oct 2022 • Zheyang Xiong, Fangshuo Liao, Anastasios Kyrillidis

The strong Lottery Ticket Hypothesis (LTH) claims the existence of a subnetwork in a sufficiently large, randomly initialized neural network that approximates some target neural network without the need of training.

no code implementations • 28 Oct 2022 • Chen Dun, Mirian Hipolito, Chris Jermaine, Dimitrios Dimitriadis, Anastasios Kyrillidis

Asynchronous learning protocols have regained attention lately, especially in the Federated Learning (FL) setup, where slower clients can severely impede the learning process.

no code implementations • 28 Oct 2022 • Qihan Wang, Chen Dun, Fangshuo Liao, Chris Jermaine, Anastasios Kyrillidis

\textsc{LoFT} is a model-parallel pretraining algorithm that partitions convolutional layers by filters to train them independently in a distributed setting, resulting in reduced memory and communication costs during pretraining.

no code implementations • 8 May 2022 • Anastasios Kyrillidis, Moshe Y. Vardi, Zhiwei Zhang

They lack, however, the ability to handle 1) (non-CNF) hybrid constraints, such as XORs and 2) generalized MaxSAT problems natively.

no code implementations • 22 Mar 2022 • Junhyung Lyle Kim, Mohammad Taha Toghani, César A. Uribe, Anastasios Kyrillidis

We propose a distributed Quantum State Tomography (QST) protocol, named Local Stochastic Factored Gradient Descent (Local SFGD), to learn the low-rank factor of a density matrix over a set of local machines.

1 code implementation • ICLR 2022 • Cheng Wan, Youjie Li, Cameron R. Wolfe, Anastasios Kyrillidis, Nam Sung Kim, Yingyan Lin

Notably, little is known regarding the convergence rate of GCN training with both stale features and stale feature gradients.

1 code implementation • 4 Mar 2022 • Ahmed Imtiaz Humayun, Randall Balestriero, Anastasios Kyrillidis, Richard Baraniuk

We propose to remedy such a scenario by introducing a maximal radius constraint $r$ on the clusters formed by the centroids, i. e., samples from the same cluster should not be more than $2r$ apart in terms of $\ell_2$ distance.

1 code implementation • 7 Dec 2021 • Cameron R. Wolfe, Anastasios Kyrillidis

We propose a novel, structured pruning algorithm for neural networks -- the iterative, Sparse Structured Pruning algorithm, dubbed as i-SpaSP.

no code implementations • 5 Dec 2021 • Fangshuo Liao, Anastasios Kyrillidis

With the motive of training all the parameters of a neural network, we study why and when one can achieve this by iteratively creating, training, and combining randomly selected subnetworks.

no code implementations • 11 Nov 2021 • Junhyung Lyle Kim, Panos Toulis, Anastasios Kyrillidis

Stochastic gradient descent with momentum (SGDM) is the dominant algorithm in many optimization scenarios, including convex optimization instances and non-convex neural network training.

no code implementations • 23 Oct 2021 • Zhenwei Dai, Chen Dun, Yuxin Tang, Anastasios Kyrillidis, Anshumali Shrivastava

Federated learning enables many local devices to train a deep learning model jointly without sharing the local data.

no code implementations • 31 Jul 2021 • Cameron R. Wolfe, Fangshuo Liao, Qihan Wang, Junhyung Lyle Kim, Anastasios Kyrillidis

Aiming to mathematically analyze the amount of dense network pre-training needed for a pruned network to perform well, we discover a simple theoretical bound in the number of gradient descent pre-training iterations on a two-layer, fully-connected network, beyond which pruning via greedy forward selection [61] yields a subnetwork that achieves good training error.

1 code implementation • 9 Jul 2021 • John Chen, Cameron Wolfe, Anastasios Kyrillidis

Deep learning practitioners often operate on a computational and monetary budget.

no code implementations • 2 Jul 2021 • John Chen, Qihan Wang, Anastasios Kyrillidis

In this work, we explore the connection between the double descent phenomena and the number of samples in the deep neural network setting.

no code implementations • 2 Jul 2021 • Chen Dun, Cameron R. Wolfe, Christopher M. Jermaine, Anastasios Kyrillidis

Thus, ResIST reduces the per-iteration communication, memory, and time requirements of ResNet training to only a fraction of the requirements of full-model training.

no code implementations • 16 Jun 2021 • Junhyung Lyle Kim, Jose Antonio Lara Benitez, Mohammad Taha Toghani, Cameron Wolfe, Zhiwei Zhang, Anastasios Kyrillidis

We present a novel, practical, and provable approach for solving diagonally constrained semi-definite programming (SDP) problems at scale using accelerated non-convex programming.

1 code implementation • 14 Apr 2021 • Junhyung Lyle Kim, George Kollias, Amir Kalev, Ken X. Wei, Anastasios Kyrillidis

Despite being a non-convex method, \texttt{MiFGD} converges \emph{provably} close to the true density matrix at an accelerated linear rate, in the absence of experimental and statistical noise, and under common assumptions.

1 code implementation • 20 Feb 2021 • Cameron R. Wolfe, Jingkang Yang, Arindam Chowdhury, Chen Dun, Artun Bayer, Santiago Segarra, Anastasios Kyrillidis

The graph convolutional network (GCN) is a go-to solution for machine learning on graphs, but its training is notoriously difficult to scale both in terms of graph size and the number of model parameters.

no code implementations • 17 Dec 2020 • T. Mitchell Roddenberry, Santiago Segarra, Anastasios Kyrillidis

We study the role of the constraint set in determining the solution to low-rank, positive semidefinite (PSD) matrix sensing problems.

1 code implementation • 14 Dec 2020 • Anastasios Kyrillidis, Moshe Y. Vardi, Zhiwei Zhang

We explore the potential of continuous local search (CLS) in SAT solving by proposing a novel approach for finding a solution of a hybrid system of Boolean constraints.

no code implementations • 28 Nov 2020 • Vatsal Shah, Soumya Basu, Anastasios Kyrillidis, Sujay Sanghavi

In this paper, we aim to characterize the performance of adaptive methods in the over-parameterized linear regression setting.

no code implementations • 25 Nov 2020 • John Chen, Samarth Sinha, Anastasios Kyrillidis

On its own, improvements with StackMix hold across different number of labeled samples on CIFAR-100, maintaining approximately a 2\% gap in test accuracy -- down to using only 5\% of the whole dataset -- and is effective in the semi-supervised setting with a 2\% improvement with the standard benchmark $\Pi$-model.

1 code implementation • 1 Jul 2020 • Jacky Y. Zhang, Rajiv Khanna, Anastasios Kyrillidis, Oluwasanmi Koyejo

Bayesian coresets have emerged as a promising approach for implementing scalable Bayesian inference.

2 code implementations • 2 Dec 2019 • Anastasios Kyrillidis, Anshumali Shrivastava, Moshe Y. Vardi, Zhiwei Zhang

By such a reduction to continuous optimization, we propose an algebraic framework for solving systems consisting of different types of constraints.

no code implementations • 15 Nov 2019 • Michael P. Perrone, Haidar Khan, Changhoan Kim, Anastasios Kyrillidis, Jerry Quinn, Valentina Salapura

This paper presents a methodology for selecting the mini-batch size that minimizes Stochastic Gradient Descent (SGD) learning time for single and multiple learner problems.

1 code implementation • ICML 2020 • John Chen, Vatsal Shah, Anastasios Kyrillidis

We introduce Negative Sampling in Semi-Supervised Learning (NS3L), a simple, fast, easy to tune algorithm for semi-supervised learning (SSL).

no code implementations • NeurIPS 2019 • Jacky Y. Zhang, Rajiv Khanna, Anastasios Kyrillidis, Oluwasanmi Koyejo

Iterative hard thresholding (IHT) is a projected gradient descent algorithm, known to achieve state of the art performance for a wide range of structured estimation problems, such as sparse inference.

2 code implementations • 11 Oct 2019 • John Chen, Cameron Wolfe, Zhao Li, Anastasios Kyrillidis

Momentum is a widely used technique for gradient-based optimizers in deep learning.

2 code implementations • 4 Oct 2019 • Binhang Yuan, Cameron R. Wolfe, Chen Dun, Yuxin Tang, Anastasios Kyrillidis, Christopher M. Jermaine

These properties of IST can cope with issues due to distributed data, slow interconnects, or limited device memory, making IST a suitable approach for cases of mandatory distribution.

no code implementations • 25 Sep 2019 • John Chen, Anastasios Kyrillidis

Momentum is a simple and popular technique in deep learning for gradient-based optimizers.

no code implementations • 29 Mar 2019 • Alexander Ratner, Dan Alistarh, Gustavo Alonso, David G. Andersen, Peter Bailis, Sarah Bird, Nicholas Carlini, Bryan Catanzaro, Jennifer Chayes, Eric Chung, Bill Dally, Jeff Dean, Inderjit S. Dhillon, Alexandros Dimakis, Pradeep Dubey, Charles Elkan, Grigori Fursin, Gregory R. Ganger, Lise Getoor, Phillip B. Gibbons, Garth A. Gibson, Joseph E. Gonzalez, Justin Gottschlich, Song Han, Kim Hazelwood, Furong Huang, Martin Jaggi, Kevin Jamieson, Michael. I. Jordan, Gauri Joshi, Rania Khalaf, Jason Knight, Jakub Konečný, Tim Kraska, Arun Kumar, Anastasios Kyrillidis, Aparna Lakshmiratan, Jing Li, Samuel Madden, H. Brendan McMahan, Erik Meijer, Ioannis Mitliagkas, Rajat Monga, Derek Murray, Kunle Olukotun, Dimitris Papailiopoulos, Gennady Pekhimenko, Theodoros Rekatsinas, Afshin Rostamizadeh, Christopher Ré, Christopher De Sa, Hanie Sedghi, Siddhartha Sen, Virginia Smith, Alex Smola, Dawn Song, Evan Sparks, Ion Stoica, Vivienne Sze, Madeleine Udell, Joaquin Vanschoren, Shivaram Venkataraman, Rashmi Vinayak, Markus Weimer, Andrew Gordon Wilson, Eric Xing, Matei Zaharia, Ce Zhang, Ameet Talwalkar

Machine learning (ML) techniques are enjoying rapidly increasing adoption.

1 code implementation • 1 Feb 2019 • Ryan Spring, Anastasios Kyrillidis, Vijai Mohan, Anshumali Shrivastava

The problem is becoming more severe as deep learning models continue to grow larger in order to learn from complex, large-scale datasets.

no code implementations • 16 Nov 2018 • Vatsal Shah, Anastasios Kyrillidis, Sujay Sanghavi

We empirically show that the minimum weight norm is not necessarily the proper gauge of good generalization in simplified scenaria, and different models found by adaptive methods could outperform plain gradient methods.

no code implementations • 6 Jun 2018 • Kelly Geyer, Anastasios Kyrillidis, Amir Kalev

Surprisingly, recent work argues that the choice of $r \leq n$ is not pivotal: even setting $U \in \mathbb{R}^{n \times n}$ is sufficient for factored gradient descent to find the rank-$r$ solution, which suggests that operating over the factors leads to an implicit regularization.

no code implementations • 1 Jun 2018 • Tayo Ajayi, David Mildebrath, Anastasios Kyrillidis, Shashanka Ubaru, Georgios Kollias, Kristofer Bouchard

We present theoretical results on the convergence of \emph{non-convex} accelerated gradient descent in matrix factorization models with $\ell_2$-norm loss.

no code implementations • 24 May 2018 • Anastasios Kyrillidis

We propose practical algorithms for entrywise $\ell_p$-norm low-rank approximation, for $p = 1$ or $p = \infty$.

no code implementations • 23 May 2018 • Tianyang Li, Anastasios Kyrillidis, Liu Liu, Constantine Caramanis

We present a novel statistical inference framework for convex empirical risk minimization, using approximate stochastic Newton steps.

no code implementations • 26 Dec 2017 • Rajiv Khanna, Anastasios Kyrillidis

We study --both in theory and practice-- the use of momentum motions in classic iterative hard thresholding (IHT) methods.

no code implementations • 21 May 2017 • Tianyang Li, Liu Liu, Anastasios Kyrillidis, Constantine Caramanis

We present a novel method for frequentist statistical inference in $M$-estimation problems, based on stochastic gradient descent (SGD) with a fixed step size: we demonstrate that the average of such SGD sequences can be used for statistical inference, after proper scaling.

no code implementations • 12 Sep 2016 • Dohyung Park, Anastasios Kyrillidis, Constantine Caramanis, Sujay Sanghavi

We consider the non-square matrix sensing problem, under restricted isometry property (RIP) assumptions.

no code implementations • 10 Jun 2016 • Dohyung Park, Anastasios Kyrillidis, Constantine Caramanis, Sujay Sanghavi

We study such parameterization for optimization of generic convex objectives $f$, and focus on first-order, gradient descent algorithmic solutions.

no code implementations • 4 Jun 2016 • Dohyung Park, Anastasios Kyrillidis, Srinadh Bhojanapalli, Constantine Caramanis, Sujay Sanghavi

We study the projected gradient descent method on low-rank matrix problems with a strongly convex objective.

no code implementations • 29 May 2016 • Megasthenis Asteris, Anastasios Kyrillidis, Oluwasanmi Koyejo, Russell Poldrack

Given two sets of variables, derived from a common set of samples, sparse Canonical Correlation Analysis (CCA) seeks linear combinations of a small number of variables in each set, such that the induced canonical variables are maximally correlated.

no code implementations • 2 May 2016 • Hemant Tyagi, Anastasios Kyrillidis, Bernd Gärtner, Andreas Krause

A function $f: \mathbb{R}^d \rightarrow \mathbb{R}$ is a Sparse Additive Model (SPAM), if it is of the form $f(\mathbf{x}) = \sum_{l \in \mathcal{S}}\phi_{l}(x_l)$ where $\mathcal{S} \subset [d]$, $|\mathcal{S}| \ll d$.

no code implementations • 18 Apr 2016 • Hemant Tyagi, Anastasios Kyrillidis, Bernd Gärtner, Andreas Krause

For some $\mathcal{S}_1 \subset [d], \mathcal{S}_2 \subset {[d] \choose 2}$, the function $f$ is assumed to be of the form: $$f(\mathbf{x}) = \sum_{p \in \mathcal{S}_1}\phi_{p} (x_p) + \sum_{(l, l^{\prime}) \in \mathcal{S}_2}\phi_{(l, l^{\prime})} (x_{l}, x_{l^{\prime}}).$$ Assuming $\phi_{p},\phi_{(l, l^{\prime})}$, $\mathcal{S}_1$ and, $\mathcal{S}_2$ to be unknown, we provide a randomized algorithm that queries $f$ and exactly recovers $\mathcal{S}_1,\mathcal{S}_2$.

no code implementations • 22 Mar 2016 • Vatsal Shah, Megasthenis Asteris, Anastasios Kyrillidis, Sujay Sanghavi

Stochastic gradient descent is the method of choice for large-scale machine learning problems, by virtue of its light complexity per iteration.

no code implementations • 21 Mar 2016 • Anastasios Kyrillidis, Bubacarr Bah, Rouzbeh Hasheminezhad, Quoc Tran-Dinh, Luca Baldassarre, Volkan Cevher

Our experimental findings on synthetic and real applications support our claims for faster recovery in the convex setting -- as opposed to using dense sensing matrices, while showing a competitive recovery performance.

no code implementations • 9 Mar 2016 • Megasthenis Asteris, Anastasios Kyrillidis, Dimitris Papailiopoulos, Alexandros G. Dimakis

We present a novel approximation algorithm for $k$-BCC, a variant of BCC with an upper bound $k$ on the number of clusters.

no code implementations • 5 Mar 2016 • Quoc Tran-Dinh, Anastasios Kyrillidis, Volkan Cevher

First, it allows handling non-smooth objectives via proximal operators; this avoids lifting the problem dimension in order to accommodate non-smooth components in optimization.

no code implementations • 14 Sep 2015 • Srinadh Bhojanapalli, Anastasios Kyrillidis, Sujay Sanghavi

To the best of our knowledge, this is the first paper to provide precise convergence rate guarantees for general convex functions under standard convex assumptions.

no code implementations • NeurIPS 2015 • Megasthenis Asteris, Dimitris Papailiopoulos, Anastasios Kyrillidis, Alexandros G. Dimakis

We consider the following multi-component sparse PCA problem: given a set of data points, we seek to extract a small number of sparse components with disjoint supports that jointly capture the maximum possible variance.

no code implementations • 20 Jul 2015 • Anastasios Kyrillidis, Luca Baldassarre, Marwa El-Halabi, Quoc Tran-Dinh, Volkan Cevher

For each, we present the models in their discrete nature, discuss how to solve the ensuing discrete problems and then describe convex relaxations.

no code implementations • 20 Jul 2015 • Volkan Cevher, Sina Jafarpour, Anastasios Kyrillidis

We describe two nonconventional algorithms for linear regression, called GAME and CLASH.

no code implementations • 8 Jun 2015 • Megasthenis Asteris, Anastasios Kyrillidis, Alexandros G. Dimakis, Han-Gyol Yi and, Bharath Chandrasekaran

We introduce a variant of (sparse) PCA in which the set of feasible support sets is determined by a graph.

no code implementations • 22 May 2014 • Michail Vlachos, Nikolaos Freris, Anastasios Kyrillidis

However, distance estimation when the data are represented using different sets of coefficients is still a largely unexplored area.

no code implementations • 13 May 2014 • Anastasios Kyrillidis, Rabeeh Karimi Mahabadi, Quoc Tran-Dinh, Volkan Cevher

We consider the class of convex minimization problems, composed of a self-concordant function, such as the $\log\det$ metric, a convex data fidelity term $h(\cdot)$ and, a regularizing -- possibly non-smooth -- function $g(\cdot)$.

no code implementations • 6 Apr 2014 • Dimitris Papailiopoulos, Anastasios Kyrillidis, Christos Boutsidis

We explain theoretically a curious empirical phenomenon: "Approximating a matrix by deterministically selecting a subset of its columns with the corresponding largest leverage scores results in a good low-rank matrix surrogate".

no code implementations • 30 Mar 2014 • Anastasios Kyrillidis, Michail Vlachos, Anastasios Zouzias

In this paper, we study the problem of approximately computing the product of two real matrices.

no code implementations • 24 Mar 2014 • Anastasios Kyrillidis, Anastasios Zouzias

We study the effectiveness of non-uniform randomized feature selection in decision tree classification.

no code implementations • 7 Nov 2013 • Quoc Tran Dinh, Anastasios Kyrillidis, Volkan Cevher

Many scientific and engineering applications feature nonsmooth convex minimization problems over convex sets.

no code implementations • 13 Aug 2013 • Quoc Tran-Dinh, Anastasios Kyrillidis, Volkan Cevher

We propose a variable metric framework for minimizing the sum of a self-concordant function and a possibly non-smooth convex function, endowed with an easily computable proximal operator.

no code implementations • 13 Mar 2013 • Luca Baldassarre, Nirav Bhan, Volkan Cevher, Anastasios Kyrillidis, Siddhartha Satpathi

Group-based sparsity models are proven instrumental in linear regression problems for recovering signals from much fewer measurements than standard compressive sensing.

no code implementations • 7 Jun 2012 • Anastasios Kyrillidis, Stephen Becker, Volkan Cevher and, Christoph Koch

Most learning methods with rank or sparsity constraints use convex relaxations, which lead to optimization with the nuclear norm or the $\ell_1$-norm.

Cannot find the paper you are looking for? You can
Submit a new open access paper.

Contact us on:
hello@paperswithcode.com
.
Papers With Code is a free resource with all data licensed under CC-BY-SA.