Search Results for author: Anastasios Kyrillidis

Found 74 papers, 13 papers with code

Better Schedules for Low Precision Training of Deep Neural Networks

no code implementations4 Mar 2024 Cameron R. Wolfe, Anastasios Kyrillidis

From these experiments, we discover alternative CPT schedules that offer further improvements in training efficiency and model performance, as well as derive a set of best practices for choosing CPT schedules.

Node Classification Quantization +1

On the Error-Propagation of Inexact Deflation for Principal Component Analysis

no code implementations6 Oct 2023 Fangshuo Liao, Junhyung Lyle Kim, Cruz Barnum, Anastasios Kyrillidis

Principal Component Analysis (PCA) is a popular tool in data analysis, especially when the data is high-dimensional.

Sweeping Heterogeneity with Smart MoPs: Mixture of Prompts for LLM Task Adaptation

no code implementations4 Oct 2023 Chen Dun, Mirian Hipolito Garcia, Guoqing Zheng, Ahmed Hassan Awadallah, Anastasios Kyrillidis, Robert Sim

Large Language Models (LLMs) have the ability to solve a variety of tasks, such as text summarization and mathematical questions, just out of the box, but they are often trained with a single task in mind.

Model Compression Text Summarization

Fast FixMatch: Faster Semi-Supervised Learning with Curriculum Batch Size

no code implementations7 Sep 2023 John Chen, Chen Dun, Anastasios Kyrillidis

Advances in Semi-Supervised Learning (SSL) have almost entirely closed the gap between SSL and Supervised Learning at a fraction of the number of labels.

Federated Learning Over Images: Vertical Decompositions and Pre-Trained Backbones Are Difficult to Beat

no code implementations ICCV 2023 Erdong Hu, Yuxin Tang, Anastasios Kyrillidis, Chris Jermaine

We carefully evaluate a number of algorithms for learning in a federated environment, and test their utility for a variety of image classification tasks.

Classification Federated Learning +1

Adaptive Federated Learning with Auto-Tuned Clients

no code implementations19 Jun 2023 Junhyung Lyle Kim, Mohammad Taha Toghani, César A. Uribe, Anastasios Kyrillidis

Federated learning (FL) is a distributed machine learning framework where the global model of a central server is trained via multiple collaborative steps by participating clients without sharing their data.

Federated Learning

Provable Accelerated Convergence of Nesterov's Momentum for Deep ReLU Neural Networks

no code implementations13 Jun 2023 Fangshuo Liao, Anastasios Kyrillidis

Current state-of-the-art analyses on the convergence of gradient descent for training neural networks focus on characterizing properties of the loss landscape, such as the Polyak-Lojaciewicz (PL) condition and the restricted strong convexity.

Open-Ended Question Answering

When is Momentum Extragradient Optimal? A Polynomial-Based Analysis

no code implementations9 Nov 2022 Junhyung Lyle Kim, Gauthier Gidel, Anastasios Kyrillidis, Fabian Pedregosa

The extragradient method has gained popularity due to its robust convergence properties for differentiable games.

Cold Start Streaming Learning for Deep Networks

no code implementations9 Nov 2022 Cameron R. Wolfe, Anastasios Kyrillidis

To mitigate these shortcomings, we propose Cold Start Streaming Learning (CSSL), a simple, end-to-end approach for streaming learning with deep networks that uses a combination of replay and data augmentation to avoid catastrophic forgetting.

Data Augmentation

Strong Lottery Ticket Hypothesis with $\varepsilon$--perturbation

no code implementations29 Oct 2022 Zheyang Xiong, Fangshuo Liao, Anastasios Kyrillidis

The strong Lottery Ticket Hypothesis (LTH) claims the existence of a subnetwork in a sufficiently large, randomly initialized neural network that approximates some target neural network without the need of training.

LOFT: Finding Lottery Tickets through Filter-wise Training

no code implementations28 Oct 2022 Qihan Wang, Chen Dun, Fangshuo Liao, Chris Jermaine, Anastasios Kyrillidis

\textsc{LoFT} is a model-parallel pretraining algorithm that partitions convolutional layers by filters to train them independently in a distributed setting, resulting in reduced memory and communication costs during pretraining.

Efficient and Light-Weight Federated Learning via Asynchronous Distributed Dropout

no code implementations28 Oct 2022 Chen Dun, Mirian Hipolito, Chris Jermaine, Dimitrios Dimitriadis, Anastasios Kyrillidis

Asynchronous learning protocols have regained attention lately, especially in the Federated Learning (FL) setup, where slower clients can severely impede the learning process.

Federated Learning

DPMS: An ADD-Based Symbolic Approach for Generalized MaxSAT Solving

no code implementations8 May 2022 Anastasios Kyrillidis, Moshe Y. Vardi, Zhiwei Zhang

They lack, however, the ability to handle 1) (non-CNF) hybrid constraints, such as XORs and 2) generalized MaxSAT problems natively.

Local Stochastic Factored Gradient Descent for Distributed Quantum State Tomography

no code implementations22 Mar 2022 Junhyung Lyle Kim, Mohammad Taha Toghani, César A. Uribe, Anastasios Kyrillidis

We propose a distributed Quantum State Tomography (QST) protocol, named Local Stochastic Factored Gradient Descent (Local SFGD), to learn the low-rank factor of a density matrix over a set of local machines.

Quantum State Tomography

No More Than 6ft Apart: Robust K-Means via Radius Upper Bounds

1 code implementation4 Mar 2022 Ahmed Imtiaz Humayun, Randall Balestriero, Anastasios Kyrillidis, Richard Baraniuk

We propose to remedy such a scenario by introducing a maximal radius constraint $r$ on the clusters formed by the centroids, i. e., samples from the same cluster should not be more than $2r$ apart in terms of $\ell_2$ distance.

Clustering

i-SpaSP: Structured Neural Pruning via Sparse Signal Recovery

1 code implementation7 Dec 2021 Cameron R. Wolfe, Anastasios Kyrillidis

We propose a novel, structured pruning algorithm for neural networks -- the iterative, Sparse Structured Pruning algorithm, dubbed as i-SpaSP.

On the Convergence of Shallow Neural Network Training with Randomly Masked Neurons

no code implementations5 Dec 2021 Fangshuo Liao, Anastasios Kyrillidis

With the motive of training all the parameters of a neural network, we study why and when one can achieve this by iteratively creating, training, and combining randomly selected subnetworks.

Convergence and Stability of the Stochastic Proximal Point Algorithm with Momentum

no code implementations11 Nov 2021 Junhyung Lyle Kim, Panos Toulis, Anastasios Kyrillidis

Stochastic gradient descent with momentum (SGDM) is the dominant algorithm in many optimization scenarios, including convex optimization instances and non-convex neural network training.

How much pre-training is enough to discover a good subnetwork?

no code implementations31 Jul 2021 Cameron R. Wolfe, Fangshuo Liao, Qihan Wang, Junhyung Lyle Kim, Anastasios Kyrillidis

Aiming to mathematically analyze the amount of dense network pre-training needed for a pruned network to perform well, we discover a simple theoretical bound in the number of gradient descent pre-training iterations on a two-layer, fully-connected network, beyond which pruning via greedy forward selection [61] yields a subnetwork that achieves good training error.

Network Pruning

REX: Revisiting Budgeted Training with an Improved Schedule

1 code implementation9 Jul 2021 John Chen, Cameron Wolfe, Anastasios Kyrillidis

Deep learning practitioners often operate on a computational and monetary budget.

Mitigating deep double descent by concatenating inputs

no code implementations2 Jul 2021 John Chen, Qihan Wang, Anastasios Kyrillidis

In this work, we explore the connection between the double descent phenomena and the number of samples in the deep neural network setting.

ResIST: Layer-Wise Decomposition of ResNets for Distributed Training

no code implementations2 Jul 2021 Chen Dun, Cameron R. Wolfe, Christopher M. Jermaine, Anastasios Kyrillidis

Thus, ResIST reduces the per-iteration communication, memory, and time requirements of ResNet training to only a fraction of the requirements of full-model training.

Momentum-inspired Low-Rank Coordinate Descent for Diagonally Constrained SDPs

no code implementations16 Jun 2021 Junhyung Lyle Kim, Jose Antonio Lara Benitez, Mohammad Taha Toghani, Cameron Wolfe, Zhiwei Zhang, Anastasios Kyrillidis

We present a novel, practical, and provable approach for solving diagonally constrained semi-definite programming (SDP) problems at scale using accelerated non-convex programming.

Fast quantum state reconstruction via accelerated non-convex programming

1 code implementation14 Apr 2021 Junhyung Lyle Kim, George Kollias, Amir Kalev, Ken X. Wei, Anastasios Kyrillidis

Despite being a non-convex method, \texttt{MiFGD} converges \emph{provably} close to the true density matrix at an accelerated linear rate, in the absence of experimental and statistical noise, and under common assumptions.

GIST: Distributed Training for Large-Scale Graph Convolutional Networks

1 code implementation20 Feb 2021 Cameron R. Wolfe, Jingkang Yang, Arindam Chowdhury, Chen Dun, Artun Bayer, Santiago Segarra, Anastasios Kyrillidis

The graph convolutional network (GCN) is a go-to solution for machine learning on graphs, but its training is notoriously difficult to scale both in terms of graph size and the number of model parameters.

BIG-bench Machine Learning Graph Sampling

Rank-One Measurements of Low-Rank PSD Matrices Have Small Feasible Sets

no code implementations17 Dec 2020 T. Mitchell Roddenberry, Santiago Segarra, Anastasios Kyrillidis

We study the role of the constraint set in determining the solution to low-rank, positive semidefinite (PSD) matrix sensing problems.

On Continuous Local BDD-Based Search for Hybrid SAT Solving

1 code implementation14 Dec 2020 Anastasios Kyrillidis, Moshe Y. Vardi, Zhiwei Zhang

We explore the potential of continuous local search (CLS) in SAT solving by proposing a novel approach for finding a solution of a hybrid system of Boolean constraints.

On Generalization of Adaptive Methods for Over-parameterized Linear Regression

no code implementations28 Nov 2020 Vatsal Shah, Soumya Basu, Anastasios Kyrillidis, Sujay Sanghavi

In this paper, we aim to characterize the performance of adaptive methods in the over-parameterized linear regression setting.

regression

StackMix: A complementary Mix algorithm

no code implementations25 Nov 2020 John Chen, Samarth Sinha, Anastasios Kyrillidis

On its own, improvements with StackMix hold across different number of labeled samples on CIFAR-100, maintaining approximately a 2\% gap in test accuracy -- down to using only 5\% of the whole dataset -- and is effective in the semi-supervised setting with a 2\% improvement with the standard benchmark $\Pi$-model.

Contrastive Learning Data Augmentation +1

Bayesian Coresets: Revisiting the Nonconvex Optimization Perspective

1 code implementation1 Jul 2020 Jacky Y. Zhang, Rajiv Khanna, Anastasios Kyrillidis, Oluwasanmi Koyejo

Bayesian coresets have emerged as a promising approach for implementing scalable Bayesian inference.

Bayesian Inference

FourierSAT: A Fourier Expansion-Based Algebraic Framework for Solving Hybrid Boolean Constraints

2 code implementations2 Dec 2019 Anastasios Kyrillidis, Anshumali Shrivastava, Moshe Y. Vardi, Zhiwei Zhang

By such a reduction to continuous optimization, we propose an algebraic framework for solving systems consisting of different types of constraints.

Optimal Mini-Batch Size Selection for Fast Gradient Descent

no code implementations15 Nov 2019 Michael P. Perrone, Haidar Khan, Changhoan Kim, Anastasios Kyrillidis, Jerry Quinn, Valentina Salapura

This paper presents a methodology for selecting the mini-batch size that minimizes Stochastic Gradient Descent (SGD) learning time for single and multiple learner problems.

Machine Translation Translation

Negative sampling in semi-supervised learning

1 code implementation ICML 2020 John Chen, Vatsal Shah, Anastasios Kyrillidis

We introduce Negative Sampling in Semi-Supervised Learning (NS3L), a simple, fast, easy to tune algorithm for semi-supervised learning (SSL).

Learning Sparse Distributions using Iterative Hard Thresholding

no code implementations NeurIPS 2019 Jacky Y. Zhang, Rajiv Khanna, Anastasios Kyrillidis, Oluwasanmi Koyejo

Iterative hard thresholding (IHT) is a projected gradient descent algorithm, known to achieve state of the art performance for a wide range of structured estimation problems, such as sparse inference.

Demon: Improved Neural Network Training with Momentum Decay

2 code implementations11 Oct 2019 John Chen, Cameron Wolfe, Zhao Li, Anastasios Kyrillidis

Momentum is a widely used technique for gradient-based optimizers in deep learning.

Image Classification

Distributed Learning of Deep Neural Networks using Independent Subnet Training

2 code implementations4 Oct 2019 Binhang Yuan, Cameron R. Wolfe, Chen Dun, Yuxin Tang, Anastasios Kyrillidis, Christopher M. Jermaine

These properties of IST can cope with issues due to distributed data, slow interconnects, or limited device memory, making IST a suitable approach for cases of mandatory distribution.

BIG-bench Machine Learning Image Classification +2

Decaying momentum helps neural network training

no code implementations25 Sep 2019 John Chen, Anastasios Kyrillidis

Momentum is a simple and popular technique in deep learning for gradient-based optimizers.

Compressing Gradient Optimizers via Count-Sketches

1 code implementation1 Feb 2019 Ryan Spring, Anastasios Kyrillidis, Vijai Mohan, Anshumali Shrivastava

The problem is becoming more severe as deep learning models continue to grow larger in order to learn from complex, large-scale datasets.

Minimum weight norm models do not always generalize well for over-parameterized problems

no code implementations16 Nov 2018 Vatsal Shah, Anastasios Kyrillidis, Sujay Sanghavi

We empirically show that the minimum weight norm is not necessarily the proper gauge of good generalization in simplified scenaria, and different models found by adaptive methods could outperform plain gradient methods.

Implicit regularization and solution uniqueness in over-parameterized matrix sensing

no code implementations6 Jun 2018 Kelly Geyer, Anastasios Kyrillidis, Amir Kalev

Surprisingly, recent work argues that the choice of $r \leq n$ is not pivotal: even setting $U \in \mathbb{R}^{n \times n}$ is sufficient for factored gradient descent to find the rank-$r$ solution, which suggests that operating over the factors leads to an implicit regularization.

Provably convergent acceleration in factored gradient descent with applications in matrix sensing

no code implementations1 Jun 2018 Tayo Ajayi, David Mildebrath, Anastasios Kyrillidis, Shashanka Ubaru, Georgios Kollias, Kristofer Bouchard

We present theoretical results on the convergence of \emph{non-convex} accelerated gradient descent in matrix factorization models with $\ell_2$-norm loss.

Quantum State Tomography

Simple and practical algorithms for $\ell_p$-norm low-rank approximation

no code implementations24 May 2018 Anastasios Kyrillidis

We propose practical algorithms for entrywise $\ell_p$-norm low-rank approximation, for $p = 1$ or $p = \infty$.

Approximate Newton-based statistical inference using only stochastic gradients

no code implementations23 May 2018 Tianyang Li, Anastasios Kyrillidis, Liu Liu, Constantine Caramanis

We present a novel statistical inference framework for convex empirical risk minimization, using approximate stochastic Newton steps.

Time Series Time Series Analysis

IHT dies hard: Provable accelerated Iterative Hard Thresholding

no code implementations26 Dec 2017 Rajiv Khanna, Anastasios Kyrillidis

We study --both in theory and practice-- the use of momentum motions in classic iterative hard thresholding (IHT) methods.

Statistical inference using SGD

no code implementations21 May 2017 Tianyang Li, Liu Liu, Anastasios Kyrillidis, Constantine Caramanis

We present a novel method for frequentist statistical inference in $M$-estimation problems, based on stochastic gradient descent (SGD) with a fixed step size: we demonstrate that the average of such SGD sequences can be used for statistical inference, after proper scaling.

Non-square matrix sensing without spurious local minima via the Burer-Monteiro approach

no code implementations12 Sep 2016 Dohyung Park, Anastasios Kyrillidis, Constantine Caramanis, Sujay Sanghavi

We consider the non-square matrix sensing problem, under restricted isometry property (RIP) assumptions.

Finding Low-Rank Solutions via Non-Convex Matrix Factorization, Efficiently and Provably

no code implementations10 Jun 2016 Dohyung Park, Anastasios Kyrillidis, Constantine Caramanis, Sujay Sanghavi

We study such parameterization for optimization of generic convex objectives $f$, and focus on first-order, gradient descent algorithmic solutions.

A simple and provable algorithm for sparse diagonal CCA

no code implementations29 May 2016 Megasthenis Asteris, Anastasios Kyrillidis, Oluwasanmi Koyejo, Russell Poldrack

Given two sets of variables, derived from a common set of samples, sparse Canonical Correlation Analysis (CCA) seeks linear combinations of a small number of variables in each set, such that the induced canonical variables are maximally correlated.

Algorithms for Learning Sparse Additive Models with Interactions in High Dimensions

no code implementations2 May 2016 Hemant Tyagi, Anastasios Kyrillidis, Bernd Gärtner, Andreas Krause

A function $f: \mathbb{R}^d \rightarrow \mathbb{R}$ is a Sparse Additive Model (SPAM), if it is of the form $f(\mathbf{x}) = \sum_{l \in \mathcal{S}}\phi_{l}(x_l)$ where $\mathcal{S} \subset [d]$, $|\mathcal{S}| \ll d$.

Additive models Vocal Bursts Intensity Prediction

Learning Sparse Additive Models with Interactions in High Dimensions

no code implementations18 Apr 2016 Hemant Tyagi, Anastasios Kyrillidis, Bernd Gärtner, Andreas Krause

For some $\mathcal{S}_1 \subset [d], \mathcal{S}_2 \subset {[d] \choose 2}$, the function $f$ is assumed to be of the form: $$f(\mathbf{x}) = \sum_{p \in \mathcal{S}_1}\phi_{p} (x_p) + \sum_{(l, l^{\prime}) \in \mathcal{S}_2}\phi_{(l, l^{\prime})} (x_{l}, x_{l^{\prime}}).$$ Assuming $\phi_{p},\phi_{(l, l^{\prime})}$, $\mathcal{S}_1$ and, $\mathcal{S}_2$ to be unknown, we provide a randomized algorithm that queries $f$ and exactly recovers $\mathcal{S}_1,\mathcal{S}_2$.

Additive models Vocal Bursts Intensity Prediction

Trading-off variance and complexity in stochastic gradient descent

no code implementations22 Mar 2016 Vatsal Shah, Megasthenis Asteris, Anastasios Kyrillidis, Sujay Sanghavi

Stochastic gradient descent is the method of choice for large-scale machine learning problems, by virtue of its light complexity per iteration.

Convex block-sparse linear regression with expanders -- provably

no code implementations21 Mar 2016 Anastasios Kyrillidis, Bubacarr Bah, Rouzbeh Hasheminezhad, Quoc Tran-Dinh, Luca Baldassarre, Volkan Cevher

Our experimental findings on synthetic and real applications support our claims for faster recovery in the convex setting -- as opposed to using dense sensing matrices, while showing a competitive recovery performance.

regression

Bipartite Correlation Clustering -- Maximizing Agreements

no code implementations9 Mar 2016 Megasthenis Asteris, Anastasios Kyrillidis, Dimitris Papailiopoulos, Alexandros G. Dimakis

We present a novel approximation algorithm for $k$-BCC, a variant of BCC with an upper bound $k$ on the number of clusters.

Clustering

A single-phase, proximal path-following framework

no code implementations5 Mar 2016 Quoc Tran-Dinh, Anastasios Kyrillidis, Volkan Cevher

First, it allows handling non-smooth objectives via proximal operators; this avoids lifting the problem dimension in order to accommodate non-smooth components in optimization.

Dropping Convexity for Faster Semi-definite Optimization

no code implementations14 Sep 2015 Srinadh Bhojanapalli, Anastasios Kyrillidis, Sujay Sanghavi

To the best of our knowledge, this is the first paper to provide precise convergence rate guarantees for general convex functions under standard convex assumptions.

Sparse PCA via Bipartite Matchings

no code implementations NeurIPS 2015 Megasthenis Asteris, Dimitris Papailiopoulos, Anastasios Kyrillidis, Alexandros G. Dimakis

We consider the following multi-component sparse PCA problem: given a set of data points, we seek to extract a small number of sparse components with disjoint supports that jointly capture the maximum possible variance.

Structured Sparsity: Discrete and Convex approaches

no code implementations20 Jul 2015 Anastasios Kyrillidis, Luca Baldassarre, Marwa El-Halabi, Quoc Tran-Dinh, Volkan Cevher

For each, we present the models in their discrete nature, discuss how to solve the ensuing discrete problems and then describe convex relaxations.

Compressive Sensing

Stay on path: PCA along graph paths

no code implementations8 Jun 2015 Megasthenis Asteris, Anastasios Kyrillidis, Alexandros G. Dimakis, Han-Gyol Yi and, Bharath Chandrasekaran

We introduce a variant of (sparse) PCA in which the set of feasible support sets is determined by a graph.

Compressive Mining: Fast and Optimal Data Mining in the Compressed Domain

no code implementations22 May 2014 Michail Vlachos, Nikolaos Freris, Anastasios Kyrillidis

However, distance estimation when the data are represented using different sets of coefficients is still a largely unexplored area.

Clustering Data Compression

Scalable sparse covariance estimation via self-concordance

no code implementations13 May 2014 Anastasios Kyrillidis, Rabeeh Karimi Mahabadi, Quoc Tran-Dinh, Volkan Cevher

We consider the class of convex minimization problems, composed of a self-concordant function, such as the $\log\det$ metric, a convex data fidelity term $h(\cdot)$ and, a regularizing -- possibly non-smooth -- function $g(\cdot)$.

Provable Deterministic Leverage Score Sampling

no code implementations6 Apr 2014 Dimitris Papailiopoulos, Anastasios Kyrillidis, Christos Boutsidis

We explain theoretically a curious empirical phenomenon: "Approximating a matrix by deterministically selecting a subset of its columns with the corresponding largest leverage scores results in a good low-rank matrix surrogate".

Non-uniform Feature Sampling for Decision Tree Ensembles

no code implementations24 Mar 2014 Anastasios Kyrillidis, Anastasios Zouzias

We study the effectiveness of non-uniform randomized feature selection in decision tree classification.

feature selection General Classification

An Inexact Proximal Path-Following Algorithm for Constrained Convex Minimization

no code implementations7 Nov 2013 Quoc Tran Dinh, Anastasios Kyrillidis, Volkan Cevher

Many scientific and engineering applications feature nonsmooth convex minimization problems over convex sets.

Composite Self-Concordant Minimization

no code implementations13 Aug 2013 Quoc Tran-Dinh, Anastasios Kyrillidis, Volkan Cevher

We propose a variable metric framework for minimizing the sum of a self-concordant function and a possibly non-smooth convex function, endowed with an easily computable proximal operator.

Group-Sparse Model Selection: Hardness and Relaxations

no code implementations13 Mar 2013 Luca Baldassarre, Nirav Bhan, Volkan Cevher, Anastasios Kyrillidis, Siddhartha Satpathi

Group-based sparsity models are proven instrumental in linear regression problems for recovering signals from much fewer measurements than standard compressive sensing.

Compressive Sensing Model Selection

Sparse projections onto the simplex

no code implementations7 Jun 2012 Anastasios Kyrillidis, Stephen Becker, Volkan Cevher and, Christoph Koch

Most learning methods with rank or sparsity constraints use convex relaxations, which lead to optimization with the nuclear norm or the $\ell_1$-norm.

Density Estimation

Cannot find the paper you are looking for? You can Submit a new open access paper.