Search Results for author: Anastasios Kyrillidis

Found 54 papers, 10 papers with code

Convergence and Stability of the Stochastic Proximal Point Algorithm with Momentum

no code implementations11 Nov 2021 Junhyung Lyle Kim, Panos Toulis, Anastasios Kyrillidis

Stochastic gradient descent with momentum (SGDM) is the dominant algorithm in many optimization scenarios, including convex optimization instances and non-convex neural network training.

Provably Efficient Lottery Ticket Discovery

no code implementations31 Jul 2021 Cameron R. Wolfe, Qihan Wang, Junhyung Lyle Kim, Anastasios Kyrillidis

We derive an analytical bound for the number of pre-training iterations that must be performed for a winning ticket to be discovered, thus providing a theoretical understanding of when and why such early-bird tickets exist.

REX: Revisiting Budgeted Training with an Improved Schedule

no code implementations9 Jul 2021 John Chen, Cameron Wolfe, Anastasios Kyrillidis

Deep learning practitioners often operate on a computational and monetary budget.

ResIST: Layer-Wise Decomposition of ResNets for Distributed Training

no code implementations2 Jul 2021 Chen Dun, Cameron R. Wolfe, Christopher M. Jermaine, Anastasios Kyrillidis

We propose {\rm \texttt{ResIST}}, a novel distributed training protocol for Residual Networks (ResNets).

Mitigating deep double descent by concatenating inputs

no code implementations2 Jul 2021 John Chen, Qihan Wang, Anastasios Kyrillidis

In this work, we explore the connection between the double descent phenomena and the number of samples in the deep neural network setting.

Momentum-inspired Low-Rank Coordinate Descent for Diagonally Constrained SDPs

no code implementations16 Jun 2021 Junhyung Lyle Kim, Jose Antonio Lara Benitez, Mohammad Taha Toghani, Cameron Wolfe, Zhiwei Zhang, Anastasios Kyrillidis

We present a novel, practical, and provable approach for solving diagonally constrained semi-definite programming (SDP) problems at scale using accelerated non-convex programming.

Fast quantum state reconstruction via accelerated non-convex programming

1 code implementation14 Apr 2021 Junhyung Lyle Kim, George Kollias, Amir Kalev, Ken X. Wei, Anastasios Kyrillidis

Despite being a non-convex method, \texttt{MiFGD} converges \emph{provably} to the true density matrix at a linear rate, in the absence of experimental and statistical noise, and under common assumptions.

GIST: Distributed Training for Large-Scale Graph Convolutional Networks

1 code implementation20 Feb 2021 Cameron R. Wolfe, Jingkang Yang, Arindam Chowdhury, Chen Dun, Artun Bayer, Santiago Segarra, Anastasios Kyrillidis

The graph convolutional network (GCN) is a go-to solution for machine learning on graphs, but its training is notoriously difficult to scale both in terms of graph size and the number of model parameters.

Graph Sampling

Rank-One Measurements of Low-Rank PSD Matrices Have Small Feasible Sets

no code implementations17 Dec 2020 T. Mitchell Roddenberry, Santiago Segarra, Anastasios Kyrillidis

We study the role of the constraint set in determining the solution to low-rank, positive semidefinite (PSD) matrix sensing problems.

On Continuous Local BDD-Based Search for Hybrid SAT Solving

1 code implementation14 Dec 2020 Anastasios Kyrillidis, Moshe Y. Vardi, Zhiwei Zhang

We explore the potential of continuous local search (CLS) in SAT solving by proposing a novel approach for finding a solution of a hybrid system of Boolean constraints.

On Generalization of Adaptive Methods for Over-parameterized Linear Regression

no code implementations28 Nov 2020 Vatsal Shah, Soumya Basu, Anastasios Kyrillidis, Sujay Sanghavi

In this paper, we aim to characterize the performance of adaptive methods in the over-parameterized linear regression setting.

StackMix: A complementary Mix algorithm

no code implementations25 Nov 2020 John Chen, Samarth Sinha, Anastasios Kyrillidis

On its own, improvements with StackMix hold across different number of labeled samples on CIFAR-100, maintaining approximately a 2\% gap in test accuracy -- down to using only 5\% of the whole dataset -- and is effective in the semi-supervised setting with a 2\% improvement with the standard benchmark $\Pi$-model.

Contrastive Learning Data Augmentation +1

Bayesian Coresets: Revisiting the Nonconvex Optimization Perspective

1 code implementation1 Jul 2020 Jacky Y. Zhang, Rajiv Khanna, Anastasios Kyrillidis, Oluwasanmi Koyejo

Bayesian coresets have emerged as a promising approach for implementing scalable Bayesian inference.

Bayesian Inference

FourierSAT: A Fourier Expansion-Based Algebraic Framework for Solving Hybrid Boolean Constraints

1 code implementation2 Dec 2019 Anastasios Kyrillidis, Anshumali Shrivastava, Moshe Y. Vardi, Zhiwei Zhang

By such a reduction to continuous optimization, we propose an algebraic framework for solving systems consisting of different types of constraints.

Optimal Mini-Batch Size Selection for Fast Gradient Descent

no code implementations15 Nov 2019 Michael P. Perrone, Haidar Khan, Changhoan Kim, Anastasios Kyrillidis, Jerry Quinn, Valentina Salapura

This paper presents a methodology for selecting the mini-batch size that minimizes Stochastic Gradient Descent (SGD) learning time for single and multiple learner problems.

Machine Translation Translation

Negative sampling in semi-supervised learning

1 code implementation ICML 2020 John Chen, Vatsal Shah, Anastasios Kyrillidis

We introduce Negative Sampling in Semi-Supervised Learning (NS3L), a simple, fast, easy to tune algorithm for semi-supervised learning (SSL).

Learning Sparse Distributions using Iterative Hard Thresholding

no code implementations NeurIPS 2019 Jacky Y. Zhang, Rajiv Khanna, Anastasios Kyrillidis, Oluwasanmi Koyejo

Iterative hard thresholding (IHT) is a projected gradient descent algorithm, known to achieve state of the art performance for a wide range of structured estimation problems, such as sparse inference.

Demon: Improved Neural Network Training with Momentum Decay

2 code implementations11 Oct 2019 John Chen, Cameron Wolfe, Zhao Li, Anastasios Kyrillidis

Momentum is a widely used technique for gradient-based optimizers in deep learning.

Image Classification

Distributed Learning of Deep Neural Networks using Independent Subnet Training

1 code implementation4 Oct 2019 Binhang Yuan, Cameron R. Wolfe, Chen Dun, Yuxin Tang, Anastasios Kyrillidis, Christopher M. Jermaine

We show experimentally that IST results in training time that are much lower than data parallel approaches to distributed learning, and that it scales to large models that cannot be learned using standard approaches.

Image Classification Product Recommendation +1

Decaying momentum helps neural network training

no code implementations25 Sep 2019 John Chen, Anastasios Kyrillidis

Momentum is a simple and popular technique in deep learning for gradient-based optimizers.

Compressing Gradient Optimizers via Count-Sketches

1 code implementation1 Feb 2019 Ryan Spring, Anastasios Kyrillidis, Vijai Mohan, Anshumali Shrivastava

The problem is becoming more severe as deep learning models continue to grow larger in order to learn from complex, large-scale datasets.

Minimum weight norm models do not always generalize well for over-parameterized problems

no code implementations16 Nov 2018 Vatsal Shah, Anastasios Kyrillidis, Sujay Sanghavi

We empirically show that the minimum weight norm is not necessarily the proper gauge of good generalization in simplified scenaria, and different models found by adaptive methods could outperform plain gradient methods.

Fine-tuning

Implicit regularization and solution uniqueness in over-parameterized matrix sensing

no code implementations6 Jun 2018 Kelly Geyer, Anastasios Kyrillidis, Amir Kalev

Surprisingly, recent work argues that the choice of $r \leq n$ is not pivotal: even setting $U \in \mathbb{R}^{n \times n}$ is sufficient for factored gradient descent to find the rank-$r$ solution, which suggests that operating over the factors leads to an implicit regularization.

Provably convergent acceleration in factored gradient descent with applications in matrix sensing

no code implementations1 Jun 2018 Tayo Ajayi, David Mildebrath, Anastasios Kyrillidis, Shashanka Ubaru, Georgios Kollias, Kristofer Bouchard

We present theoretical results on the convergence of \emph{non-convex} accelerated gradient descent in matrix factorization models with $\ell_2$-norm loss.

Quantum State Tomography

Simple and practical algorithms for $\ell_p$-norm low-rank approximation

no code implementations24 May 2018 Anastasios Kyrillidis

We propose practical algorithms for entrywise $\ell_p$-norm low-rank approximation, for $p = 1$ or $p = \infty$.

Approximate Newton-based statistical inference using only stochastic gradients

no code implementations23 May 2018 Tianyang Li, Anastasios Kyrillidis, Liu Liu, Constantine Caramanis

We present a novel statistical inference framework for convex empirical risk minimization, using approximate stochastic Newton steps.

Time Series Time Series Analysis

IHT dies hard: Provable accelerated Iterative Hard Thresholding

no code implementations26 Dec 2017 Rajiv Khanna, Anastasios Kyrillidis

We study --both in theory and practice-- the use of momentum motions in classic iterative hard thresholding (IHT) methods.

Statistical inference using SGD

no code implementations21 May 2017 Tianyang Li, Liu Liu, Anastasios Kyrillidis, Constantine Caramanis

We present a novel method for frequentist statistical inference in $M$-estimation problems, based on stochastic gradient descent (SGD) with a fixed step size: we demonstrate that the average of such SGD sequences can be used for statistical inference, after proper scaling.

Non-square matrix sensing without spurious local minima via the Burer-Monteiro approach

no code implementations12 Sep 2016 Dohyung Park, Anastasios Kyrillidis, Constantine Caramanis, Sujay Sanghavi

We consider the non-square matrix sensing problem, under restricted isometry property (RIP) assumptions.

Finding Low-Rank Solutions via Non-Convex Matrix Factorization, Efficiently and Provably

no code implementations10 Jun 2016 Dohyung Park, Anastasios Kyrillidis, Constantine Caramanis, Sujay Sanghavi

We study such parameterization for optimization of generic convex objectives $f$, and focus on first-order, gradient descent algorithmic solutions.

A simple and provable algorithm for sparse diagonal CCA

no code implementations29 May 2016 Megasthenis Asteris, Anastasios Kyrillidis, Oluwasanmi Koyejo, Russell Poldrack

Given two sets of variables, derived from a common set of samples, sparse Canonical Correlation Analysis (CCA) seeks linear combinations of a small number of variables in each set, such that the induced canonical variables are maximally correlated.

Algorithms for Learning Sparse Additive Models with Interactions in High Dimensions

no code implementations2 May 2016 Hemant Tyagi, Anastasios Kyrillidis, Bernd Gärtner, Andreas Krause

A function $f: \mathbb{R}^d \rightarrow \mathbb{R}$ is a Sparse Additive Model (SPAM), if it is of the form $f(\mathbf{x}) = \sum_{l \in \mathcal{S}}\phi_{l}(x_l)$ where $\mathcal{S} \subset [d]$, $|\mathcal{S}| \ll d$.

Additive models

Learning Sparse Additive Models with Interactions in High Dimensions

no code implementations18 Apr 2016 Hemant Tyagi, Anastasios Kyrillidis, Bernd Gärtner, Andreas Krause

For some $\mathcal{S}_1 \subset [d], \mathcal{S}_2 \subset {[d] \choose 2}$, the function $f$ is assumed to be of the form: $$f(\mathbf{x}) = \sum_{p \in \mathcal{S}_1}\phi_{p} (x_p) + \sum_{(l, l^{\prime}) \in \mathcal{S}_2}\phi_{(l, l^{\prime})} (x_{l}, x_{l^{\prime}}).$$ Assuming $\phi_{p},\phi_{(l, l^{\prime})}$, $\mathcal{S}_1$ and, $\mathcal{S}_2$ to be unknown, we provide a randomized algorithm that queries $f$ and exactly recovers $\mathcal{S}_1,\mathcal{S}_2$.

Additive models

Trading-off variance and complexity in stochastic gradient descent

no code implementations22 Mar 2016 Vatsal Shah, Megasthenis Asteris, Anastasios Kyrillidis, Sujay Sanghavi

Stochastic gradient descent is the method of choice for large-scale machine learning problems, by virtue of its light complexity per iteration.

Convex block-sparse linear regression with expanders -- provably

no code implementations21 Mar 2016 Anastasios Kyrillidis, Bubacarr Bah, Rouzbeh Hasheminezhad, Quoc Tran-Dinh, Luca Baldassarre, Volkan Cevher

Our experimental findings on synthetic and real applications support our claims for faster recovery in the convex setting -- as opposed to using dense sensing matrices, while showing a competitive recovery performance.

Bipartite Correlation Clustering -- Maximizing Agreements

no code implementations9 Mar 2016 Megasthenis Asteris, Anastasios Kyrillidis, Dimitris Papailiopoulos, Alexandros G. Dimakis

We present a novel approximation algorithm for $k$-BCC, a variant of BCC with an upper bound $k$ on the number of clusters.

A single-phase, proximal path-following framework

no code implementations5 Mar 2016 Quoc Tran-Dinh, Anastasios Kyrillidis, Volkan Cevher

First, it allows handling non-smooth objectives via proximal operators; this avoids lifting the problem dimension in order to accommodate non-smooth components in optimization.

Dropping Convexity for Faster Semi-definite Optimization

no code implementations14 Sep 2015 Srinadh Bhojanapalli, Anastasios Kyrillidis, Sujay Sanghavi

To the best of our knowledge, this is the first paper to provide precise convergence rate guarantees for general convex functions under standard convex assumptions.

Sparse PCA via Bipartite Matchings

no code implementations NeurIPS 2015 Megasthenis Asteris, Dimitris Papailiopoulos, Anastasios Kyrillidis, Alexandros G. Dimakis

We consider the following multi-component sparse PCA problem: given a set of data points, we seek to extract a small number of sparse components with disjoint supports that jointly capture the maximum possible variance.

Structured Sparsity: Discrete and Convex approaches

no code implementations20 Jul 2015 Anastasios Kyrillidis, Luca Baldassarre, Marwa El-Halabi, Quoc Tran-Dinh, Volkan Cevher

For each, we present the models in their discrete nature, discuss how to solve the ensuing discrete problems and then describe convex relaxations.

Compressive Sensing

Stay on path: PCA along graph paths

no code implementations8 Jun 2015 Megasthenis Asteris, Anastasios Kyrillidis, Alexandros G. Dimakis, Han-Gyol Yi and, Bharath Chandrasekaran

We introduce a variant of (sparse) PCA in which the set of feasible support sets is determined by a graph.

Compressive Mining: Fast and Optimal Data Mining in the Compressed Domain

no code implementations22 May 2014 Michail Vlachos, Nikolaos Freris, Anastasios Kyrillidis

However, distance estimation when the data are represented using different sets of coefficients is still a largely unexplored area.

Data Compression

Scalable sparse covariance estimation via self-concordance

no code implementations13 May 2014 Anastasios Kyrillidis, Rabeeh Karimi Mahabadi, Quoc Tran-Dinh, Volkan Cevher

We consider the class of convex minimization problems, composed of a self-concordant function, such as the $\log\det$ metric, a convex data fidelity term $h(\cdot)$ and, a regularizing -- possibly non-smooth -- function $g(\cdot)$.

Provable Deterministic Leverage Score Sampling

no code implementations6 Apr 2014 Dimitris Papailiopoulos, Anastasios Kyrillidis, Christos Boutsidis

We explain theoretically a curious empirical phenomenon: "Approximating a matrix by deterministically selecting a subset of its columns with the corresponding largest leverage scores results in a good low-rank matrix surrogate".

Non-uniform Feature Sampling for Decision Tree Ensembles

no code implementations24 Mar 2014 Anastasios Kyrillidis, Anastasios Zouzias

We study the effectiveness of non-uniform randomized feature selection in decision tree classification.

Feature Selection General Classification

An Inexact Proximal Path-Following Algorithm for Constrained Convex Minimization

no code implementations7 Nov 2013 Quoc Tran Dinh, Anastasios Kyrillidis, Volkan Cevher

Many scientific and engineering applications feature nonsmooth convex minimization problems over convex sets.

Composite Self-Concordant Minimization

no code implementations13 Aug 2013 Quoc Tran-Dinh, Anastasios Kyrillidis, Volkan Cevher

We propose a variable metric framework for minimizing the sum of a self-concordant function and a possibly non-smooth convex function, endowed with an easily computable proximal operator.

Group-Sparse Model Selection: Hardness and Relaxations

no code implementations13 Mar 2013 Luca Baldassarre, Nirav Bhan, Volkan Cevher, Anastasios Kyrillidis, Siddhartha Satpathi

Group-based sparsity models are proven instrumental in linear regression problems for recovering signals from much fewer measurements than standard compressive sensing.

Compressive Sensing Model Selection

Sparse projections onto the simplex

no code implementations7 Jun 2012 Anastasios Kyrillidis, Stephen Becker, Volkan Cevher and, Christoph Koch

Most learning methods with rank or sparsity constraints use convex relaxations, which lead to optimization with the nuclear norm or the $\ell_1$-norm.

Density Estimation

Cannot find the paper you are looking for? You can Submit a new open access paper.