Search Results for author: Jelena Diakonikolas

Found 27 papers, 2 papers with code

Optimization on a Finer Scale: Bounded Local Subgradient Variation Perspective

no code implementations24 Mar 2024 Jelena Diakonikolas, Cristóbal Guzmán

The resulting class of objective functions encapsulates the classes of objective functions traditionally studied in optimization, which are defined based on either Lipschitz continuity of the objective or H\"{o}lder/Lipschitz continuity of its gradient.

A Primal-Dual Algorithm for Faster Distributionally Robust Optimization

no code implementations16 Mar 2024 Ronak Mehta, Jelena Diakonikolas, Zaid Harchaoui

We consider the penalized distributionally robust optimization (DRO) problem with a closed, convex uncertainty set, a setting that encompasses the $f$-DRO, Wasserstein-DRO, and spectral/$L$-risk formulations used in practice.

Robust Second-Order Nonconvex Optimization and Its Application to Low Rank Matrix Sensing

no code implementations NeurIPS 2023 Shuyao Li, Yu Cheng, Ilias Diakonikolas, Jelena Diakonikolas, Rong Ge, Stephen J. Wright

We introduce a general framework for efficiently finding an approximate SOSP with \emph{dimension-independent} accuracy guarantees, using $\widetilde{O}({D^2}/{\epsilon})$ samples where $D$ is the ambient dimension and $\epsilon$ is the fraction of corrupted datapoints.

Last Iterate Convergence of Incremental Methods and Applications in Continual Learning

no code implementations11 Mar 2024 Xufeng Cai, Jelena Diakonikolas

We further obtain generalizations of our results to weighted averaging of the iterates with increasing weights, which can be seen as interpolating between the last iterate and the average iterate guarantees.

Continual Learning

Robustly Learning Single-Index Models via Alignment Sharpness

no code implementations27 Feb 2024 Nikos Zarifis, Puqian Wang, Ilias Diakonikolas, Jelena Diakonikolas

We give an efficient learning algorithm, achieving a constant factor approximation to the optimal loss, that succeeds under a range of distributions (including log-concave distributions) and a broad class of monotone and Lipschitz link functions.

Variance Reduced Halpern Iteration for Finite-Sum Monotone Inclusions

no code implementations4 Oct 2023 Xufeng Cai, Ahmet Alacaoglu, Jelena Diakonikolas

Our main contributions are variants of the classical Halpern iteration that employ variance reduction to obtain improved complexity guarantees in which $n$ component operators in the finite sum are ``on average'' either cocoercive or Lipschitz continuous and monotone, with parameter $L$.

Adversarial Robustness

Information-Computation Tradeoffs for Learning Margin Halfspaces with Random Classification Noise

no code implementations28 Jun 2023 Ilias Diakonikolas, Jelena Diakonikolas, Daniel M. Kane, Puqian Wang, Nikos Zarifis

Our main result is a lower bound for Statistical Query (SQ) algorithms and low-degree polynomial tests suggesting that the quadratic dependence on $1/\epsilon$ in the sample complexity is inherent for computationally efficient algorithms.

PAC learning

Empirical Risk Minimization with Shuffled SGD: A Primal-Dual Perspective and Improved Bounds

no code implementations21 Jun 2023 Xufeng Cai, Cheuk Yin Lin, Jelena Diakonikolas

Contrary to the empirical practice of sampling from the datasets without replacement and with (possible) reshuffling at each epoch, the theoretical counterpart of SGD usually relies on the assumption of sampling with replacement.

Robustly Learning a Single Neuron via Sharpness

no code implementations13 Jun 2023 Puqian Wang, Nikos Zarifis, Ilias Diakonikolas, Jelena Diakonikolas

We study the problem of learning a single neuron with respect to the $L_2^2$-loss in the presence of adversarial label noise.

Accelerated Cyclic Coordinate Dual Averaging with Extrapolation for Composite Convex Optimization

no code implementations28 Mar 2023 Cheuk Yin Lin, Chaobing Song, Jelena Diakonikolas

Exploiting partial first-order information in a cyclic way is arguably the most natural strategy to obtain scalable first-order methods.

Cyclic Block Coordinate Descent With Variance Reduction for Composite Nonconvex Optimization

no code implementations9 Dec 2022 Xufeng Cai, Chaobing Song, Stephen J. Wright, Jelena Diakonikolas

Our convergence analysis is based on a gradient Lipschitz condition with respect to a Mahalanobis norm, inspired by a recent progress on cyclic block coordinate methods.

Stochastic Halpern Iteration with Variance Reduction for Stochastic Monotone Inclusions

1 code implementation17 Mar 2022 Xufeng Cai, Chaobing Song, Cristóbal Guzmán, Jelena Diakonikolas

We study stochastic monotone inclusion problems, which widely appear in machine learning applications, including robust regression and adversarial learning.

A Fast Scale-Invariant Algorithm for Non-negative Least Squares with Non-negative Data

no code implementations8 Mar 2022 Jelena Diakonikolas, Chenghui Li, Swati Padmanabhan, Chaobing Song

In particular, while the oracle complexity of unconstrained least squares problems necessarily scales with one of the data matrix constants (typically the spectral norm) and these problems are solved to additive error, we show that nonnegative least squares problems with nonnegative data are solvable to multiplicative error and with complexity that is independent of any matrix constants.

Coordinate Linear Variance Reduction for Generalized Linear Programming

1 code implementation2 Nov 2021 Chaobing Song, Cheuk Yin Lin, Stephen J. Wright, Jelena Diakonikolas

\textsc{clvr} yields improved complexity results for (GLP) that depend on the max row norm of the linear constraint matrix in (GLP) rather than the spectral norm.

Variance Reduction via Primal-Dual Accelerated Dual Averaging for Nonsmooth Convex Finite-Sums

no code implementations26 Feb 2021 Chaobing Song, Stephen J. Wright, Jelena Diakonikolas

We study structured nonsmooth convex finite-sum optimization that appears widely in machine learning applications, including support vector machines and least absolute deviation.

Cyclic Coordinate Dual Averaging with Extrapolation

no code implementations26 Feb 2021 Chaobing Song, Jelena Diakonikolas

This class includes composite convex optimization problems and convex-concave min-max optimization problems as special cases and has not been addressed by the existing work.

Parameter-free Locally Accelerated Conditional Gradients

no code implementations12 Feb 2021 Alejandro Carderera, Jelena Diakonikolas, Cheuk Yin Lin, Sebastian Pokutta

Projection-free conditional gradient (CG) methods are the algorithms of choice for constrained optimization setups in which projections are often computationally prohibitive but linear optimization over the constraint set remains computationally feasible.

Potential Function-based Framework for Making the Gradients Small in Convex and Min-Max Optimization

no code implementations28 Jan 2021 Jelena Diakonikolas, Puqian Wang

We introduce a novel potential function-based framework to study the convergence of standard methods for making the gradients small in smooth convex optimization and convex-concave min-max optimization.

Complementary Composite Minimization, Small Gradients in General Norms, and Applications

no code implementations26 Jan 2021 Jelena Diakonikolas, Cristóbal Guzmán

We introduce a new algorithmic framework for complementary composite minimization, where the objective function decouples into a (weakly) smooth and a uniformly convex term.

regression

Efficient Methods for Structured Nonconvex-Nonconcave Min-Max Optimization

no code implementations31 Oct 2020 Jelena Diakonikolas, Constantinos Daskalakis, Michael I. Jordan

The use of min-max optimization in adversarial training of deep neural network classifiers and training of generative adversarial networks has motivated the study of nonconvex-nonconcave optimization objectives, which frequently arise in these applications.

Halpern Iteration for Near-Optimal and Parameter-Free Monotone Inclusion and Strong Solutions to Variational Inequalities

no code implementations20 Feb 2020 Jelena Diakonikolas

We leverage the connections between nonexpansive maps, monotone Lipschitz operators, and proximal mappings to obtain near-optimal (i. e., optimal up to poly-log factors in terms of iteration complexity) and parameter-free methods for solving monotone inclusion problems.

Conjugate Gradients and Accelerated Methods Unified: The Approximate Duality Gap View

no code implementations29 Jun 2019 Jelena Diakonikolas, Lorenzo Orecchia

This note provides a novel, simple analysis of the method of conjugate gradients for the minimization of convex quadratic functions.

Locally Accelerated Conditional Gradients

no code implementations19 Jun 2019 Jelena Diakonikolas, Alejandro Carderera, Sebastian Pokutta

As such, they are frequently used in solving smooth convex optimization problems over polytopes, for which the computational cost of orthogonal projections would be prohibitive.

Generalized Momentum-Based Methods: A Hamiltonian Perspective

no code implementations2 Jun 2019 Jelena Diakonikolas, Michael. I. Jordan

We take a Hamiltonian-based perspective to generalize Nesterov's accelerated gradient descent and Polyak's heavy ball method to a broad class of momentum methods in the setting of (possibly) constrained minimization in Euclidean and non-Euclidean normed vector spaces.

Langevin Monte Carlo without smoothness

no code implementations30 May 2019 Niladri S. Chatterji, Jelena Diakonikolas, Michael. I. Jordan, Peter L. Bartlett

Langevin Monte Carlo (LMC) is an iterative algorithm used to generate samples from a distribution that is known only up to a normalizing constant.

Lower Bounds for Parallel and Randomized Convex Optimization

no code implementations5 Nov 2018 Jelena Diakonikolas, Cristóbal Guzmán

We study the question of whether parallelization in the exploration of the feasible set can be used to speed up convex optimization, in the local oracle model of computation.

Alternating Randomized Block Coordinate Descent

no code implementations ICML 2018 Jelena Diakonikolas, Lorenzo Orecchia

While various block-coordinate-descent-type methods have been studied extensively, only alternating minimization – which applies to the setting of only two blocks – is known to have convergence time that scales independently of the least smooth block.

Cannot find the paper you are looking for? You can Submit a new open access paper.