no code implementations • 24 Mar 2024 • Jelena Diakonikolas, Cristóbal Guzmán
The resulting class of objective functions encapsulates the classes of objective functions traditionally studied in optimization, which are defined based on either Lipschitz continuity of the objective or H\"{o}lder/Lipschitz continuity of its gradient.
no code implementations • 16 Mar 2024 • Ronak Mehta, Jelena Diakonikolas, Zaid Harchaoui
We consider the penalized distributionally robust optimization (DRO) problem with a closed, convex uncertainty set, a setting that encompasses the $f$-DRO, Wasserstein-DRO, and spectral/$L$-risk formulations used in practice.
no code implementations • NeurIPS 2023 • Shuyao Li, Yu Cheng, Ilias Diakonikolas, Jelena Diakonikolas, Rong Ge, Stephen J. Wright
We introduce a general framework for efficiently finding an approximate SOSP with \emph{dimension-independent} accuracy guarantees, using $\widetilde{O}({D^2}/{\epsilon})$ samples where $D$ is the ambient dimension and $\epsilon$ is the fraction of corrupted datapoints.
no code implementations • 11 Mar 2024 • Xufeng Cai, Jelena Diakonikolas
We further obtain generalizations of our results to weighted averaging of the iterates with increasing weights, which can be seen as interpolating between the last iterate and the average iterate guarantees.
no code implementations • 27 Feb 2024 • Nikos Zarifis, Puqian Wang, Ilias Diakonikolas, Jelena Diakonikolas
We give an efficient learning algorithm, achieving a constant factor approximation to the optimal loss, that succeeds under a range of distributions (including log-concave distributions) and a broad class of monotone and Lipschitz link functions.
no code implementations • 4 Oct 2023 • Xufeng Cai, Ahmet Alacaoglu, Jelena Diakonikolas
Our main contributions are variants of the classical Halpern iteration that employ variance reduction to obtain improved complexity guarantees in which $n$ component operators in the finite sum are ``on average'' either cocoercive or Lipschitz continuous and monotone, with parameter $L$.
no code implementations • 28 Jun 2023 • Ilias Diakonikolas, Jelena Diakonikolas, Daniel M. Kane, Puqian Wang, Nikos Zarifis
Our main result is a lower bound for Statistical Query (SQ) algorithms and low-degree polynomial tests suggesting that the quadratic dependence on $1/\epsilon$ in the sample complexity is inherent for computationally efficient algorithms.
no code implementations • 21 Jun 2023 • Xufeng Cai, Cheuk Yin Lin, Jelena Diakonikolas
Contrary to the empirical practice of sampling from the datasets without replacement and with (possible) reshuffling at each epoch, the theoretical counterpart of SGD usually relies on the assumption of sampling with replacement.
no code implementations • 13 Jun 2023 • Puqian Wang, Nikos Zarifis, Ilias Diakonikolas, Jelena Diakonikolas
We study the problem of learning a single neuron with respect to the $L_2^2$-loss in the presence of adversarial label noise.
no code implementations • 28 Mar 2023 • Cheuk Yin Lin, Chaobing Song, Jelena Diakonikolas
Exploiting partial first-order information in a cyclic way is arguably the most natural strategy to obtain scalable first-order methods.
no code implementations • 9 Dec 2022 • Xufeng Cai, Chaobing Song, Stephen J. Wright, Jelena Diakonikolas
Our convergence analysis is based on a gradient Lipschitz condition with respect to a Mahalanobis norm, inspired by a recent progress on cyclic block coordinate methods.
1 code implementation • 17 Mar 2022 • Xufeng Cai, Chaobing Song, Cristóbal Guzmán, Jelena Diakonikolas
We study stochastic monotone inclusion problems, which widely appear in machine learning applications, including robust regression and adversarial learning.
no code implementations • 8 Mar 2022 • Jelena Diakonikolas, Chenghui Li, Swati Padmanabhan, Chaobing Song
In particular, while the oracle complexity of unconstrained least squares problems necessarily scales with one of the data matrix constants (typically the spectral norm) and these problems are solved to additive error, we show that nonnegative least squares problems with nonnegative data are solvable to multiplicative error and with complexity that is independent of any matrix constants.
1 code implementation • 2 Nov 2021 • Chaobing Song, Cheuk Yin Lin, Stephen J. Wright, Jelena Diakonikolas
\textsc{clvr} yields improved complexity results for (GLP) that depend on the max row norm of the linear constraint matrix in (GLP) rather than the spectral norm.
no code implementations • 26 Feb 2021 • Chaobing Song, Stephen J. Wright, Jelena Diakonikolas
We study structured nonsmooth convex finite-sum optimization that appears widely in machine learning applications, including support vector machines and least absolute deviation.
no code implementations • 26 Feb 2021 • Chaobing Song, Jelena Diakonikolas
This class includes composite convex optimization problems and convex-concave min-max optimization problems as special cases and has not been addressed by the existing work.
no code implementations • 12 Feb 2021 • Alejandro Carderera, Jelena Diakonikolas, Cheuk Yin Lin, Sebastian Pokutta
Projection-free conditional gradient (CG) methods are the algorithms of choice for constrained optimization setups in which projections are often computationally prohibitive but linear optimization over the constraint set remains computationally feasible.
no code implementations • 28 Jan 2021 • Jelena Diakonikolas, Puqian Wang
We introduce a novel potential function-based framework to study the convergence of standard methods for making the gradients small in smooth convex optimization and convex-concave min-max optimization.
no code implementations • 26 Jan 2021 • Jelena Diakonikolas, Cristóbal Guzmán
We introduce a new algorithmic framework for complementary composite minimization, where the objective function decouples into a (weakly) smooth and a uniformly convex term.
no code implementations • 31 Oct 2020 • Jelena Diakonikolas, Constantinos Daskalakis, Michael I. Jordan
The use of min-max optimization in adversarial training of deep neural network classifiers and training of generative adversarial networks has motivated the study of nonconvex-nonconcave optimization objectives, which frequently arise in these applications.
no code implementations • 20 Feb 2020 • Jelena Diakonikolas
We leverage the connections between nonexpansive maps, monotone Lipschitz operators, and proximal mappings to obtain near-optimal (i. e., optimal up to poly-log factors in terms of iteration complexity) and parameter-free methods for solving monotone inclusion problems.
no code implementations • 29 Jun 2019 • Jelena Diakonikolas, Lorenzo Orecchia
This note provides a novel, simple analysis of the method of conjugate gradients for the minimization of convex quadratic functions.
no code implementations • 19 Jun 2019 • Jelena Diakonikolas, Alejandro Carderera, Sebastian Pokutta
As such, they are frequently used in solving smooth convex optimization problems over polytopes, for which the computational cost of orthogonal projections would be prohibitive.
no code implementations • 2 Jun 2019 • Jelena Diakonikolas, Michael. I. Jordan
We take a Hamiltonian-based perspective to generalize Nesterov's accelerated gradient descent and Polyak's heavy ball method to a broad class of momentum methods in the setting of (possibly) constrained minimization in Euclidean and non-Euclidean normed vector spaces.
no code implementations • 30 May 2019 • Niladri S. Chatterji, Jelena Diakonikolas, Michael. I. Jordan, Peter L. Bartlett
Langevin Monte Carlo (LMC) is an iterative algorithm used to generate samples from a distribution that is known only up to a normalizing constant.
no code implementations • 5 Nov 2018 • Jelena Diakonikolas, Cristóbal Guzmán
We study the question of whether parallelization in the exploration of the feasible set can be used to speed up convex optimization, in the local oracle model of computation.
no code implementations • ICML 2018 • Jelena Diakonikolas, Lorenzo Orecchia
While various block-coordinate-descent-type methods have been studied extensively, only alternating minimization – which applies to the setting of only two blocks – is known to have convergence time that scales independently of the least smooth block.