no code implementations • NeurIPS 2023 • Shuyao Li, Yu Cheng, Ilias Diakonikolas, Jelena Diakonikolas, Rong Ge, Stephen J. Wright
We introduce a general framework for efficiently finding an approximate SOSP with \emph{dimension-independent} accuracy guarantees, using $\widetilde{O}({D^2}/{\epsilon})$ samples where $D$ is the ambient dimension and $\epsilon$ is the fraction of corrupted datapoints.
no code implementations • 17 Feb 2024 • Andrew Lowy, Jonathan Ullman, Stephen J. Wright
We use this framework to obtain improved, and sometimes optimal, rates for several classes of non-convex loss functions.
no code implementations • 7 Feb 2024 • Ahmet Alacaoglu, Donghwan Kim, Stephen J. Wright
With a simple argument, we obtain optimal or best-known complexity guarantees with cohypomonotonicity or weak MVI conditions for $\rho < \frac{1}{L}$.
no code implementations • 1 Nov 2023 • Ahmet Alacaoglu, Stephen J. Wright
To find a point that satisfies $\varepsilon$-approximate first-order conditions, we require $\widetilde{O}(\varepsilon^{-3})$ complexity in the first case, $\widetilde{O}(\varepsilon^{-4})$ in the second case, and $\widetilde{O}(\varepsilon^{-5})$ in the third case.
no code implementations • 28 Oct 2023 • Shuyao Li, Stephen J. Wright
We consider minimization of a smooth nonconvex function with inexact oracle access to gradient and Hessian (without assuming access to the function value) to achieve approximate second-order optimality.
no code implementations • 6 Oct 2023 • Shi Chen, Qin Li, Oliver Tse, Stephen J. Wright
Most research has focused on optimization over Euclidean spaces, but given the need to optimize over spaces of probability measures in many machine learning problems, it is of interest to investigate accelerated gradient methods in this context too.
no code implementations • 3 Jun 2023 • Yewei Xu, Shi Chen, Qin Li, Stephen J. Wright
Does the use of auto-differentiation yield reasonable updates to deep neural networks that represent neural ODEs?
no code implementations • 9 Feb 2023 • Changyu Gao, Stephen J. Wright
We develop simple differentially private optimization algorithms that move along directions of (expected) descent to find an approximate second-order solution for nonconvex ERM.
no code implementations • 9 Dec 2022 • Xufeng Cai, Chaobing Song, Stephen J. Wright, Jelena Diakonikolas
Our convergence analysis is based on a gradient Lipschitz condition with respect to a Mahalanobis norm, inspired by a recent progress on cyclic block coordinate methods.
no code implementations • 19 Jan 2022 • Ahmet Alacaoglu, Volkan Cevher, Stephen J. Wright
We prove complexity bounds for the primal-dual algorithm with random extrapolation and coordinate descent (PURE-CD), which has been shown to obtain good practical performance for solving convex-concave min-max problems with bilinear coupling.
1 code implementation • 2 Nov 2021 • Chaobing Song, Cheuk Yin Lin, Stephen J. Wright, Jelena Diakonikolas
\textsc{clvr} yields improved complexity results for (GLP) that depend on the max row norm of the linear constraint matrix in (GLP) rather than the spectral norm.
no code implementations • 19 Apr 2021 • Aydin Buluc, Tamara G. Kolda, Stefan M. Wild, Mihai Anitescu, Anthony DeGennaro, John Jakeman, Chandrika Kamath, Ramakrishnan Kannan, Miles E. Lopes, Per-Gunnar Martinsson, Kary Myers, Jelani Nelson, Juan M. Restrepo, C. Seshadhri, Draguna Vrabie, Brendt Wohlberg, Stephen J. Wright, Chao Yang, Peter Zwart
Randomized algorithms have propelled advances in artificial intelligence and represent a foundational research area in advancing AI for Science.
no code implementations • 26 Feb 2021 • Chaobing Song, Stephen J. Wright, Jelena Diakonikolas
We study structured nonsmooth convex finite-sum optimization that appears widely in machine learning applications, including support vector machines and least absolute deviation.
no code implementations • 22 Oct 2020 • Zhiyan Ding, Qin Li, Jianfeng Lu, Stephen J. Wright
We investigate the computational complexity of RC-ULMC and compare it with the classical ULMC for strongly log-concave probability distributions.
no code implementations • 3 Oct 2020 • Zhiyan Ding, Qin Li, Jianfeng Lu, Stephen J. Wright
We investigate the total complexity of RC-LMC and compare it with the classical LMC for log-concave probability distributions.
no code implementations • 28 May 2020 • Nam Ho-Nguyen, Stephen J. Wright
Inspired by this observation, we show that, for a certain class of distributions, the only stationary point of the regularized ramp loss minimization problem is the global minimizer.
no code implementations • 18 Dec 2019 • Soroosh Khoram, Stephen J. Wright, Jing Li
A method often used to reduce this computational cost is quantization of the vector space and location-based encoding of the dataset vectors.
1 code implementation • 12 Dec 2019 • Ching-pei Lee, Cong Han Lim, Stephen J. Wright
When applied to the distributed dual ERM problem, unlike state of the art that takes only the block-diagonal part of the Hessian, our approach is able to utilize global curvature information and is thus magnitudes faster.
1 code implementation • 4 Mar 2018 • Ching-pei Lee, Cong Han Lim, Stephen J. Wright
Initial computational results on convex problems demonstrate that our method significantly improves on communication cost and running time over the current state-of-the-art methods.
no code implementations • 24 Jan 2018 • Xuezhou Zhang, Xiaojin Zhu, Stephen J. Wright
The set of trusted items may not by itself be adequate for learning, so we propose an algorithm that uses these items to identify bugs in the training set and thus im- proves learning.
no code implementations • 26 Sep 2013 • Ryan Kennedy, Laura Balzano, Stephen J. Wright, Camillo J. Taylor
We present a family of online algorithms for real-time factorization-based structure from motion, leveraging a relationship between incremental singular value decomposition and recently proposed methods for online matrix completion.
no code implementations • 21 Jul 2013 • Laura Balzano, Stephen J. Wright
GROUSE (Grassmannian Rank-One Update Subspace Estimation) is an incremental algorithm for identifying a subspace of Rn from a sequence of vectors in this subspace, where only a subset of components of each vector is revealed at each iteration.
no code implementations • 3 Jul 2012 • Ji Liu, Stephen J. Wright
We consider the reconstruction problem in compressed sensing in which the observations are recorded in a finite number of bits.
5 code implementations • 28 Jun 2011 • Feng Niu, Benjamin Recht, Christopher Re, Stephen J. Wright
Stochastic Gradient Descent (SGD) is a popular algorithm that can achieve state-of-the-art performance on a variety of machine learning tasks.