no code implementations • ICML 2020 • Ashok Cutkosky
We provide a new online learning algorithm that for the first time combines several disparate notions of adaptivity.
no code implementations • 1 Jul 2024 • Hoang Tran, Qinzi Zhang, Ashok Cutkosky
There is a significant gap between our theoretical understanding of optimization algorithms used in deep learning and their practical performance.
no code implementations • 27 Jun 2024 • Qinzi Zhang, Hoang Tran, Ashok Cutkosky
We introduce a new zeroth-order algorithm for private stochastic optimization on nonconvex and nonsmooth objectives.
no code implementations • 30 May 2024 • Ashok Cutkosky, Zakaria Mhammedi
We provide an online learning algorithm that obtains regret $G\|w_\star\|\sqrt{T\log(\|w_\star\|G\sqrt{T})} + \|w_\star\|^2 + G^2$ on $G$-Lipschitz convex losses for any comparison point $w_\star$ without knowing either $G$ or $\|w_\star\|$.
no code implementations • 29 May 2024 • Andrew Jacobsen, Ashok Cutkosky
We develop algorithms for online linear regression which achieve optimal static and dynamic regret guarantees \emph{even in the complete absence of prior knowledge}.
no code implementations • 28 May 2024 • Kwangjun Ahn, Ashok Cutkosky
In this work, we offer a theoretical analysis of two modern optimization techniques for training large and complex models: (i) adaptive optimization algorithms, such as Adam, and (ii) the model exponential moving average (EMA).
1 code implementation • 24 May 2024 • Aaron Defazio, Xingyu Alice Yang, Harsh Mehta, Konstantin Mishchenko, Ahmed Khaled, Ashok Cutkosky
Existing learning rate schedules that do not require specification of the optimization stopping step T are greatly out-performed by learning rate schedules that depend on T. We propose an approach that avoids the need for this stopping time by eschewing the use of schedules entirely, while exhibiting state-of-the-art performance compared to schedules across a wide family of problems ranging from convex problems to large-scale deep learning problems.
no code implementations • 16 May 2024 • Qinzi Zhang, Ashok Cutkosky
Training neural networks requires optimizing a loss function that may be highly irregular, and in particular neither convex nor smooth.
1 code implementation • 11 Oct 2023 • Aaron Defazio, Ashok Cutkosky, Harsh Mehta, Konstantin Mishchenko
To go beyond this worst-case analysis, we use the observed gradient norms to derive schedules refined for any particular task.
no code implementations • 27 Sep 2023 • ZhiYu Zhang, Heng Yang, Ashok Cutkosky, Ioannis Ch. Paschalidis
Motivated by the pursuit of instance optimality, we propose a new algorithm that simultaneously achieves ($i$) the AdaGrad-style second order gradient adaptivity; and ($ii$) the comparator norm adaptivity also known as "parameter freeness" in the literature.
no code implementations • 8 Jun 2023 • Andrew Jacobsen, Ashok Cutkosky
Algorithms for online learning typically require one or more boundedness assumptions: that the domain is bounded, that the losses are Lipschitz, or both.
no code implementations • 7 Feb 2023 • Ashok Cutkosky, Harsh Mehta, Francesco Orabona
Our primary technique is a reduction from non-smooth non-convex optimization to online learning, after which our results follow from standard regret bounds in online learning.
1 code implementation • 24 Nov 2022 • Harsh Mehta, Walid Krichene, Abhradeep Thakurta, Alexey Kurakin, Ashok Cutkosky
We find that linear regression is much more effective than logistic regression from both privacy and computational aspects, especially at stricter epsilon values ($\epsilon < 1$).
Ranked #34 on Image Classification on ImageNet
no code implementations • 25 Oct 2022 • Jiujia Zhang, Ashok Cutkosky
We present new algorithms for online convex optimization over unbounded domains that obtain parameter-free regret in high-probability given access only to potentially heavy-tailed subgradient estimates.
no code implementations • 12 Oct 2022 • Qinzi Zhang, Hoang Tran, Ashok Cutkosky
We develop a new reduction that converts any online convex optimization algorithm suffering $O(\sqrt{T})$ regret into an $\epsilon$-differentially private stochastic convex optimization algorithm with the optimal convergence rate $\tilde O(1/\sqrt{T} + \sqrt{d}/\epsilon T)$ on smooth losses in linear time, forming a direct analogy to the classical non-private "online-to-batch" conversion.
no code implementations • 12 Oct 2022 • Hoang Tran, Ashok Cutkosky
We introduce new algorithms and convergence guarantees for privacy-preserving non-convex Empirical Risk Minimization (ERM) on smooth $d$-dimensional objectives.
1 code implementation • 27 Jun 2022 • Harsh Mehta, Ankit Gupta, Ashok Cutkosky, Behnam Neyshabur
State space models have shown to be effective at modeling long range dependencies, specially on sequence classification tasks.
1 code implementation • 13 May 2022 • ZhiYu Zhang, Ashok Cutkosky, Ioannis Ch. Paschalidis
Practical online learning tasks are often naturally defined on unconstrained domains, where optimal algorithms for general convex losses are characterized by the notion of comparator adaptivity.
no code implementations • 6 May 2022 • Harsh Mehta, Abhradeep Thakurta, Alexey Kurakin, Ashok Cutkosky
Moreover, by systematically comparing private and non-private models across a range of large batch sizes, we find that similar to non-private setting, choice of optimizer can further improve performance substantially with DP.
no code implementations • 19 Mar 2022 • Keyi Chen, Ashok Cutkosky, Francesco Orabona
Parameter-free algorithms are online learning algorithms that do not require setting learning rates.
no code implementations • 8 Mar 2022 • Ashok Cutkosky, Chris Dann, Abhimanyu Das, Qiuyi, Zhang
We study the setting of optimizing with bandit feedback with additional prior knowledge provided to the learner in the form of an initial hint of the optimal action.
no code implementations • 26 Feb 2022 • Andrew Jacobsen, Ashok Cutkosky
We develop a modified online mirror descent framework that is suitable for building adaptive and parameter-free algorithms in unbounded domains.
1 code implementation • 31 Jan 2022 • Zhenxun Zhuang, Mingrui Liu, Ashok Cutkosky, Francesco Orabona
First, we show how to re-interpret AdamW as an approximation of a proximal gradient method, which takes advantage of the closed-form proximal mapping of the regularizer instead of only utilizing its gradient information as in Adam-$\ell_2$.
1 code implementation • 19 Jan 2022 • ZhiYu Zhang, Ashok Cutkosky, Ioannis Paschalidis
Unconstrained Online Linear Optimization (OLO) is a practical problem setting to study the training of machine learning models.
no code implementations • NeurIPS 2021 • Aditya Bhaskara, Ashok Cutkosky, Ravi Kumar, Manish Purohit
We consider the online linear optimization problem, where at every step the algorithm plays a point $x_t$ in the unit ball, and suffers loss $\langle c_t, x_t\rangle$ for some cost vector $c_t$ that is then revealed to the algorithm.
1 code implementation • NeurIPS 2021 • Aditya Gangrade, Anil Kag, Ashok Cutkosky, Venkatesh Saligrama
For example, this may model an adaptive decision to invoke more resources on this instance.
no code implementations • NeurIPS 2021 • Ashok Cutkosky, Harsh Mehta
We consider non-convex stochastic optimization using first-order algorithms for which the gradient estimates may have heavy tails.
1 code implementation • 4 Mar 2021 • Hoang Tran, Ashok Cutkosky
We develop a new algorithm for non-convex stochastic optimization that finds an $\epsilon$-critical point in the optimal $O(\epsilon^{-3})$ stochastic gradient and Hessian-vector product computations.
no code implementations • 2 Feb 2021 • ZhiYu Zhang, Ashok Cutkosky, Ioannis Ch. Paschalidis
Next, considering a related problem called online learning with memory, we construct a novel strongly adaptive algorithm that uses our first contribution as a building block.
no code implementations • 24 Dec 2020 • Ashok Cutkosky, Abhimanyu Das, Manish Purohit
We provide a simple method to combine stochastic bandit algorithms.
no code implementations • NeurIPS 2020 • Ashok Cutkosky
We provide online convex optimization algorithms that guarantee improved full-matrix regret bounds.
no code implementations • NeurIPS 2020 • Aditya Bhaskara, Ashok Cutkosky, Ravi Kumar, Manish Purohit
We study an online linear optimization (OLO) problem in which the learner is provided access to $K$ "hint" vectors in each round prior to making a decision.
1 code implementation • ICLR 2021 • Harsh Mehta, Ashok Cutkosky, Behnam Neyshabur
In the case of the homogeneous ReLU activation, we show that this behavior can be attributed to the loss function.
no code implementations • NeurIPS 2020 • Dirk van der Hoeven, Ashok Cutkosky, Haipeng Luo
We study bandit convex optimization methods that adapt to the norm of the comparator, a topic that has only been studied before for its full-information counterpart.
no code implementations • ICML 2020 • Aditya Bhaskara, Ashok Cutkosky, Ravi Kumar, Manish Purohit
We consider a variant of the classical online linear optimization problem in which at every step, the online player receives a "hint" vector before choosing the action for that round.
no code implementations • 10 Feb 2020 • Ashok Cutkosky
Given any increasing sequence of norms $\|\cdot\|_0,\dots,\|\cdot\|_{T-1}$, we provide an online convex optimization algorithm that outputs points $w_t$ in some domain $W$ in response to convex losses $\ell_t:W\to \mathbb{R}$ that guarantees regret $R_T(u)=\sum_{t=1}^T \ell_t(w_t)-\ell_t(u)\le \tilde O\left(\|u\|_{T-1}\sqrt{\sum_{t=1}^T \|g_t\|_{t-1,\star}^2}\right)$ where $g_t$ is a subgradient of $\ell_t$ at $w_t$.
no code implementations • ICML 2020 • Ashok Cutkosky, Harsh Mehta
We provide an improved analysis of normalized SGD showing that adding momentum provably removes the need for large batch sizes on non-convex objectives.
no code implementations • 29 May 2019 • Ashok Cutkosky, Tamas Sarlos
We provide an online convex optimization algorithm with regret that interpolates between the regret of an algorithm using an optimal preconditioning matrix and one using a diagonal preconditioning matrix.
no code implementations • NeurIPS 2019 • Kwang-Sung Jun, Ashok Cutkosky, Francesco Orabona
In this paper, we consider the nonparametric least square regression in a Reproducing Kernel Hilbert Space (RKHS).
2 code implementations • NeurIPS 2019 • Ashok Cutkosky, Francesco Orabona
Variance reduction has emerged in recent years as a strong competitor to stochastic gradient descent in non-convex problems, providing the first algorithms to improve upon the converge rate of stochastic gradient descent for finding first-order critical points.
no code implementations • 3 Mar 2019 • Ashok Cutkosky
A standard way to obtain convergence guarantees in stochastic convex optimization is to run an online learning algorithm and then output the average of its iterates: the actual iterates of the online learning algorithm do not come with individual guarantees.
no code implementations • 24 Feb 2019 • Ashok Cutkosky
We provide algorithms that guarantee regret $R_T(u)\le \tilde O(G\|u\|^3 + G(\|u\|+1)\sqrt{T})$ or $R_T(u)\le \tilde O(G\|u\|^3T^{1/3} + GT^{1/3}+ G\|u\|\sqrt{T})$ for online convex optimization with $G$-Lipschitz losses for any comparison point $u$ without prior knowledge of either $G$ or $\|u\|$.
no code implementations • 24 Feb 2019 • Ashok Cutkosky
We show how to take any two parameter-free online learning algorithms with different regret guarantees and obtain a single algorithm whose regret is the minimum of the two base algorithms.
1 code implementation • 25 Jan 2019 • Zhenxun Zhuang, Ashok Cutkosky, Francesco Orabona
Stochastic Gradient Descent (SGD) has played a central role in machine learning.
no code implementations • 17 Feb 2018 • Ashok Cutkosky, Francesco Orabona
We introduce several new black-box reductions that significantly improve the design of adaptive and parameter-free online learning algorithms by simplifying analysis, improving regret guarantees, and sometimes even improving runtime.
no code implementations • NeurIPS 2018 • Ashok Cutkosky, Robert Busa-Fekete
Scaling up the training process of these models is crucial, but the most popular algorithm, Stochastic Gradient Descent (SGD), is a serial method that is surprisingly hard to parallelize.
no code implementations • NeurIPS 2017 • Ashok Cutkosky, Kwabena A. Boahen
Most online optimization algorithms focus on one of two things: performing well in adversarial settings by adapting to unknown data parameters (such as Lipschitz constants), typically achieving $O(\sqrt{T})$ regret, or performing well in stochastic settings where they can leverage some structure in the losses (such as strong convexity), typically achieving $O(\log(T))$ regret.
no code implementations • NeurIPS 2016 • Ashok Cutkosky, Kwabena Boahen
We propose an online convex optimization algorithm (RescaledExp) that achieves optimal regret in the unconstrained setting without prior knowledge of any bounds on the loss functions.
no code implementations • 7 Mar 2017 • Ashok Cutkosky, Kwabena Boahen
The vast majority of optimization and online learning algorithms today require some prior information about the data (often in the form of bounds on gradients or on the optimal parameter value).