Search Results for author: Ashok Cutkosky

Found 49 papers, 12 papers with code

Parameter-free, Dynamic, and Strongly-Adaptive Online Learning

no code implementations ICML 2020 Ashok Cutkosky

We provide a new online learning algorithm that for the first time combines several disparate notions of adaptivity.

Empirical Tests of Optimization Assumptions in Deep Learning

no code implementations1 Jul 2024 Hoang Tran, Qinzi Zhang, Ashok Cutkosky

There is a significant gap between our theoretical understanding of optimization algorithms used in deep learning and their practical performance.

Private Zeroth-Order Nonsmooth Nonconvex Optimization

no code implementations27 Jun 2024 Qinzi Zhang, Hoang Tran, Ashok Cutkosky

We introduce a new zeroth-order algorithm for private stochastic optimization on nonconvex and nonsmooth objectives.

Stochastic Optimization

Fully Unconstrained Online Learning

no code implementations30 May 2024 Ashok Cutkosky, Zakaria Mhammedi

We provide an online learning algorithm that obtains regret $G\|w_\star\|\sqrt{T\log(\|w_\star\|G\sqrt{T})} + \|w_\star\|^2 + G^2$ on $G$-Lipschitz convex losses for any comparison point $w_\star$ without knowing either $G$ or $\|w_\star\|$.

Online Linear Regression in Dynamic Environments via Discounting

no code implementations29 May 2024 Andrew Jacobsen, Ashok Cutkosky

We develop algorithms for online linear regression which achieve optimal static and dynamic regret guarantees \emph{even in the complete absence of prior knowledge}.

regression

Adam with model exponential moving average is effective for nonconvex optimization

no code implementations28 May 2024 Kwangjun Ahn, Ashok Cutkosky

In this work, we offer a theoretical analysis of two modern optimization techniques for training large and complex models: (i) adaptive optimization algorithms, such as Adam, and (ii) the model exponential moving average (EMA).

The Road Less Scheduled

1 code implementation24 May 2024 Aaron Defazio, Xingyu Alice Yang, Harsh Mehta, Konstantin Mishchenko, Ahmed Khaled, Ashok Cutkosky

Existing learning rate schedules that do not require specification of the optimization stopping step T are greatly out-performed by learning rate schedules that depend on T. We propose an approach that avoids the need for this stopping time by eschewing the use of schedules entirely, while exhibiting state-of-the-art performance compared to schedules across a wide family of problems ranging from convex problems to large-scale deep learning problems.

Scheduling

Random Scaling and Momentum for Non-smooth Non-convex Optimization

no code implementations16 May 2024 Qinzi Zhang, Ashok Cutkosky

Training neural networks requires optimizing a loss function that may be highly irregular, and in particular neither convex nor smooth.

When, Why and How Much? Adaptive Learning Rate Scheduling by Refinement

1 code implementation11 Oct 2023 Aaron Defazio, Ashok Cutkosky, Harsh Mehta, Konstantin Mishchenko

To go beyond this worst-case analysis, we use the observed gradient norms to derive schedules refined for any particular task.

Scheduling

Improving Adaptive Online Learning Using Refined Discretization

no code implementations27 Sep 2023 ZhiYu Zhang, Heng Yang, Ashok Cutkosky, Ioannis Ch. Paschalidis

Motivated by the pursuit of instance optimality, we propose a new algorithm that simultaneously achieves ($i$) the AdaGrad-style second order gradient adaptivity; and ($ii$) the comparator norm adaptivity also known as "parameter freeness" in the literature.

Unconstrained Online Learning with Unbounded Losses

no code implementations8 Jun 2023 Andrew Jacobsen, Ashok Cutkosky

Algorithms for online learning typically require one or more boundedness assumptions: that the domain is bounded, that the losses are Lipschitz, or both.

Optimal Stochastic Non-smooth Non-convex Optimization through Online-to-Non-convex Conversion

no code implementations7 Feb 2023 Ashok Cutkosky, Harsh Mehta, Francesco Orabona

Our primary technique is a reduction from non-smooth non-convex optimization to online learning, after which our results follow from standard regret bounds in online learning.

Differentially Private Image Classification from Features

1 code implementation24 Nov 2022 Harsh Mehta, Walid Krichene, Abhradeep Thakurta, Alexey Kurakin, Ashok Cutkosky

We find that linear regression is much more effective than logistic regression from both privacy and computational aspects, especially at stricter epsilon values ($\epsilon < 1$).

Classification Image Classification +3

Parameter-free Regret in High Probability with Heavy Tails

no code implementations25 Oct 2022 Jiujia Zhang, Ashok Cutkosky

We present new algorithms for online convex optimization over unbounded domains that obtain parameter-free regret in high-probability given access only to potentially heavy-tailed subgradient estimates.

Vocal Bursts Intensity Prediction

Differentially Private Online-to-Batch for Smooth Losses

no code implementations12 Oct 2022 Qinzi Zhang, Hoang Tran, Ashok Cutkosky

We develop a new reduction that converts any online convex optimization algorithm suffering $O(\sqrt{T})$ regret into an $\epsilon$-differentially private stochastic convex optimization algorithm with the optimal convergence rate $\tilde O(1/\sqrt{T} + \sqrt{d}/\epsilon T)$ on smooth losses in linear time, forming a direct analogy to the classical non-private "online-to-batch" conversion.

Momentum Aggregation for Private Non-convex ERM

no code implementations12 Oct 2022 Hoang Tran, Ashok Cutkosky

We introduce new algorithms and convergence guarantees for privacy-preserving non-convex Empirical Risk Minimization (ERM) on smooth $d$-dimensional objectives.

Privacy Preserving

Long Range Language Modeling via Gated State Spaces

1 code implementation27 Jun 2022 Harsh Mehta, Ankit Gupta, Ashok Cutkosky, Behnam Neyshabur

State space models have shown to be effective at modeling long range dependencies, specially on sequence classification tasks.

Language Modelling Zero-shot Generalization

Optimal Comparator Adaptive Online Learning with Switching Cost

1 code implementation13 May 2022 ZhiYu Zhang, Ashok Cutkosky, Ioannis Ch. Paschalidis

Practical online learning tasks are often naturally defined on unconstrained domains, where optimal algorithms for general convex losses are characterized by the notion of comparator adaptivity.

Large Scale Transfer Learning for Differentially Private Image Classification

no code implementations6 May 2022 Harsh Mehta, Abhradeep Thakurta, Alexey Kurakin, Ashok Cutkosky

Moreover, by systematically comparing private and non-private models across a range of large batch sizes, we find that similar to non-private setting, choice of optimizer can further improve performance substantially with DP.

Classification Image Classification +1

Implicit Parameter-free Online Learning with Truncated Linear Models

no code implementations19 Mar 2022 Keyi Chen, Ashok Cutkosky, Francesco Orabona

Parameter-free algorithms are online learning algorithms that do not require setting learning rates.

Stochastic Optimization

Leveraging Initial Hints for Free in Stochastic Linear Bandits

no code implementations8 Mar 2022 Ashok Cutkosky, Chris Dann, Abhimanyu Das, Qiuyi, Zhang

We study the setting of optimizing with bandit feedback with additional prior knowledge provided to the learner in the form of an initial hint of the optimal action.

Parameter-free Mirror Descent

no code implementations26 Feb 2022 Andrew Jacobsen, Ashok Cutkosky

We develop a modified online mirror descent framework that is suitable for building adaptive and parameter-free algorithms in unbounded domains.

Understanding AdamW through Proximal Methods and Scale-Freeness

1 code implementation31 Jan 2022 Zhenxun Zhuang, Mingrui Liu, Ashok Cutkosky, Francesco Orabona

First, we show how to re-interpret AdamW as an approximation of a proximal gradient method, which takes advantage of the closed-form proximal mapping of the regularizer instead of only utilizing its gradient information as in Adam-$\ell_2$.

PDE-Based Optimal Strategy for Unconstrained Online Learning

1 code implementation19 Jan 2022 ZhiYu Zhang, Ashok Cutkosky, Ioannis Paschalidis

Unconstrained Online Linear Optimization (OLO) is a practical problem setting to study the training of machine learning models.

Logarithmic Regret from Sublinear Hints

no code implementations NeurIPS 2021 Aditya Bhaskara, Ashok Cutkosky, Ravi Kumar, Manish Purohit

We consider the online linear optimization problem, where at every step the algorithm plays a point $x_t$ in the unit ball, and suffers loss $\langle c_t, x_t\rangle$ for some cost vector $c_t$ that is then revealed to the algorithm.

Better SGD using Second-order Momentum

1 code implementation4 Mar 2021 Hoang Tran, Ashok Cutkosky

We develop a new algorithm for non-convex stochastic optimization that finds an $\epsilon$-critical point in the optimal $O(\epsilon^{-3})$ stochastic gradient and Hessian-vector product computations.

Stochastic Optimization

Adversarial Tracking Control via Strongly Adaptive Online Learning with Memory

no code implementations2 Feb 2021 ZhiYu Zhang, Ashok Cutkosky, Ioannis Ch. Paschalidis

Next, considering a related problem called online learning with memory, we construct a novel strongly adaptive algorithm that uses our first contribution as a building block.

Better Full-Matrix Regret via Parameter-Free Online Learning

no code implementations NeurIPS 2020 Ashok Cutkosky

We provide online convex optimization algorithms that guarantee improved full-matrix regret bounds.

Online Linear Optimization with Many Hints

no code implementations NeurIPS 2020 Aditya Bhaskara, Ashok Cutkosky, Ravi Kumar, Manish Purohit

We study an online linear optimization (OLO) problem in which the learner is provided access to $K$ "hint" vectors in each round prior to making a decision.

Extreme Memorization via Scale of Initialization

1 code implementation ICLR 2021 Harsh Mehta, Ashok Cutkosky, Behnam Neyshabur

In the case of the homogeneous ReLU activation, we show that this behavior can be attributed to the loss function.

Image Classification Memorization

Comparator-adaptive Convex Bandits

no code implementations NeurIPS 2020 Dirk van der Hoeven, Ashok Cutkosky, Haipeng Luo

We study bandit convex optimization methods that adapt to the norm of the comparator, a topic that has only been studied before for its full-information counterpart.

Online Learning with Imperfect Hints

no code implementations ICML 2020 Aditya Bhaskara, Ashok Cutkosky, Ravi Kumar, Manish Purohit

We consider a variant of the classical online linear optimization problem in which at every step, the online player receives a "hint" vector before choosing the action for that round.

Adaptive Online Learning with Varying Norms

no code implementations10 Feb 2020 Ashok Cutkosky

Given any increasing sequence of norms $\|\cdot\|_0,\dots,\|\cdot\|_{T-1}$, we provide an online convex optimization algorithm that outputs points $w_t$ in some domain $W$ in response to convex losses $\ell_t:W\to \mathbb{R}$ that guarantees regret $R_T(u)=\sum_{t=1}^T \ell_t(w_t)-\ell_t(u)\le \tilde O\left(\|u\|_{T-1}\sqrt{\sum_{t=1}^T \|g_t\|_{t-1,\star}^2}\right)$ where $g_t$ is a subgradient of $\ell_t$ at $w_t$.

Momentum Improves Normalized SGD

no code implementations ICML 2020 Ashok Cutkosky, Harsh Mehta

We provide an improved analysis of normalized SGD showing that adding momentum provably removes the need for large batch sizes on non-convex objectives.

Matrix-Free Preconditioning in Online Learning

no code implementations29 May 2019 Ashok Cutkosky, Tamas Sarlos

We provide an online convex optimization algorithm with regret that interpolates between the regret of an algorithm using an optimal preconditioning matrix and one using a diagonal preconditioning matrix.

Benchmarking

Momentum-Based Variance Reduction in Non-Convex SGD

2 code implementations NeurIPS 2019 Ashok Cutkosky, Francesco Orabona

Variance reduction has emerged in recent years as a strong competitor to stochastic gradient descent in non-convex problems, providing the first algorithms to improve upon the converge rate of stochastic gradient descent for finding first-order critical points.

Anytime Online-to-Batch Conversions, Optimism, and Acceleration

no code implementations3 Mar 2019 Ashok Cutkosky

A standard way to obtain convergence guarantees in stochastic convex optimization is to run an online learning algorithm and then output the average of its iterates: the actual iterates of the online learning algorithm do not come with individual guarantees.

Artificial Constraints and Lipschitz Hints for Unconstrained Online Learning

no code implementations24 Feb 2019 Ashok Cutkosky

We provide algorithms that guarantee regret $R_T(u)\le \tilde O(G\|u\|^3 + G(\|u\|+1)\sqrt{T})$ or $R_T(u)\le \tilde O(G\|u\|^3T^{1/3} + GT^{1/3}+ G\|u\|\sqrt{T})$ for online convex optimization with $G$-Lipschitz losses for any comparison point $u$ without prior knowledge of either $G$ or $\|u\|$.

Combining Online Learning Guarantees

no code implementations24 Feb 2019 Ashok Cutkosky

We show how to take any two parameter-free online learning algorithms with different regret guarantees and obtain a single algorithm whose regret is the minimum of the two base algorithms.

Black-Box Reductions for Parameter-free Online Learning in Banach Spaces

no code implementations17 Feb 2018 Ashok Cutkosky, Francesco Orabona

We introduce several new black-box reductions that significantly improve the design of adaptive and parameter-free online learning algorithms by simplifying analysis, improving regret guarantees, and sometimes even improving runtime.

Distributed Stochastic Optimization via Adaptive SGD

no code implementations NeurIPS 2018 Ashok Cutkosky, Robert Busa-Fekete

Scaling up the training process of these models is crucial, but the most popular algorithm, Stochastic Gradient Descent (SGD), is a serial method that is surprisingly hard to parallelize.

Stochastic Optimization

Stochastic and Adversarial Online Learning without Hyperparameters

no code implementations NeurIPS 2017 Ashok Cutkosky, Kwabena A. Boahen

Most online optimization algorithms focus on one of two things: performing well in adversarial settings by adapting to unknown data parameters (such as Lipschitz constants), typically achieving $O(\sqrt{T})$ regret, or performing well in stochastic settings where they can leverage some structure in the losses (such as strong convexity), typically achieving $O(\log(T))$ regret.

Online Convex Optimization with Unconstrained Domains and Losses

no code implementations NeurIPS 2016 Ashok Cutkosky, Kwabena Boahen

We propose an online convex optimization algorithm (RescaledExp) that achieves optimal regret in the unconstrained setting without prior knowledge of any bounds on the loss functions.

Hyperparameter Optimization

Online Learning Without Prior Information

no code implementations7 Mar 2017 Ashok Cutkosky, Kwabena Boahen

The vast majority of optimization and online learning algorithms today require some prior information about the data (often in the form of bounds on gradients or on the optimal parameter value).

Cannot find the paper you are looking for? You can Submit a new open access paper.