Search Results for author: Ashok Cutkosky

Found 42 papers, 11 papers with code

Parameter-free, Dynamic, and Strongly-Adaptive Online Learning

no code implementations • ICML 2020 • Ashok Cutkosky

We provide a new online learning algorithm that for the first time combines several disparate notions of adaptivity.

Paper
Add Code

When, Why and How Much? Adaptive Learning Rate Scheduling by Refinement

1 code implementation • 11 Oct 2023 • Aaron Defazio, Ashok Cutkosky, Harsh Mehta, Konstantin Mishchenko

To go beyond this worst-case analysis, we use the observed gradient norms to derive schedules refined for any particular task.

Scheduling

Paper
Code

Improving Adaptive Online Learning Using Refined Discretization

no code implementations • 27 Sep 2023 • ZhiYu Zhang, Heng Yang, Ashok Cutkosky, Ioannis Ch. Paschalidis

Motivated by the pursuit of instance optimality, we propose a new algorithm that simultaneously achieves ($i$) the AdaGrad-style second order gradient adaptivity; and ($ii$) the comparator norm adaptivity also known as "parameter freeness" in the literature.

Paper
Add Code

Unconstrained Online Learning with Unbounded Losses

no code implementations • 8 Jun 2023 • Andrew Jacobsen, Ashok Cutkosky

Algorithms for online learning typically require one or more boundedness assumptions: that the domain is bounded, that the losses are Lipschitz, or both.

Paper
Add Code

Optimal Stochastic Non-smooth Non-convex Optimization through Online-to-Non-convex Conversion

no code implementations • 7 Feb 2023 • Ashok Cutkosky, Harsh Mehta, Francesco Orabona

Our primary technique is a reduction from non-smooth non-convex optimization to online learning, after which our results follow from standard regret bounds in online learning.

Paper
Add Code

Differentially Private Image Classification from Features

1 code implementation • 24 Nov 2022 • Harsh Mehta, Walid Krichene, Abhradeep Thakurta, Alexey Kurakin, Ashok Cutkosky

We find that linear regression is much more effective than logistic regression from both privacy and computational aspects, especially at stricter epsilon values ($\epsilon < 1$).

Ranked #35 on Image Classification on ImageNet

Classification Image Classification +3

32,809

Paper
Code

Parameter-free Regret in High Probability with Heavy Tails

no code implementations • 25 Oct 2022 • Jiujia Zhang, Ashok Cutkosky

We present new algorithms for online convex optimization over unbounded domains that obtain parameter-free regret in high-probability given access only to potentially heavy-tailed subgradient estimates.

Vocal Bursts Intensity Prediction

Paper
Add Code

Differentially Private Online-to-Batch for Smooth Losses

no code implementations • 12 Oct 2022 • Qinzi Zhang, Hoang Tran, Ashok Cutkosky

We develop a new reduction that converts any online convex optimization algorithm suffering $O(\sqrt{T})$ regret into an $\epsilon$-differentially private stochastic convex optimization algorithm with the optimal convergence rate $\tilde O(1/\sqrt{T} + \sqrt{d}/\epsilon T)$ on smooth losses in linear time, forming a direct analogy to the classical non-private "online-to-batch" conversion.

Paper
Add Code

Momentum Aggregation for Private Non-convex ERM

no code implementations • 12 Oct 2022 • Hoang Tran, Ashok Cutkosky

We introduce new algorithms and convergence guarantees for privacy-preserving non-convex Empirical Risk Minimization (ERM) on smooth $d$-dimensional objectives.

Privacy Preserving

Paper
Add Code

Long Range Language Modeling via Gated State Spaces

1 code implementation • 27 Jun 2022 • Harsh Mehta, Ankit Gupta, Ashok Cutkosky, Behnam Neyshabur

State space models have shown to be effective at modeling long range dependencies, specially on sequence classification tasks.

Language Modelling Zero-shot Generalization

Paper
Code

Optimal Comparator Adaptive Online Learning with Switching Cost

1 code implementation • 13 May 2022 • ZhiYu Zhang, Ashok Cutkosky, Ioannis Ch. Paschalidis

Practical online learning tasks are often naturally defined on unconstrained domains, where optimal algorithms for general convex losses are characterized by the notion of comparator adaptivity.

Paper
Code

Large Scale Transfer Learning for Differentially Private Image Classification

no code implementations • 6 May 2022 • Harsh Mehta, Abhradeep Thakurta, Alexey Kurakin, Ashok Cutkosky

Moreover, by systematically comparing private and non-private models across a range of large batch sizes, we find that similar to non-private setting, choice of optimizer can further improve performance substantially with DP.

Classification Image Classification +1

Paper
Add Code

Implicit Parameter-free Online Learning with Truncated Linear Models

no code implementations • 19 Mar 2022 • Keyi Chen, Ashok Cutkosky, Francesco Orabona

Parameter-free algorithms are online learning algorithms that do not require setting learning rates.

Stochastic Optimization

Paper
Add Code

Leveraging Initial Hints for Free in Stochastic Linear Bandits

no code implementations • 8 Mar 2022 • Ashok Cutkosky, Chris Dann, Abhimanyu Das, Qiuyi, Zhang

We study the setting of optimizing with bandit feedback with additional prior knowledge provided to the learner in the form of an initial hint of the optimal action.

Paper
Add Code

Parameter-free Mirror Descent

no code implementations • 26 Feb 2022 • Andrew Jacobsen, Ashok Cutkosky

We develop a modified online mirror descent framework that is suitable for building adaptive and parameter-free algorithms in unbounded domains.

Paper
Add Code

Understanding AdamW through Proximal Methods and Scale-Freeness

1 code implementation • 31 Jan 2022 • Zhenxun Zhuang, Mingrui Liu, Ashok Cutkosky, Francesco Orabona

First, we show how to re-interpret AdamW as an approximation of a proximal gradient method, which takes advantage of the closed-form proximal mapping of the regularizer instead of only utilizing its gradient information as in Adam-$\ell_2$.

Paper
Code

PDE-Based Optimal Strategy for Unconstrained Online Learning

1 code implementation • 19 Jan 2022 • ZhiYu Zhang, Ashok Cutkosky, Ioannis Paschalidis

Unconstrained Online Linear Optimization (OLO) is a practical problem setting to study the training of machine learning models.

Paper
Code

Logarithmic Regret from Sublinear Hints

no code implementations • NeurIPS 2021 • Aditya Bhaskara, Ashok Cutkosky, Ravi Kumar, Manish Purohit

We consider the online linear optimization problem, where at every step the algorithm plays a point $x_t$ in the unit ball, and suffers loss $\langle c_t, x_t\rangle$ for some cost vector $c_t$ that is then revealed to the algorithm.

Paper
Add Code

Online Selective Classification with Limited Feedback

1 code implementation • NeurIPS 2021 • Aditya Gangrade, Anil Kag, Ashok Cutkosky, Venkatesh Saligrama

For example, this may model an adaptive decision to invoke more resources on this instance.

Classification valid

Paper
Code

High-probability Bounds for Non-Convex Stochastic Optimization with Heavy Tails

no code implementations • NeurIPS 2021 • Ashok Cutkosky, Harsh Mehta

We consider non-convex stochastic optimization using first-order algorithms for which the gradient estimates may have heavy tails.

Stochastic Optimization Vocal Bursts Intensity Prediction

Paper
Add Code

Better SGD using Second-order Momentum

1 code implementation • 4 Mar 2021 • Hoang Tran, Ashok Cutkosky

We develop a new algorithm for non-convex stochastic optimization that finds an $\epsilon$-critical point in the optimal $O(\epsilon^{-3})$ stochastic gradient and Hessian-vector product computations.

Stochastic Optimization

Paper
Code

Adversarial Tracking Control via Strongly Adaptive Online Learning with Memory

no code implementations • 2 Feb 2021 • ZhiYu Zhang, Ashok Cutkosky, Ioannis Ch. Paschalidis

Next, considering a related problem called online learning with memory, we construct a novel strongly adaptive algorithm that uses our first contribution as a building block.

Paper
Add Code

Upper Confidence Bounds for Combining Stochastic Bandits

no code implementations • 24 Dec 2020 • Ashok Cutkosky, Abhimanyu Das, Manish Purohit

We provide a simple method to combine stochastic bandit algorithms.

Model Selection

Paper
Add Code

Better Full-Matrix Regret via Parameter-Free Online Learning

no code implementations • NeurIPS 2020 • Ashok Cutkosky

We provide online convex optimization algorithms that guarantee improved full-matrix regret bounds.

Paper
Add Code

Online Linear Optimization with Many Hints

no code implementations • NeurIPS 2020 • Aditya Bhaskara, Ashok Cutkosky, Ravi Kumar, Manish Purohit

We study an online linear optimization (OLO) problem in which the learner is provided access to $K$ "hint" vectors in each round prior to making a decision.

Paper
Add Code

Extreme Memorization via Scale of Initialization

1 code implementation • ICLR 2021 • Harsh Mehta, Ashok Cutkosky, Behnam Neyshabur

In the case of the homogeneous ReLU activation, we show that this behavior can be attributed to the loss function.

Image Classification Memorization

32,816

Paper
Code

Comparator-adaptive Convex Bandits

no code implementations • NeurIPS 2020 • Dirk van der Hoeven, Ashok Cutkosky, Haipeng Luo

We study bandit convex optimization methods that adapt to the norm of the comparator, a topic that has only been studied before for its full-information counterpart.

Paper
Add Code

Online Learning with Imperfect Hints

no code implementations • ICML 2020 • Aditya Bhaskara, Ashok Cutkosky, Ravi Kumar, Manish Purohit

We consider a variant of the classical online linear optimization problem in which at every step, the online player receives a "hint" vector before choosing the action for that round.

Paper
Add Code

Adaptive Online Learning with Varying Norms

no code implementations • 10 Feb 2020 • Ashok Cutkosky

Given any increasing sequence of norms $\|\cdot\|_0,\dots,\|\cdot\|_{T-1}$, we provide an online convex optimization algorithm that outputs points $w_t$ in some domain $W$ in response to convex losses $\ell_t:W\to \mathbb{R}$ that guarantees regret $R_T(u)=\sum_{t=1}^T \ell_t(w_t)-\ell_t(u)\le \tilde O\left(\|u\|_{T-1}\sqrt{\sum_{t=1}^T \|g_t\|_{t-1,\star}^2}\right)$ where $g_t$ is a subgradient of $\ell_t$ at $w_t$.

Paper
Add Code

Momentum Improves Normalized SGD

no code implementations • ICML 2020 • Ashok Cutkosky, Harsh Mehta

We provide an improved analysis of normalized SGD showing that adding momentum provably removes the need for large batch sizes on non-convex objectives.

Paper
Add Code

Matrix-Free Preconditioning in Online Learning

no code implementations • 29 May 2019 • Ashok Cutkosky, Tamas Sarlos

We provide an online convex optimization algorithm with regret that interpolates between the regret of an algorithm using an optimal preconditioning matrix and one using a diagonal preconditioning matrix.

Benchmarking

Paper
Add Code

Kernel Truncated Randomized Ridge Regression: Optimal Rates and Low Noise Acceleration

no code implementations • NeurIPS 2019 • Kwang-Sung Jun, Ashok Cutkosky, Francesco Orabona

In this paper, we consider the nonparametric least square regression in a Reproducing Kernel Hilbert Space (RKHS).

regression

Paper
Add Code

Momentum-Based Variance Reduction in Non-Convex SGD

2 code implementations • NeurIPS 2019 • Ashok Cutkosky, Francesco Orabona

Variance reduction has emerged in recent years as a strong competitor to stochastic gradient descent in non-convex problems, providing the first algorithms to improve upon the converge rate of stochastic gradient descent for finding first-order critical points.

32,808

Paper
Code

Anytime Online-to-Batch Conversions, Optimism, and Acceleration

no code implementations • 3 Mar 2019 • Ashok Cutkosky

A standard way to obtain convergence guarantees in stochastic convex optimization is to run an online learning algorithm and then output the average of its iterates: the actual iterates of the online learning algorithm do not come with individual guarantees.

Paper
Add Code

Artificial Constraints and Lipschitz Hints for Unconstrained Online Learning

no code implementations • 24 Feb 2019 • Ashok Cutkosky

We provide algorithms that guarantee regret $R_T(u)\le \tilde O(G\|u\|^3 + G(\|u\|+1)\sqrt{T})$ or $R_T(u)\le \tilde O(G\|u\|^3T^{1/3} + GT^{1/3}+ G\|u\|\sqrt{T})$ for online convex optimization with $G$-Lipschitz losses for any comparison point $u$ without prior knowledge of either $G$ or $\|u\|$.

Paper
Add Code

Combining Online Learning Guarantees

no code implementations • 24 Feb 2019 • Ashok Cutkosky

We show how to take any two parameter-free online learning algorithms with different regret guarantees and obtain a single algorithm whose regret is the minimum of the two base algorithms.

Paper
Add Code

Surrogate Losses for Online Learning of Stepsizes in Stochastic Non-Convex Optimization

1 code implementation • 25 Jan 2019 • Zhenxun Zhuang, Ashok Cutkosky, Francesco Orabona

Stochastic Gradient Descent (SGD) has played a central role in machine learning.

Stochastic Optimization

Paper
Code

Black-Box Reductions for Parameter-free Online Learning in Banach Spaces

no code implementations • 17 Feb 2018 • Ashok Cutkosky, Francesco Orabona

We introduce several new black-box reductions that significantly improve the design of adaptive and parameter-free online learning algorithms by simplifying analysis, improving regret guarantees, and sometimes even improving runtime.

Paper
Add Code

Distributed Stochastic Optimization via Adaptive SGD

no code implementations • NeurIPS 2018 • Ashok Cutkosky, Robert Busa-Fekete

Scaling up the training process of these models is crucial, but the most popular algorithm, Stochastic Gradient Descent (SGD), is a serial method that is surprisingly hard to parallelize.

Stochastic Optimization

Paper
Add Code

Stochastic and Adversarial Online Learning without Hyperparameters

no code implementations • NeurIPS 2017 • Ashok Cutkosky, Kwabena A. Boahen

Most online optimization algorithms focus on one of two things: performing well in adversarial settings by adapting to unknown data parameters (such as Lipschitz constants), typically achieving $O(\sqrt{T})$ regret, or performing well in stochastic settings where they can leverage some structure in the losses (such as strong convexity), typically achieving $O(\log(T))$ regret.

Paper
Add Code

Online Learning Without Prior Information

no code implementations • 7 Mar 2017 • Ashok Cutkosky, Kwabena Boahen

The vast majority of optimization and online learning algorithms today require some prior information about the data (often in the form of bounds on gradients or on the optimal parameter value).

Paper
Add Code

Online Convex Optimization with Unconstrained Domains and Losses

no code implementations • NeurIPS 2016 • Ashok Cutkosky, Kwabena Boahen

We propose an online convex optimization algorithm (RescaledExp) that achieves optimal regret in the unconstrained setting without prior knowledge of any bounds on the loss functions.

Hyperparameter Optimization

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.