Search Results for author: Yair Carmon

Found 28 papers, 8 papers with code

Accelerated Parameter-Free Stochastic Optimization

no code implementations31 Mar 2024 Itai Kreisler, Maor Ivgi, Oliver Hinder, Yair Carmon

We propose a method that achieves near-optimal rates for smooth stochastic convex optimization and requires essentially no prior knowledge of problem parameters.

Stochastic Optimization

The Price of Adaptivity in Stochastic Convex Optimization

no code implementations16 Feb 2024 Yair Carmon, Oliver Hinder

We prove impossibility results for adaptivity in non-smooth stochastic convex optimization.

DoG is SGD's Best Friend: A Parameter-Free Dynamic Step Size Schedule

1 code implementation8 Feb 2023 Maor Ivgi, Oliver Hinder, Yair Carmon

Empirically, we consider a broad range of vision and language transfer learning tasks, and show that DoG's performance is close to that of SGD with tuned learning rate.

Transfer Learning

ReSQueing Parallel and Private Stochastic Convex Optimization

no code implementations1 Jan 2023 Yair Carmon, Arun Jambulapati, Yujia Jin, Yin Tat Lee, Daogao Liu, Aaron Sidford, Kevin Tian

We give a parallel algorithm obtaining optimization error $\epsilon_{\text{opt}}$ with $d^{1/3}\epsilon_{\text{opt}}^{-2/3}$ gradient oracle query depth and $d^{1/3}\epsilon_{\text{opt}}^{-2/3} + \epsilon_{\text{opt}}^{-2}$ gradient queries in total, assuming access to a bounded-variance stochastic gradient estimator.

Malign Overfitting: Interpolation Can Provably Preclude Invariance

no code implementations28 Nov 2022 Yoav Wald, Gal Yona, Uri Shalit, Yair Carmon

This suggests that the phenomenon of ``benign overfitting," in which models generalize well despite interpolating, might not favorably extend to settings in which robustness or fairness are desirable.

Fairness Out-of-Distribution Generalization

RECAPP: Crafting a More Efficient Catalyst for Convex Optimization

1 code implementation17 Jun 2022 Yair Carmon, Arun Jambulapati, Yujia Jin, Aaron Sidford

The accelerated proximal point algorithm (APPA), also known as "Catalyst", is a well-established reduction from convex optimization to approximate proximal point computation (i. e., regularized minimization).

Making SGD Parameter-Free

no code implementations4 May 2022 Yair Carmon, Oliver Hinder

We develop an algorithm for parameter-free stochastic convex optimization (SCO) whose rate of convergence is only a double-logarithmic factor larger than the optimal rate for the corresponding known-parameter setting.

Distributionally Robust Optimization via Ball Oracle Acceleration

no code implementations24 Mar 2022 Yair Carmon, Danielle Hausler

We develop and analyze algorithms for distributionally robust optimization (DRO) of convex losses.

Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time

5 code implementations10 Mar 2022 Mitchell Wortsman, Gabriel Ilharco, Samir Yitzhak Gadre, Rebecca Roelofs, Raphael Gontijo-Lopes, Ari S. Morcos, Hongseok Namkoong, Ali Farhadi, Yair Carmon, Simon Kornblith, Ludwig Schmidt

The conventional recipe for maximizing model accuracy is to (1) train multiple models with various hyperparameters and (2) pick the individual model which performs best on a held-out validation set, discarding the remainder.

 Ranked #1 on Image Classification on ImageNet V2 (using extra training data)

Domain Generalization Image Classification +2

Scaling Laws Under the Microscope: Predicting Transformer Performance from Small Scale Experiments

no code implementations13 Feb 2022 Maor Ivgi, Yair Carmon, Jonathan Berant

Neural scaling laws define a predictable relationship between a model's parameter count and its performance after training in the form of a power law.

Model Selection

Never Go Full Batch (in Stochastic Convex Optimization)

no code implementations NeurIPS 2021 Idan Amir, Yair Carmon, Tomer Koren, Roi Livni

We study the generalization performance of $\text{full-batch}$ optimization algorithms for stochastic convex optimization: these are first-order methods that only access the exact gradient of the empirical risk (rather than gradients with respect to individual data points), that include a wide range of algorithms such as gradient descent, mirror descent, and their regularized and/or accelerated variants.

Stochastic Bias-Reduced Gradient Methods

no code implementations NeurIPS 2021 Hilal Asi, Yair Carmon, Arun Jambulapati, Yujia Jin, Aaron Sidford

We develop a new primitive for stochastic optimization: a low-bias, low-cost estimator of the minimizer $x_\star$ of any Lipschitz strongly-convex function.

Stochastic Optimization

Thinking Inside the Ball: Near-Optimal Minimization of the Maximal Loss

no code implementations4 May 2021 Yair Carmon, Arun Jambulapati, Yujia Jin, Aaron Sidford

We characterize the complexity of minimizing $\max_{i\in[N]} f_i(x)$ for convex, Lipschitz functions $f_1,\ldots, f_N$.

Large-Scale Methods for Distributionally Robust Optimization

1 code implementation NeurIPS 2020 Daniel Levy, Yair Carmon, John C. Duchi, Aaron Sidford

We propose and analyze algorithms for distributionally robust optimization of convex losses with conditional value at risk (CVaR) and $\chi^2$ divergence uncertainty sets.

Coordinate Methods for Matrix Games

no code implementations17 Sep 2020 Yair Carmon, Yujia Jin, Aaron Sidford, Kevin Tian

For linear regression with an elementwise nonnegative matrix, our guarantees improve on exact gradient methods by a factor of $\sqrt{\mathrm{nnz}(A)/(m+n)}$.

regression

Second-Order Information in Non-Convex Stochastic Optimization: Power and Limitations

no code implementations24 Jun 2020 Yossi Arjevani, Yair Carmon, John C. Duchi, Dylan J. Foster, Ayush Sekhari, Karthik Sridharan

We design an algorithm which finds an $\epsilon$-approximate stationary point (with $\|\nabla F(x)\|\le \epsilon$) using $O(\epsilon^{-3})$ stochastic gradient and Hessian-vector products, matching guarantees that were previously available only under a stronger assumption of access to multiple queries with the same random seed.

Second-order methods Stochastic Optimization

Lower Bounds for Non-Convex Stochastic Optimization

no code implementations5 Dec 2019 Yossi Arjevani, Yair Carmon, John C. Duchi, Dylan J. Foster, Nathan Srebro, Blake Woodworth

We lower bound the complexity of finding $\epsilon$-stationary points (with gradient norm at most $\epsilon$) using stochastic first-order methods.

Stochastic Optimization

Variance Reduction for Matrix Games

no code implementations NeurIPS 2019 Yair Carmon, Yujia Jin, Aaron Sidford, Kevin Tian

We present a randomized primal-dual algorithm that solves the problem $\min_{x} \max_{y} y^\top A x$ to additive error $\epsilon$ in time $\mathrm{nnz}(A) + \sqrt{\mathrm{nnz}(A)n}/\epsilon$, for matrix $A$ with larger dimension $n$ and $\mathrm{nnz}(A)$ nonzero entries.

Unlabeled Data Improves Adversarial Robustness

4 code implementations NeurIPS 2019 Yair Carmon, aditi raghunathan, Ludwig Schmidt, Percy Liang, John C. Duchi

We demonstrate, theoretically and empirically, that adversarial robustness can significantly benefit from semisupervised learning.

Adversarial Robustness Robust classification

A Rank-1 Sketch for Matrix Multiplicative Weights

no code implementations7 Mar 2019 Yair Carmon, John C. Duchi, Aaron Sidford, Kevin Tian

We show that a simple randomized sketch of the matrix multiplicative weight (MMW) update enjoys (in expectation) the same regret bounds as MMW, up to a small constant factor.

Analysis of Krylov Subspace Solutions of Regularized Non-Convex Quadratic Problems

no code implementations NeurIPS 2018 Yair Carmon, John C. Duchi

We provide convergence rates for Krylov subspace solutions to the trust-region and cubic-regularized (nonconvex) quadratic problems.

“Convex Until Proven Guilty”: Dimension-Free Acceleration of Gradient Descent on Non-Convex Functions

no code implementations ICML 2017 Yair Carmon, John C. Duchi, Oliver Hinder, Aaron Sidford

We develop and analyze a variant of Nesterov’s accelerated gradient descent (AGD) for minimization of smooth non-convex functions.

No bad local minima: Data independent training error guarantees for multilayer neural networks

no code implementations26 May 2016 Daniel Soudry, Yair Carmon

We use smoothed analysis techniques to provide guarantees on the training loss of Multilayer Neural Networks (MNNs) at differentiable local minima.

Cannot find the paper you are looking for? You can Submit a new open access paper.