Search Results for author: Yair Carmon

Found 18 papers, 4 papers with code

Making SGD Parameter-Free

no code implementations4 May 2022 Yair Carmon, Oliver Hinder

We develop an algorithm for parameter-free stochastic convex optimization (SCO) whose rate of convergence is only a double-logarithmic factor larger than the optimal rate for the corresponding known-parameter setting.

Distributionally Robust Optimization via Ball Oracle Acceleration

no code implementations24 Mar 2022 Yair Carmon, Danielle Hausler

We develop and analyze algorithms for distributionally robust optimization (DRO) of convex losses.

Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time

2 code implementations10 Mar 2022 Mitchell Wortsman, Gabriel Ilharco, Samir Yitzhak Gadre, Rebecca Roelofs, Raphael Gontijo-Lopes, Ari S. Morcos, Hongseok Namkoong, Ali Farhadi, Yair Carmon, Simon Kornblith, Ludwig Schmidt

In this paper, we revisit the second step of this procedure in the context of fine-tuning large pre-trained models, where fine-tuned models often appear to lie in a single low error basin.

 Ranked #1 on Image Classification on ImageNet V2 (using extra training data)

Domain Generalization Image Classification +1

Scaling Laws Under the Microscope: Predicting Transformer Performance from Small Scale Experiments

no code implementations13 Feb 2022 Maor Ivgi, Yair Carmon, Jonathan Berant

Neural scaling laws define a predictable relationship between a model's parameter count and its performance after training in the form of a power law.

Model Selection

Never Go Full Batch (in Stochastic Convex Optimization)

no code implementations NeurIPS 2021 Idan Amir, Yair Carmon, Tomer Koren, Roi Livni

We study the generalization performance of $\text{full-batch}$ optimization algorithms for stochastic convex optimization: these are first-order methods that only access the exact gradient of the empirical risk (rather than gradients with respect to individual data points), that include a wide range of algorithms such as gradient descent, mirror descent, and their regularized and/or accelerated variants.

Stochastic Bias-Reduced Gradient Methods

no code implementations NeurIPS 2021 Hilal Asi, Yair Carmon, Arun Jambulapati, Yujia Jin, Aaron Sidford

We develop a new primitive for stochastic optimization: a low-bias, low-cost estimator of the minimizer $x_\star$ of any Lipschitz strongly-convex function.

Stochastic Optimization

Thinking Inside the Ball: Near-Optimal Minimization of the Maximal Loss

no code implementations4 May 2021 Yair Carmon, Arun Jambulapati, Yujia Jin, Aaron Sidford

We characterize the complexity of minimizing $\max_{i\in[N]} f_i(x)$ for convex, Lipschitz functions $f_1,\ldots, f_N$.

Large-Scale Methods for Distributionally Robust Optimization

1 code implementation NeurIPS 2020 Daniel Levy, Yair Carmon, John C. Duchi, Aaron Sidford

We propose and analyze algorithms for distributionally robust optimization of convex losses with conditional value at risk (CVaR) and $\chi^2$ divergence uncertainty sets.

Coordinate Methods for Matrix Games

no code implementations17 Sep 2020 Yair Carmon, Yujia Jin, Aaron Sidford, Kevin Tian

For linear regression with an elementwise nonnegative matrix, our guarantees improve on exact gradient methods by a factor of $\sqrt{\mathrm{nnz}(A)/(m+n)}$.

Second-Order Information in Non-Convex Stochastic Optimization: Power and Limitations

no code implementations24 Jun 2020 Yossi Arjevani, Yair Carmon, John C. Duchi, Dylan J. Foster, Ayush Sekhari, Karthik Sridharan

We design an algorithm which finds an $\epsilon$-approximate stationary point (with $\|\nabla F(x)\|\le \epsilon$) using $O(\epsilon^{-3})$ stochastic gradient and Hessian-vector products, matching guarantees that were previously available only under a stronger assumption of access to multiple queries with the same random seed.

Second-order methods Stochastic Optimization

Lower Bounds for Non-Convex Stochastic Optimization

no code implementations5 Dec 2019 Yossi Arjevani, Yair Carmon, John C. Duchi, Dylan J. Foster, Nathan Srebro, Blake Woodworth

We lower bound the complexity of finding $\epsilon$-stationary points (with gradient norm at most $\epsilon$) using stochastic first-order methods.

Stochastic Optimization

Variance Reduction for Matrix Games

no code implementations NeurIPS 2019 Yair Carmon, Yujia Jin, Aaron Sidford, Kevin Tian

We present a randomized primal-dual algorithm that solves the problem $\min_{x} \max_{y} y^\top A x$ to additive error $\epsilon$ in time $\mathrm{nnz}(A) + \sqrt{\mathrm{nnz}(A)n}/\epsilon$, for matrix $A$ with larger dimension $n$ and $\mathrm{nnz}(A)$ nonzero entries.

Unlabeled Data Improves Adversarial Robustness

4 code implementations NeurIPS 2019 Yair Carmon, aditi raghunathan, Ludwig Schmidt, Percy Liang, John C. Duchi

We demonstrate, theoretically and empirically, that adversarial robustness can significantly benefit from semisupervised learning.

Adversarial Robustness Robust classification

A Rank-1 Sketch for Matrix Multiplicative Weights

no code implementations7 Mar 2019 Yair Carmon, John C. Duchi, Aaron Sidford, Kevin Tian

We show that a simple randomized sketch of the matrix multiplicative weight (MMW) update enjoys (in expectation) the same regret bounds as MMW, up to a small constant factor.

Analysis of Krylov Subspace Solutions of Regularized Non-Convex Quadratic Problems

no code implementations NeurIPS 2018 Yair Carmon, John C. Duchi

We provide convergence rates for Krylov subspace solutions to the trust-region and cubic-regularized (nonconvex) quadratic problems.

“Convex Until Proven Guilty”: Dimension-Free Acceleration of Gradient Descent on Non-Convex Functions

no code implementations ICML 2017 Yair Carmon, John C. Duchi, Oliver Hinder, Aaron Sidford

We develop and analyze a variant of Nesterov’s accelerated gradient descent (AGD) for minimization of smooth non-convex functions.

No bad local minima: Data independent training error guarantees for multilayer neural networks

no code implementations26 May 2016 Daniel Soudry, Yair Carmon

We use smoothed analysis techniques to provide guarantees on the training loss of Multilayer Neural Networks (MNNs) at differentiable local minima.

Cannot find the paper you are looking for? You can Submit a new open access paper.