Search Results for author: Yair Carmon

Found 28 papers, 8 papers with code

Accelerated Parameter-Free Stochastic Optimization

no code implementations • 31 Mar 2024 • Itai Kreisler, Maor Ivgi, Oliver Hinder, Yair Carmon

We propose a method that achieves near-optimal rates for smooth stochastic convex optimization and requires essentially no prior knowledge of problem parameters.

Stochastic Optimization

Paper
Add Code

Language models scale reliably with over-training and on downstream tasks

1 code implementation • 13 Mar 2024 • Samir Yitzhak Gadre, Georgios Smyrnis, Vaishaal Shankar, Suchin Gururangan, Mitchell Wortsman, Rulin Shao, Jean Mercat, Alex Fang, Jeffrey Li, Sedrick Keh, Rui Xin, Marianna Nezhurina, Igor Vasiljevic, Jenia Jitsev, Alexandros G. Dimakis, Gabriel Ilharco, Shuran Song, Thomas Kollar, Yair Carmon, Achal Dave, Reinhard Heckel, Niklas Muennighoff, Ludwig Schmidt

We fit scaling laws that extrapolate in both the number of model parameters and the ratio of training tokens to parameters.

Language Modelling

Paper
Code

The Price of Adaptivity in Stochastic Convex Optimization

no code implementations • 16 Feb 2024 • Yair Carmon, Oliver Hinder

We prove impossibility results for adaptivity in non-smooth stochastic convex optimization.

Paper
Add Code

A Whole New Ball Game: A Primal Accelerated Method for Matrix Games and Minimizing the Maximum of Smooth Functions

no code implementations • 17 Nov 2023 • Yair Carmon, Arun Jambulapati, Yujia Jin, Aaron Sidford

For $n>d$ and $\epsilon=1/\sqrt{n}$ this improves over all existing first-order methods.

Paper
Add Code

Gradient Descent Monotonically Decreases the Sharpness of Gradient Flow Solutions in Scalar Networks and Beyond

no code implementations • 22 May 2023 • Itai Kreisler, Mor Shpigel Nacson, Daniel Soudry, Yair Carmon

Using this result, we characterize settings where GD provably converges to the EoS in scalar networks.

Paper
Add Code

DataComp: In search of the next generation of multimodal datasets

1 code implementation • NeurIPS 2023 • Samir Yitzhak Gadre, Gabriel Ilharco, Alex Fang, Jonathan Hayase, Georgios Smyrnis, Thao Nguyen, Ryan Marten, Mitchell Wortsman, Dhruba Ghosh, Jieyu Zhang, Eyal Orgad, Rahim Entezari, Giannis Daras, Sarah Pratt, Vivek Ramanujan, Yonatan Bitton, Kalyani Marathe, Stephen Mussmann, Richard Vencu, Mehdi Cherti, Ranjay Krishna, Pang Wei Koh, Olga Saukh, Alexander Ratner, Shuran Song, Hannaneh Hajishirzi, Ali Farhadi, Romain Beaumont, Sewoong Oh, Alex Dimakis, Jenia Jitsev, Yair Carmon, Vaishaal Shankar, Ludwig Schmidt

Multimodal datasets are a critical component in recent breakthroughs such as Stable Diffusion and GPT-4, yet their design does not receive the same research attention as model architectures or training algorithms.

Paper
Code

DoG is SGD's Best Friend: A Parameter-Free Dynamic Step Size Schedule

1 code implementation • 8 Feb 2023 • Maor Ivgi, Oliver Hinder, Yair Carmon

Empirically, we consider a broad range of vision and language transfer learning tasks, and show that DoG's performance is close to that of SGD with tuned learning rate.

Transfer Learning

Paper
Code

ReSQueing Parallel and Private Stochastic Convex Optimization

no code implementations • 1 Jan 2023 • Yair Carmon, Arun Jambulapati, Yujia Jin, Yin Tat Lee, Daogao Liu, Aaron Sidford, Kevin Tian

We give a parallel algorithm obtaining optimization error $\epsilon_{\text{opt}}$ with $d^{1/3}\epsilon_{\text{opt}}^{-2/3}$ gradient oracle query depth and $d^{1/3}\epsilon_{\text{opt}}^{-2/3} + \epsilon_{\text{opt}}^{-2}$ gradient queries in total, assuming access to a bounded-variance stochastic gradient estimator.

Paper
Add Code

Malign Overfitting: Interpolation Can Provably Preclude Invariance

no code implementations • 28 Nov 2022 • Yoav Wald, Gal Yona, Uri Shalit, Yair Carmon

This suggests that the phenomenon of ``benign overfitting," in which models generalize well despite interpolating, might not favorably extend to settings in which robustness or fairness are desirable.

Fairness Out-of-Distribution Generalization

Paper
Add Code

RECAPP: Crafting a More Efficient Catalyst for Convex Optimization

1 code implementation • 17 Jun 2022 • Yair Carmon, Arun Jambulapati, Yujia Jin, Aaron Sidford

The accelerated proximal point algorithm (APPA), also known as "Catalyst", is a well-established reduction from convex optimization to approximate proximal point computation (i. e., regularized minimization).

Paper
Code

Making SGD Parameter-Free

no code implementations • 4 May 2022 • Yair Carmon, Oliver Hinder

We develop an algorithm for parameter-free stochastic convex optimization (SCO) whose rate of convergence is only a double-logarithmic factor larger than the optimal rate for the corresponding known-parameter setting.

Paper
Add Code

Distributionally Robust Optimization via Ball Oracle Acceleration

no code implementations • 24 Mar 2022 • Yair Carmon, Danielle Hausler

We develop and analyze algorithms for distributionally robust optimization (DRO) of convex losses.

Paper
Add Code

Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time

5 code implementations • 10 Mar 2022 • Mitchell Wortsman, Gabriel Ilharco, Samir Yitzhak Gadre, Rebecca Roelofs, Raphael Gontijo-Lopes, Ari S. Morcos, Hongseok Namkoong, Ali Farhadi, Yair Carmon, Simon Kornblith, Ludwig Schmidt

The conventional recipe for maximizing model accuracy is to (1) train multiple models with various hyperparameters and (2) pick the individual model which performs best on a held-out validation set, discarding the remainder.

Ranked #1 on Image Classification on ImageNet V2 (using extra training data)

Domain Generalization Image Classification +2

361

Paper
Code

Scaling Laws Under the Microscope: Predicting Transformer Performance from Small Scale Experiments

no code implementations • 13 Feb 2022 • Maor Ivgi, Yair Carmon, Jonathan Berant

Neural scaling laws define a predictable relationship between a model's parameter count and its performance after training in the form of a power law.

Model Selection

Paper
Add Code

Accuracy on the Line: On the Strong Correlation Between Out-of-Distribution and In-Distribution Generalization

1 code implementation • 9 Jul 2021 • John Miller, Rohan Taori, aditi raghunathan, Shiori Sagawa, Pang Wei Koh, Vaishaal Shankar, Percy Liang, Yair Carmon, Ludwig Schmidt

For machine learning systems to be reliable, we must understand their performance in unseen, out-of-distribution environments.

Classification Domain Adaptation +1

112

Paper
Code

Never Go Full Batch (in Stochastic Convex Optimization)

no code implementations • NeurIPS 2021 • Idan Amir, Yair Carmon, Tomer Koren, Roi Livni

We study the generalization performance of $\text{full-batch}$ optimization algorithms for stochastic convex optimization: these are first-order methods that only access the exact gradient of the empirical risk (rather than gradients with respect to individual data points), that include a wide range of algorithms such as gradient descent, mirror descent, and their regularized and/or accelerated variants.

Paper
Add Code

Stochastic Bias-Reduced Gradient Methods

no code implementations • NeurIPS 2021 • Hilal Asi, Yair Carmon, Arun Jambulapati, Yujia Jin, Aaron Sidford

We develop a new primitive for stochastic optimization: a low-bias, low-cost estimator of the minimizer $x_\star$ of any Lipschitz strongly-convex function.

Stochastic Optimization

Paper
Add Code

Thinking Inside the Ball: Near-Optimal Minimization of the Maximal Loss

no code implementations • 4 May 2021 • Yair Carmon, Arun Jambulapati, Yujia Jin, Aaron Sidford

We characterize the complexity of minimizing $\max_{i\in[N]} f_i(x)$ for convex, Lipschitz functions $f_1,\ldots, f_N$.

Paper
Add Code

Large-Scale Methods for Distributionally Robust Optimization

1 code implementation • NeurIPS 2020 • Daniel Levy, Yair Carmon, John C. Duchi, Aaron Sidford

We propose and analyze algorithms for distributionally robust optimization of convex losses with conditional value at risk (CVaR) and $\chi^2$ divergence uncertainty sets.

Paper
Code

Coordinate Methods for Matrix Games

no code implementations • 17 Sep 2020 • Yair Carmon, Yujia Jin, Aaron Sidford, Kevin Tian

For linear regression with an elementwise nonnegative matrix, our guarantees improve on exact gradient methods by a factor of $\sqrt{\mathrm{nnz}(A)/(m+n)}$.

regression

Paper
Add Code

Second-Order Information in Non-Convex Stochastic Optimization: Power and Limitations

no code implementations • 24 Jun 2020 • Yossi Arjevani, Yair Carmon, John C. Duchi, Dylan J. Foster, Ayush Sekhari, Karthik Sridharan

We design an algorithm which finds an $\epsilon$-approximate stationary point (with $\|\nabla F(x)\|\le \epsilon$) using $O(\epsilon^{-3})$ stochastic gradient and Hessian-vector products, matching guarantees that were previously available only under a stronger assumption of access to multiple queries with the same random seed.

Second-order methods Stochastic Optimization

Paper
Add Code

Lower Bounds for Non-Convex Stochastic Optimization

no code implementations • 5 Dec 2019 • Yossi Arjevani, Yair Carmon, John C. Duchi, Dylan J. Foster, Nathan Srebro, Blake Woodworth

We lower bound the complexity of finding $\epsilon$-stationary points (with gradient norm at most $\epsilon$) using stochastic first-order methods.

Stochastic Optimization

Paper
Add Code

Variance Reduction for Matrix Games

no code implementations • NeurIPS 2019 • Yair Carmon, Yujia Jin, Aaron Sidford, Kevin Tian

We present a randomized primal-dual algorithm that solves the problem $\min_{x} \max_{y} y^\top A x$ to additive error $\epsilon$ in time $\mathrm{nnz}(A) + \sqrt{\mathrm{nnz}(A)n}/\epsilon$, for matrix $A$ with larger dimension $n$ and $\mathrm{nnz}(A)$ nonzero entries.

Paper
Add Code

Unlabeled Data Improves Adversarial Robustness

4 code implementations • NeurIPS 2019 • Yair Carmon, aditi raghunathan, Ludwig Schmidt, Percy Liang, John C. Duchi

We demonstrate, theoretically and empirically, that adversarial robustness can significantly benefit from semisupervised learning.

Adversarial Robustness Robust classification

133

Paper
Code

A Rank-1 Sketch for Matrix Multiplicative Weights

no code implementations • 7 Mar 2019 • Yair Carmon, John C. Duchi, Aaron Sidford, Kevin Tian

We show that a simple randomized sketch of the matrix multiplicative weight (MMW) update enjoys (in expectation) the same regret bounds as MMW, up to a small constant factor.

Paper
Add Code

Analysis of Krylov Subspace Solutions of Regularized Non-Convex Quadratic Problems

no code implementations • NeurIPS 2018 • Yair Carmon, John C. Duchi

We provide convergence rates for Krylov subspace solutions to the trust-region and cubic-regularized (nonconvex) quadratic problems.

Paper
Add Code

“Convex Until Proven Guilty”: Dimension-Free Acceleration of Gradient Descent on Non-Convex Functions

no code implementations • ICML 2017 • Yair Carmon, John C. Duchi, Oliver Hinder, Aaron Sidford

We develop and analyze a variant of Nesterov’s accelerated gradient descent (AGD) for minimization of smooth non-convex functions.

Paper
Add Code

No bad local minima: Data independent training error guarantees for multilayer neural networks

no code implementations • 26 May 2016 • Daniel Soudry, Yair Carmon

We use smoothed analysis techniques to provide guarantees on the training loss of Multilayer Neural Networks (MNNs) at differentiable local minima.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.