2 code implementations • 10 Mar 2022 • Mitchell Wortsman, Gabriel Ilharco, Samir Yitzhak Gadre, Rebecca Roelofs, Raphael Gontijo-Lopes, Ari S. Morcos, Hongseok Namkoong, Ali Farhadi, Yair Carmon, Simon Kornblith, Ludwig Schmidt
In this paper, we revisit the second step of this procedure in the context of fine-tuning large pre-trained models, where fine-tuned models often appear to lie in a single low error basin.
Ranked #1 on Image Classification on ImageNet V2 (using extra training data)
1 code implementation • 4 Sep 2021 • Mitchell Wortsman, Gabriel Ilharco, Jong Wook Kim, Mike Li, Simon Kornblith, Rebecca Roelofs, Raphael Gontijo Lopes, Hannaneh Hajishirzi, Ali Farhadi, Hongseok Namkoong, Ludwig Schmidt
Compared to standard fine-tuning, WiSE-FT provides large accuracy improvements under distribution shift, while preserving high accuracy on the target distribution.
We propose a novel imitation-learning-based algorithm that distills a TS policy into an explicit policy representation by performing posterior inference and optimization offline.
While modern large-scale datasets often consist of heterogeneous subpopulations---for example, multiple demographic groups or multiple text corpora---the standard practice of minimizing average loss fails to guarantee uniformly low losses across all subpopulations.
We assess robustness of OPE methods under unobserved confounding by developing worst-case bounds on the performance of an evaluation policy.
Modern treatments for Type 1 diabetes (T1D) use devices known as artificial pancreata (APs), which combine an insulin pump with a continuous glucose monitor (CGM) operating in a closed-loop manner to control blood glucose levels.
While recent developments in autonomous vehicle (AV) technology highlight substantial progress, we lack tools for rigorous and scalable testing.
A common goal in statistics and machine learning is to learn models that can perform well against distributional shifts, such as latent heterogeneous subpopulations, unknown covariate shifts, or unmodeled temporal effects.
Machine learning models (e. g., speech recognizers) are usually trained to minimize average loss, which results in representation disparity---minority groups (e. g., non-native speakers) contribute less to the training objective and thus tend to suffer higher loss.
Only using training data from a single source distribution, we propose an iterative procedure that augments the dataset with examples from a fictitious target domain that is "hard" under the current model.
Neural networks are vulnerable to adversarial examples and researchers have proposed many heuristic attack and defense mechanisms.
Standard forms of coordinate and stochastic gradient methods do not adapt to structure in data; their good behavior under random sampling is predicated on uniformity in data.
We develop efficient solution methods for a robust empirical risk minimization problem designed to give calibrated confidence intervals on performance and provide optimal tradeoffs between bias and variance.
We study statistical inference and distributionally robust solution methods for stochastic optimization problems, focusing on confidence intervals for optimal values and solutions that achieve exact coverage asymptotically.
We develop an approach to risk minimization and stochastic optimization that provides a convex surrogate for variance, allowing near-optimal and computationally efficient trading between approximation and estimation error.