Search Results for author: Mitchell Wortsman

Found 26 papers, 21 papers with code

Replacing softmax with ReLU in Vision Transformers

no code implementations15 Sep 2023 Mitchell Wortsman, Jaehoon Lee, Justin Gilmer, Simon Kornblith

Previous research observed accuracy degradation when replacing the attention softmax with a point-wise activation such as ReLU.

The Role of Pre-training Data in Transfer Learning

2 code implementations27 Feb 2023 Rahim Entezari, Mitchell Wortsman, Olga Saukh, M. Moein Shariatnia, Hanie Sedghi, Ludwig Schmidt

We investigate the impact of pre-training data distribution on the few-shot and full fine-tuning performance using 3 pre-training methods (supervised, contrastive language-image and image-image), 7 pre-training datasets, and 9 downstream datasets.

Transfer Learning

Reproducible scaling laws for contrastive language-image learning

3 code implementations CVPR 2023 Mehdi Cherti, Romain Beaumont, Ross Wightman, Mitchell Wortsman, Gabriel Ilharco, Cade Gordon, Christoph Schuhmann, Ludwig Schmidt, Jenia Jitsev

To address these limitations, we investigate scaling laws for contrastive language-image pre-training (CLIP) with the public LAION dataset and the open-source OpenCLIP repository.

 Ranked #1 on Zero-Shot Image Classification on Country211 (using extra training data)

Image Classification Open Vocabulary Attribute Detection +4

Editing Models with Task Arithmetic

3 code implementations8 Dec 2022 Gabriel Ilharco, Marco Tulio Ribeiro, Mitchell Wortsman, Suchin Gururangan, Ludwig Schmidt, Hannaneh Hajishirzi, Ali Farhadi

Changing how pre-trained models behave -- e. g., improving their performance on a downstream task or mitigating biases learned during pre-training -- is a common practice when developing machine learning systems.

Negation

lo-fi: distributed fine-tuning without communication

no code implementations19 Oct 2022 Mitchell Wortsman, Suchin Gururangan, Shen Li, Ali Farhadi, Ludwig Schmidt, Michael Rabbat, Ari S. Morcos

When fine-tuning DeiT-base and DeiT-large on ImageNet, this procedure matches accuracy in-distribution and improves accuracy under distribution shift compared to the baseline, which observes the same amount of data but communicates gradients at each step.

Quality Not Quantity: On the Interaction between Dataset Design and Robustness of CLIP

1 code implementation10 Aug 2022 Thao Nguyen, Gabriel Ilharco, Mitchell Wortsman, Sewoong Oh, Ludwig Schmidt

Web-crawled datasets have enabled remarkable generalization capabilities in recent image-text models such as CLIP (Contrastive Language-Image pre-training) or Flamingo, but little is known about the dataset creation processes.

Patching open-vocabulary models by interpolating weights

1 code implementation10 Aug 2022 Gabriel Ilharco, Mitchell Wortsman, Samir Yitzhak Gadre, Shuran Song, Hannaneh Hajishirzi, Simon Kornblith, Ali Farhadi, Ludwig Schmidt

We study model patching, where the goal is to improve accuracy on specific tasks without degrading accuracy on tasks where performance is already adequate.

Image Classification

Data Determines Distributional Robustness in Contrastive Language Image Pre-training (CLIP)

2 code implementations3 May 2022 Alex Fang, Gabriel Ilharco, Mitchell Wortsman, Yuhao Wan, Vaishaal Shankar, Achal Dave, Ludwig Schmidt

Contrastively trained language-image models such as CLIP, ALIGN, and BASIC have demonstrated unprecedented robustness to multiple challenging natural distribution shifts.

Ranked #94 on Image Classification on ObjectNet (using extra training data)

Image Classification

CoWs on Pasture: Baselines and Benchmarks for Language-Driven Zero-Shot Object Navigation

1 code implementation CVPR 2023 Samir Yitzhak Gadre, Mitchell Wortsman, Gabriel Ilharco, Ludwig Schmidt, Shuran Song

To better evaluate L-ZSON, we introduce the Pasture benchmark, which considers finding uncommon objects, objects described by spatial and appearance attributes, and hidden objects described relative to visible objects.

Image Classification Object Localization +1

Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time

5 code implementations10 Mar 2022 Mitchell Wortsman, Gabriel Ilharco, Samir Yitzhak Gadre, Rebecca Roelofs, Raphael Gontijo-Lopes, Ari S. Morcos, Hongseok Namkoong, Ali Farhadi, Yair Carmon, Simon Kornblith, Ludwig Schmidt

The conventional recipe for maximizing model accuracy is to (1) train multiple models with various hyperparameters and (2) pick the individual model which performs best on a held-out validation set, discarding the remainder.

 Ranked #1 on Image Classification on ImageNet V2 (using extra training data)

Domain Generalization Image Classification +2

Robust fine-tuning of zero-shot models

3 code implementations CVPR 2022 Mitchell Wortsman, Gabriel Ilharco, Jong Wook Kim, Mike Li, Simon Kornblith, Rebecca Roelofs, Raphael Gontijo-Lopes, Hannaneh Hajishirzi, Ali Farhadi, Hongseok Namkoong, Ludwig Schmidt

Compared to standard fine-tuning, WiSE-FT provides large accuracy improvements under distribution shift, while preserving high accuracy on the target distribution.

Ranked #12 on Image Classification on ObjectNet (using extra training data)

Image Classification Transfer Learning

Learning Neural Network Subspaces

1 code implementation20 Feb 2021 Mitchell Wortsman, Maxwell Horton, Carlos Guestrin, Ali Farhadi, Mohammad Rastegari

Recent observations have advanced our understanding of the neural network optimization landscape, revealing the existence of (1) paths of high accuracy containing diverse solutions and (2) wider minima offering improved performance.

Deconstructing the Structure of Sparse Neural Networks

no code implementations30 Nov 2020 Maxwell Van Gelder, Mitchell Wortsman, Kiana Ehsani

Although sparse neural networks have been studied extensively, the focus has been primarily on accuracy.

Supermasks in Superposition

2 code implementations NeurIPS 2020 Mitchell Wortsman, Vivek Ramanujan, Rosanne Liu, Aniruddha Kembhavi, Mohammad Rastegari, Jason Yosinski, Ali Farhadi

We present the Supermasks in Superposition (SupSup) model, capable of sequentially learning thousands of tasks without catastrophic forgetting.

Cannot find the paper you are looking for? You can Submit a new open access paper.