no code implementations • 16 May 2024 • Joseph Cho, Cyril Zakka, Dhamanpreet Kaur, Rohan Shad, Ross Wightman, Akshay Chaudhari, William Hiesinger
Diffusion models have recently gained significant traction due to their ability to generate high-fidelity and diverse images and videos conditioned on text prompts.
4 code implementations • CVPR 2023 • Mehdi Cherti, Romain Beaumont, Ross Wightman, Mitchell Wortsman, Gabriel Ilharco, Cade Gordon, Christoph Schuhmann, Ludwig Schmidt, Jenia Jitsev
To address these limitations, we investigate scaling laws for contrastive language-image pre-training (CLIP) with the public LAION dataset and the open-source OpenCLIP repository.
Ranked #1 on Zero-Shot Image Classification on Country211 (using extra training data)
4 code implementations • NeurIPS 2022 Datasets and Benchmarks 2022 • Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, Patrick Schramowski, Srivatsa Kundurthy, Katherine Crowson, Ludwig Schmidt, Robert Kaczmarczyk, Jenia Jitsev
We show successful replication and fine-tuning of foundational models like CLIP, GLIDE and Stable Diffusion using the dataset, and discuss further experiments enabled with an openly available dataset of this scale.
13 code implementations • NeurIPS Workshop ImageNet_PPF 2021 • Ross Wightman, Hugo Touvron, Hervé Jégou
We share competitive training settings and pre-trained models in the timm open-source library, with the hope that they will serve as better baselines for future work.
Ranked #2 on Medical Image Classification on NCT-CRC-HE-100K
16 code implementations • 18 Jun 2021 • Andreas Steiner, Alexander Kolesnikov, Xiaohua Zhai, Ross Wightman, Jakob Uszkoreit, Lucas Beyer
Vision Transformers (ViT) have been shown to attain highly competitive performance for a wide range of vision applications, such as image classification, object detection and semantic image segmentation.