no code implementations • CVPR 2024 • Jiasen Lu, Christopher Clark, Sangho Lee, Zichen Zhang, Savya Khosla, Ryan Marten, Derek Hoiem, Aniruddha Kembhavi
We present Unified-IO 2 a multimodal and multi-skill unified model capable of following novel instructions.
1 code implementation • 28 Dec 2023 • Jiasen Lu, Christopher Clark, Sangho Lee, Zichen Zhang, Savya Khosla, Ryan Marten, Derek Hoiem, Aniruddha Kembhavi
We present Unified-IO 2, the first autoregressive multimodal model that is capable of understanding and generating image, text, audio, and action.
3 code implementations • NeurIPS 2023 • Samir Yitzhak Gadre, Gabriel Ilharco, Alex Fang, Jonathan Hayase, Georgios Smyrnis, Thao Nguyen, Ryan Marten, Mitchell Wortsman, Dhruba Ghosh, Jieyu Zhang, Eyal Orgad, Rahim Entezari, Giannis Daras, Sarah Pratt, Vivek Ramanujan, Yonatan Bitton, Kalyani Marathe, Stephen Mussmann, Richard Vencu, Mehdi Cherti, Ranjay Krishna, Pang Wei Koh, Olga Saukh, Alexander Ratner, Shuran Song, Hannaneh Hajishirzi, Ali Farhadi, Romain Beaumont, Sewoong Oh, Alex Dimakis, Jenia Jitsev, Yair Carmon, Vaishaal Shankar, Ludwig Schmidt
Multimodal datasets are a critical component in recent breakthroughs such as Stable Diffusion and GPT-4, yet their design does not receive the same research attention as model architectures or training algorithms.
1 code implementation • 28 Apr 2022 • Tanmay Gupta, Ryan Marten, Aniruddha Kembhavi, Derek Hoiem
Computer vision models excel at making predictions when the test distribution closely resembles the training distribution.
1 code implementation • 22 Nov 2021 • Morteza Rezanejad, Sidharth Gupta, Chandra Gummaluru, Ryan Marten, John Wilder, Michael Gruninger, Dirk B. Walther
Humans are excellent at perceiving illusory outlines.