1 code implementation • 10 Dec 2024 • Arijit Ray, Jiafei Duan, Ellis Brown, Reuben Tan, Dina Bashkirova, Rose Hendrix, Kiana Ehsani, Aniruddha Kembhavi, Bryan A. Plummer, Ranjay Krishna, Kuo-Hao Zeng, Kate Saenko
While many studies highlight that large multimodal language models (MLMs) struggle to reason about space, they only focus on static spatial relationships, and not dynamic awareness of motion and space, i. e., reasoning about the effect of egocentric and object motions on spatial relationships.
1 code implementation • 24 Jun 2024 • Shengbang Tong, Ellis Brown, Penghao Wu, Sanghyun Woo, Manoj Middepogu, Sai Charitha Akula, Jihan Yang, Shusheng Yang, Adithya Iyer, Xichen Pan, Ziteng Wang, Rob Fergus, Yann Lecun, Saining Xie
We introduce Cambrian-1, a family of multimodal LLMs (MLLMs) designed with a vision-centric approach.
1 code implementation • 5 Feb 2024 • Jihan Yang, Runyu Ding, Ellis Brown, Xiaojuan Qi, Saining Xie
There is a sensory gulf between the Earth that humans inhabit and the digital realms in which modern AI agents are created.
4 code implementations • ICCV 2023 • Alexander C. Li, Mihir Prabhudesai, Shivam Duggal, Ellis Brown, Deepak Pathak
Our generative approach to classification, which we call Diffusion Classifier, attains strong results on a variety of benchmarks and outperforms alternative methods of extracting knowledge from diffusion models.
Ranked #1 on
Image Classification
on ObjectNet (ImageNet classes)
1 code implementation • 27 Feb 2023 • Alexander C. Li, Ellis Brown, Alexei A. Efros, Deepak Pathak
Modern vision models typically rely on fine-tuning general-purpose models pre-trained on large, static datasets.