1 code implementation • 24 Jun 2024 • Shengbang Tong, Ellis Brown, Penghao Wu, Sanghyun Woo, Manoj Middepogu, Sai Charitha Akula, Jihan Yang, Shusheng Yang, Adithya Iyer, Xichen Pan, Austin Wang, Rob Fergus, Yann Lecun, Saining Xie
We introduce Cambrian-1, a family of multimodal LLMs (MLLMs) designed with a vision-centric approach.
no code implementations • CVPR 2024 • Jiraphon Yenphraphai, Xichen Pan, Sainan Liu, Daniele Panozzo, Saining Xie
We present Image Sculpting, a new framework for editing 2D images by incorporating tools from 3D geometry and graphics.
1 code implementation • 4 Oct 2023 • Xichen Pan, Li Dong, Shaohan Huang, Zhiliang Peng, Wenhu Chen, Furu Wei
These limitations keep them far from the ultimate goal of "image as a foreign language in image generation."
1 code implementation • 19 Apr 2023 • Guanfang Dong, Chenqiu Zhao, Xichen Pan, Anup Basu
In this paper, we propose a method called Learning Temporal Distribution and Spatial Correlation (LTS) that has the potential to be a general solution for universal moving object segmentation.
1 code implementation • 20 Nov 2022 • Xichen Pan, Pengda Qin, Yuhong Li, Hui Xue, Wenhu Chen
Conditioned diffusion models have demonstrated state-of-the-art text-to-image synthesis capacity.
Ranked #1 on Story Continuation on VIST
1 code implementation • ACL 2022 • Xichen Pan, Peiyu Chen, Yichen Gong, Helong Zhou, Xinbing Wang, Zhouhan Lin
In particular, audio and visual front-ends are trained on large-scale unimodal datasets, then we integrate components of both front-ends into a larger multimodal framework which learns to recognize parallel audio-visual data into characters through a combination of CTC and seq2seq decoding.
Ranked #2 on Automatic Speech Recognition (ASR) on LRS2
Audio-Visual Speech Recognition Automatic Speech Recognition (ASR) +7