no code implementations • 2 Jan 2024 • Jiraphon Yenphraphai, Xichen Pan, Sainan Liu, Daniele Panozzo, Saining Xie
We present Image Sculpting, a new framework for editing 2D images by incorporating tools from 3D geometry and graphics.
1 code implementation • 4 Oct 2023 • Xichen Pan, Li Dong, Shaohan Huang, Zhiliang Peng, Wenhu Chen, Furu Wei
These limitations keep them far from the ultimate goal of "image as a foreign language in image generation."
1 code implementation • 19 Apr 2023 • Guanfang Dong, Chenqiu Zhao, Xichen Pan, Anup Basu
In this paper, we propose a method called Learning Temporal Distribution and Spatial Correlation (LTS) that has the potential to be a general solution for universal moving object segmentation.
1 code implementation • 20 Nov 2022 • Xichen Pan, Pengda Qin, Yuhong Li, Hui Xue, Wenhu Chen
Conditioned diffusion models have demonstrated state-of-the-art text-to-image synthesis capacity.
Ranked #1 on Story Visualization on Pororo
1 code implementation • ACL 2022 • Xichen Pan, Peiyu Chen, Yichen Gong, Helong Zhou, Xinbing Wang, Zhouhan Lin
In particular, audio and visual front-ends are trained on large-scale unimodal datasets, then we integrate components of both front-ends into a larger multimodal framework which learns to recognize parallel audio-visual data into characters through a combination of CTC and seq2seq decoding.
Ranked #2 on Automatic Speech Recognition (ASR) on LRS2
Audio-Visual Speech Recognition Automatic Speech Recognition (ASR) +7