Search Results for author: Xichen Pan

Found 5 papers, 4 papers with code

Image Sculpting: Precise Object Editing with 3D Geometry Control

no code implementations • 2 Jan 2024 • Jiraphon Yenphraphai, Xichen Pan, Sainan Liu, Daniele Panozzo, Saining Xie

We present Image Sculpting, a new framework for editing 2D images by incorporating tools from 3D geometry and graphics.

Object

Paper
Add Code

Kosmos-G: Generating Images in Context with Multimodal Large Language Models

1 code implementation • 4 Oct 2023 • Xichen Pan, Li Dong, Shaohan Huang, Zhiliang Peng, Wenhu Chen, Furu Wei

These limitations keep them far from the ultimate goal of "image as a foreign language in image generation."

Image Generation

18,319

Paper
Code

Learning Temporal Distribution and Spatial Correlation Towards Universal Moving Object Segmentation

1 code implementation • 19 Apr 2023 • Guanfang Dong, Chenqiu Zhao, Xichen Pan, Anup Basu

In this paper, we propose a method called Learning Temporal Distribution and Spatial Correlation (LTS) that has the potential to be a general solution for universal moving object segmentation.

Object Segmentation +1

Paper
Code

Synthesizing Coherent Story with Auto-Regressive Latent Diffusion Models

1 code implementation • 20 Nov 2022 • Xichen Pan, Pengda Qin, Yuhong Li, Hui Xue, Wenhu Chen

Conditioned diffusion models have demonstrated state-of-the-art text-to-image synthesis capacity.

Ranked #1 on Story Visualization on Pororo

Story Continuation Story Visualization

174

Paper
Code

Leveraging Unimodal Self-Supervised Learning for Multimodal Audio-Visual Speech Recognition

1 code implementation • ACL 2022 • Xichen Pan, Peiyu Chen, Yichen Gong, Helong Zhou, Xinbing Wang, Zhouhan Lin

In particular, audio and visual front-ends are trained on large-scale unimodal datasets, then we integrate components of both front-ends into a larger multimodal framework which learns to recognize parallel audio-visual data into characters through a combination of CTC and seq2seq decoding.

Ranked #2 on Automatic Speech Recognition (ASR) on LRS2

Audio-Visual Speech Recognition Automatic Speech Recognition (ASR) +7

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.