no code implementations • 25 Apr 2024 • Haomiao Ni, Bernhard Egger, Suhas Lohit, Anoop Cherian, Ye Wang, Toshiaki Koike-Akino, Sharon X. Huang, Tim K. Marks
To guide video generation with the additional image input, we propose a "repeat-and-slide" strategy that modulates the reverse denoising process, allowing the frozen diffusion model to synthesize a video frame-by-frame starting from the provided image.
1 code implementation • 15 Apr 2024 • Aashish Anantha Ramakrishnan, Sharon X. Huang, Dongwon Lee
With Large Language Models (LLM) achieving success in language and commonsense reasoning tasks, we explore the ability of different LLMs to identify and understand key subjects from abstractive captions.
no code implementations • 5 Nov 2023 • Haomiao Ni, Jiachen Liu, Yuan Xue, Sharon X. Huang
In this paper, we propose a novel 3D-aware talking-head video motion transfer network, Head3D, which fully exploits the subject appearance information by generating a visually-interpretable 3D canonical head from the 2D subject frames with a recurrent network.
no code implementations • 23 Sep 2023 • Rui Yu, Jiachen Liu, Zihan Zhou, Sharon X. Huang
In various applications, such as robotic navigation and remote visual assistance, expanding the field of view (FOV) of the camera proves beneficial for enhancing environmental perception.
1 code implementation • 8 Aug 2023 • Jiarong Ye, Haomiao Ni, Peng Jin, Sharon X. Huang, Yuan Xue
To further reduce the dependency on annotated data, we propose a synthetic augmentation method called HistoDiffusion, which can be pre-trained on large-scale unlabeled datasets and later applied to a small-scale labeled dataset for augmented training.
1 code implementation • CVPR 2023 • Haomiao Ni, Changhao Shi, Kai Li, Sharon X. Huang, Martin Renqiang Min
In this paper, we propose an approach for cI2V using novel latent flow diffusion models (LFDM) that synthesize an optical flow sequence in the latent space based on the given condition to warp the given image.
1 code implementation • 5 Jan 2023 • Aashish Anantha Ramakrishnan, Sharon X. Huang, Dongwon Lee
Advancements in Text-to-Image synthesis over recent years have focused more on improving the quality of generated samples on datasets with descriptive captions.
1 code implementation • 2 Oct 2022 • Haomiao Ni, Yihao Liu, Sharon X. Huang, Yuan Xue
The novel design of dual branches combines the strengths of deformation-grid-based transformation and warp-free generation for better identity preservation and robustness to occlusion in the synthesized videos.