Search Results for author: Gaoxiang Cong

Found 3 papers, 2 papers with code

StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing

no code implementations • 20 Feb 2024 • Gaoxiang Cong, Yuankai Qi, Liang Li, Amin Beheshti, Zhedong Zhang, Anton Van Den Hengel, Ming-Hsuan Yang, Chenggang Yan, Qingming Huang

It contains three main components: (1) A multimodal style adaptor operating at the phoneme level to learn pronunciation style from the reference audio, and generate intermediate representations informed by the facial emotion presented in the video; (2) An utterance-level style learning module, which guides both the mel-spectrogram decoding and the refining processes from the intermediate embeddings to improve the overall style expression; And (3) a phoneme-guided lip aligner to maintain lip sync.

Voice Cloning

Paper
Add Code

Learning to Dub Movies via Hierarchical Prosody Models

1 code implementation • CVPR 2023 • Gaoxiang Cong, Liang Li, Yuankai Qi, ZhengJun Zha, Qi Wu, Wenyu Wang, Bin Jiang, Ming-Hsuan Yang, Qingming Huang

Given a piece of text, a video clip and a reference audio, the movie dubbing (also known as visual voice clone V2C) task aims to generate speeches that match the speaker's emotion presented in the video using the desired speaker voice as reference.

Paper
Code

LS-GAN: Iterative Language-based Image Manipulation via Long and Short Term Consistency Reasoning

2 code implementations • journal 2022 • Gaoxiang Cong, Liang Li, Zhenhuan Liu, Yunbin Tu, Weijun Qin, Shenyuan Zhang, Chengang Yan, Wenyu Wang, Bin Jiang

To address this issue, we propose a novel Long and Short term consistency reasoning Generative Adversarial Network (LS-GAN), which enhances the awareness of previous objects with current instruction and better maintains the consistency with the user's intent under the continuous iterations.

Generative Adversarial Network Image Manipulation

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.