Search Results for author: Haogeng Liu

Found 5 papers, 0 papers with code

Video-Teller: Enhancing Cross-Modal Generation with Fusion and Decoupling

no code implementations8 Oct 2023 Haogeng Liu, Qihang Fan, Tingkai Liu, Linjie Yang, Yunzhe Tao, Huaibo Huang, Ran He, Hongxia Yang

This paper proposes Video-Teller, a video-language foundation model that leverages multi-modal fusion and fine-grained modality alignment to significantly enhance the video-to-text generation task.

Text Generation Video Summarization

Video-CSR: Complex Video Digest Creation for Visual-Language Models

no code implementations8 Oct 2023 Tingkai Liu, Yunzhe Tao, Haogeng Liu, Qihang Fan, Ding Zhou, Huaibo Huang, Ran He, Hongxia Yang

We present a novel task and human annotated dataset for evaluating the ability for visual-language models to generate captions and summaries for real-world video clips, which we call Video-CSR (Captioning, Summarization and Retrieval).

Retrieval Sentence +1

Boosting Fast and High-Quality Speech Synthesis with Linear Diffusion

no code implementations9 Jun 2023 Haogeng Liu, Tao Wang, Jie Cao, Ran He, JianHua Tao

When decreasing the number of sampling steps (i. e., the number of line segments used to fit the path), the ease of fitting straight lines compared to curves allows us to generate higher quality samples from a random noise with fewer iterations.

Denoising Speech Synthesis

UnifySpeech: A Unified Framework for Zero-shot Text-to-Speech and Voice Conversion

no code implementations10 Jan 2023 Haogeng Liu, Tao Wang, Ruibo Fu, Jiangyan Yi, Zhengqi Wen, JianHua Tao

Text-to-speech (TTS) and voice conversion (VC) are two different tasks both aiming at generating high quality speaking voice according to different input modality.

Quantization Voice Conversion

Cannot find the paper you are looking for? You can Submit a new open access paper.