Search Results for author: Chenxu Hu

Found 5 papers, 3 papers with code

Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models

1 code implementation NeurIPS 2023 Simian Luo, Chuanhao Yan, Chenxu Hu, Hang Zhao

The Video-to-Audio (V2A) model has recently gained attention for its practical application in generating audio directly from silent videos, particularly in video/film production.

Audio Synthesis

ViP3D: End-to-end Visual Trajectory Prediction via 3D Agent Queries

1 code implementation CVPR 2023 Junru Gu, Chenxu Hu, Tianyuan Zhang, Xuanyao Chen, Yilun Wang, Yue Wang, Hang Zhao

In this work, we propose ViP3D, a query-based visual trajectory prediction pipeline that exploits rich information from raw videos to directly predict future trajectories of agents in a scene.

Autonomous Driving Trajectory Prediction

Neural Dubber: Dubbing for Videos According to Scripts

no code implementations NeurIPS 2021 Chenxu Hu, Qiao Tian, Tingle Li, Yuping Wang, Yuxuan Wang, Hang Zhao

Neural Dubber is a multi-modal text-to-speech (TTS) model that utilizes the lip movement in the video to control the prosody of the generated speech.

FastSpeech 2: Fast and High-Quality End-to-End Text to Speech

34 code implementations ICLR 2021 Yi Ren, Chenxu Hu, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu

In this paper, we propose FastSpeech 2, which addresses the issues in FastSpeech and better solves the one-to-many mapping problem in TTS by 1) directly training the model with ground-truth target instead of the simplified output from teacher, and 2) introducing more variation information of speech (e. g., pitch, energy and more accurate duration) as conditional inputs.

Ranked #6 on Text-To-Speech Synthesis on LJSpeech (using extra training data)

Knowledge Distillation Speech Synthesis +1

Cannot find the paper you are looking for? You can Submit a new open access paper.