Search Results for author: Chenxu Hu

Found 6 papers, 3 papers with code

FastSpeech 2: Fast and High-Quality End-to-End Text to Speech

32 code implementations ICLR 2021 Yi Ren, Chenxu Hu, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu

In this paper, we propose FastSpeech 2, which addresses the issues in FastSpeech and better solves the one-to-many mapping problem in TTS by 1) directly training the model with ground-truth target instead of the simplified output from teacher, and 2) introducing more variation information of speech (e. g., pitch, energy and more accurate duration) as conditional inputs.

Ranked #6 on Text-To-Speech Synthesis on LJSpeech (using extra training data)

Knowledge Distillation Speech Synthesis +1

ViP3D: End-to-end Visual Trajectory Prediction via 3D Agent Queries

1 code implementation CVPR 2023 Junru Gu, Chenxu Hu, Tianyuan Zhang, Xuanyao Chen, Yilun Wang, Yue Wang, Hang Zhao

In this work, we propose ViP3D, a query-based visual trajectory prediction pipeline that exploits rich information from raw videos to directly predict future trajectories of agents in a scene.

Autonomous Driving Trajectory Prediction

Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models

1 code implementation NeurIPS 2023 Simian Luo, Chuanhao Yan, Chenxu Hu, Hang Zhao

The Video-to-Audio (V2A) model has recently gained attention for its practical application in generating audio directly from silent videos, particularly in video/film production.

Audio Synthesis

Neural Dubber: Dubbing for Videos According to Scripts

no code implementations NeurIPS 2021 Chenxu Hu, Qiao Tian, Tingle Li, Yuping Wang, Yuxuan Wang, Hang Zhao

Neural Dubber is a multi-modal text-to-speech (TTS) model that utilizes the lip movement in the video to control the prosody of the generated speech.

DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models

no code implementations19 Feb 2024 Xiaoyu Tian, Junru Gu, Bailin Li, Yicheng Liu, Chenxu Hu, Yang Wang, Kun Zhan, Peng Jia, Xianpeng Lang, Hang Zhao

We introduce DriveVLM, an autonomous driving system leveraging Vision-Language Models (VLMs) for enhanced scene understanding and planning capabilities.

Autonomous Driving Scene Understanding

Cannot find the paper you are looking for? You can Submit a new open access paper.