Search Results for author: Yuping Wang

Found 19 papers, 2 papers with code

VoiceShop: A Unified Speech-to-Speech Framework for Identity-Preserving Zero-Shot Voice Editing

no code implementations • 10 Apr 2024 • Philip Anastassiou, Zhenyu Tang, Kainan Peng, Dongya Jia, Jiaxin Li, Ming Tu, Yuping Wang, Yuxuan Wang, Mingbo Ma

We present VoiceShop, a novel speech-to-speech framework that can modify multiple attributes of speech, such as age, gender, accent, and speech style, in a single forward pass while preserving the input speaker's timbre.

Attribute

Paper
Add Code

CMP: Cooperative Motion Prediction with Multi-Agent Communication

no code implementations • 26 Mar 2024 • Zhuoyuan Wu, Yuping Wang, Hengbo Ma, Zhaowei Li, Hang Qiu, Jiachen Li

Building on top of cooperative perception, this paper explores the feasibility and effectiveness of cooperative motion prediction.

Autonomous Vehicles motion prediction

Paper
Add Code

StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice Conversion

no code implementations • 19 Jan 2024 • Zhichao Wang, Yuanzhe Chen, Xinsheng Wang, Zhuo Chen, Lei Xie, Yuping Wang, Yuxuan Wang

Specifically, to enable streaming capability, StreamVoice employs a fully causal context-aware LM with a temporal-independent acoustic predictor, while alternately processing semantic and acoustic features at each time step of autoregression which eliminates the dependence on complete source speech.

Language Modelling Voice Conversion

Paper
Add Code

Towards the Unification of Generative and Discriminative Visual Foundation Model: A Survey

no code implementations • 15 Dec 2023 • Xu Liu, Tong Zhou, Yuanxin Wang, Yuping Wang, Qinjingwen Cao, Weizhi Du, Yonghuan Yang, Junjun He, Yu Qiao, Yiqing Shen

The advent of foundation models, which are pre-trained on vast datasets, has ushered in a new era of computer vision, characterized by their robustness and remarkable zero-shot generalization capabilities.

Image Generation Image Segmentation +2

Paper
Add Code

Novel View Synthesis from a Single RGBD Image for Indoor Scenes

no code implementations • 2 Nov 2023 • Congrui Hetang, Yuping Wang

In this paper, we propose an approach for synthesizing novel view images from a single RGBD (Red Green Blue-Depth) input.

Novel View Synthesis Style Transfer

Paper
Add Code

EqDrive: Efficient Equivariant Motion Forecasting with Multi-Modality for Autonomous Driving

no code implementations • 26 Oct 2023 • Yuping Wang, Jier Chen

Forecasting vehicular motions in autonomous driving requires a deep understanding of agent interactions and the preservation of motion equivariance under Euclidean geometric transformations.

Motion Forecasting

Paper
Add Code

Equivariant Map and Agent Geometry for Autonomous Driving Motion Prediction

no code implementations • 21 Oct 2023 • Yuping Wang, Jier Chen

This research introduces a groundbreaking solution by employing EqMotion, a theoretically geometric equivariant and interaction invariant motion prediction model for particles and humans, plus integrating agent-equivariant high-definition (HD) map features for context aware motion prediction in autonomous driving.

Autonomous Driving motion prediction

Paper
Add Code

MSM-VC: High-fidelity Source Style Transfer for Non-Parallel Voice Conversion by Multi-scale Style Modeling

no code implementations • 3 Sep 2023 • Zhichao Wang, Xinsheng Wang, Qicong Xie, Tao Li, Lei Xie, Qiao Tian, Yuping Wang

In addition to conveying the linguistic content from source speech to converted speech, maintaining the speaking style of source speech also plays an important role in the voice conversion (VC) task, which is essential in many scenarios with highly expressive source speech, such as dubbing and data augmentation.

Data Augmentation Disentanglement +3

Paper
Add Code

Halo: Estimation and Reduction of Hallucinations in Open-Source Weak Large Language Models

1 code implementation • 22 Aug 2023 • Mohamed Elaraby, Mengyin Lu, Jacob Dunn, Xueying Zhang, Yu Wang, Shizhu Liu, Pingchuan Tian, Yuping Wang, Yuxuan Wang

Large Language Models (LLMs) have revolutionized Natural Language Processing (NLP).

Paper
Code

AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining

1 code implementation • 10 Aug 2023 • Haohe Liu, Qiao Tian, Yi Yuan, Xubo Liu, Xinhao Mei, Qiuqiang Kong, Yuping Wang, Wenwu Wang, Yuxuan Wang, Mark D. Plumbley

Any audio can be translated into LOA based on AudioMAE, a self-supervised pre-trained representation learning model.

Ranked #3 on Audio Generation on AudioCaps

Audio Generation In-Context Learning +2

2,042

Paper
Code

LM-VC: Zero-shot Voice Conversion via Speech Generation based on Language Models

no code implementations • 18 Jun 2023 • Zhichao Wang, Yuanzhe Chen, Lei Xie, Qiao Tian, Yuping Wang

An intuitive approach is to follow AudioLM - Tokenizing speech into semantic and acoustic tokens respectively by HuBERT and SoundStream, and converting source semantic tokens to target acoustic tokens conditioned on acoustic tokens of the target speaker.

Audio Generation Disentanglement +2

Paper
Add Code

PolyVoice: Language Models for Speech to Speech Translation

no code implementations • 5 Jun 2023 • Qianqian Dong, Zhiying Huang, Qiao Tian, Chen Xu, Tom Ko, Yunlong Zhao, Siyuan Feng, Tang Li, Kexin Wang, Xuxin Cheng, Fengpeng Yue, Ye Bai, Xi Chen, Lu Lu, Zejun Ma, Yuping Wang, Mingxuan Wang, Yuxuan Wang

For the speech synthesis part, we adopt the existing VALL-E X approach and build a unit-based audio language model.

Language Modelling Speech Synthesis +2

Paper
Add Code

Multi-level Temporal-channel Speaker Retrieval for Zero-shot Voice Conversion

no code implementations • 12 May 2023 • Zhichao Wang, Liumeng Xue, Qiuqiang Kong, Lei Xie, Yuanzhe Chen, Qiao Tian, Yuping Wang

Specifically, to flexibly adapt to the dynamic-variant speaker characteristic in the temporal and channel axis of the speech, we propose a novel fine-grained speaker modeling method, called temporal-channel retrieval (TCR), to find out when and where speaker information appears in speech.

Disentanglement Retrieval +2

Paper
Add Code

Zero-Shot Accent Conversion using Pseudo Siamese Disentanglement Network

no code implementations • 12 Dec 2022 • Dongya Jia, Qiao Tian, Kainan Peng, Jiaxin Li, Yuanzhe Chen, Mingbo Ma, Yuping Wang, Yuxuan Wang

The goal of accent conversion (AC) is to convert the accent of speech into the target accent while preserving the content and speaker identity.

Data Augmentation Disentanglement

Paper
Add Code

Delivering Speaking Style in Low-resource Voice Conversion with Multi-factor Constraints

no code implementations • 16 Nov 2022 • Zhichao Wang, Xinsheng Wang, Lei Xie, Yuanzhe Chen, Qiao Tian, Yuping Wang

Conveying the linguistic content and maintaining the source speech's speaking style, such as intonation and emotion, is essential in voice conversion (VC).

Voice Conversion

Paper
Add Code

Streaming Voice Conversion Via Intermediate Bottleneck Features And Non-streaming Teacher Guidance

no code implementations • 27 Oct 2022 • Yuanzhe Chen, Ming Tu, Tang Li, Xin Li, Qiuqiang Kong, Jiaxin Li, Zhichao Wang, Qiao Tian, Yuping Wang, Yuxuan Wang

In this paper, we propose to use intermediate bottleneck features (IBFs) to replace PPGs.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Neural Dubber: Dubbing for Videos According to Scripts

no code implementations • NeurIPS 2021 • Chenxu Hu, Qiao Tian, Tingle Li, Yuping Wang, Yuxuan Wang, Hang Zhao

Neural Dubber is a multi-modal text-to-speech (TTS) model that utilizes the lip movement in the video to control the prosody of the generated speech.

Paper
Add Code

Cloning one's voice using very limited data in the wild

no code implementations • 7 Oct 2021 • Dongyang Dai, Yuanzhe Chen, Li Chen, Ming Tu, Lu Liu, Rui Xia, Qiao Tian, Yuping Wang, Yuxuan Wang

(2) How to clone a person's voice while controlling the style and prosody.

Speech Synthesis

Paper
Add Code

Joint framework with deep feature distillation and adaptive focal loss for weakly supervised audio tagging and acoustic event detection

no code implementations • 23 Mar 2021 • Yunhao Liang, Yanhua Long, Yijie Li, Jiaen Liang, Yuping Wang

A good joint training framework is very helpful to improve the performances of weakly supervised audio tagging (AT) and acoustic event detection (AED) simultaneously.

Audio Tagging Event Detection

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.