Search Results for author: Heng Lu

Found 21 papers, 6 papers with code

PP-MeT: a Real-world Personalized Prompt based Meeting Transcription System

no code implementations28 Sep 2023 Xiang Lyu, Yuhang Cao, Qing Wang, JingJing Yin, Yuguang Yang, Pengpeng Zou, Yanni Hu, Heng Lu

Speaker-attributed automatic speech recognition (SA-ASR) improves the accuracy and applicability of multi-speaker ASR systems in real-world scenarios by assigning speaker labels to transcribed texts.

Action Detection Activity Detection +3

PromptVC: Flexible Stylistic Voice Conversion in Latent Space Driven by Natural Language Prompts

no code implementations17 Sep 2023 Jixun Yao, Yuguang Yang, Yi Lei, Ziqian Ning, Yanni Hu, Yu Pan, JingJing Yin, Hongbin Zhou, Heng Lu, Lei Xie

In this study, we propose PromptVC, a novel style voice conversion approach that employs a latent diffusion model to generate a style vector driven by natural language prompts.

Voice Conversion

DiaCorrect: Error Correction Back-end For Speaker Diarization

1 code implementation15 Sep 2023 Jiangyu Han, Federico Landini, Johan Rohdin, Mireia Diez, Lukas Burget, Yuhang Cao, Heng Lu, Jan Cernocky

In this work, we propose an error correction framework, named DiaCorrect, to refine the output of a diarization system in a simple yet effective way.

Automatic Speech Recognition speaker-diarization +3

MSAC: Multiple Speech Attribute Control Method for Reliable Speech Emotion Recognition

no code implementations8 Aug 2023 Yu Pan, Yuguang Yang, Yuheng Huang, Jixun Yao, JingJing Yin, Yanni Hu, Heng Lu, Lei Ma, Jianjun Zhao

Despite notable progress, speech emotion recognition (SER) remains challenging due to the intricate and ambiguous nature of speech emotion, particularly in wild world.

Attribute Cross-corpus +2

METTS: Multilingual Emotional Text-to-Speech by Cross-speaker and Cross-lingual Emotion Transfer

no code implementations29 Jul 2023 Xinfa Zhu, Yi Lei, Tao Li, Yongmao Zhang, Hongbin Zhou, Heng Lu, Lei Xie

However, such data-efficient approaches have ignored synthesizing emotional aspects of speech due to the challenges of cross-speaker cross-lingual emotion transfer - the heavy entanglement of speaker timbre, emotion, and language factors in the speech signal will make a system produce cross-lingual synthetic speech with an undesired foreign accent and weak emotion expressiveness.

Disentanglement Quantization +1

GEmo-CLAP: Gender-Attribute-Enhanced Contrastive Language-Audio Pretraining for Accurate Speech Emotion Recognition

no code implementations13 Jun 2023 Yu Pan, Yanni Hu, Yuguang Yang, Wen Fei, Jixun Yao, Heng Lu, Lei Ma, Jianjun Zhao

Contrastive cross-modality pretraining has recently exhibited impressive success in diverse fields, whereas there is limited research on their merits in speech emotion recognition (SER).

Attribute Contrastive Learning +3

Improving Cross-lingual Speech Synthesis with Triplet Training Scheme

no code implementations22 Feb 2022 Jianhao Ye, Hongbin Zhou, Zhiba Su, Wendi He, Kaimeng Ren, Lin Li, Heng Lu

Recent advances in cross-lingual text-to-speech (TTS) made it possible to synthesize speech in a language foreign to a monolingual speaker.

Speech Synthesis

The USTC-Ximalaya system for the ICASSP 2022 multi-channel multi-party meeting transcription (M2MeT) challenge

no code implementations10 Feb 2022 Maokui He, Xiang Lv, Weilin Zhou, JingJing Yin, Xiaoqi Zhang, Yuxuan Wang, Shutong Niu, Yuhang Cao, Heng Lu, Jun Du, Chin-Hui Lee

We propose two improvements to target-speaker voice activity detection (TS-VAD), the core component in our proposed speaker diarization system that was submitted to the 2022 Multi-Channel Multi-Party Meeting Transcription (M2MeT) challenge.

Action Detection Activity Detection +2

Imaging vibrations of locally gated, electromechanical few layer graphene resonators with a moving vacuum enclosure

no code implementations4 Jan 2021 Heng Lu, Chen Yang, Ye Tian, Jun Lu, Fanqi Xu, FengNan Chen, Yan Ying, Kevin G. Schädler, Chinhua Wang, Frank H. L. Koppens, Antoine Reserbat-Plantey, Joel Moser

With it we characterize the lowest frequency mode of a FLG resonator by measuring its frequency response as a function of position on the membrane.

Mesoscale and Nanoscale Physics

Phonetic Posteriorgrams based Many-to-Many Singing Voice Conversion via Adversarial Training

1 code implementation3 Dec 2020 Haohan Guo, Heng Lu, Na Hu, Chunlei Zhang, Shan Yang, Lei Xie, Dan Su, Dong Yu

In order to make timbre conversion more stable and controllable, speaker embedding is further decomposed to the weighted sum of a group of trainable vectors representing different timbre clusters.

Audio Generation Disentanglement +1

TFGAN: Time and Frequency Domain Based Generative Adversarial Network for High-fidelity Speech Synthesis

1 code implementation24 Nov 2020 Qiao Tian, Yi Chen, Zewang Zhang, Heng Lu, LingHui Chen, Lei Xie, Shan Liu

On one hand, we propose to discriminate ground-truth waveform from synthetic one in frequency domain for offering more consistency guarantees instead of only in time domain.

Generative Adversarial Network Speech Synthesis

Peking Opera Synthesis via Duration Informed Attention Network

no code implementations7 Aug 2020 Yusong Wu, Shengchen Li, Chengzhu Yu, Heng Lu, Chao Weng, Liqiang Zhang, Dong Yu

In this work, we propose to deal with this issue and synthesize expressive Peking Opera singing from the music score based on the Duration Informed Attention Network (DurIAN) framework.

Singing Voice Synthesis

AdaDurIAN: Few-shot Adaptation for Neural Text-to-Speech with DurIAN

no code implementations12 May 2020 Zewang Zhang, Qiao Tian, Heng Lu, Ling-Hui Chen, Shan Liu

This paper investigates how to leverage a DurIAN-based average model to enable a new speaker to have both accurate pronunciation and fluent cross-lingual speaking with very limited monolingual data.

Few-Shot Learning

Learning Singing From Speech

no code implementations20 Dec 2019 Liqiang Zhang, Chengzhu Yu, Heng Lu, Chao Weng, Yusong Wu, Xiang Xie, Zijin Li, Dong Yu

The proposed algorithm first integrate speech and singing synthesis into a unified framework, and learns universal speaker embeddings that are shareable between speech and singing synthesis tasks.

Speech Synthesis Voice Conversion

PitchNet: Unsupervised Singing Voice Conversion with Pitch Adversarial Network

no code implementations4 Dec 2019 Chengqi Deng, Chengzhu Yu, Heng Lu, Chao Weng, Dong Yu

However, the converted singing voice can be easily out of key, showing that the existing approach cannot model the pitch information precisely.

Music Generation Translation +1

DurIAN: Duration Informed Attention Network For Multimodal Synthesis

4 code implementations4 Sep 2019 Chengzhu Yu, Heng Lu, Na Hu, Meng Yu, Chao Weng, Kun Xu, Peng Liu, Deyi Tuo, Shiyin Kang, Guangzhi Lei, Dan Su, Dong Yu

In this paper, we present a generic and robust multimodal synthesis system that produces highly natural speech and facial expression simultaneously.

Speech Synthesis

Deep Feed-forward Sequential Memory Networks for Speech Synthesis

no code implementations26 Feb 2018 Mengxiao Bi, Heng Lu, Shiliang Zhang, Ming Lei, Zhijie Yan

The Bidirectional LSTM (BLSTM) RNN based speech synthesis system is among the best parametric Text-to-Speech (TTS) systems in terms of the naturalness of generated speech, especially the naturalness in prosody.

speech-recognition Speech Recognition +1

Cannot find the paper you are looking for? You can Submit a new open access paper.