Search Results for author: Heng Lu

Found 21 papers, 6 papers with code

PP-MeT: a Real-world Personalized Prompt based Meeting Transcription System

no code implementations • 28 Sep 2023 • Xiang Lyu, Yuhang Cao, Qing Wang, JingJing Yin, Yuguang Yang, Pengpeng Zou, Yanni Hu, Heng Lu

Speaker-attributed automatic speech recognition (SA-ASR) improves the accuracy and applicability of multi-speaker ASR systems in real-world scenarios by assigning speaker labels to transcribed texts.

Action Detection Activity Detection +3

Paper
Add Code

PromptVC: Flexible Stylistic Voice Conversion in Latent Space Driven by Natural Language Prompts

no code implementations • 17 Sep 2023 • Jixun Yao, Yuguang Yang, Yi Lei, Ziqian Ning, Yanni Hu, Yu Pan, JingJing Yin, Hongbin Zhou, Heng Lu, Lei Xie

In this study, we propose PromptVC, a novel style voice conversion approach that employs a latent diffusion model to generate a style vector driven by natural language prompts.

Voice Conversion

Paper
Add Code

DiaCorrect: Error Correction Back-end For Speaker Diarization

1 code implementation • 15 Sep 2023 • Jiangyu Han, Federico Landini, Johan Rohdin, Mireia Diez, Lukas Burget, Yuhang Cao, Heng Lu, Jan Cernocky

In this work, we propose an error correction framework, named DiaCorrect, to refine the output of a diarization system in a simple yet effective way.

Automatic Speech Recognition speaker-diarization +3

Paper
Code

MSAC: Multiple Speech Attribute Control Method for Reliable Speech Emotion Recognition

no code implementations • 8 Aug 2023 • Yu Pan, Yuguang Yang, Yuheng Huang, Jixun Yao, JingJing Yin, Yanni Hu, Heng Lu, Lei Ma, Jianjun Zhao

Despite notable progress, speech emotion recognition (SER) remains challenging due to the intricate and ambiguous nature of speech emotion, particularly in wild world.

Attribute Cross-corpus +2

Paper
Add Code

METTS: Multilingual Emotional Text-to-Speech by Cross-speaker and Cross-lingual Emotion Transfer

no code implementations • 29 Jul 2023 • Xinfa Zhu, Yi Lei, Tao Li, Yongmao Zhang, Hongbin Zhou, Heng Lu, Lei Xie

However, such data-efficient approaches have ignored synthesizing emotional aspects of speech due to the challenges of cross-speaker cross-lingual emotion transfer - the heavy entanglement of speaker timbre, emotion, and language factors in the speech signal will make a system produce cross-lingual synthetic speech with an undesired foreign accent and weak emotion expressiveness.

Disentanglement Quantization +1

Paper
Add Code

GEmo-CLAP: Gender-Attribute-Enhanced Contrastive Language-Audio Pretraining for Accurate Speech Emotion Recognition

no code implementations • 13 Jun 2023 • Yu Pan, Yanni Hu, Yuguang Yang, Wen Fei, Jixun Yao, Heng Lu, Lei Ma, Jianjun Zhao

Contrastive cross-modality pretraining has recently exhibited impressive success in diverse fields, whereas there is limited research on their merits in speech emotion recognition (SER).

Attribute Contrastive Learning +3

Paper
Add Code

HYBRIDFORMER: improving SqueezeFormer with hybrid attention and NSR mechanism

1 code implementation • 15 Mar 2023 • Yuguang Yang, Yu Pan, JingJing Yin, Jiangyu Han, Lei Ma, Heng Lu

SqueezeFormer has recently shown impressive performance in automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Code

LMEC: Learnable Multiplicative Absolute Position Embedding Based Conformer for Speech Recognition

1 code implementation • 5 Dec 2022 • Yuguang Yang, Yu Pan, JingJing Yin, Heng Lu

This paper proposes a Learnable Multiplicative absolute position Embedding based Conformer (LMEC).

Position speech-recognition +1

Paper
Code

DiaCorrect: End-to-end error correction for speaker diarization

no code implementations • 31 Oct 2022 • Jiangyu Han, Yuhang Cao, Heng Lu, Yanhua Long

In recent years, speaker diarization has attracted widespread attention.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Improving Cross-lingual Speech Synthesis with Triplet Training Scheme

no code implementations • 22 Feb 2022 • Jianhao Ye, Hongbin Zhou, Zhiba Su, Wendi He, Kaimeng Ren, Lin Li, Heng Lu

Recent advances in cross-lingual text-to-speech (TTS) made it possible to synthesize speech in a language foreign to a monolingual speaker.

Speech Synthesis

Paper
Add Code

The USTC-Ximalaya system for the ICASSP 2022 multi-channel multi-party meeting transcription (M2MeT) challenge

no code implementations • 10 Feb 2022 • Maokui He, Xiang Lv, Weilin Zhou, JingJing Yin, Xiaoqi Zhang, Yuxuan Wang, Shutong Niu, Yuhang Cao, Heng Lu, Jun Du, Chin-Hui Lee

We propose two improvements to target-speaker voice activity detection (TS-VAD), the core component in our proposed speaker diarization system that was submitted to the 2022 Multi-Channel Multi-Party Meeting Transcription (M2MeT) challenge.

Action Detection Activity Detection +2

Paper
Add Code

Imaging vibrations of locally gated, electromechanical few layer graphene resonators with a moving vacuum enclosure

no code implementations • 4 Jan 2021 • Heng Lu, Chen Yang, Ye Tian, Jun Lu, Fanqi Xu, FengNan Chen, Yan Ying, Kevin G. Schädler, Chinhua Wang, Frank H. L. Koppens, Antoine Reserbat-Plantey, Joel Moser

With it we characterize the lowest frequency mode of a FLG resonator by measuring its frequency response as a function of position on the membrane.

Mesoscale and Nanoscale Physics

Paper
Add Code

Phonetic Posteriorgrams based Many-to-Many Singing Voice Conversion via Adversarial Training

1 code implementation • 3 Dec 2020 • Haohan Guo, Heng Lu, Na Hu, Chunlei Zhang, Shan Yang, Lei Xie, Dan Su, Dong Yu

In order to make timbre conversion more stable and controllable, speaker embedding is further decomposed to the weighted sum of a group of trainable vectors representing different timbre clusters.

Audio Generation Disentanglement +1

122

Paper
Code

TFGAN: Time and Frequency Domain Based Generative Adversarial Network for High-fidelity Speech Synthesis

1 code implementation • 24 Nov 2020 • Qiao Tian, Yi Chen, Zewang Zhang, Heng Lu, LingHui Chen, Lei Xie, Shan Liu

On one hand, we propose to discriminate ground-truth waveform from synthetic one in frequency domain for offering more consistency guarantees instead of only in time domain.

Generative Adversarial Network Speech Synthesis

Paper
Code

Peking Opera Synthesis via Duration Informed Attention Network

no code implementations • 7 Aug 2020 • Yusong Wu, Shengchen Li, Chengzhu Yu, Heng Lu, Chao Weng, Liqiang Zhang, Dong Yu

In this work, we propose to deal with this issue and synthesize expressive Peking Opera singing from the music score based on the Duration Informed Attention Network (DurIAN) framework.

Singing Voice Synthesis

Paper
Add Code

AdaDurIAN: Few-shot Adaptation for Neural Text-to-Speech with DurIAN

no code implementations • 12 May 2020 • Zewang Zhang, Qiao Tian, Heng Lu, Ling-Hui Chen, Shan Liu

This paper investigates how to leverage a DurIAN-based average model to enable a new speaker to have both accurate pronunciation and fluent cross-lingual speaking with very limited monolingual data.

Few-Shot Learning

Paper
Add Code

Synthesising Expressiveness in Peking Opera via Duration Informed Attention Network

no code implementations • 27 Dec 2019 • Yusong Wu, Shengchen Li, Chengzhu Yu, Heng Lu, Chao Weng, Liqiang Zhang, Dong Yu

This paper presents a method that generates expressive singing voice of Peking opera.

Paper
Add Code

Learning Singing From Speech

no code implementations • 20 Dec 2019 • Liqiang Zhang, Chengzhu Yu, Heng Lu, Chao Weng, Yusong Wu, Xiang Xie, Zijin Li, Dong Yu

The proposed algorithm first integrate speech and singing synthesis into a unified framework, and learns universal speaker embeddings that are shareable between speech and singing synthesis tasks.

Speech Synthesis Voice Conversion

Paper
Add Code

PitchNet: Unsupervised Singing Voice Conversion with Pitch Adversarial Network

no code implementations • 4 Dec 2019 • Chengqi Deng, Chengzhu Yu, Heng Lu, Chao Weng, Dong Yu

However, the converted singing voice can be easily out of key, showing that the existing approach cannot model the pitch information precisely.

Music Generation Translation +1

Paper
Add Code

DurIAN: Duration Informed Attention Network For Multimodal Synthesis

4 code implementations • 4 Sep 2019 • Chengzhu Yu, Heng Lu, Na Hu, Meng Yu, Chao Weng, Kun Xu, Peng Liu, Deyi Tuo, Shiyin Kang, Guangzhi Lei, Dan Su, Dong Yu

In this paper, we present a generic and robust multimodal synthesis system that produces highly natural speech and facial expression simultaneously.

Speech Synthesis

181

Paper
Code

Deep Feed-forward Sequential Memory Networks for Speech Synthesis

no code implementations • 26 Feb 2018 • Mengxiao Bi, Heng Lu, Shiliang Zhang, Ming Lei, Zhijie Yan

The Bidirectional LSTM (BLSTM) RNN based speech synthesis system is among the best parametric Text-to-Speech (TTS) systems in terms of the naturalness of generated speech, especially the naturalness in prosody.

speech-recognition Speech Recognition +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.