Search Results for author: LiRong Dai

Found 13 papers, 4 papers with code

The USTC-NELSLIP Offline Speech Translation Systems for IWSLT 2022

no code implementations IWSLT (ACL) 2022 Weitai Zhang, Zhongyi Ye, Haitao Tang, Xiaoxi Li, Xinyuan Zhou, Jing Yang, Jianwei Cui, Dan Liu, Junhua Liu, LiRong Dai

This paper describes USTC-NELSLIP’s submissions to the IWSLT 2022 Offline Speech Translation task, including speech translation of talks from English to German, English to Chinese and English to Japanese.

Translation

Adversarial speech for voice privacy protection from Personalized Speech generation

no code implementations22 Jan 2024 Shihao Chen, Liping Chen, Jie Zhang, KongAik Lee, ZhenHua Ling, LiRong Dai

For validation, we employ the open-source pre-trained YourTTS model for speech generation and protect the target speaker's speech in the white-box scenario.

Speaker Verification Voice Conversion

Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech Representation

1 code implementation7 Jan 2024 Qiushi Zhu, Jie Zhang, Yu Gu, Yuchen Hu, LiRong Dai

Considering that visual information helps to improve speech recognition performance in noisy scenes, in this work we propose a multichannel multi-modal speech self-supervised learning framework AV-wav2vec2, which utilizes video and multichannel audio data as inputs.

Audio-Visual Speech Recognition Automatic Speech Recognition +7

Rep2wav: Noise Robust text-to-speech Using self-supervised representations

no code implementations28 Aug 2023 Qiushi Zhu, Yu Gu, Rilin Chen, Chao Weng, Yuchen Hu, LiRong Dai, Jie Zhang

Noise-robust TTS models are often trained using the enhanced speech, which thus suffer from speech distortion and background noise that affect the quality of the synthesized speech.

Speech Enhancement

VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning

no code implementations21 Nov 2022 Qiushi Zhu, Long Zhou, Ziqiang Zhang, Shujie Liu, Binxing Jiao, Jie Zhang, LiRong Dai, Daxin Jiang, Jinyu Li, Furu Wei

Although speech is a simple and effective way for humans to communicate with the outside world, a more realistic speech interaction contains multimodal information, e. g., vision, text.

Audio-Visual Speech Recognition Language Modelling +3

SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data

1 code implementation30 Sep 2022 Ziqiang Zhang, Sanyuan Chen, Long Zhou, Yu Wu, Shuo Ren, Shujie Liu, Zhuoyuan Yao, Xun Gong, LiRong Dai, Jinyu Li, Furu Wei

In this paper, we propose a cross-modal Speech and Language Model (SpeechLM) to explicitly align speech and text pre-training with a pre-defined unified discrete representation.

Language Modelling speech-recognition +1

Vision-Language Adaptive Mutual Decoder for OOV-STR

no code implementations2 Sep 2022 Jinshui Hu, Chenyu Liu, Qiandong Yan, Xuyang Zhu, Jiajia Wu, Jun Du, LiRong Dai

However, in real-world scenarios, out-of-vocabulary (OOV) words are of great importance and SOTA recognition models usually perform poorly on OOV settings.

Language Modelling Representation Learning +1

Speech-MLP: a simple MLP architecture for speech processing

no code implementations29 Sep 2021 Chao Xing, Dong Wang, LiRong Dai, Qun Liu, Anderson Avila

Overparameterized transformer-based architectures have shown remarkable performance in recent years, achieving state-of-the-art results in speech processing tasks such as speech recognition, speech synthesis, keyword spotting, and speech enhancement et al.

Keyword Spotting Speech Enhancement +3

Multi-Task Learning with High-Order Statistics for X-vector based Text-Independent Speaker Verification

no code implementations28 Mar 2019 Lanhua You, Wu Guo, LiRong Dai, Jun Du

The x-vector based deep neural network (DNN) embedding systems have demonstrated effectiveness for text-independent speaker verification.

Multi-Task Learning Text-Independent Speaker Verification

Cannot find the paper you are looking for? You can Submit a new open access paper.