Search Results for author: Linquan Liu

Found 8 papers, 2 papers with code

WavLLM: Towards Robust and Adaptive Speech Large Language Model

no code implementations • 31 Mar 2024 • Shujie Hu, Long Zhou, Shujie Liu, Sanyuan Chen, Hongkun Hao, Jing Pan, Xunying Liu, Jinyu Li, Sunit Sivasankaran, Linquan Liu, Furu Wei

In this work, we introduce WavLLM, a robust and adaptive speech large language model with dual encoders, and a prompt-aware LoRA weight adapter, optimized by a two-stage curriculum learning approach.

Language Modelling Large Language Model

Paper
Add Code

On decoder-only architecture for speech-to-text and large language model integration

no code implementations • 8 Jul 2023 • Jian Wu, Yashesh Gaur, Zhuo Chen, Long Zhou, Yimeng Zhu, Tianrui Wang, Jinyu Li, Shujie Liu, Bo Ren, Linquan Liu, Yu Wu

Large language models (LLMs) have achieved remarkable success in the field of natural language processing, enabling better human-computer interaction using natural language.

Language Modelling Large Language Model +1

Paper
Add Code

Code-Switching Text Generation and Injection in Mandarin-English ASR

no code implementations • 20 Mar 2023 • Haibin Yu, Yuxuan Hu, Yao Qian, Ma Jin, Linquan Liu, Shujie Liu, Yu Shi, Yanmin Qian, Edward Lin, Michael Zeng

Code-switching speech refers to a means of expression by mixing two or more languages within a single utterance.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

FoundationTTS: Text-to-Speech for ASR Customization with Generative Language Model

no code implementations • 6 Mar 2023 • Ruiqing Xue, Yanqing Liu, Lei He, Xu Tan, Linquan Liu, Edward Lin, Sheng Zhao

Neural text-to-speech (TTS) generally consists of cascaded architecture with separately optimized acoustic model and vocoder, or end-to-end architecture with continuous mel-spectrograms or self-extracted speech frames as the intermediate representations to bridge acoustic model and vocoder, which suffers from two limitations: 1) the continuous acoustic frames are hard to predict with phoneme only, and acoustic information like duration or pitch is also needed to solve the one-to-many problem, which is not easy to scale on large scale and noise datasets; 2) to achieve diverse speech output based on continuous speech features, complex VAE or flow-based models are usually required.

Language Modelling Large Language Model +1

Paper
Add Code

Building High-accuracy Multilingual ASR with Gated Language Experts and Curriculum Training

no code implementations • 1 Mar 2023 • Eric Sun, Jinyu Li, Yuxuan Hu, Yimeng Zhu, Long Zhou, Jian Xue, Peidong Wang, Linquan Liu, Shujie Liu, Edward Lin, Yifan Gong

We propose gated language experts and curriculum training to enhance multilingual transformer transducer models without requiring language identification (LID) input from users during inference.

Language Identification

Paper
Add Code

Building a great multi-lingual teacher with sparsely-gated mixture of experts for speech recognition

no code implementations • 10 Dec 2021 • Kenichi Kumatani, Robert Gmyr, Felipe Cruz Salinas, Linquan Liu, Wei Zuo, Devang Patel, Eric Sun, Yu Shi

The sparsely-gated Mixture of Experts (MoE) can magnify a network capacity with a little computational complexity.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

FastCorrect 2: Fast Error Correction on Multiple Candidates for Automatic Speech Recognition

1 code implementation • Findings (EMNLP) 2021 • Yichong Leng, Xu Tan, Rui Wang, Linchen Zhu, Jin Xu, Wenjie Liu, Linquan Liu, Tao Qin, Xiang-Yang Li, Edward Lin, Tie-Yan Liu

Although multiple candidates are generated by an ASR system through beam search, current error correction approaches can only correct one sentence at a time, failing to leverage the voting effect from multiple candidates to better detect and correct error tokens.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

1,286

Paper
Code

FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition

1 code implementation • NeurIPS 2021 • Yichong Leng, Xu Tan, Linchen Zhu, Jin Xu, Renqian Luo, Linquan Liu, Tao Qin, Xiang-Yang Li, Ed Lin, Tie-Yan Liu

A straightforward solution to reduce latency, inspired by non-autoregressive (NAR) neural machine translation, is to use an NAR sequence generation model for ASR error correction, which, however, comes at the cost of significantly increased ASR error rate.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

1,286

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.