Search Results for author: Sanyuan Chen

Found 23 papers, 12 papers with code

MPII: Multi-Level Mutual Promotion for Inference and Interpretation

1 code implementation ACL 2022 Yan Liu, Sanyuan Chen, Yazheng Yang, Qi Dai

In this paper, we propose a multi-level Mutual Promotion mechanism for self-evolved Inference and sentence-level Interpretation (MPII).

Sentence

VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via Monotonic Alignment

no code implementations12 Jun 2024 Bing Han, Long Zhou, Shujie Liu, Sanyuan Chen, Lingwei Meng, Yanming Qian, Yanqing Liu, Sheng Zhao, Jinyu Li, Furu Wei

With the help of discrete neural audio codecs, large language models (LLM) have increasingly been recognized as a promising methodology for zero-shot Text-to-Speech (TTS) synthesis.

Quantization Speech Synthesis +1

VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers

no code implementations8 Jun 2024 Sanyuan Chen, Shujie Liu, Long Zhou, Yanqing Liu, Xu Tan, Jinyu Li, Sheng Zhao, Yao Qian, Furu Wei

This paper introduces VALL-E 2, the latest advancement in neural codec language models that marks a milestone in zero-shot text-to-speech synthesis (TTS), achieving human parity for the first time.

Speech Synthesis Text-To-Speech Synthesis

WavLLM: Towards Robust and Adaptive Speech Large Language Model

no code implementations31 Mar 2024 Shujie Hu, Long Zhou, Shujie Liu, Sanyuan Chen, Lingwei Meng, Hongkun Hao, Jing Pan, Xunying Liu, Jinyu Li, Sunit Sivasankaran, Linquan Liu, Furu Wei

In this work, we introduce WavLLM, a robust and adaptive speech large language model with dual encoders, and a prompt-aware LoRA weight adapter, optimized by a two-stage curriculum learning approach.

Language Modelling Large Language Model

SpeechX: Neural Codec Language Model as a Versatile Speech Transformer

no code implementations14 Aug 2023 Xiaofei Wang, Manthan Thakker, Zhuo Chen, Naoyuki Kanda, Sefik Emre Eskimez, Sanyuan Chen, Min Tang, Shujie Liu, Jinyu Li, Takuya Yoshioka

Recent advancements in generative speech models based on audio-text prompts have enabled remarkable innovations like high-quality zero-shot text-to-speech.

Language Modelling Multi-Task Learning +2

BEATs: Audio Pre-Training with Acoustic Tokenizers

3 code implementations18 Dec 2022 Sanyuan Chen, Yu Wu, Chengyi Wang, Shujie Liu, Daniel Tompkins, Zhuo Chen, Furu Wei

In the first iteration, we use random projection as the acoustic tokenizer to train an audio SSL model in a mask and label prediction manner.

Audio Classification Self-Supervised Learning

Exploring WavLM on Speech Enhancement

no code implementations18 Nov 2022 Hyungchan Song, Sanyuan Chen, Zhuo Chen, Yu Wu, Takuya Yoshioka, Min Tang, Jong Won Shin, Shujie Liu

There is a surge in interest in self-supervised learning approaches for end-to-end speech encoding in recent years as they have achieved great success.

Self-Supervised Learning Speech Enhancement +2

SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data

1 code implementation30 Sep 2022 Ziqiang Zhang, Sanyuan Chen, Long Zhou, Yu Wu, Shuo Ren, Shujie Liu, Zhuoyuan Yao, Xun Gong, LiRong Dai, Jinyu Li, Furu Wei

In this paper, we propose a cross-modal Speech and Language Model (SpeechLM) to explicitly align speech and text pre-training with a pre-defined unified discrete representation.

Language Modelling speech-recognition +1

Ultra Fast Speech Separation Model with Teacher Student Learning

no code implementations27 Apr 2022 Sanyuan Chen, Yu Wu, Zhuo Chen, Jian Wu, Takuya Yoshioka, Shujie Liu, Jinyu Li, Xiangzhan Yu

In this paper, an ultra fast speech separation Transformer model is proposed to achieve both better performance and efficiency with teacher student learning (T-S learning).

Computational Efficiency Speech Separation

Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?

no code implementations27 Apr 2022 Sanyuan Chen, Yu Wu, Chengyi Wang, Shujie Liu, Zhuo Chen, Peidong Wang, Gang Liu, Jinyu Li, Jian Wu, Xiangzhan Yu, Furu Wei

Recently, self-supervised learning (SSL) has demonstrated strong performance in speaker recognition, even if the pre-training objective is designed for speech recognition.

Self-Supervised Learning Speaker Recognition +3

Self-Supervised Learning for speech recognition with Intermediate layer supervision

1 code implementation16 Dec 2021 Chengyi Wang, Yu Wu, Sanyuan Chen, Shujie Liu, Jinyu Li, Yao Qian, Zhenglu Yang

Recently, pioneer work finds that speech pre-trained models can solve full-stack speech processing tasks, because the model utilizes bottom layers to learn speaker-related information and top layers to encode content-related information.

Language Modelling Self-Supervised Learning +2

Investigation of Practical Aspects of Single Channel Speech Separation for ASR

no code implementations5 Jul 2021 Jian Wu, Zhuo Chen, Sanyuan Chen, Yu Wu, Takuya Yoshioka, Naoyuki Kanda, Shujie Liu, Jinyu Li

Speech separation has been successfully applied as a frontend processing module of conversation transcription systems thanks to its ability to handle overlapped speech and its flexibility to combine with downstream tasks such as automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

C2C-GenDA: Cluster-to-Cluster Generation for Data Augmentation of Slot Filling

1 code implementation13 Dec 2020 Yutai Hou, Sanyuan Chen, Wanxiang Che, Cheng Chen, Ting Liu

Slot filling, a fundamental module of spoken language understanding, often suffers from insufficient quantity and diversity of training data.

Data Augmentation Diversity +3

Don't shoot butterfly with rifles: Multi-channel Continuous Speech Separation with Early Exit Transformer

1 code implementation23 Oct 2020 Sanyuan Chen, Yu Wu, Zhuo Chen, Takuya Yoshioka, Shujie Liu, Jinyu Li

With its strong modeling capacity that comes from a multi-head and multi-layer structure, Transformer is a very powerful model for learning a sequential representation and has been successfully applied to speech separation recently.

Speech Separation

Continuous Speech Separation with Conformer

1 code implementation13 Aug 2020 Sanyuan Chen, Yu Wu, Zhuo Chen, Jian Wu, Jinyu Li, Takuya Yoshioka, Chengyi Wang, Shujie Liu, Ming Zhou

Continuous speech separation plays a vital role in complicated speech related tasks such as conversation transcription.

 Ranked #1 on Speech Separation on LibriCSS (using extra training data)

Speech Separation

MotherNets: Rapid Deep Ensemble Learning

no code implementations12 Sep 2018 Abdul Wasay, Brian Hentschel, Yuze Liao, Sanyuan Chen, Stratos Idreos

We propose MotherNets to enable higher accuracy and practical training cost for large and diverse neural network ensembles: A MotherNet captures the structural similarity across some or all members of a deep neural network ensemble which allows us to share data movement and computation costs across these networks.

Clustering Clustering Ensemble +2

Cannot find the paper you are looking for? You can Submit a new open access paper.