Search Results for author: Shujie Liu

Found 97 papers, 29 papers with code

SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing

3 code implementations • ACL 2022 • Junyi Ao, Rui Wang, Long Zhou, Chengyi Wang, Shuo Ren, Yu Wu, Shujie Liu, Tom Ko, Qing Li, Yu Zhang, Zhihua Wei, Yao Qian, Jinyu Li, Furu Wei

Motivated by the success of T5 (Text-To-Text Transfer Transformer) in pre-trained natural language processing models, we propose a unified-modal SpeechT5 framework that explores the encoder-decoder pre-training for self-supervised speech/text representation learning.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +7

124,889

Paper
Code

Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers

6 code implementations • 5 Jan 2023 • Chengyi Wang, Sanyuan Chen, Yu Wu, Ziqiang Zhang, Long Zhou, Shujie Liu, Zhuo Chen, Yanqing Liu, Huaming Wang, Jinyu Li, Lei He, Sheng Zhao, Furu Wei

In addition, we find Vall-E could preserve the speaker's emotion and acoustic environment of the acoustic prompt in synthesis.

In-Context Learning Language Modelling +2

32,485

Paper
Code

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing

5 code implementations • 26 Oct 2021 • Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Xiangzhan Yu, Furu Wei

Self-supervised learning (SSL) achieves great success in speech recognition, while limited exploration has been attempted for other speech processing tasks.

Denoising Self-Supervised Learning +3

18,315

Paper
Code

BEATs: Audio Pre-Training with Acoustic Tokenizers

2 code implementations • 18 Dec 2022 • Sanyuan Chen, Yu Wu, Chengyi Wang, Shujie Liu, Daniel Tompkins, Zhuo Chen, Furu Wei

In the first iteration, we use random projection as the acoustic tokenizer to train an audio SSL model in a mask and label prediction manner.

Ranked #1 on Audio Classification on Balanced Audio Set

Audio Classification Self-Supervised Learning

18,311

Paper
Code

Neural Speech Synthesis with Transformer Network

6 code implementations • 19 Sep 2018 • Naihan Li, Shujie Liu, Yanqing Liu, Sheng Zhao, Ming Liu, Ming Zhou

Although end-to-end neural text-to-speech (TTS) methods (such as Tacotron2) are proposed and achieve state-of-the-art performance, they still suffer from two problems: 1) low efficiency during training and inference; 2) hard to model long dependency using current recurrent neural networks (RNNs).

Ranked #9 on Text-To-Speech Synthesis on LJSpeech (using extra training data)

Machine Translation NMT +2

10,126

Paper
Code

CodeBLEU: a Method for Automatic Evaluation of Code Synthesis

2 code implementations • 22 Sep 2020 • Shuo Ren, Daya Guo, Shuai Lu, Long Zhou, Shujie Liu, Duyu Tang, Neel Sundaresan, Ming Zhou, Ambrosio Blanco, Shuai Ma

Evaluation metrics play a vital role in the growth of an area as it defines the standard of distinguishing between good and bad models.

Code Translation Translation

7,760

Paper
Code

Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling

1 code implementation • 7 Mar 2023 • Ziqiang Zhang, Long Zhou, Chengyi Wang, Sanyuan Chen, Yu Wu, Shujie Liu, Zhuo Chen, Yanqing Liu, Huaming Wang, Jinyu Li, Lei He, Sheng Zhao, Furu Wei

We propose a cross-lingual neural codec language model, VALL-E X, for cross-lingual speech synthesis.

In-Context Learning Language Modelling +3

7,161

Paper
Code

GraphCodeBERT: Pre-training Code Representations with Data Flow

1 code implementation • ICLR 2021 • Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie Liu, Long Zhou, Nan Duan, Alexey Svyatkovskiy, Shengyu Fu, Michele Tufano, Shao Kun Deng, Colin Clement, Dawn Drain, Neel Sundaresan, Jian Yin, Daxin Jiang, Ming Zhou

Instead of taking syntactic-level structure of code like abstract syntax tree (AST), we use data flow in the pre-training stage, which is a semantic-level structure of code that encodes the relation of "where-the-value-comes-from" between variables.

Ranked #3 on Type prediction on ManyTypes4TypeScript

Clone Detection Code Completion +7

1,976

Paper
Code

CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation

4 code implementations • 9 Feb 2021 • Shuai Lu, Daya Guo, Shuo Ren, JunJie Huang, Alexey Svyatkovskiy, Ambrosio Blanco, Colin Clement, Dawn Drain, Daxin Jiang, Duyu Tang, Ge Li, Lidong Zhou, Linjun Shou, Long Zhou, Michele Tufano, Ming Gong, Ming Zhou, Nan Duan, Neel Sundaresan, Shao Kun Deng, Shengyu Fu, Shujie Liu

Benchmark datasets have a significant impact on accelerating research in programming language tasks.

Ranked #1 on Cloze Test on CodeXGLUE - CT-maxmin

BIG-bench Machine Learning Clone Detection +9

1,413

Paper
Code

Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data

1 code implementation • 31 Mar 2022 • Junyi Ao, Ziqiang Zhang, Long Zhou, Shujie Liu, Haizhou Li, Tom Ko, LiRong Dai, Jinyu Li, Yao Qian, Furu Wei

In this way, the decoder learns to reconstruct original speech information with codes before learning to generate correct text.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

1,014

Paper
Code

The YiTrans End-to-End Speech Translation System for IWSLT 2022 Offline Shared Task

1 code implementation • 12 Jun 2022 • Ziqiang Zhang, Junyi Ao, Long Zhou, Shujie Liu, Furu Wei, Jinyu Li

The YiTrans system is built on large-scale pre-trained encoder-decoder models.

Data Augmentation Translation

1,014

Paper
Code

SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data

1 code implementation • 30 Sep 2022 • Ziqiang Zhang, Sanyuan Chen, Long Zhou, Yu Wu, Shuo Ren, Shujie Liu, Zhuoyuan Yao, Xun Gong, LiRong Dai, Jinyu Li, Furu Wei

In this paper, we propose a cross-modal Speech and Language Model (SpeechLM) to explicitly align speech and text pre-training with a pre-defined unified discrete representation.

Language Modelling speech-recognition +1

1,014

Paper
Code

SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training

1 code implementation • 7 Oct 2022 • Ziqiang Zhang, Long Zhou, Junyi Ao, Shujie Liu, LiRong Dai, Jinyu Li, Furu Wei

The rapid development of single-modal pre-training has prompted researchers to pay more attention to cross-modal pre-training methods.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

1,014

Paper
Code

Joint Pre-Training with Speech and Bilingual Text for Direct Speech to Speech Translation

1 code implementation • 31 Oct 2022 • Kun Wei, Long Zhou, Ziqiang Zhang, Liping Chen, Shujie Liu, Lei He, Jinyu Li, Furu Wei

However, direct S2ST suffers from the data scarcity problem because the corpora from speech of the source language to speech of the target language are very rare.

Speech-to-Speech Translation Translation

1,014

Paper
Code

UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data

3 code implementations • 19 Jan 2021 • Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang

In this paper, we propose a unified pre-training approach called UniSpeech to learn speech representations with both unlabeled and labeled data, in which supervised phonetic CTC learning and phonetically-aware contrastive self-supervised learning are conducted in a multi-task learning manner.

Multi-Task Learning Representation Learning +3

387

Paper
Code

UniSpeech-SAT: Universal Speech Representation Learning with Speaker Aware Pre-Training

3 code implementations • 12 Oct 2021 • Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu

We integrate the proposed methods into the HuBERT framework.

Data Augmentation Multi-Task Learning +5

387

Paper
Code

Self-Supervised Learning for speech recognition with Intermediate layer supervision

1 code implementation • 16 Dec 2021 • Chengyi Wang, Yu Wu, Sanyuan Chen, Shujie Liu, Jinyu Li, Yao Qian, Zhenglu Yang

Recently, pioneer work finds that speech pre-trained models can solve full-stack speech processing tasks, because the model utilizes bottom layers to learn speaker-related information and top layers to encode content-related information.

Language Modelling Self-Supervised Learning +2

387

Paper
Code

MuTual: A Dataset for Multi-Turn Dialogue Reasoning

1 code implementation • ACL 2020 • Leyang Cui, Yu Wu, Shujie Liu, Yue Zhang, Ming Zhou

Non-task oriented dialogue systems have achieved great success in recent years due to largely accessible conversation data and the development of deep learning techniques.

Task-Oriented Dialogue Systems

269

Paper
Code

On the Comparison of Popular End-to-End Models for Large Scale Speech Recognition

1 code implementation • 28 May 2020 • Jinyu Li, Yu Wu, Yashesh Gaur, Chengyi Wang, Rui Zhao, Shujie Liu

Among all three E2E models, transformer-AED achieved the best accuracy in both streaming and non-streaming mode.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

267

Paper
Code

Two-Stream Network for Sign Language Recognition and Translation

1 code implementation • 2 Nov 2022 • Yutong Chen, Ronglai Zuo, Fangyun Wei, Yu Wu, Shujie Liu, Brian Mak

RGB videos, however, are raw signals with substantial visual redundancy, leading the encoder to overlook the key information for sign language understanding.

Ranked #1 on Sign Language Translation on RWTH-PHOENIX-Weather 2014 T

Sign Language Recognition Sign Language Translation +2

202

Paper
Code

Continuous Speech Separation with Conformer

1 code implementation • 13 Aug 2020 • Sanyuan Chen, Yu Wu, Zhuo Chen, Jian Wu, Jinyu Li, Takuya Yoshioka, Chengyi Wang, Shujie Liu, Ming Zhou

Continuous speech separation plays a vital role in complicated speech related tasks such as conversation transcription.

Ranked #1 on Speech Separation on LibriCSS (using extra training data)

Speech Separation

103

Paper
Code

Achieving Human Parity on Automatic Chinese to English News Translation

2 code implementations • 15 Mar 2018 • Hany Hassan, Anthony Aue, Chang Chen, Vishal Chowdhary, Jonathan Clark, Christian Federmann, Xuedong Huang, Marcin Junczys-Dowmunt, William Lewis, Mu Li, Shujie Liu, Tie-Yan Liu, Renqian Luo, Arul Menezes, Tao Qin, Frank Seide, Xu Tan, Fei Tian, Lijun Wu, Shuangzhi Wu, Yingce Xia, Dong-dong Zhang, Zhirui Zhang, Ming Zhou

Machine translation has made rapid advances in recent years.

Ranked #3 on Machine Translation on WMT 2017 English-Chinese

Machine Translation Translation

Paper
Code

Semantic Mask for Transformer based End-to-End Speech Recognition

1 code implementation • 6 Dec 2019 • Chengyi Wang, Yu Wu, Yujiao Du, Jinyu Li, Shujie Liu, Liang Lu, Shuo Ren, Guoli Ye, Sheng Zhao, Ming Zhou

Attention-based encoder-decoder model has achieved impressive results for both automatic speech recognition (ASR) and text-to-speech (TTS) tasks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Code

Unsupervised Neural Machine Translation with SMT as Posterior Regularization

1 code implementation • 14 Jan 2019 • Shuo Ren, Zhirui Zhang, Shujie Liu, Ming Zhou, Shuai Ma

To address this issue, we introduce phrase based Statistic Machine Translation (SMT) models which are robust to noisy data, as posterior regularizations to guide the training of unsupervised NMT models in the iterative back-translation process.

Ranked #2 on Unsupervised Machine Translation on WMT2014 English-German

NMT Translation +1

Paper
Code

Knowledge Enhanced Fine-Tuning for Better Handling Unseen Entities in Dialogue Generation

1 code implementation • EMNLP 2021 • Leyang Cui, Yu Wu, Shujie Liu, Yue Zhang

To deal with this problem, instead of introducing knowledge base as the input, we force the model to learn a better semantic representation by predicting the information in the knowledge base, only based on the input context.

Dialogue Generation Retrieval

Paper
Code

Don't shoot butterfly with rifles: Multi-channel Continuous Speech Separation with Early Exit Transformer

1 code implementation • 23 Oct 2020 • Sanyuan Chen, Yu Wu, Zhuo Chen, Takuya Yoshioka, Shujie Liu, Jinyu Li

With its strong modeling capacity that comes from a multi-head and multi-layer structure, Transformer is a very powerful model for learning a sequential representation and has been successfully applied to speech separation recently.

Speech Separation

Paper
Code

Target Sound Extraction with Variable Cross-modality Clues

1 code implementation • 15 Mar 2023 • Chenda Li, Yao Qian, Zhuo Chen, Dongmei Wang, Takuya Yoshioka, Shujie Liu, Yanmin Qian, Michael Zeng

Automatic target sound extraction (TSE) is a machine learning approach to mimic the human auditory perception capability of attending to a sound source of interest from a mixture of sources.

AudioCaps Target Sound Extraction

Paper
Code

A Retrieve-and-Rewrite Initialization Method for Unsupervised Machine Translation

1 code implementation • ACL 2020 • Shuo Ren, Yu Wu, Shujie Liu, Ming Zhou, Shuai Ma

The commonly used framework for unsupervised machine translation builds initial translation models of both translation directions, and then performs iterative back-translation to jointly boost their translation performance.

NMT Retrieval +3

Paper
Code

ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation

1 code implementation • NeurIPS 2023 • Chenyang Le, Yao Qian, Long Zhou, Shujie Liu, Yanmin Qian, Michael Zeng, Xuedong Huang

Joint speech-language training is challenging due to the large demand for training data and GPU consumption, as well as the modality gap between speech and language.

Language Modelling Multi-Task Learning +2

Paper
Code

Triangular Architecture for Rare Language Translation

no code implementations • ACL 2018 • Shuo Ren, Wenhu Chen, Shujie Liu, Mu Li, Ming Zhou, Shuai Ma

Neural Machine Translation (NMT) performs poor on the low-resource language pair $(X, Z)$, especially when $Z$ is a rare language.

Machine Translation NMT +1

Paper
Add Code

Generative Bridging Network in Neural Sequence Prediction

no code implementations • 28 Jun 2017 • Wenhu Chen, Guanlin Li, Shuo Ren, Shujie Liu, Zhirui Zhang, Mu Li, Ming Zhou

In order to alleviate data sparsity and overfitting problems in maximum likelihood estimation (MLE) for sequence prediction tasks, we propose the Generative Bridging Network (GBN), in which a novel bridge module is introduced to assist the training of the sequence prediction model (the generator network).

Abstractive Text Summarization Language Modelling +2

Paper
Add Code

Joint Training for Neural Machine Translation Models with Monolingual Data

no code implementations • 1 Mar 2018 • Zhirui Zhang, Shujie Liu, Mu Li, Ming Zhou, Enhong Chen

Monolingual data have been demonstrated to be helpful in improving translation quality of both statistical machine translation (SMT) systems and neural machine translation (NMT) systems, especially in resource-poor or domain adaptation tasks where parallel data are not rich enough.

Domain Adaptation Machine Translation +2

Paper
Add Code

Assertion-based QA with Question-Aware Open Information Extraction

no code implementations • 23 Jan 2018 • Zhao Yan, Duyu Tang, Nan Duan, Shujie Liu, Wendi Wang, Daxin Jiang, Ming Zhou, Zhoujun Li

We present assertion based question answering (ABQA), an open domain question answering task that takes a question and a passage as inputs, and outputs a semi-structured assertion consisting of a subject, a predicate and a list of arguments.

Learning-To-Rank Open-Domain Question Answering +2

Paper
Add Code

Implicit Distortion and Fertility Models for Attention-based Encoder-Decoder NMT Model

no code implementations • 13 Jan 2016 • Shi Feng, Shujie Liu, Mu Li, Ming Zhou

Aiming to resolve these problems, we propose new variations of attention-based encoder-decoder and compare them with other models on machine translation.

Attribute Image Captioning +5

Paper
Add Code

A Statistical Parsing Framework for Sentiment Classification

no code implementations • CL 2015 • Li Dong, Furu Wei, Shujie Liu, Ming Zhou, Ke Xu

Unlike previous works that employ syntactic parsing results for sentiment analysis, we develop a statistical parser to directly analyze the sentiment structure of a sentence.

Classification General Classification +4

Paper
Add Code

Beyond Word-based Language Model in Statistical Machine Translation

no code implementations • 5 Feb 2015 • Jiajun Zhang, Shujie Liu, Mu Li, Ming Zhou, Cheng-qing Zong

Language model is one of the most important modules in statistical machine translation and currently the word-based language model dominants this community.

Language Modelling Machine Translation +1

Paper
Add Code

Regularizing Neural Machine Translation by Target-bidirectional Agreement

no code implementations • 13 Aug 2018 • Zhirui Zhang, Shuangzhi Wu, Shujie Liu, Mu Li, Ming Zhou, Tong Xu

Although Neural Machine Translation (NMT) has achieved remarkable progress in the past several years, most NMT systems still suffer from a fundamental shortcoming as in other sequence generation tasks: errors made early in generation process are fed as inputs to the model and can be quickly amplified, harming subsequent sequence generation.

Machine Translation NMT +1

Paper
Add Code

Approximate Distribution Matching for Sequence-to-Sequence Learning

no code implementations • 24 Aug 2018 • Wenhu Chen, Guanlin Li, Shujie Liu, Zhirui Zhang, Mu Li, Ming Zhou

Then, we interpret sequence-to-sequence learning as learning a transductive model to transform the source local latent distributions to match their corresponding target distributions.

Image Captioning Machine Translation +1

Paper
Add Code

Style Transfer as Unsupervised Machine Translation

no code implementations • 23 Aug 2018 • Zhirui Zhang, Shuo Ren, Shujie Liu, Jianyong Wang, Peng Chen, Mu Li, Ming Zhou, Enhong Chen

Language style transferring rephrases text with specific stylistic attributes while preserving the original attribute-independent content.

Ranked #3 on Unsupervised Text Style Transfer on GYAFC

Attribute NMT +4

Paper
Add Code

Bidirectional Generative Adversarial Networks for Neural Machine Translation

no code implementations • CONLL 2018 • Zhirui Zhang, Shujie Liu, Mu Li, Ming Zhou, Enhong Chen

To address this issue and stabilize the GAN training, in this paper, we propose a novel Bidirectional Generative Adversarial Network for Neural Machine Translation (BGAN-NMT), which aims to introduce a generator model to act as the discriminator, whereby the discriminator naturally considers the entire translation space so that the inadequate training problem can be alleviated.

Generative Adversarial Network Language Modelling +4

Paper
Add Code

Chunk-based Decoder for Neural Machine Translation

no code implementations • ACL 2017 • Shonosuke Ishiwatari, JingTao Yao, Shujie Liu, Mu Li, Ming Zhou, Naoki Yoshinaga, Masaru Kitsuregawa, Weijia Jia

The chunk-level decoder models global dependencies while the word-level decoder decides the local word order in a chunk.

Machine Translation NMT +2

Paper
Add Code

Knowledge-Based Semantic Embedding for Machine Translation

no code implementations • ACL 2016 • Chen Shi, Shujie Liu, Shuo Ren, Shi Feng, Mu Li, Ming Zhou, Xu sun, Houfeng Wang

Machine Translation Translation

Paper
Add Code

Learning to Collaborate for Question Answering and Asking

no code implementations • NAACL 2018 • Duyu Tang, Nan Duan, Zhao Yan, Zhirui Zhang, Yibo Sun, Shujie Liu, Yuanhua Lv, Ming Zhou

Secondly, directly applying GAN that regards all the generated questions as negative instances could not improve the accuracy of the QA model.

Answer Selection Generative Adversarial Network +2

Paper
Add Code

Generative Bridging Network for Neural Sequence Prediction

no code implementations • NAACL 2018 • Wenhu Chen, Guanlin Li, Shuo Ren, Shujie Liu, Zhirui Zhang, Mu Li, Ming Zhou

Abstractive Text Summarization Image Captioning +5

Paper
Add Code

Stack-based Multi-layer Attention for Transition-based Dependency Parsing

no code implementations • EMNLP 2017 • Zhirui Zhang, Shujie Liu, Mu Li, Ming Zhou, Enhong Chen

Although sequence-to-sequence (seq2seq) network has achieved significant success in many NLP tasks such as machine translation and text summarization, simply applying this approach to transition-based dependency parsing cannot yield a comparable performance gain as in other state-of-the-art methods, such as stack-LSTM and head selection.

Language Modelling Machine Translation +3

Paper
Add Code

Improving Attention Modeling with Implicit Distortion and Fertility for Machine Translation

no code implementations • COLING 2016 • Shi Feng, Shujie Liu, Nan Yang, Mu Li, Ming Zhou, Kenny Q. Zhu

In neural machine translation, the attention mechanism facilitates the translation process by producing a soft alignment between the source sentence and the target sentence.

Machine Translation Sentence +1

Paper
Add Code

Bilingually-constrained Phrase Embeddings for Machine Translation

no code implementations • ACL 2014 • Jiajun Zhang, Shujie Liu, Mu Li, Ming Zhou, Cheng-qing Zong

Language Modelling Machine Translation +4

Paper
Add Code

Learning Topic Representation for SMT with Neural Networks

no code implementations • ACL 2014 • Lei Cui, Dong-dong Zhang, Shujie Liu, Qiming Chen, Mu Li, Ming Zhou, Muyun Yang

Domain Adaptation Machine Translation

Paper
Add Code

A Recursive Recurrent Neural Network for Statistical Machine Translation

no code implementations • ACL 2014 • Shujie Liu, Nan Yang, Mu Li, Ming Zhou

Chunking Language Modelling +7

Paper
Add Code

Word Alignment Modeling with Context Dependent Deep Neural Network

no code implementations • ACL 2013 • Nan Yang, Shujie Liu, Mu Li, Ming Zhou, Nenghai Yu

Speech Recognition Word Alignment

Paper
Add Code

Learning Entity Representation for Entity Disambiguation

no code implementations • ACL 2013 • Zhengyan He, Shujie Liu, Mu Li, Ming Zhou, Longkai Zhang, Houfeng Wang

Denoising Entity Disambiguation +1

Paper
Add Code

Bilingual Data Cleaning for SMT using Graph-based Random Walk

no code implementations • ACL 2013 • Lei Cui, Dong-dong Zhang, Shujie Liu, Mu Li, Ming Zhou

Machine Translation Word Alignment

Paper
Add Code

Learning Translation Consensus with Structured Label Propagation

no code implementations • ACL 2012 • Shujie Liu, Chi-Ho Li, Mu Li, Ming Zhou

Machine Translation Translation

Paper
Add Code

Hierarchical Recurrent Neural Network for Document Modeling

no code implementations • EMNLP 2015 • Rui Lin, Shujie Liu, Muyun Yang, Mu Li, Ming Zhou, Sheng Li

Chunking Language Modelling +3

Paper
Add Code

Efficient Collective Entity Linking with Stacking

no code implementations • EMNLP 2013 • Zhengyan He, Shujie Liu, Yang song, Mu Li, Ming Zhou, Houfeng Wang

Entity Linking Learning-To-Rank +1

Paper
Add Code

Multi-Domain Adaptation for SMT Using Multi-Task Learning

no code implementations • EMNLP 2013 • Lei Cui, Xilun Chen, Dong-dong Zhang, Shujie Liu, Mu Li, Ming Zhou

Domain Adaptation Machine Translation +2

Paper
Add Code

Re-training Monolingual Parser Bilingually for Syntactic SMT

no code implementations • EMNLP 2012 • Shujie Liu, Chi-Ho Li, Mu Li, Ming Zhou

Machine Translation Word Alignment

Paper
Add Code

Explicit Cross-lingual Pre-training for Unsupervised Machine Translation

no code implementations • IJCNLP 2019 • Shuo Ren, Yu Wu, Shujie Liu, Ming Zhou, Shuai Ma

Pre-training has proven to be effective in unsupervised machine translation due to its ability to model deep context information in cross-lingual scenarios.

Language Modelling Translation +1

Paper
Add Code

Accelerating Transformer Decoding via a Hybrid of Self-attention and Recurrent Neural Network

no code implementations • 5 Sep 2019 • Chengyi Wang, Shuangzhi Wu, Shujie Liu

Due to the highly parallelizable architecture, Transformer is faster to train than RNN-based models and popularly used in machine translation tasks.

Knowledge Distillation Machine Translation +1

Paper
Add Code

Source Dependency-Aware Transformer with Supervised Self-Attention

no code implementations • 5 Sep 2019 • Chengyi Wang, Shuangzhi Wu, Shujie Liu

Recently, Transformer has achieved the state-of-the-art performance on many machine translation tasks.

Machine Translation Translation

Paper
Add Code

Bridging the Gap between Pre-Training and Fine-Tuning for End-to-End Speech Translation

no code implementations • 17 Sep 2019 • Chengyi Wang, Yu Wu, Shujie Liu, Zhenglu Yang, Ming Zhou

End-to-end speech translation, a hot topic in recent years, aims to translate a segment of audio into a specific language with an end-to-end model.

Multi-Task Learning Translation

Paper
Add Code

Unsupervised Context Rewriting for Open Domain Conversation

no code implementations • IJCNLP 2019 • Kun Zhou, Kai Zhang, Yu Wu, Shujie Liu, Jingsong Yu

Context modeling has a pivotal role in open domain conversation.

Reinforcement Learning (RL) Response Generation +1

Paper
Add Code

Curriculum Pre-training for End-to-End Speech Translation

no code implementations • ACL 2020 • Chengyi Wang, Yu Wu, Shujie Liu, Ming Zhou, Zhenglu Yang

End-to-end speech translation poses a heavy burden on the encoder, because it has to transcribe, understand, and learn cross-lingual semantics simultaneously.

speech-recognition Speech Recognition +1

Paper
Add Code

A Graph-based Coarse-to-fine Method for Unsupervised Bilingual Lexicon Induction

no code implementations • ACL 2020 • Shuo Ren, Shujie Liu, Ming Zhou, Shuai Ma

To deal with those issues, in this paper, we propose a novel graph-based paradigm to induce bilingual lexicons in a coarse-to-fine way.

Bilingual Lexicon Induction Cross-Lingual Word Embeddings +2

Paper
Add Code

Low Latency End-to-End Streaming Speech Recognition with a Scout Network

no code implementations • 23 Mar 2020 • Chengyi Wang, Yu Wu, Shujie Liu, Jinyu Li, Liang Lu, Guoli Ye, Ming Zhou

The attention-based Transformer model has achieved promising results for speech recognition (SR) in the offline mode.

Audio and Speech Processing

Paper
Add Code

Developing Real-time Streaming Transformer Transducer for Speech Recognition on Large-scale Dataset

no code implementations • 22 Oct 2020 • Xie Chen, Yu Wu, Zhenghao Wang, Shujie Liu, Jinyu Li

Recently, Transformer based end-to-end models have achieved great success in many areas including speech recognition.

speech-recognition Speech Recognition

Paper
Add Code

Investigation of Practical Aspects of Single Channel Speech Separation for ASR

no code implementations • 5 Jul 2021 • Jian Wu, Zhuo Chen, Sanyuan Chen, Yu Wu, Takuya Yoshioka, Naoyuki Kanda, Shujie Liu, Jinyu Li

Speech separation has been successfully applied as a frontend processing module of conversation transcription systems thanks to its ability to handle overlapped speech and its flexibility to combine with downstream tasks such as automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

UniSpeech at scale: An Empirical Study of Pre-training Method on Large-Scale Speech Recognition Dataset

no code implementations • 12 Jul 2021 • Chengyi Wang, Yu Wu, Shujie Liu, Jinyu Li, Yao Qian, Kenichi Kumatani, Furu Wei

Recently, there has been a vast interest in self-supervised learning (SSL) where the model is pre-trained on large scale unlabeled data and then fine-tuned on a small labeled dataset.

Self-Supervised Learning speech-recognition +1

Paper
Add Code

A Configurable Multilingual Model is All You Need to Recognize All Languages

no code implementations • 13 Jul 2021 • Long Zhou, Jinyu Li, Eric Sun, Shujie Liu

Particularly, a single CMM can be deployed to any user scenario where the users can pre-select any combination of languages.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

SemFace: Pre-training Encoder and Decoder with a Semantic Interface for Neural Machine Translation

no code implementations • ACL 2021 • Shuo Ren, Long Zhou, Shujie Liu, Furu Wei, Ming Zhou, Shuai Ma

While pre-training techniques are working very well in natural language processing, how to pre-train a decoder and effectively use it for neural machine translation (NMT) still remains a tricky issue.

Machine Translation NMT +1

Paper
Add Code

Jointly Learning to Repair Code and Generate Commit Message

no code implementations • EMNLP 2021 • Jiaqi Bai, Long Zhou, Ambrosio Blanco, Shujie Liu, Furu Wei, Ming Zhou, Zhoujun Li

We propose a novel task of jointly repairing program codes and generating commit messages.

Code Repair Translation

Paper
Add Code

Multi-View Self-Attention Based Transformer for Speaker Recognition

no code implementations • 11 Oct 2021 • Rui Wang, Junyi Ao, Long Zhou, Shujie Liu, Zhihua Wei, Tom Ko, Qing Li, Yu Zhang

In this work, we propose a novel multi-view self-attention mechanism and present an empirical study of different Transformer variants with or without the proposed attention mechanism for speaker recognition.

Speaker Recognition

Paper
Add Code

Separating Long-Form Speech with Group-Wise Permutation Invariant Training

no code implementations • 27 Oct 2021 • Wangyou Zhang, Zhuo Chen, Naoyuki Kanda, Shujie Liu, Jinyu Li, Sefik Emre Eskimez, Takuya Yoshioka, Xiong Xiao, Zhong Meng, Yanmin Qian, Furu Wei

Multi-talker conversational speech processing has drawn many interests for various applications such as meeting transcription.

Speech Separation

Paper
Add Code

Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction

no code implementations • 28 Oct 2021 • Heming Wang, Yao Qian, Xiaofei Wang, Yiming Wang, Chengyi Wang, Shujie Liu, Takuya Yoshioka, Jinyu Li, DeLiang Wang

The reconstruction module is used for auxiliary learning to improve the noise robustness of the learned representation and thus is not required during inference.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +8

Paper
Add Code

Grammar-Based Patches Generation for Automated Program Repair

no code implementations • Findings (ACL) 2021 • Yu Tang, Long Zhou, Ambrosio Blanco, Shujie Liu, Furu Wei, Ming Zhou, Muyun Yang

Program Repair

Paper
Add Code

Ultra Fast Speech Separation Model with Teacher Student Learning

no code implementations • 27 Apr 2022 • Sanyuan Chen, Yu Wu, Zhuo Chen, Jian Wu, Takuya Yoshioka, Shujie Liu, Jinyu Li, Xiangzhan Yu

In this paper, an ultra fast speech separation Transformer model is proposed to achieve both better performance and efficiency with teacher student learning (T-S learning).

Computational Efficiency Speech Separation

Paper
Add Code

Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?

no code implementations • 27 Apr 2022 • Sanyuan Chen, Yu Wu, Chengyi Wang, Shujie Liu, Zhuo Chen, Peidong Wang, Gang Liu, Jinyu Li, Jian Wu, Xiangzhan Yu, Furu Wei

Recently, self-supervised learning (SSL) has demonstrated strong performance in speaker recognition, even if the pre-training objective is designed for speech recognition.

Self-Supervised Learning Speaker Recognition +3

Paper
Add Code

Supervision-Guided Codebooks for Masked Prediction in Speech Pre-training

no code implementations • 21 Jun 2022 • Chengyi Wang, Yiming Wang, Yu Wu, Sanyuan Chen, Jinyu Li, Shujie Liu, Furu Wei

Recently, masked prediction pre-training has seen remarkable progress in self-supervised learning (SSL) for speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers

no code implementations • 5 Nov 2022 • Peidong Wang, Eric Sun, Jian Xue, Yu Wu, Long Zhou, Yashesh Gaur, Shujie Liu, Jinyu Li

In this paper, we propose LAMASSU, a streaming language-agnostic multilingual speech recognition and translation model using neural transducers.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

LongFNT: Long-form Speech Recognition with Factorized Neural Transducer

no code implementations • 17 Nov 2022 • Xun Gong, Yu Wu, Jinyu Li, Shujie Liu, Rui Zhao, Xie Chen, Yanmin Qian

This motivates us to leverage the factorized neural transducer structure, containing a real language model, the vocabulary predictor.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Exploring WavLM on Speech Enhancement

no code implementations • 18 Nov 2022 • Hyungchan Song, Sanyuan Chen, Zhuo Chen, Yu Wu, Takuya Yoshioka, Min Tang, Jong Won Shin, Shujie Liu

There is a surge in interest in self-supervised learning approaches for end-to-end speech encoding in recent years as they have achieved great success.

Self-Supervised Learning Speech Enhancement +2

Paper
Add Code

VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning

no code implementations • 21 Nov 2022 • Qiushi Zhu, Long Zhou, Ziqiang Zhang, Shujie Liu, Binxing Jiao, Jie Zhang, LiRong Dai, Daxin Jiang, Jinyu Li, Furu Wei

Although speech is a simple and effective way for humans to communicate with the outside world, a more realistic speech interaction contains multimodal information, e. g., vision, text.

Audio-Visual Speech Recognition Language Modelling +3

Paper
Add Code

Building High-accuracy Multilingual ASR with Gated Language Experts and Curriculum Training

no code implementations • 1 Mar 2023 • Eric Sun, Jinyu Li, Yuxuan Hu, Yimeng Zhu, Long Zhou, Jian Xue, Peidong Wang, Linquan Liu, Shujie Liu, Edward Lin, Yifan Gong

We propose gated language experts and curriculum training to enhance multilingual transformer transducer models without requiring language identification (LID) input from users during inference.

Language Identification

Paper
Add Code

Code-Switching Text Generation and Injection in Mandarin-English ASR

no code implementations • 20 Mar 2023 • Haibin Yu, Yuxuan Hu, Yao Qian, Ma Jin, Linquan Liu, Shujie Liu, Yu Shi, Yanmin Qian, Edward Lin, Michael Zeng

Code-switching speech refers to a means of expression by mixing two or more languages within a single utterance.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation

no code implementations • 25 May 2023 • Tianrui Wang, Long Zhou, Ziqiang Zhang, Yu Wu, Shujie Liu, Yashesh Gaur, Zhuo Chen, Jinyu Li, Furu Wei

Recent research shows a big convergence in model architecture, training objectives, and inference methods across various tasks for different modalities.

Language Modelling Multi-Task Learning +3

Paper
Add Code

OpenNDD: Open Set Recognition for Neurodevelopmental Disorders Detection

no code implementations • 28 Jun 2023 • Jiaming Yu, Zihao Guan, Xinyue Chang, Shujie Liu, Zhenshan Shi, Xiumei Liu, Changcai Yang, Riqing Chen, Lanyan Xue, Lifang Wei

Since the strong comorbid similarity in NDDs, such as attention-deficit hyperactivity disorder, can interfere with the accurate diagnosis of autism spectrum disorder (ASD), identifying unknown classes is extremely crucial and challenging from NDDs.

open-set classification Open Set Learning

Paper
Add Code

Accelerating Transducers through Adjacent Token Merging

no code implementations • 28 Jun 2023 • Yuang Li, Yu Wu, Jinyu Li, Shujie Liu

Recent end-to-end automatic speech recognition (ASR) systems often utilize a Transformer-based acoustic encoder that generates embedding at a high frame rate.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Prompting Large Language Models for Zero-Shot Domain Adaptation in Speech Recognition

no code implementations • 28 Jun 2023 • Yuang Li, Yu Wu, Jinyu Li, Shujie Liu

Different from these methods, in this work, with only a domain-specific text prompt, we propose two zero-shot ASR domain adaptation methods using LLaMA, a 7-billion-parameter large language model (LLM).

Domain Adaptation Language Modelling +3

Paper
Add Code

On decoder-only architecture for speech-to-text and large language model integration

no code implementations • 8 Jul 2023 • Jian Wu, Yashesh Gaur, Zhuo Chen, Long Zhou, Yimeng Zhu, Tianrui Wang, Jinyu Li, Shujie Liu, Bo Ren, Linquan Liu, Yu Wu

Large language models (LLMs) have achieved remarkable success in the field of natural language processing, enabling better human-computer interaction using natural language.

Language Modelling Large Language Model +1

Paper
Add Code

SpeechX: Neural Codec Language Model as a Versatile Speech Transformer

no code implementations • 14 Aug 2023 • Xiaofei Wang, Manthan Thakker, Zhuo Chen, Naoyuki Kanda, Sefik Emre Eskimez, Sanyuan Chen, Min Tang, Shujie Liu, Jinyu Li, Takuya Yoshioka

Recent advancements in generative speech models based on audio-text prompts have enabled remarkable innovations like high-quality zero-shot text-to-speech.

Language Modelling Multi-Task Learning +2

Paper
Add Code

WavMark: Watermarking for Audio Generation

no code implementations • 24 Aug 2023 • Guangyu Chen, Yu Wu, Shujie Liu, Tao Liu, Xiaoyong Du, Furu Wei

Recent breakthroughs in zero-shot voice synthesis have enabled imitating a speaker's voice using just a few seconds of recording while maintaining a high level of realism.

Audio Generation

Paper
Add Code

Diffusion Conditional Expectation Model for Efficient and Robust Target Speech Extraction

no code implementations • 25 Sep 2023 • Leying Zhang, Yao Qian, Linfeng Yu, Heming Wang, Xinkai Wang, Hemin Yang, Long Zhou, Shujie Liu, Yanmin Qian, Michael Zeng

Additionally, we introduce Regenerate-DCEM (R-DCEM) that can regenerate and optimize speech quality based on pre-processed speech from a discriminative model.

Speech Extraction

Paper
Add Code

COSMIC: Data Efficient Instruction-tuning For Speech In-Context Learning

no code implementations • 3 Nov 2023 • Jing Pan, Jian Wu, Yashesh Gaur, Sunit Sivasankaran, Zhuo Chen, Shujie Liu, Jinyu Li

With fewer than 20M trainable parameters and as little as 450 hours of English speech data for SQA generation, COSMIC exhibits emergent instruction-following and in-context learning capabilities in speech-to-text tasks.

Domain Adaptation In-Context Learning +4

Paper
Add Code

Boosting Large Language Model for Speech Synthesis: An Empirical Study

no code implementations • 30 Dec 2023 • Hongkun Hao, Long Zhou, Shujie Liu, Jinyu Li, Shujie Hu, Rui Wang, Furu Wei

In this paper, we conduct a comprehensive empirical exploration of boosting LLMs with the ability to generate speech, by combining pre-trained LLM LLaMA/OPT and text-to-speech synthesis model VALL-E. We compare three integration methods between LLMs and speech synthesis models, including directly fine-tuned LLMs, superposed layers of LLMs and VALL-E, and coupled LLMs and VALL-E using LLMs as a powerful text encoder.

Language Modelling Large Language Model +2

Paper
Add Code

WavLLM: Towards Robust and Adaptive Speech Large Language Model

no code implementations • 31 Mar 2024 • Shujie Hu, Long Zhou, Shujie Liu, Sanyuan Chen, Hongkun Hao, Jing Pan, Xunying Liu, Jinyu Li, Sunit Sivasankaran, Linquan Liu, Furu Wei

In this work, we introduce WavLLM, a robust and adaptive speech large language model with dual encoders, and a prompt-aware LoRA weight adapter, optimized by a two-stage curriculum learning approach.

Language Modelling Large Language Model

Paper
Add Code

RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis

no code implementations • 4 Apr 2024 • Detai Xin, Xu Tan, Kai Shen, Zeqian Ju, Dongchao Yang, Yuancheng Wang, Shinnosuke Takamichi, Hiroshi Saruwatari, Shujie Liu, Jinyu Li, Sheng Zhao

Furthermore, we demonstrate that RALL-E correctly synthesizes sentences that are hard for VALL-E and reduces the error rate from $68\%$ to $4\%$.

Language Modelling Speech Synthesis +1

Paper
Add Code

CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations

no code implementations • 10 Apr 2024 • Leying Zhang, Yao Qian, Long Zhou, Shujie Liu, Dongmei Wang, Xiaofei Wang, Midia Yousefi, Yanmin Qian, Jinyu Li, Lei He, Sheng Zhao, Michael Zeng

CoVoMix is capable of first converting dialogue text into multiple streams of discrete tokens, with each token stream representing semantic information for individual talkers.

Dialogue Generation

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.