Search Results for author: Liang Lu

Found 33 papers, 9 papers with code

Improved Neural Protoform Reconstruction via Reflex Prediction

1 code implementation27 Mar 2024 Liang Lu, Jingzhi Wang, David R. Mortensen

Protolanguage reconstruction is central to historical linguistics.

Endpoint Detection for Streaming End-to-End Multi-talker ASR

no code implementations24 Jan 2022 Liang Lu, Jinyu Li, Yifan Gong

Our experimental results based on the 2-speaker LibrispeechMix dataset show that the SURT model can achieve promising EP detection without significantly degradation of the recognition accuracy.

Sentence speech-recognition +2

Continuous Streaming Multi-Talker ASR with Dual-path Transducers

no code implementations17 Sep 2021 Desh Raj, Liang Lu, Zhuo Chen, Yashesh Gaur, Jinyu Li

Streaming recognition of multi-talker conversations has so far been evaluated only for 2-speaker single-turn sessions.

Speech Separation

Minimum Word Error Rate Training with Language Model Fusion for End-to-End Speech Recognition

no code implementations4 Jun 2021 Zhong Meng, Yu Wu, Naoyuki Kanda, Liang Lu, Xie Chen, Guoli Ye, Eric Sun, Jinyu Li, Yifan Gong

In this work, we perform LM fusion in the minimum WER (MWER) training of an E2E model to obviate the need for LM weights tuning during inference.

Language Modelling speech-recognition +1

Streaming Multi-talker Speech Recognition with Joint Speaker Identification

no code implementations5 Apr 2021 Liang Lu, Naoyuki Kanda, Jinyu Li, Yifan Gong

In multi-talker scenarios such as meetings and conversations, speech processing systems are usually required to transcribe the audio as well as identify the speakers for downstream applications.

Speaker Identification speech-recognition +2

Internal Language Model Training for Domain-Adaptive End-to-End Speech Recognition

no code implementations2 Feb 2021 Zhong Meng, Naoyuki Kanda, Yashesh Gaur, Sarangarajan Parthasarathy, Eric Sun, Liang Lu, Xie Chen, Jinyu Li, Yifan Gong

The efficacy of external language model (LM) integration with existing end-to-end (E2E) automatic speech recognition (ASR) systems can be improved significantly using the internal language model estimation (ILME) method.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Streaming end-to-end multi-talker speech recognition

no code implementations26 Nov 2020 Liang Lu, Naoyuki Kanda, Jinyu Li, Yifan Gong

End-to-end multi-talker speech recognition is an emerging research trend in the speech community due to its vast potential in applications such as conversation and meeting transcriptions.

speech-recognition Speech Recognition

Internal Language Model Estimation for Domain-Adaptive End-to-End Speech Recognition

no code implementations3 Nov 2020 Zhong Meng, Sarangarajan Parthasarathy, Eric Sun, Yashesh Gaur, Naoyuki Kanda, Liang Lu, Xie Chen, Rui Zhao, Jinyu Li, Yifan Gong

The external language models (LM) integration remains a challenging task for end-to-end (E2E) automatic speech recognition (ASR) which has no clear division between acoustic and language models.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Minimum Bayes Risk Training for End-to-End Speaker-Attributed ASR

1 code implementation3 Nov 2020 Naoyuki Kanda, Zhong Meng, Liang Lu, Yashesh Gaur, Xiaofei Wang, Zhuo Chen, Takuya Yoshioka

Recently, an end-to-end speaker-attributed automatic speech recognition (E2E SA-ASR) model was proposed as a joint model of speaker counting, speech recognition and speaker identification for monaural overlapped speech.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

On Minimum Word Error Rate Training of the Hybrid Autoregressive Transducer

no code implementations23 Oct 2020 Liang Lu, Zhong Meng, Naoyuki Kanda, Jinyu Li, Yifan Gong

Hybrid Autoregressive Transducer (HAT) is a recently proposed end-to-end acoustic model that extends the standard Recurrent Neural Network Transducer (RNN-T) for the purpose of the external language model (LM) fusion.

Language Modelling speech-recognition +1

Exploring Transformers for Large-Scale Speech Recognition

no code implementations19 May 2020 Liang Lu, Changliang Liu, Jinyu Li, Yifan Gong

While recurrent neural networks still largely define state-of-the-art speech recognition systems, the Transformer network has been proven to be a competitive alternative, especially in the offline condition.

speech-recognition Speech Recognition

Exploring Pre-training with Alignments for RNN Transducer based End-to-End Speech Recognition

no code implementations1 May 2020 Hu Hu, Rui Zhao, Jinyu Li, Liang Lu, Yifan Gong

Recently, the recurrent neural network transducer (RNN-T) architecture has become an emerging trend in end-to-end automatic speech recognition research due to its advantages of being capable for online streaming speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Low Latency End-to-End Streaming Speech Recognition with a Scout Network

no code implementations23 Mar 2020 Chengyi Wang, Yu Wu, Shujie Liu, Jinyu Li, Liang Lu, Guoli Ye, Ming Zhou

The attention-based Transformer model has achieved promising results for speech recognition (SR) in the offline mode.

Audio and Speech Processing

Continuous speech separation: dataset and analysis

1 code implementation30 Jan 2020 Zhuo Chen, Takuya Yoshioka, Liang Lu, Tianyan Zhou, Zhong Meng, Yi Luo, Jian Wu, Xiong Xiao, Jinyu Li

In this paper, we define continuous speech separation (CSS) as a task of generating a set of non-overlapped speech signals from a \textit{continuous} audio stream that contains multiple utterances that are \emph{partially} overlapped by a varying degree.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Semantic Mask for Transformer based End-to-End Speech Recognition

1 code implementation6 Dec 2019 Chengyi Wang, Yu Wu, Yujiao Du, Jinyu Li, Shujie Liu, Liang Lu, Shuo Ren, Guoli Ye, Sheng Zhao, Ming Zhou

Attention-based encoder-decoder model has achieved impressive results for both automatic speech recognition (ASR) and text-to-speech (TTS) tasks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

A Transformer with Interleaved Self-attention and Convolution for Hybrid Acoustic Models

1 code implementation23 Oct 2019 Liang Lu

Transformer with self-attention has achieved great success in the area of nature language processing.

speech-recognition Speech Recognition

Self-Teaching Networks

no code implementations9 Sep 2019 Liang Lu, Eric Sun, Yifan Gong

Furthermore, the auxiliary loss also works as a regularizer, which improves the generalization capacity of the network.

speech-recognition Speech Recognition

PyKaldi2: Yet another speech toolkit based on Kaldi and PyTorch

1 code implementation12 Jul 2019 Liang Lu, Xiong Xiao, Zhuo Chen, Yifan Gong

While similar toolkits are available built on top of the two, a key feature of PyKaldi2 is sequence training with criteria such as MMI, sMBR and MPE.

speech-recognition Speech Recognition

End-to-End Neural Segmental Models for Speech Recognition

no code implementations1 Aug 2017 Hao Tang, Liang Lu, Lingpeng Kong, Kevin Gimpel, Karen Livescu, Chris Dyer, Noah A. Smith, Steve Renals

Segmental models are an alternative to frame-based models for sequence prediction, where hypothesized path weights are based on entire segment scores rather than a single frame at a time.

speech-recognition Speech Recognition

Toward Computation and Memory Efficient Neural Network Acoustic Models with Binary Weights and Activations

no code implementations28 Jun 2017 Liang Lu

This paper investigates the use of binary weights and activations for computation and memory efficient neural network acoustic models.

Efficient Neural Network speech-recognition +1

Multitask Learning with Low-Level Auxiliary Tasks for Encoder-Decoder Based Speech Recognition

no code implementations5 Apr 2017 Shubham Toshniwal, Hao Tang, Liang Lu, Karen Livescu

We hypothesize that using intermediate representations as auxiliary supervision at lower levels of deep networks may be a good way of combining the advantages of end-to-end training and more traditional pipeline approaches.

speech-recognition Speech Recognition

Multitask Learning with CTC and Segmental CRF for Speech Recognition

no code implementations21 Feb 2017 Liang Lu, Lingpeng Kong, Chris Dyer, Noah A. Smith

Segmental conditional random fields (SCRFs) and connectionist temporal classification (CTC) are two sequence labeling methods used for end-to-end training of speech recognition models.

speech-recognition Speech Recognition

Adaptive DCTNet for Audio Signal Classification

1 code implementation13 Dec 2016 Yin Xian, Yunchen Pu, Zhe Gan, Liang Lu, Andrew Thompson

Its output feature is related to Cohen's class of time-frequency distributions.

Sound

Small-footprint Highway Deep Neural Networks for Speech Recognition

no code implementations18 Oct 2016 Liang Lu, Steve Renals

Furthermore, HDNNs are more controllable than DNNs: the gate functions of an HDNN can control the behavior of the whole network using a very small number of model parameters.

speech-recognition Speech Recognition

Multiplicative LSTM for sequence modelling

1 code implementation26 Sep 2016 Ben Krause, Liang Lu, Iain Murray, Steve Renals

We introduce multiplicative LSTM (mLSTM), a recurrent neural network architecture for sequence modelling that combines the long short-term memory (LSTM) and multiplicative recurrent neural network architectures.

Density Estimation Language Modelling

Knowledge Distillation for Small-footprint Highway Networks

no code implementations2 Aug 2016 Liang Lu, Michelle Guo, Steve Renals

We have shown that HDNN-based acoustic models can achieve comparable recognition accuracy with much smaller number of model parameters compared to plain deep neural network (DNN) acoustic models.

Acoustic Modelling Knowledge Distillation +2

Sequence Training and Adaptation of Highway Deep Neural Networks

no code implementations7 Jul 2016 Liang Lu

Highway deep neural network (HDNN) is a type of depth-gated feedforward neural network, which has shown to be easier to train with more hidden layers and also generalise better compared to conventional plain deep neural networks (DNNs).

speech-recognition Speech Recognition

Segmental Recurrent Neural Networks for End-to-end Speech Recognition

no code implementations1 Mar 2016 Liang Lu, Lingpeng Kong, Chris Dyer, Noah A. Smith, Steve Renals

This model connects the segmental conditional random field (CRF) with a recurrent neural network (RNN) used for feature extraction.

Acoustic Modelling Language Modelling +2

Small-footprint Deep Neural Networks with Highway Connections for Speech Recognition

no code implementations14 Dec 2015 Liang Lu, Steve Renals

For speech recognition, deep neural networks (DNNs) have significantly improved the recognition accuracy in most of benchmark datasets and application domains.

speech-recognition Speech Recognition

Top-down Tree Long Short-Term Memory Networks

1 code implementation NAACL 2016 Xingxing Zhang, Liang Lu, Mirella Lapata

Long Short-Term Memory (LSTM) networks, a type of recurrent neural network with a more complex computational unit, have been successfully applied to a variety of sequence modeling tasks.

Dependency Parsing Sentence +1

Tied Probabilistic Linear Discriminant Analysis for Speech Recognition

no code implementations4 Nov 2014 Liang Lu, Steve Renals

Acoustic models using probabilistic linear discriminant analysis (PLDA) capture the correlations within feature vectors using subspaces which do not vastly expand the model.

speech-recognition Speech Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.