Search Results for author: Jinyu Li

Found 78 papers, 10 papers with code

Ultra Fast Speech Separation Model with Teacher Student Learning

no code implementations27 Apr 2022 Sanyuan Chen, Yu Wu, Zhuo Chen, Jian Wu, Takuya Yoshioka, Shujie Liu, Jinyu Li, Xiangzhan Yu

In this paper, an ultra fast speech separation Transformer model is proposed to achieve both better performance and efficiency with teacher student learning (T-S learning).

Speech Separation

Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?

no code implementations27 Apr 2022 Sanyuan Chen, Yu Wu, Chengyi Wang, Shujie Liu, Zhuo Chen, Peidong Wang, Gang Liu, Jinyu Li, Jian Wu, Xiangzhan Yu, Furu Wei

Recently, self-supervised learning (SSL) has demonstrated strong performance in speaker recognition, even if the pre-training objective is designed for speech recognition.

Self-Supervised Learning Speaker Recognition +2

Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings

no code implementations30 Mar 2022 Naoyuki Kanda, Jian Wu, Yu Wu, Xiong Xiao, Zhong Meng, Xiaofei Wang, Yashesh Gaur, Zhuo Chen, Jinyu Li, Takuya Yoshioka

The proposed speaker embedding, named t-vector, is extracted synchronously with the t-SOT ASR model, enabling joint execution of speaker identification (SID) or speaker diarization (SD) with the multi-talker transcription with low latency.

Automatic Speech Recognition Speaker Diarization +1

Towards Contextual Spelling Correction for Customization of End-to-end Speech Recognition Systems

no code implementations2 Mar 2022 Xiaoqiang Wang, Yanqing Liu, Jinyu Li, Veljko Miljanic, Sheng Zhao, Hosam Khalil

In this work, we introduce a novel approach to do contextual biasing by adding a contextual spelling correction model on top of the end-to-end ASR system.

Automatic Speech Recognition Spelling Correction

Streaming Multi-Talker ASR with Token-Level Serialized Output Training

no code implementations2 Feb 2022 Naoyuki Kanda, Jian Wu, Yu Wu, Xiong Xiao, Zhong Meng, Xiaofei Wang, Yashesh Gaur, Zhuo Chen, Jinyu Li, Takuya Yoshioka

This paper proposes a token-level serialized output training (t-SOT), a novel framework for streaming multi-talker automatic speech recognition (ASR).

Automatic Speech Recognition

Endpoint Detection for Streaming End-to-End Multi-talker ASR

no code implementations24 Jan 2022 Liang Lu, Jinyu Li, Yifan Gong

Our experimental results based on the 2-speaker LibrispeechMix dataset show that the SURT model can achieve promising EP detection without significantly degradation of the recognition accuracy.

Speech Recognition Speech Separation

Self-Supervised Learning for speech recognition with Intermediate layer supervision

1 code implementation16 Dec 2021 Chengyi Wang, Yu Wu, Sanyuan Chen, Shujie Liu, Jinyu Li, Yao Qian, Zhenglu Yang

Recently, pioneer work finds that speech pre-trained models can solve full-stack speech processing tasks, because the model utilizes bottom layers to learn speaker-related information and top layers to encode content-related information.

Self-Supervised Learning Speech Recognition

Recent Advances in End-to-End Automatic Speech Recognition

no code implementations2 Nov 2021 Jinyu Li

Recently, the speech community is seeing a significant trend of moving from deep neural network based hybrid modeling to end-to-end (E2E) modeling for automatic speech recognition (ASR).

Automatic Speech Recognition

Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction

no code implementations28 Oct 2021 Heming Wang, Yao Qian, Xiaofei Wang, Yiming Wang, Chengyi Wang, Shujie Liu, Takuya Yoshioka, Jinyu Li, DeLiang Wang

The reconstruction module is used for auxiliary learning to improve the noise robustness of the learned representation and thus is not required during inference.

Automatic Speech Recognition Auxiliary Learning +6

Continuous Speech Separation with Recurrent Selective Attention Network

no code implementations28 Oct 2021 Yixuan Zhang, Zhuo Chen, Jian Wu, Takuya Yoshioka, Peidong Wang, Zhong Meng, Jinyu Li

In this paper, we propose to apply recurrent selective attention network (RSAN) to CSS, which generates a variable number of output channels based on active speaker counting.

Speech Recognition Speech Separation

SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing

no code implementations ACL 2022 Junyi Ao, Rui Wang, Long Zhou, Chengyi Wang, Shuo Ren, Yu Wu, Shujie Liu, Tom Ko, Qing Li, Yu Zhang, Zhihua Wei, Yao Qian, Jinyu Li, Furu Wei

Motivated by the success of T5 (Text-To-Text Transfer Transformer) in pre-trained natural language processing models, we propose a unified-modal SpeechT5 framework that explores the encoder-decoder pre-training for self-supervised speech/text representation learning.

Automatic Speech Recognition Quantization +5

Wav2vec-Switch: Contrastive Learning from Original-noisy Speech Pairs for Robust Speech Recognition

no code implementations11 Oct 2021 Yiming Wang, Jinyu Li, Heming Wang, Yao Qian, Chengyi Wang, Yu Wu

In this paper we propose wav2vec-Switch, a method to encode noise robustness into contextualized representations of speech via contrastive learning.

Automatic Speech Recognition Contrastive Learning +4

Have best of both worlds: two-pass hybrid and E2E cascading framework for speech recognition

no code implementations10 Oct 2021 Guoli Ye, Vadim Mazalov, Jinyu Li, Yifan Gong

Hybrid and end-to-end (E2E) systems have their individual advantages, with different error patterns in the speech recognition results.

Speech Recognition

Factorized Neural Transducer for Efficient Language Model Adaptation

no code implementations27 Sep 2021 Xie Chen, Zhong Meng, Sarangarajan Parthasarathy, Jinyu Li

In recent years, end-to-end (E2E) based automatic speech recognition (ASR) systems have achieved great success due to their simplicity and promising performance.

Automatic Speech Recognition

Continuous Streaming Multi-Talker ASR with Dual-path Transducers

no code implementations17 Sep 2021 Desh Raj, Liang Lu, Zhuo Chen, Yashesh Gaur, Jinyu Li

Streaming recognition of multi-talker conversations has so far been evaluated only for 2-speaker single-turn sessions.

Speech Separation

A Light-weight contextual spelling correction model for customizing transducer-based speech recognition systems

no code implementations17 Aug 2021 Xiaoqiang Wang, Yanqing Liu, Sheng Zhao, Jinyu Li

We incorporate the context information into the spelling correction model with a shared context encoder and use a filtering algorithm to handle large-size context lists.

Automatic Speech Recognition Spelling Correction

A Configurable Multilingual Model is All You Need to Recognize All Languages

no code implementations13 Jul 2021 Long Zhou, Jinyu Li, Eric Sun, Shujie Liu

Particularly, a single CMM can be deployed to any user scenario where the users can pre-select any combination of languages.

Automatic Speech Recognition

UniSpeech at scale: An Empirical Study of Pre-training Method on Large-Scale Speech Recognition Dataset

no code implementations12 Jul 2021 Chengyi Wang, Yu Wu, Shujie Liu, Jinyu Li, Yao Qian, Kenichi Kumatani, Furu Wei

Recently, there has been a vast interest in self-supervised learning (SSL) where the model is pre-trained on large scale unlabeled data and then fine-tuned on a small labeled dataset.

Self-Supervised Learning Speech Recognition

Investigation of Practical Aspects of Single Channel Speech Separation for ASR

no code implementations5 Jul 2021 Jian Wu, Zhuo Chen, Sanyuan Chen, Yu Wu, Takuya Yoshioka, Naoyuki Kanda, Shujie Liu, Jinyu Li

Speech separation has been successfully applied as a frontend processing module of conversation transcription systems thanks to its ability to handle overlapped speech and its flexibility to combine with downstream tasks such as automatic speech recognition (ASR).

Automatic Speech Recognition Model Compression +1

Minimum Word Error Rate Training with Language Model Fusion for End-to-End Speech Recognition

no code implementations4 Jun 2021 Zhong Meng, Yu Wu, Naoyuki Kanda, Liang Lu, Xie Chen, Guoli Ye, Eric Sun, Jinyu Li, Yifan Gong

In this work, we perform LM fusion in the minimum WER (MWER) training of an E2E model to obviate the need for LM weights tuning during inference.

Speech Recognition

On Addressing Practical Challenges for RNN-Transducer

no code implementations27 Apr 2021 Rui Zhao, Jian Xue, Jinyu Li, Wenning Wei, Lei He, Yifan Gong

The first challenge is solved with a splicing data method which concatenates the speech segments extracted from the source domain data.

Speech Recognition

Streaming Multi-talker Speech Recognition with Joint Speaker Identification

no code implementations5 Apr 2021 Liang Lu, Naoyuki Kanda, Jinyu Li, Yifan Gong

In multi-talker scenarios such as meetings and conversations, speech processing systems are usually required to transcribe the audio as well as identify the speakers for downstream applications.

Speaker Identification Speech Recognition +1

Internal Language Model Training for Domain-Adaptive End-to-End Speech Recognition

no code implementations2 Feb 2021 Zhong Meng, Naoyuki Kanda, Yashesh Gaur, Sarangarajan Parthasarathy, Eric Sun, Liang Lu, Xie Chen, Jinyu Li, Yifan Gong

The efficacy of external language model (LM) integration with existing end-to-end (E2E) automatic speech recognition (ASR) systems can be improved significantly using the internal language model estimation (ILME) method.

Automatic Speech Recognition

Streaming end-to-end multi-talker speech recognition

no code implementations26 Nov 2020 Liang Lu, Naoyuki Kanda, Jinyu Li, Yifan Gong

End-to-end multi-talker speech recognition is an emerging research trend in the speech community due to its vast potential in applications such as conversation and meeting transcriptions.

Speech Recognition

Internal Language Model Estimation for Domain-Adaptive End-to-End Speech Recognition

no code implementations3 Nov 2020 Zhong Meng, Sarangarajan Parthasarathy, Eric Sun, Yashesh Gaur, Naoyuki Kanda, Liang Lu, Xie Chen, Rui Zhao, Jinyu Li, Yifan Gong

The external language models (LM) integration remains a challenging task for end-to-end (E2E) automatic speech recognition (ASR) which has no clear division between acoustic and language models.

Automatic Speech Recognition

On Minimum Word Error Rate Training of the Hybrid Autoregressive Transducer

no code implementations23 Oct 2020 Liang Lu, Zhong Meng, Naoyuki Kanda, Jinyu Li, Yifan Gong

Hybrid Autoregressive Transducer (HAT) is a recently proposed end-to-end acoustic model that extends the standard Recurrent Neural Network Transducer (RNN-T) for the purpose of the external language model (LM) fusion.

Speech Recognition

Don't shoot butterfly with rifles: Multi-channel Continuous Speech Separation with Early Exit Transformer

1 code implementation23 Oct 2020 Sanyuan Chen, Yu Wu, Zhuo Chen, Takuya Yoshioka, Shujie Liu, Jinyu Li

With its strong modeling capacity that comes from a multi-head and multi-layer structure, Transformer is a very powerful model for learning a sequential representation and has been successfully applied to speech separation recently.

Speech Separation

Developing Real-time Streaming Transformer Transducer for Speech Recognition on Large-scale Dataset

no code implementations22 Oct 2020 Xie Chen, Yu Wu, Zhenghao Wang, Shujie Liu, Jinyu Li

Recently, Transformer based end-to-end models have achieved great success in many areas including speech recognition.

Speech Recognition

Speaker Separation Using Speaker Inventories and Estimated Speech

no code implementations20 Oct 2020 Peidong Wang, Zhuo Chen, DeLiang Wang, Jinyu Li, Yifan Gong

We propose speaker separation using speaker inventories and estimated speech (SSUSIES), a framework leveraging speaker profiles and estimated speech for speaker separation.

Speaker Separation Speech Extraction +1

An End-to-end Architecture of Online Multi-channel Speech Separation

no code implementations7 Sep 2020 Jian Wu, Zhuo Chen, Jinyu Li, Takuya Yoshioka, Zhili Tan, Ed Lin, Yi Luo, Lei Xie

Previously, we introduced a sys-tem, calledunmixing, fixed-beamformerandextraction(UFE), that was shown to be effective in addressing the speech over-lap problem in conversation transcription.

Speech Recognition Speech Separation

Adaptation Algorithms for Neural Network-Based Speech Recognition: An Overview

1 code implementation14 Aug 2020 Peter Bell, Joachim Fainberg, Ondrej Klejch, Jinyu Li, Steve Renals, Pawel Swietojanski

We present a structured overview of adaptation algorithms for neural network-based speech recognition, considering both hybrid hidden Markov model / neural network systems and end-to-end neural network systems, with a focus on speaker adaptation, domain adaptation, and accent adaptation.

Data Augmentation Domain Adaptation +1

Continuous Speech Separation with Conformer

1 code implementation13 Aug 2020 Sanyuan Chen, Yu Wu, Zhuo Chen, Jian Wu, Jinyu Li, Takuya Yoshioka, Chengyi Wang, Shujie Liu, Ming Zhou

Continuous speech separation plays a vital role in complicated speech related tasks such as conversation transcription.

Speech Separation

Transfer Learning Approaches for Streaming End-to-End Speech Recognition System

no code implementations12 Aug 2020 Vikas Joshi, Rui Zhao, Rupesh R. Mehta, Kshitiz Kumar, Jinyu Li

Transfer learning (TL) is widely used in conventional hybrid automatic speech recognition (ASR) system, to transfer the knowledge from source to target language.

Automatic Speech Recognition Transfer Learning

Developing RNN-T Models Surpassing High-Performance Hybrid Models with Customization Capability

no code implementations30 Jul 2020 Jinyu Li, Rui Zhao, Zhong Meng, Yanqing Liu, Wenning Wei, Sarangarajan Parthasarathy, Vadim Mazalov, Zhenghao Wang, Lei He, Sheng Zhao, Yifan Gong

Because of its streaming nature, recurrent neural network transducer (RNN-T) is a very promising end-to-end (E2E) model that may replace the popular hybrid model for automatic speech recognition.

Automatic Speech Recognition

On the Comparison of Popular End-to-End Models for Large Scale Speech Recognition

1 code implementation28 May 2020 Jinyu Li, Yu Wu, Yashesh Gaur, Chengyi Wang, Rui Zhao, Shujie Liu

Among all three E2E models, transformer-AED achieved the best accuracy in both streaming and non-streaming mode.

Automatic Speech Recognition

Exploring Transformers for Large-Scale Speech Recognition

no code implementations19 May 2020 Liang Lu, Changliang Liu, Jinyu Li, Yifan Gong

While recurrent neural networks still largely define state-of-the-art speech recognition systems, the Transformer network has been proven to be a competitive alternative, especially in the offline condition.

Speech Recognition

Exploring Pre-training with Alignments for RNN Transducer based End-to-End Speech Recognition

no code implementations1 May 2020 Hu Hu, Rui Zhao, Jinyu Li, Liang Lu, Yifan Gong

Recently, the recurrent neural network transducer (RNN-T) architecture has become an emerging trend in end-to-end automatic speech recognition research due to its advantages of being capable for online streaming speech recognition.

Automatic Speech Recognition

L-Vector: Neural Label Embedding for Domain Adaptation

no code implementations25 Apr 2020 Zhong Meng, Hu Hu, Jinyu Li, Changliang Liu, Yan Huang, Yifan Gong, Chin-Hui Lee

We propose a novel neural label embedding (NLE) scheme for the domain adaptation of a deep neural network (DNN) acoustic model with unpaired data samples from source and target domains.

Domain Adaptation

Low Latency End-to-End Streaming Speech Recognition with a Scout Network

no code implementations23 Mar 2020 Chengyi Wang, Yu Wu, Shujie Liu, Jinyu Li, Liang Lu, Guoli Ye, Ming Zhou

The attention-based Transformer model has achieved promising results for speech recognition (SR) in the offline mode.

Audio and Speech Processing

High-Accuracy and Low-Latency Speech Recognition with Two-Head Contextual Layer Trajectory LSTM Model

no code implementations17 Mar 2020 Jinyu Li, Rui Zhao, Eric Sun, Jeremy H. M. Wong, Amit Das, Zhong Meng, Yifan Gong

While the community keeps promoting end-to-end models over conventional hybrid models, which usually are long short-term memory (LSTM) models trained with a cross entropy criterion followed by a sequence discriminative training criterion, we argue that such conventional hybrid models can still be significantly improved.

Automatic Speech Recognition

Continuous speech separation: dataset and analysis

1 code implementation30 Jan 2020 Zhuo Chen, Takuya Yoshioka, Liang Lu, Tianyan Zhou, Zhong Meng, Yi Luo, Jian Wu, Xiong Xiao, Jinyu Li

In this paper, we define continuous speech separation (CSS) as a task of generating a set of non-overlapped speech signals from a \textit{continuous} audio stream that contains multiple utterances that are \emph{partially} overlapped by a varying degree.

Automatic Speech Recognition Speech Separation

Character-Aware Attention-Based End-to-End Speech Recognition

no code implementations6 Jan 2020 Zhong Meng, Yashesh Gaur, Jinyu Li, Yifan Gong

However, as one input to the decoder recurrent neural network (RNN), each WSU embedding is learned independently through context and acoustic information in a purely data-driven fashion.

Speech Recognition

Domain Adaptation via Teacher-Student Learning for End-to-End Speech Recognition

no code implementations6 Jan 2020 Zhong Meng, Jinyu Li, Yashesh Gaur, Yifan Gong

In this work, we extend the T/S learning to large-scale unsupervised domain adaptation of an attention-based end-to-end (E2E) model through two levels of knowledge transfer: teacher's token posteriors as soft labels and one-best predictions as decoder guidance.

Speech Recognition Transfer Learning +1

Semantic Mask for Transformer based End-to-End Speech Recognition

1 code implementation6 Dec 2019 Chengyi Wang, Yu Wu, Yujiao Du, Jinyu Li, Shujie Liu, Liang Lu, Shuo Ren, Guoli Ye, Sheng Zhao, Ming Zhou

Attention-based encoder-decoder model has achieved impressive results for both automatic speech recognition (ASR) and text-to-speech (TTS) tasks.

Automatic Speech Recognition

Speaker Adaptation for Attention-Based End-to-End Speech Recognition

no code implementations9 Nov 2019 Zhong Meng, Yashesh Gaur, Jinyu Li, Yifan Gong

We propose three regularization-based speaker adaptation approaches to adapt the attention-based encoder-decoder (AED) model with very limited adaptation data from target speakers for end-to-end automatic speech recognition.

Automatic Speech Recognition Multi-Task Learning

Adversarial Speaker Verification

no code implementations29 Apr 2019 Zhong Meng, Yong Zhao, Jinyu Li, Yifan Gong

The use of deep networks to extract embeddings for speaker recognition has proven successfully.

General Classification Speaker Recognition +1

Adversarial Speaker Adaptation

no code implementations29 Apr 2019 Zhong Meng, Jinyu Li, Yifan Gong

We propose a novel adversarial speaker adaptation (ASA) scheme, in which adversarial learning is applied to regularize the distribution of deep hidden features in a speaker-dependent (SD) deep neural network (DNN) acoustic model to be close to that of a fixed speaker-independent (SI) DNN acoustic model during adaptation.

Automatic Speech Recognition

Conditional Teacher-Student Learning

no code implementations28 Apr 2019 Zhong Meng, Jinyu Li, Yong Zhao, Yifan Gong

To overcome this problem, we propose a conditional T/S learning scheme, in which a "smart" student model selectively chooses to learn from either the teacher model or the ground truth labels conditioned on whether the teacher can correctly predict the ground truth.

Domain Adaptation Model Compression

Attentive Adversarial Learning for Domain-Invariant Training

no code implementations28 Apr 2019 Zhong Meng, Jinyu Li, Yifan Gong

Adversarial domain-invariant training (ADIT) proves to be effective in suppressing the effects of domain variability in acoustic modeling and has led to improved performance in automatic speech recognition (ASR).

Automatic Speech Recognition

Speaker Adaptation for End-to-End CTC Models

no code implementations4 Jan 2019 Ke Li, Jinyu Li, Yong Zhao, Kshitiz Kumar, Yifan Gong

We propose two approaches for speaker adaptation in end-to-end (E2E) automatic speech recognition systems.

Automatic Speech Recognition Multi-Task Learning

Advancing Acoustic-to-Word CTC Model with Attention and Mixed-Units

no code implementations31 Dec 2018 Amit Das, Jinyu Li, Guoli Ye, Rui Zhao, Yifan Gong

In particular, we introduce Attention CTC, Self-Attention CTC, Hybrid CTC, and Mixed-unit CTC.

Language Modelling voice assistant

Adversarial Feature-Mapping for Speech Enhancement

no code implementations6 Sep 2018 Zhong Meng, Jinyu Li, Yifan Gong, Biing-Hwang, Juang

To achieve better performance on ASR task, senone-aware (SA) AFM is further proposed in which an acoustic model network is jointly trained with the feature-mapping and discriminator networks to optimize the senone classification loss in addition to the AFM losses.

Speech Enhancement

Cycle-Consistent Speech Enhancement

no code implementations6 Sep 2018 Zhong Meng, Jinyu Li, Yifan Gong, Biing-Hwang, Juang

In this paper, we propose a cycle-consistent speech enhancement (CSE) in which an additional inverse mapping network is introduced to reconstruct the noisy features from the enhanced ones.

Multi-Task Learning Speech Enhancement

Layer Trajectory LSTM

no code implementations28 Aug 2018 Jinyu Li, Changliang Liu, Yifan Gong

In this paper, we propose a layer trajectory LSTM (ltLSTM) which builds a layer-LSTM using all the layer outputs from a standard multi-layer time-LSTM.

Recent Progresses in Deep Learning based Acoustic Models (Updated)

no code implementations25 Apr 2018 Dong Yu, Jinyu Li

In this paper, we summarize recent progresses made in deep learning based acoustic models and the motivation and insights behind the surveyed techniques.

General Classification Speech Enhancement +1

Developing Far-Field Speaker System Via Teacher-Student Learning

no code implementations14 Apr 2018 Jinyu Li, Rui Zhao, Zhuo Chen, Changliang Liu, Xiong Xiao, Guoli Ye, Yifan Gong

In this study, we develop the keyword spotting (KWS) and acoustic model (AM) components in a far-field speaker system.

Keyword Spotting Model Compression

Adversarial Teacher-Student Learning for Unsupervised Domain Adaptation

no code implementations2 Apr 2018 Zhong Meng, Jinyu Li, Yifan Gong, Biing-Hwang, Juang

In this method, a student acoustic model and a condition classifier are jointly optimized to minimize the Kullback-Leibler divergence between the output distributions of the teacher and student models, and simultaneously, to min-maximize the condition classification loss.

Transfer Learning Unsupervised Domain Adaptation

Speaker-Invariant Training via Adversarial Learning

no code implementations2 Apr 2018 Zhong Meng, Jinyu Li, Zhuo Chen, Yong Zhao, Vadim Mazalov, Yifan Gong, Biing-Hwang, Juang

We propose a novel adversarial multi-task learning scheme, aiming at actively curtailing the inter-talker feature variability while maximizing its senone discriminability so as to enhance the performance of a deep neural network (DNN) based ASR system.

General Classification Multi-Task Learning

Advancing Connectionist Temporal Classification With Attention Modeling

no code implementations15 Mar 2018 Amit Das, Jinyu Li, Rui Zhao, Yifan Gong

In this study, we propose advancing all-neural speech recognition by directly incorporating attention modeling within the Connectionist Temporal Classification (CTC) framework.

Classification General Classification +2

Advancing Acoustic-to-Word CTC Model

no code implementations15 Mar 2018 Jinyu Li, Guoli Ye, Amit Das, Rui Zhao, Yifan Gong

However, the word-based CTC model suffers from the out-of-vocabulary (OOV) issue as it can only model limited number of words in the output layer and maps all the remaining words into an OOV output node.

Language Modelling voice assistant

Acoustic-To-Word Model Without OOV

no code implementations28 Nov 2017 Jinyu Li, Guoli Ye, Rui Zhao, Jasha Droppo, Yifan Gong

However, this type of word-based CTC model suffers from the out-of-vocabulary (OOV) issue as it can only model limited number of words in the output layer and maps all the remaining words into an OOV output node.

voice assistant

Unsupervised Adaptation with Domain Separation Networks for Robust Speech Recognition

no code implementations21 Nov 2017 Zhong Meng, Zhuo Chen, Vadim Mazalov, Jinyu Li, Yifan Gong

Unsupervised domain adaptation of speech signal aims at adapting a well-trained source-domain acoustic model to the unlabeled data from target domain.

Automatic Speech Recognition General Classification +2

Improved training for online end-to-end speech recognition systems

1 code implementation6 Nov 2017 Suyoun Kim, Michael L. Seltzer, Jinyu Li, Rui Zhao

Achieving high accuracy with end-to-end speech recognizers requires careful parameter initialization prior to training.

Speech Recognition

Large-Scale Domain Adaptation via Teacher-Student Learning

no code implementations17 Aug 2017 Jinyu Li, Michael L. Seltzer, Xi Wang, Rui Zhao, Yifan Gong

High accuracy speech recognition requires a large amount of transcribed data for supervised training.

Domain Adaptation Speech Recognition

Progressive Joint Modeling in Unsupervised Single-channel Overlapped Speech Recognition

no code implementations21 Jul 2017 Zhehuai Chen, Jasha Droppo, Jinyu Li, Wayne Xiong

We propose to advance the current state of the art by imposing a modular structure on the neural network, applying a progressive pretraining regimen, and improving the objective function with transfer learning and a discriminative training criterion.

Automatic Speech Recognition Frame +2

Learning Hidden Unit Contributions for Unsupervised Acoustic Model Adaptation

no code implementations12 Jan 2016 Pawel Swietojanski, Jinyu Li, Steve Renals

This work presents a broad study on the adaptation of neural network acoustic models by means of learning hidden unit contributions (LHUC) -- a method that linearly re-combines hidden units in a speaker- or environment-dependent manner using small amounts of unsupervised adaptation data.

Speech Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.