Search Results for author: Guoli Ye

Found 14 papers, 2 papers with code

Efficient Long-Form Speech Recognition for General Speech In-Context Learning

no code implementations29 Sep 2024 Hao Yen, Shaoshi Ling, Guoli Ye

We propose a novel approach to end-to-end automatic speech recognition (ASR) to achieve efficient speech in-context learning (SICL) for (i) long-form speech decoding, (ii) test-time speaker adaptation, and (iii) test-time contextual biasing.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Hybrid Attention-based Encoder-decoder Model for Efficient Language Model Adaptation

no code implementations14 Sep 2023 Shaoshi Ling, Guoli Ye, Rui Zhao, Yifan Gong

The attention-based encoder-decoder (AED) speech recognition model has been widely successful in recent years.

Automatic Speech Recognition Decoder +3

Adapting Large Language Model with Speech for Fully Formatted End-to-End Speech Recognition

1 code implementation17 Jul 2023 Shaoshi Ling, Yuxuan Hu, Shuangbei Qian, Guoli Ye, Yao Qian, Yifan Gong, Ed Lin, Michael Zeng

Most end-to-end (E2E) speech recognition models are composed of encoder and decoder blocks that perform acoustic and language modeling functions.

Decoder Language Modelling +3

Acoustic-aware Non-autoregressive Spell Correction with Mask Sample Decoding

no code implementations16 Oct 2022 Ruchao Fan, Guoli Ye, Yashesh Gaur, Jinyu Li

As a result, we reduce the WER of a streaming TT from 7. 6% to 6. 5% on the Librispeech test-other data and the CER from 7. 3% to 6. 1% on the Aishell test data, respectively.

Language Modelling speech-recognition +1

Have best of both worlds: two-pass hybrid and E2E cascading framework for speech recognition

no code implementations10 Oct 2021 Guoli Ye, Vadim Mazalov, Jinyu Li, Yifan Gong

Hybrid and end-to-end (E2E) systems have their individual advantages, with different error patterns in the speech recognition results.

speech-recognition Speech Recognition

Minimum Word Error Rate Training with Language Model Fusion for End-to-End Speech Recognition

no code implementations4 Jun 2021 Zhong Meng, Yu Wu, Naoyuki Kanda, Liang Lu, Xie Chen, Guoli Ye, Eric Sun, Jinyu Li, Yifan Gong

In this work, we perform LM fusion in the minimum WER (MWER) training of an E2E model to obviate the need for LM weights tuning during inference.

Language Modelling speech-recognition +1

End-to-End Speaker-Attributed ASR with Transformer

no code implementations5 Apr 2021 Naoyuki Kanda, Guoli Ye, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka

This paper presents our recent effort on end-to-end speaker-attributed automatic speech recognition, which jointly performs speaker counting, speech recognition and speaker identification for monaural multi-talker audio.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Large-Scale Pre-Training of End-to-End Multi-Talker ASR for Meeting Transcription with Single Distant Microphone

no code implementations31 Mar 2021 Naoyuki Kanda, Guoli Ye, Yu Wu, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka

Transcribing meetings containing overlapped speech with only a single distant microphone (SDM) has been one of the most challenging problems for automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Low Latency End-to-End Streaming Speech Recognition with a Scout Network

no code implementations23 Mar 2020 Chengyi Wang, Yu Wu, Shujie Liu, Jinyu Li, Liang Lu, Guoli Ye, Ming Zhou

The attention-based Transformer model has achieved promising results for speech recognition (SR) in the offline mode.

Audio and Speech Processing

Semantic Mask for Transformer based End-to-End Speech Recognition

1 code implementation6 Dec 2019 Chengyi Wang, Yu Wu, Yujiao Du, Jinyu Li, Shujie Liu, Liang Lu, Shuo Ren, Guoli Ye, Sheng Zhao, Ming Zhou

Attention-based encoder-decoder model has achieved impressive results for both automatic speech recognition (ASR) and text-to-speech (TTS) tasks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Advancing Acoustic-to-Word CTC Model with Attention and Mixed-Units

no code implementations31 Dec 2018 Amit Das, Jinyu Li, Guoli Ye, Rui Zhao, Yifan Gong

In particular, we introduce Attention CTC, Self-Attention CTC, Hybrid CTC, and Mixed-unit CTC.

Decoder Language Modelling

Developing Far-Field Speaker System Via Teacher-Student Learning

no code implementations14 Apr 2018 Jinyu Li, Rui Zhao, Zhuo Chen, Changliang Liu, Xiong Xiao, Guoli Ye, Yifan Gong

In this study, we develop the keyword spotting (KWS) and acoustic model (AM) components in a far-field speaker system.

Keyword Spotting Model Compression

Advancing Acoustic-to-Word CTC Model

no code implementations15 Mar 2018 Jinyu Li, Guoli Ye, Amit Das, Rui Zhao, Yifan Gong

However, the word-based CTC model suffers from the out-of-vocabulary (OOV) issue as it can only model limited number of words in the output layer and maps all the remaining words into an OOV output node.

Decoder Language Modelling

Acoustic-To-Word Model Without OOV

no code implementations28 Nov 2017 Jinyu Li, Guoli Ye, Rui Zhao, Jasha Droppo, Yifan Gong

However, this type of word-based CTC model suffers from the out-of-vocabulary (OOV) issue as it can only model limited number of words in the output layer and maps all the remaining words into an OOV output node.

Cannot find the paper you are looking for? You can Submit a new open access paper.