Search Results for author: Liyong Guo

Found 14 papers, 10 papers with code

CR-CTC: Consistency regularization on CTC for improved speech recognition

1 code implementation7 Oct 2024 Zengwei Yao, Wei Kang, Xiaoyu Yang, Fangjun Kuang, Liyong Guo, Han Zhu, Zengrui Jin, Zhaoqing Li, Long Lin, Daniel Povey

Connectionist Temporal Classification (CTC) is a widely used method for automatic speech recognition (ASR), renowned for its simplicity and computational efficiency.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context

2 code implementations15 Sep 2023 Wei Kang, Xiaoyu Yang, Zengwei Yao, Fangjun Kuang, Yifan Yang, Liyong Guo, Long Lin, Daniel Povey

In this paper, we introduce Libriheavy, a large-scale ASR corpus consisting of 50, 000 hours of read English speech derived from LibriVox.

PromptASR for contextualized ASR with controllable style

2 code implementations14 Sep 2023 Xiaoyu Yang, Wei Kang, Zengwei Yao, Yifan Yang, Liyong Guo, Fangjun Kuang, Long Lin, Daniel Povey

An additional style prompt can be given to the text encoder and guide the ASR system to output different styles of transcriptions.

Automatic Speech Recognition speech-recognition +1

Blank-regularized CTC for Frame Skipping in Neural Transducer

1 code implementation19 May 2023 Yifan Yang, Xiaoyu Yang, Liyong Guo, Zengwei Yao, Wei Kang, Fangjun Kuang, Long Lin, Xie Chen, Daniel Povey

Neural Transducer and connectionist temporal classification (CTC) are popular end-to-end automatic speech recognition systems.

Automatic Speech Recognition speech-recognition +1

Exploring Representation Learning for Small-Footprint Keyword Spotting

no code implementations20 Mar 2023 Fan Cui, Liyong Guo, Quandong Wang, Peng Gao, Yujun Wang

To address those challenges, we explore representation learning for KWS by self-supervised contrastive learning and self-training with pretrained model.

Contrastive Learning Representation Learning +1

Fast and parallel decoding for transducer

1 code implementation31 Oct 2022 Wei Kang, Liyong Guo, Fangjun Kuang, Long Lin, Mingshuang Luo, Zengwei Yao, Xiaoyu Yang, Piotr Żelasko, Daniel Povey

In this work, we introduce a constrained version of transducer loss to learn strictly monotonic alignments between the sequences; we also improve the standard greedy search and beam search algorithms by limiting the number of symbols that can be emitted per time step in transducer decoding, making it more efficient to decode in parallel with batches.

speech-recognition Speech Recognition

Delay-penalized transducer for low-latency streaming ASR

1 code implementation31 Oct 2022 Wei Kang, Zengwei Yao, Fangjun Kuang, Liyong Guo, Xiaoyu Yang, Long Lin, Piotr Żelasko, Daniel Povey

In streaming automatic speech recognition (ASR), it is desirable to reduce latency as much as possible while having minimum impact on recognition accuracy.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Pruned RNN-T for fast, memory-efficient ASR training

no code implementations23 Jun 2022 Fangjun Kuang, Liyong Guo, Wei Kang, Long Lin, Mingshuang Luo, Zengwei Yao, Daniel Povey

The RNN-Transducer (RNN-T) framework for speech recognition has been growing in popularity, particularly for deployed real-time ASR systems, because it combines high accuracy with naturally streaming recognition.

Decoder speech-recognition +1

Unified Streaming and Non-streaming Two-pass End-to-end Model for Speech Recognition

5 code implementations10 Dec 2020 BinBin Zhang, Di wu, Zhuoyuan Yao, Xiong Wang, Fan Yu, Chao Yang, Liyong Guo, Yaguang Hu, Lei Xie, Xin Lei

In this paper, we present a novel two-pass approach to unify streaming and non-streaming end-to-end (E2E) speech recognition in a single model.

Decoder Sentence +2

Cannot find the paper you are looking for? You can Submit a new open access paper.