Search Results for author: Yangyang Shi

Found 18 papers, 2 papers with code

Transferring Voice Knowledge for Acoustic Event Detection: An Empirical Study

no code implementations7 Oct 2021 Dawei Liang, Yangyang Shi, Yun Wang, Nayan Singhal, Alex Xiao, Jonathan Shaw, Edison Thomaz, Ozlem Kalinli, Mike Seltzer

Detection of common events and scenes from audio is useful for extracting and understanding human contexts in daily life.

Event Detection

Collaborative Training of Acoustic Encoders for Speech Recognition

no code implementations16 Jun 2021 Varun Nagaraja, Yangyang Shi, Ganesh Venkatesh, Ozlem Kalinli, Michael L. Seltzer, Vikas Chandra

On-device speech recognition requires training models of different sizes for deploying on devices with various computational budgets.

Speech Recognition

Dissecting User-Perceived Latency of On-Device E2E Speech Recognition

no code implementations6 Apr 2021 Yuan Shangguan, Rohit Prabhavalkar, Hang Su, Jay Mahadeokar, Yangyang Shi, Jiatong Zhou, Chunyang Wu, Duc Le, Ozlem Kalinli, Christian Fuegen, Michael L. Seltzer

As speech-enabled devices such as smartphones and smart speakers become increasingly ubiquitous, there is growing interest in building automatic speech recognition (ASR) systems that can run directly on-device; end-to-end (E2E) speech recognition models such as recurrent neural network transducers and their variants have recently emerged as prime candidates for this task.

automatic-speech-recognition Speech Recognition

Dynamic Encoder Transducer: A Flexible Solution For Trading Off Accuracy For Latency

no code implementations5 Apr 2021 Yangyang Shi, Varun Nagaraja, Chunyang Wu, Jay Mahadeokar, Duc Le, Rohit Prabhavalkar, Alex Xiao, Ching-Feng Yeh, Julian Chan, Christian Fuegen, Ozlem Kalinli, Michael L. Seltzer

DET gets similar accuracy as a baseline model with better latency on a large in-house data set by assigning a lightweight encoder for the beginning part of one utterance and a full-size encoder for the rest.

Speech Recognition

Streaming Attention-Based Models with Augmented Memory for End-to-End Speech Recognition

no code implementations3 Nov 2020 Ching-Feng Yeh, Yongqiang Wang, Yangyang Shi, Chunyang Wu, Frank Zhang, Julian Chan, Michael L. Seltzer

Attention-based models have been gaining popularity recently for their strong performance demonstrated in fields such as machine translation and automatic speech recognition.

automatic-speech-recognition End-To-End Speech Recognition +3

Emformer: Efficient Memory Transformer Based Acoustic Model For Low Latency Streaming Speech Recognition

no code implementations21 Oct 2020 Yangyang Shi, Yongqiang Wang, Chunyang Wu, Ching-Feng Yeh, Julian Chan, Frank Zhang, Duc Le, Mike Seltzer

For a low latency scenario with an average latency of 80 ms, Emformer achieves WER $3. 01\%$ on test-clean and $7. 09\%$ on test-other.

Speech Recognition

Weak-Attention Suppression For Transformer Based Speech Recognition

no code implementations18 May 2020 Yangyang Shi, Yongqiang Wang, Chunyang Wu, Christian Fuegen, Frank Zhang, Duc Le, Ching-Feng Yeh, Michael L. Seltzer

Transformers, originally proposed for natural language processing (NLP) tasks, have recently achieved great success in automatic speech recognition (ASR).

automatic-speech-recognition Speech Recognition

Knowledge Distillation For Recurrent Neural Network Language Modeling With Trust Regularization

no code implementations8 Apr 2019 Yangyang Shi, Mei-Yuh Hwang, Xin Lei, Haoyu Sheng

Using knowledge distillation with trust regularization, we reduce the parameter size to a third of that of the previously published best model while maintaining the state-of-the-art perplexity result on Penn Treebank data.

Knowledge Distillation Language Modelling +1

Cannot find the paper you are looking for? You can Submit a new open access paper.