no code implementations • 18 Nov 2021 • Chunxi Liu, Michael Picheny, Leda Sari, Pooja Chitkara, Alex Xiao, Xiaohui Zhang, Mark Chou, Andres Alvarado, Caner Hazirbas, Yatharth Saraf
This paper presents initial Speech Recognition results on "Casual Conversations" -- a publicly released 846 hour corpus designed to help researchers evaluate their computer vision and audio models for accuracy across a diverse set of metadata, including age, gender, and skin tone.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 10 Nov 2021 • Alex Xiao, Weiyi Zheng, Gil Keren, Duc Le, Frank Zhang, Christian Fuegen, Ozlem Kalinli, Yatharth Saraf, Abdelrahman Mohamed
With 4. 5 million hours of English speech from 10 different sources across 120 countries and models of up to 10 billion parameters, we explore the frontiers of scale for automatic speech recognition.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 7 Oct 2021 • Yangyang Shi, Chunyang Wu, Dilin Wang, Alex Xiao, Jay Mahadeokar, Xiaohui Zhang, Chunxi Liu, Ke Li, Yuan Shangguan, Varun Nagaraja, Ozlem Kalinli, Mike Seltzer
This paper improves the streaming transformer transducer for speech recognition by using non-causal convolution.
no code implementations • 7 Oct 2021 • Dawei Liang, Yangyang Shi, Yun Wang, Nayan Singhal, Alex Xiao, Jonathan Shaw, Edison Thomaz, Ozlem Kalinli, Mike Seltzer
Detection of common events and scenes from audio is useful for extracting and understanding human contexts in daily life.
no code implementations • 6 Apr 2021 • Jay Mahadeokar, Yangyang Shi, Yuan Shangguan, Chunyang Wu, Alex Xiao, Hang Su, Duc Le, Ozlem Kalinli, Christian Fuegen, Michael L. Seltzer
In order to achieve flexible and better accuracy and latency trade-offs, the following techniques are used.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 5 Apr 2021 • Yangyang Shi, Varun Nagaraja, Chunyang Wu, Jay Mahadeokar, Duc Le, Rohit Prabhavalkar, Alex Xiao, Ching-Feng Yeh, Julian Chan, Christian Fuegen, Ozlem Kalinli, Michael L. Seltzer
DET gets similar accuracy as a baseline model with better latency on a large in-house data set by assigning a lightweight encoder for the beginning part of one utterance and a full-size encoder for the rest.
no code implementations • 9 Mar 2021 • Alex Xiao, Christian Fuegen, Abdelrahman Mohamed
Pseudo-labeling is the most adopted method for pre-training automatic speech recognition (ASR) models.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 27 Oct 2020 • Yongqiang Wang, Yangyang Shi, Frank Zhang, Chunyang Wu, Julian Chan, Ching-Feng Yeh, Alex Xiao
We compare the transformer based acoustic models with their LSTM counterparts on industrial scale tasks.
no code implementations • 16 May 2020 • Kritika Singh, Vimal Manohar, Alex Xiao, Sergey Edunov, Ross Girshick, Vitaliy Liptchinsky, Christian Fuegen, Yatharth Saraf, Geoffrey Zweig, Abdel-rahman Mohamed
Many semi- and weakly-supervised approaches have been investigated for overcoming the labeling cost of building high quality speech recognition systems.
no code implementations • 22 Oct 2019 • Yongqiang Wang, Abdel-rahman Mohamed, Duc Le, Chunxi Liu, Alex Xiao, Jay Mahadeokar, Hongzhao Huang, Andros Tjandra, Xiaohui Zhang, Frank Zhang, Christian Fuegen, Geoffrey Zweig, Michael L. Seltzer
We propose and evaluate transformer-based acoustic models (AMs) for hybrid speech recognition.
Ranked #28 on
Speech Recognition
on LibriSpeech test-other
(using extra training data)