no code implementations • 9 Jul 2021 • Dilin Wang, Yuan Shangguan, Haichuan Yang, Pierce Chuang, Jiatong Zhou, Meng Li, Ganesh Venkatesh, Ozlem Kalinli, Vikas Chandra
We apply noisy training to improve both dense and sparse state-of-the-art Emformer models and observe consistent WER reduction.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 6 Apr 2021 • Yuan Shangguan, Rohit Prabhavalkar, Hang Su, Jay Mahadeokar, Yangyang Shi, Jiatong Zhou, Chunyang Wu, Duc Le, Ozlem Kalinli, Christian Fuegen, Michael L. Seltzer
As speech-enabled devices such as smartphones and smart speakers become increasingly ubiquitous, there is growing interest in building automatic speech recognition (ASR) systems that can run directly on-device; end-to-end (E2E) speech recognition models such as recurrent neural network transducers and their variants have recently emerged as prime candidates for this task.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 11 Feb 2021 • Leda Sari, Kritika Singh, Jiatong Zhou, Lorenzo Torresani, Nayan Singhal, Yatharth Saraf
Although speaker verification has conventionally been an audio-only task, some practical applications provide both audio and visual streams of input.
no code implementations • CVPR 2018 • Haoqi Fan, Jiatong Zhou
Attention has shown to be a pivotal development in deep learning and has been used for a multitude of multimodal learning tasks such as visual question answering and image captioning.