1 code implementation • ECCV 2018 • Chao-yuan Wu, Nayan Singhal, Philipp Krähenbühl
An ever increasing amount of our digital communication, media consumption, and content creation revolves around videos.
no code implementations • 11 Feb 2021 • Leda Sari, Kritika Singh, Jiatong Zhou, Lorenzo Torresani, Nayan Singhal, Yatharth Saraf
Although speaker verification has conventionally been an audio-only task, some practical applications provide both audio and visual streams of input.
no code implementations • 9 Jul 2021 • Xiaohui Zhang, Vimal Manohar, David Zhang, Frank Zhang, Yangyang Shi, Nayan Singhal, Julian Chan, Fuchun Peng, Yatharth Saraf, Mike Seltzer
Hybrid automatic speech recognition (ASR) models are typically sequentially trained with CTC or LF-MMI criteria.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 7 Oct 2021 • Dawei Liang, Yangyang Shi, Yun Wang, Nayan Singhal, Alex Xiao, Jonathan Shaw, Edison Thomaz, Ozlem Kalinli, Mike Seltzer
Detection of common events and scenes from audio is useful for extracting and understanding human contexts in daily life.
no code implementations • 10 Nov 2022 • Andros Tjandra, Nayan Singhal, David Zhang, Ozlem Kalinli, Abdelrahman Mohamed, Duc Le, Michael L. Seltzer
Later, we use our optimal tokenization strategy to train multiple embedding and output model to further improve our result.