1 code implementation • 5 Nov 2023 • Jingru Yi, Burak Uzkent, Oana Ignat, Zili Li, Amanmeet Garg, Xiang Yu, Linda Liu
While we demonstrate our data augmentation method with MDETR framework, the proposed approach is applicable to common grounding-based vision and language tasks with other frameworks.
no code implementations • CVPR 2023 • Jue Wang, Wentao Zhu, Pichao Wang, Xiang Yu, Linda Liu, Mohamed Omar, Raffay Hamid
To address this limitation, we present a novel Selective S4 (i. e., S5) model that employs a lightweight mask generator to adaptively select informative image tokens resulting in more efficient and accurate modeling of long-term spatiotemporal dependencies in videos.
Ranked #2 on Video Classification on Breakfast
no code implementations • Submitted to ICLR 2022 • Wentao Zhu, Jingru Yi, Kevin Hsu, Xiaohang Sun, Xiang Hao, Linda Liu, Mohamed Omar
AVT uses a combination of video and audio signals to improve action recognition accuracy, leveraging the effective spatio-temporal representation by the video Transformer.
Ranked #4 on Multi-modal Classification on VGG-Sound
no code implementations • Submitted to ICLR 2022 • Wentao Zhu, Jingru Yi, Xiaohang Sun, Xiang Hao, Linda Liu, Mohamed Omar
In this work, we develop a multiscale multimodal Transformer (MMT) that employs hierarchical representation learning.
Ranked #1 on Multi-modal Classification on VGG-Sound
no code implementations • 17 Feb 2022 • Chao-Han Huck Yang, Zeeshan Ahmed, Yile Gu, Joseph Szurley, Roger Ren, Linda Liu, Andreas Stolcke, Ivan Bulyko
In this work, we aim to enhance the system robustness of end-to-end automatic speech recognition (ASR) against adversarially-noisy speech examples.
no code implementations • 15 Feb 2021 • Aditya Gourav, Linda Liu, Ankur Gandhe, Yile Gu, Guitang Lan, Xiangyang Huang, Shashank Kalmane, Gautam Tiwari, Denis Filimonov, Ariya Rastrow, Andreas Stolcke, Ivan Bulyko
We also describe a novel second-pass de-biasing approach: used in conjunction with a first-pass shallow fusion that optimizes on oracle WER, we can achieve an additional 14% improvement on personalized content recognition, and even improve accuracy for the general use case by up to 2. 5%.
no code implementations • 5 Jan 2021 • Linda Liu, Yile Gu, Aditya Gourav, Ankur Gandhe, Shashank Kalmane, Denis Filimonov, Ariya Rastrow, Ivan Bulyko
As voice assistants become more ubiquitous, they are increasingly expected to support and perform well on a wide variety of use-cases across different domains.
no code implementations • 30 Nov 2020 • Vijay Ravi, Yile Gu, Ankur Gandhe, Ariya Rastrow, Linda Liu, Denis Filimonov, Scott Novotney, Ivan Bulyko
We show that this simple method can improve performance on rare words by 3. 7% WER relative without degradation on general test set, and the improvement from USF is additive to any additional language model based rescoring.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 23 Nov 2020 • Chao-Han Huck Yang, Linda Liu, Ankur Gandhe, Yile Gu, Anirudh Raju, Denis Filimonov, Ivan Bulyko
We show that our rescoring model trained with these additional tasks outperforms the baseline rescoring model, trained with only the language modeling task, by 1. 4% on a general test and by 2. 6% on a rare word test set in terms of word-error-rate relative (WERR).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 26 Jun 2018 • Anirudh Raju, Behnam Hedayatnia, Linda Liu, Ankur Gandhe, Chandra Khatri, Angeliki Metallinou, Anu Venkatesh, Ariya Rastrow
Statistical language models (LM) play a key role in Automatic Speech Recognition (ASR) systems used by conversational agents.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2