no code implementations • 14 Jun 2024 • Siddhant Arora, Ankita Pasad, Chung-Ming Chien, Jionghao Han, Roshan Sharma, Jee-weon Jung, Hira Dhamyal, William Chen, Suwon Shon, Hung-Yi Lee, Karen Livescu, Shinji Watanabe
To answer this, we perform an extensive evaluation of multiple supervised and self-supervised SFMs using several evaluation protocols: (i) frozen SFMs with a lightweight prediction head, (ii) frozen SFMs with a complex prediction head, and (iii) fine-tuned SFMs with a lightweight prediction head.
1 code implementation • 12 Jun 2024 • Kwanghee Choi, Ankita Pasad, Tomohiko Nakamura, Satoru Fukayama, Karen Livescu, Shinji Watanabe
Self-supervised speech models (S3Ms) have become an effective backbone for speech applications.
1 code implementation • 30 Jun 2023 • Ankita Pasad, Chung-Ming Chien, Shane Settle, Karen Livescu
Many self-supervised speech models (S3Ms) have been introduced over the last few years, improving performance and data efficiency on various speech tasks.
no code implementations • 20 Dec 2022 • Suwon Shon, Siddhant Arora, Chyi-Jiunn Lin, Ankita Pasad, Felix Wu, Roshan Sharma, Wei-Lun Wu, Hung-Yi Lee, Karen Livescu, Shinji Watanabe
In this work, we introduce several new annotated SLU benchmark tasks based on freely available speech data, which complement existing benchmarks and address gaps in the SLU evaluation landscape.
1 code implementation • 8 Nov 2022 • Ankita Pasad, Bowen Shi, Karen Livescu
We further investigate the utility of our analyses for downstream tasks by comparing the property trends with performance on speech recognition and spoken language understanding tasks.
no code implementations • 18 Oct 2022 • Shuo Xie, Jiahao Qiu, Ankita Pasad, Li Du, Qing Qu, Hongyuan Mei
We propose to select layers based on the variability of their hidden states given a task-specific corpus.
1 code implementation • NAACL 2022 • Ankita Pasad, Felix Wu, Suwon Shon, Karen Livescu, Kyu J. Han
In this work we focus on low-resource spoken named entity recognition (NER) and address the question: Beyond self-supervised pre-training, how can we use external speech and/or text data that are not annotated for the task?
1 code implementation • 19 Nov 2021 • Suwon Shon, Ankita Pasad, Felix Wu, Pablo Brusco, Yoav Artzi, Karen Livescu, Kyu J. Han
Historically these have focused on automatic speech recognition (ASR), speaker identification, or other lower-level tasks.
Ranked #1 on Named Entity Recognition (NER) on SLUE
Automatic Speech Recognition Automatic Speech Recognition (ASR) +7
1 code implementation • 10 Jul 2021 • Ankita Pasad, Ju-chieh Chou, Karen Livescu
Recently proposed self-supervised learning approaches have been successful for pre-training speech representation models.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • CVPR 2021 • Yao Lu, Sören Pirk, Jan Dlabal, Anthony Brohan, Ankita Pasad, Zhao Chen, Vincent Casser, Anelia Angelova, Ariel Gordon
Many computer vision tasks address the problem of scene understanding and are naturally interrelated e. g. object classification, detection, scene segmentation, depth estimation, etc.
no code implementations • 11 Apr 2020 • Ankita Pasad, Ariel Gordon, Tsung-Yi Lin, Anelia Angelova
We leverage unsupervised learning of depth, egomotion, and camera intrinsics to improve the performance of single-image semantic segmentation, by enforcing 3D-geometric and temporal consistency of segmentation masks across video frames.
no code implementations • 24 Apr 2019 • Ankita Pasad, Bowen Shi, Herman Kamper, Karen Livescu
Recent work has shown that speech paired with images can be used to learn semantically meaningful speech representations even without any textual supervision.