1 code implementation • EMNLP (NLP-COVID19) 2020 • Adam Poliak, Max Fleming, Cash Costello, Kenton Murray, Mahsa Yarmohammadi, Shivani Pandya, Darius Irani, Milind Agarwal, Udit Sharma, Shuo Sun, Nicola Ivanov, Lingxi Shang, Kaushik Srinivasan, Seolhwa Lee, Xu Han, Smisha Agarwal, João Sedoc
We release a dataset of over 2, 100 COVID19 related Frequently asked Question-Answer pairs scraped from over 40 trusted websites.
no code implementations • 14 Jul 2024 • Ruizhe Huang, Mahsa Yarmohammadi, Sanjeev Khudanpur, Daniel Povey
Existing research suggests that automatic speech recognition (ASR) models can benefit from additional contexts (e. g., contact lists, user specified vocabulary).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
1 code implementation • 29 Jan 2024 • William Gantt, Shabnam Behzad, Hannah Youngeun An, Yunmo Chen, Aaron Steven White, Benjamin Van Durme, Mahsa Yarmohammadi
We introduce MultiMUC, the first multilingual parallel corpus for template filling, comprising translations of the classic MUC-4 template filling benchmark into five languages: Arabic, Chinese, Farsi, Korean, and Russian.
no code implementations • 13 Jul 2023 • Samuel Barham, Orion Weller, Michelle Yuan, Kenton Murray, Mahsa Yarmohammadi, Zhengping Jiang, Siddharth Vashishtha, Alexander Martin, Anqi Liu, Aaron Steven White, Jordan Boyd-Graber, Benjamin Van Durme
To foster the development of new models for collaborative AI-assisted report generation, we introduce MegaWika, consisting of 13 million Wikipedia articles in 50 diverse languages, along with their 71 million referenced source materials.
1 code implementation • 2 Aug 2022 • Boyuan Zheng, Patrick Xia, Mahsa Yarmohammadi, Benjamin Van Durme
Existing multiparty dialogue datasets for entity coreference resolution are nascent, and many challenges are still unaddressed.
2 code implementations • EMNLP 2021 • Mahsa Yarmohammadi, Shijie Wu, Marc Marone, Haoran Xu, Seth Ebner, Guanghui Qin, Yunmo Chen, Jialiang Guo, Craig Harman, Kenton Murray, Aaron Steven White, Mark Dredze, Benjamin Van Durme
Zero-shot cross-lingual information extraction (IE) describes the construction of an IE model for some target language, given existing annotations exclusively in some other language, typically English.
2 code implementations • EACL (AdaptNLP) 2021 • Haoran Xu, Seth Ebner, Mahsa Yarmohammadi, Aaron Steven White, Benjamin Van Durme, Kenton Murray
Fine-tuning is known to improve NLP models by adapting an initial model trained on more plentiful but less domain-salient examples to data in a target domain.
no code implementations • EMNLP (spnlp) 2020 • Abhinav Singh, Patrick Xia, Guanghui Qin, Mahsa Yarmohammadi, Benjamin Van Durme
Copy mechanisms are employed in sequence to sequence models (seq2seq) to generate reproductions of words from the input to the output.
1 code implementation • Interspeech 2018 2018 • Daniel Povey, Gaofeng Cheng, Yiming Wang, Ke Li, Hainan Xu, Mahsa Yarmohammadi, Sanjeev Khudanpur
Time Delay Neural Networks (TDNNs), also known as onedimensional Convolutional Neural Networks (1-d CNNs), are an efficient and well-performing neural network architecture for speech recognition.