no code implementations • 22 May 2023 • Elizabeth Clark, Shruti Rijhwani, Sebastian Gehrmann, Joshua Maynez, Roee Aharoni, Vitaly Nikolaev, Thibault Sellam, Aditya Siddhant, Dipanjan Das, Ankur P. Parikh
In this work, we introduce SEAHORSE, a dataset for multilingual, multifaceted summarization evaluation.
1 code implementation • 19 May 2023 • Sebastian Ruder, Jonathan H. Clark, Alexander Gutkin, Mihir Kale, Min Ma, Massimo Nicosia, Shruti Rijhwani, Parker Riley, Jean-Michel A. Sarr, Xinyi Wang, John Wieting, Nitish Gupta, Anna Katanova, Christo Kirov, Dana L. Dickinson, Brian Roark, Bidisha Samanta, Connie Tao, David I. Adelani, Vera Axelrod, Isaac Caswell, Colin Cherry, Dan Garrette, Reeve Ingle, Melvin Johnson, Dmitry Panteleev, Partha Talukdar
We evaluate commonly used models on the benchmark.
no code implementations • 26 Feb 2023 • Shruti Rijhwani, Daisy Rosenblum, Michayla King, Antonios Anastasopoulos, Graham Neubig
There has been recent interest in improving optical character recognition (OCR) for endangered languages, particularly because a large number of documents and books in these languages are not in machine-readable formats.
Optical Character Recognition
Optical Character Recognition (OCR)
1 code implementation • 22 Oct 2022 • David Ifeoluwa Adelani, Graham Neubig, Sebastian Ruder, Shruti Rijhwani, Michael Beukman, Chester Palen-Michel, Constantine Lignos, Jesujoba O. Alabi, Shamsuddeen H. Muhammad, Peter Nabende, Cheikh M. Bamba Dione, Andiswa Bukula, Rooweither Mabuya, Bonaventure F. P. Dossou, Blessing Sibanda, Happy Buzaaba, Jonathan Mukiibi, Godson Kalipe, Derguene Mbaye, Amelia Taylor, Fatoumata Kabore, Chris Chinenye Emezue, Anuoluwapo Aremu, Perez Ogayo, Catherine Gitau, Edwin Munkoh-Buabeng, Victoire M. Koagne, Allahsera Auguste Tapo, Tebogo Macucwa, Vukosi Marivate, Elvis Mboning, Tajuddeen Gwadabe, Tosin Adewumi, Orevaoghene Ahia, Joyce Nakatumba-Nabende, Neo L. Mokono, Ignatius Ezeani, Chiamaka Chukwuneke, Mofetoluwa Adeyemi, Gilles Q. Hacheme, Idris Abdulmumin, Odunayo Ogundepo, Oreen Yousuf, Tatiana Moteu Ngoli, Dietrich Klakow
African languages are spoken by over a billion people, but are underrepresented in NLP research and development.
1 code implementation • 4 Nov 2021 • Shruti Rijhwani, Daisy Rosenblum, Antonios Anastasopoulos, Graham Neubig
In addition, to enforce consistency in the recognized vocabulary, we introduce a lexically-aware decoding method that augments the neural post-correction model with a count-based language model constructed from the recognized texts, implemented using weighted finite-state automata (WFSA) for efficient and effective decoding.
1 code implementation • CoNLL (EMNLP) 2021 • Ruisi Su, Shruti Rijhwani, Hao Zhu, Junxian He, Xinyu Wang, Yonatan Bisk, Graham Neubig
Our experiments find that concreteness is a strong indicator for learning dependency grammars, improving the direct attachment score (DAS) by over 50\% as compared to state-of-the-art models trained on pure text.
1 code implementation • EMNLP 2021 • Adithya Pratapa, Antonios Anastasopoulos, Shruti Rijhwani, Aditi Chaudhary, David R. Mortensen, Graham Neubig, Yulia Tsvetkov
Text generation systems are ubiquitous in natural language processing applications.
2 code implementations • 22 Mar 2021 • David Ifeoluwa Adelani, Jade Abbott, Graham Neubig, Daniel D'souza, Julia Kreutzer, Constantine Lignos, Chester Palen-Michel, Happy Buzaaba, Shruti Rijhwani, Sebastian Ruder, Stephen Mayhew, Israel Abebe Azime, Shamsuddeen Muhammad, Chris Chinenye Emezue, Joyce Nakatumba-Nabende, Perez Ogayo, Anuoluwapo Aremu, Catherine Gitau, Derguene Mbaye, Jesujoba Alabi, Seid Muhie Yimam, Tajuddeen Gwadabe, Ignatius Ezeani, Rubungo Andre Niyongabo, Jonathan Mukiibi, Verrah Otiende, Iroro Orife, Davis David, Samba Ngom, Tosin Adewumi, Paul Rayson, Mofetoluwa Adeyemi, Gerald Muriuki, Emmanuel Anebi, Chiamaka Chukwuneke, Nkiruka Odu, Eric Peter Wairagala, Samuel Oyerinde, Clemencia Siro, Tobius Saul Bateesa, Temilola Oloyede, Yvonne Wambui, Victor Akinode, Deborah Nabagereka, Maurice Katusiime, Ayodele Awokoya, Mouhamadane MBOUP, Dibora Gebreyohannes, Henok Tilaye, Kelechi Nwaike, Degaga Wolde, Abdoulaye Faye, Blessing Sibanda, Orevaoghene Ahia, Bonaventure F. P. Dossou, Kelechi Ogueji, Thierno Ibrahima DIOP, Abdoulaye Diallo, Adewale Akinfaderin, Tendai Marengereke, Salomey Osei
We take a step towards addressing the under-representation of the African continent in NLP research by creating the first large publicly available high-quality dataset for named entity recognition (NER) in ten African languages, bringing together a variety of stakeholders.
1 code implementation • EMNLP 2020 • Shruti Rijhwani, Antonios Anastasopoulos, Graham Neubig
There is little to no data available to build natural language processing models for most endangered languages.
no code implementations • ACL 2020 • Shruti Rijhwani, Daniel Preotiuc-Pietro
Natural language processing models often have to make predictions on text data that evolves over time as a result of changes in language use or the information described in the text.
1 code implementation • ACL 2020 • Shruti Rijhwani, Shuyan Zhou, Graham Neubig, Jaime Carbonell
However, designing such features for low-resource languages is challenging, because exhaustive entity gazetteers do not exist in these languages.
no code implementations • LREC 2020 • Graham Neubig, Shruti Rijhwani, Alexis Palmer, Jordan MacKenzie, Hilaria Cruz, Xinjian Li, Matthew Lee, Aditi Chaudhary, Luke Gessler, Steven Abney, Shirley Anugrah Hayati, Antonios Anastasopoulos, Olga Zamaraeva, Emily Prud'hommeaux, Jennette Child, Sara Child, Rebecca Knowles, Sarah Moeller, Jeffrey Micher, Yiyuan Li, Sydney Zink, Mengzhou Xia, Roshan S Sharma, Patrick Littell
Despite recent advances in natural language processing and other language technology, the application of such technology to language documentation and conservation has been limited.
1 code implementation • 24 Apr 2020 • Aman Madaan, Shruti Rijhwani, Antonios Anastasopoulos, Yiming Yang, Graham Neubig
We propose a method of curating high-quality comparable training data for low-resource languages with monolingual annotators.
no code implementations • LREC 2020 • David R. Mortensen, Xinjian Li, Patrick Littell, Alexis Michaud, Shruti Rijhwani, Antonios Anastasopoulos, Alan W. black, Florian Metze, Graham Neubig
While phonemic representations are language specific, phonetic representations (stated in terms of (allo)phones) are much closer to a universal (language-independent) transcription.
1 code implementation • WS 2019 • Shuyan Zhou, Shruti Rijhwani, Graham Neubig
Cross-lingual entity linking (XEL) grounds named entities in a source language to an English Knowledge Base (KB), such as Wikipedia.
1 code implementation • ACL 2019 • Yu-Hsiang Lin, Chian-Yu Chen, Jean Lee, Zirui Li, Yuyan Zhang, Mengzhou Xia, Shruti Rijhwani, Junxian He, Zhisong Zhang, Xuezhe Ma, Antonios Anastasopoulos, Patrick Littell, Graham Neubig
Cross-lingual transfer, where a high-resource transfer language is used to improve the accuracy of a low-resource task language, is now an invaluable tool for improving performance of natural language processing (NLP) on low-resource languages.
no code implementations • 24 Feb 2019 • Aditi Chaudhary, Siddharth Dalmia, Junjie Hu, Xinjian Li, Austin Matthews, Aldrian Obaja Muis, Naoki Otani, Shruti Rijhwani, Zaid Sheikh, Nidhi Vyas, Xinyi Wang, Jiateng Xie, Ruochen Xu, Chunting Zhou, Peter J. Jansen, Yiming Yang, Lori Levin, Florian Metze, Teruko Mitamura, David R. Mortensen, Graham Neubig, Eduard Hovy, Alan W. black, Jaime Carbonell, Graham V. Horwood, Shabnam Tafreshi, Mona Diab, Efsun S. Kayi, Noura Farra, Kathleen McKeown
This paper describes the ARIEL-CMU submissions to the Low Resource Human Language Technologies (LoReHLT) 2018 evaluations for the tasks Machine Translation (MT), Entity Discovery and Linking (EDL), and detection of Situation Frames in Text and Speech (SF Text and Speech).
1 code implementation • 9 Nov 2018 • Shruti Rijhwani, Jiateng Xie, Graham Neubig, Jaime Carbonell
To address this problem, we investigate zero-shot cross-lingual entity linking, in which we assume no bilingual lexical resources are available in the source low-resource language.
no code implementations • WS 2017 • Paul Michel, Ravich, Abhilasha er, Shruti Rijhwani
We investigate the pertinence of methods from algebraic topology for text data analysis.
no code implementations • WS 2017 • Michael Yoder, Shruti Rijhwani, Carolyn Ros{\'e}, Lori Levin
Code-switching has been found to have social motivations in addition to syntactic constraints.
no code implementations • ACL 2017 • Shruti Rijhwani, Royal Sequiera, Monojit Choudhury, Kalika Bali, Ch Maddila, ra Shekhar
Word-level language detection is necessary for analyzing code-switched text, where multiple languages could be mixed within a sentence.
no code implementations • 23 Jun 2017 • Abhilasha Ravichander, Shruti Rijhwani, Rajat Kulshreshtha, Chirag Nagpal, Tadas Baltrušaitis, Louis-Philippe Morency
In this work, we focus on improving learning for such hierarchical models and demonstrate our method on the task of speaker trait prediction.
no code implementations • 31 May 2017 • Paul Michel, Abhilasha Ravichander, Shruti Rijhwani
We investigate the pertinence of methods from algebraic topology for text data analysis.