1 code implementation • CRAC (ACL) 2021 • Andreas van Cranenburgh, Esther Ploeger, Frank van den Berg, Remi Thüss
We introduce a modular, hybrid coreference resolution system that extends a rule-based baseline with three neural classifiers for the subtasks mention detection, mention attributes (gender, animacy, number), and pronoun resolution.
no code implementations • SemEval (NAACL) 2022 • Wessel Poelman, Gijs Danoe, Esther Ploeger, Frank van den Berg, Tommaso Caselli, Lukas Edman
This paper describes our system created for the SemEval 2022 Task 3: Presupposed Taxonomies - Evaluating Neural-network Semantics.
no code implementations • 11 Dec 2024 • Huiyuan Lai, Esther Ploeger, Rik van Noord, Antonio Toral
Neural machine translation (NMT) systems amplify lexical biases present in their training data, leading to artificially impoverished language in output translations.
no code implementations • 29 Nov 2024 • Angelika Romanou, Negar Foroutan, Anna Sotnikova, Zeming Chen, Sree Harsha Nelaturu, Shivalika Singh, Rishabh Maheshwary, Micol Altomare, Mohamed A. Haggag, Snegha A, Alfonso Amayuelas, Azril Hafizi Amirudin, Viraat Aryabumi, Danylo Boiko, Michael Chang, Jenny Chim, Gal Cohen, Aditya Kumar Dalmia, Abraham Diress, Sharad Duwal, Daniil Dzenhaliou, Daniel Fernando Erazo Florez, Fabian Farestam, Joseph Marvin Imperial, Shayekh Bin Islam, Perttu Isotalo, Maral Jabbarishiviari, Börje F. Karlsson, Eldar Khalilov, Christopher Klamm, Fajri Koto, Dominik Krzemiński, Gabriel Adriano de Melo, Syrielle Montariol, Yiyang Nan, Joel Niklaus, Jekaterina Novikova, Johan Samir Obando Ceron, Debjit Paul, Esther Ploeger, Jebish Purbey, Swati Rajwal, Selvan Sunitha Ravi, Sara Rydell, Roshan Santhosh, Drishti Sharma, Marjana Prifti Skenduli, Arshia Soltani Moakhar, Bardia Soltani Moakhar, Ran Tamir, Ayush Kumar Tarun, Azmine Toushik Wasi, Thenuka Ovin Weerasinghe, Serhan Yilmaz, Mike Zhang, Imanol Schlag, Marzieh Fadaee, Sara Hooker, Antoine Bosselut
The performance differential of large language models (LLM) between languages hinders their effective deployment in many regions, inhibiting the potential economic and societal value of generative AI tools in many communities.
no code implementations • 8 Nov 2024 • Kushal Tatariya, Artur Kulmizev, Wessel Poelman, Esther Ploeger, Marcel Bollmann, Johannes Bjerva, Jiaming Luo, Heather Lent, Miryam de Lhoneux
Wikipedia's perceived high quality and broad language coverage have established it as a fundamental resource in multilingual NLP.
no code implementations • 30 Aug 2024 • Esther Ploeger, Huiyuan Lai, Rik van Noord, Antonio Toral
Thus, rather than aiming for the rigid increase of lexical diversity, we reframe the task as recovering what is lost in the machine translation process.
1 code implementation • 6 Jul 2024 • Esther Ploeger, Wessel Poelman, Andreas Holck Høeg-Petersen, Anders Schlichtkrull, Miryam de Lhoneux, Johannes Bjerva
We compare sampling methods with a range of metrics and find that our systematic methods consistently retrieve more typologically diverse language selections than previous methods in NLP.
2 code implementations • 6 Feb 2024 • Esther Ploeger, Wessel Poelman, Miryam de Lhoneux, Johannes Bjerva
We recommend future work to include an operationalization of 'typological diversity' that empirically justifies the diversity of language samples.
no code implementations • 2 Feb 2024 • Emi Baylor, Esther Ploeger, Johannes Bjerva
While information from the field of linguistic typology has the potential to improve performance on NLP tasks, reliable typological data is a prerequisite.
1 code implementation • 30 Oct 2023 • Heather Lent, Kushal Tatariya, Raj Dabre, Yiyi Chen, Marcell Fekete, Esther Ploeger, Li Zhou, Ruth-Ann Armstrong, Abee Eijansantos, Catriona Malau, Hans Erik Heje, Ernests Lavrinovics, Diptesh Kanojia, Paul Belony, Marcel Bollmann, Loïc Grobol, Miryam de Lhoneux, Daniel Hershcovich, Michel DeGraff, Anders Søgaard, Johannes Bjerva
Creoles represent an under-explored and marginalized group of languages, with few available resources for NLP research. While the genealogical ties between Creoles and a number of highly-resourced languages imply a significant potential for transfer learning, this potential is hampered due to this lack of annotated data.
no code implementations • 20 Oct 2023 • Emi Baylor, Esther Ploeger, Johannes Bjerva
We propose that such a view of typology has significant potential in the future, including in language modeling in low-resource scenarios.