no code implementations • EMNLP 2021 • Aditi Chaudhary, Kayo Yin, Antonios Anastasopoulos, Graham Neubig
Learning fine-grained distinctions between vocabulary items is a key challenge in learning a new language.
1 code implementation • EMNLP (WNUT) 2020 • Md Mahfuz ibn Alam, Antonios Anastasopoulos
The performance of neural machine translation (NMT) systems only trained on a single language variant degrades when confronted with even slightly different language variations.
no code implementations • IWSLT (ACL) 2022 • Antonios Anastasopoulos, Loïc Barrault, Luisa Bentivogli, Marcely Zanon Boito, Ondřej Bojar, Roldano Cattoni, Anna Currey, Georgiana Dinu, Kevin Duh, Maha Elbayad, Clara Emmanuel, Yannick Estève, Marcello Federico, Christian Federmann, Souhir Gahbiche, Hongyu Gong, Roman Grundkiewicz, Barry Haddow, Benjamin Hsu, Dávid Javorský, Vĕra Kloudová, Surafel Lakew, Xutai Ma, Prashant Mathur, Paul McNamee, Kenton Murray, Maria Nǎdejde, Satoshi Nakamura, Matteo Negri, Jan Niehues, Xing Niu, John Ortega, Juan Pino, Elizabeth Salesky, Jiatong Shi, Matthias Sperber, Sebastian Stüker, Katsuhito Sudoh, Marco Turchi, Yogesh Virkar, Alexander Waibel, Changhan Wang, Shinji Watanabe
The evaluation campaign of the 19th International Conference on Spoken Language Translation featured eight shared tasks: (i) Simultaneous speech translation, (ii) Offline speech translation, (iii) Speech to speech translation, (iv) Low-resource speech translation, (v) Multilingual speech translation, (vi) Dialect speech translation, (vii) Formality control for speech translation, (viii) Isometric speech translation.
1 code implementation • ACL 2022 • Damian Blasi, Antonios Anastasopoulos, Graham Neubig
Natural language processing (NLP) systems have become a central technology in communication, education, medicine, artificial intelligence, and many other domains of research and development.
1 code implementation • VarDial (COLING) 2022 • Noëmi Aepli, Antonios Anastasopoulos, Adrian-Gabriel Chifu, William Domingues, Fahim Faisal, Mihaela Gaman, Radu Tudor Ionescu, Yves Scherrer
This report presents the results of the shared tasks organized as part of the VarDial Evaluation Campaign 2022.
no code implementations • ACL (IWSLT) 2021 • Antonios Anastasopoulos, Ondřej Bojar, Jacob Bremerman, Roldano Cattoni, Maha Elbayad, Marcello Federico, Xutai Ma, Satoshi Nakamura, Matteo Negri, Jan Niehues, Juan Pino, Elizabeth Salesky, Sebastian Stüker, Katsuhito Sudoh, Marco Turchi, Alexander Waibel, Changhan Wang, Matthew Wiesner
The evaluation campaign of the International Conference on Spoken Language Translation (IWSLT 2021) featured this year four shared tasks: (i) Simultaneous speech translation, (ii) Offline speech translation, (iii) Multilingual speech translation, (iv) Low-resource speech translation.
1 code implementation • NAACL (SIGMORPHON) 2022 • Jordan Kodner, Salam Khalifa, Khuyagbaatar Batsuren, Hossep Dolatian, Ryan Cotterell, Faruk Akkus, Antonios Anastasopoulos, Taras Andrushko, Aryaman Arora, Nona Atanalov, Gábor Bella, Elena Budianskaya, Yustinus Ghanggo Ate, Omer Goldman, David Guriel, Simon Guriel, Silvia Guriel-Agiashvili, Witold Kieraś, Andrew Krizhanovsky, Natalia Krizhanovsky, Igor Marchenko, Magdalena Markowska, Polina Mashkovtseva, Maria Nepomniashchaya, Daria Rodionova, Karina Scheifer, Alexandra Sorova, Anastasia Yemelina, Jeremiah Young, Ekaterina Vylomova
The 2022 SIGMORPHON–UniMorph shared task on large scale morphological inflection generation included a wide range of typologically diverse languages: 33 languages from 11 top-level language families: Arabic (Modern Standard), Assamese, Braj, Chukchi, Eastern Armenian, Evenki, Georgian, Gothic, Gujarati, Hebrew, Hungarian, Itelmen, Karelian, Kazakh, Ket, Khalkha Mongolian, Kholosi, Korean, Lamahalot, Low German, Ludic, Magahi, Middle Low German, Old English, Old High German, Old Norse, Polish, Pomak, Slovak, Turkish, Upper Sorbian, Veps, and Xibe.
no code implementations • JEP/TALN/RECITAL 2022 • Benjamin Muller, Antonios Anastasopoulos, Benoît Sagot, Djamé Seddah
Dans ce travail, en comparant des modèles multilingues et monolingues, nous montrons que de tels modèles se comportent de multiples façons sur des langues inconnues.
no code implementations • WMT (EMNLP) 2021 • Md Mahfuz ibn Alam, Ivana Kvapilíková, Antonios Anastasopoulos, Laurent Besacier, Georgiana Dinu, Marcello Federico, Matthias Gallé, Kweonwoo Jung, Philipp Koehn, Vassilina Nikoulina
Language domains that require very careful use of terminology are abundant and reflect a significant part of the translation industry.
1 code implementation • 20 Oct 2024 • Jonathan Hus, Antonios Anastasopoulos
Machine translation systems for high resource languages perform exceptionally well and produce high quality translations.
no code implementations • 19 Oct 2024 • Nishat Raihan, Antonios Anastasopoulos, Marcos Zampieri
The HumanEval Benchmark, developed by OpenAI, remains the most widely used code generation benchmark.
no code implementations • 7 Oct 2024 • Alexander S. Choi, Syeda Sabrina Akter, JP Singh, Antonios Anastasopoulos
The study, conducted in two stages-Topic Discovery and Topic Assignment-integrates LLMs with expert annotators to observe the impact of LLM suggestions on what is usually human-only analysis.
no code implementations • 22 Aug 2024 • Prabin Bhandari, Antonios Anastasopoulos, Dieter Pfoser
Understanding urban mobility patterns and analyzing how people move around cities helps improve the overall quality of life and supports the development of more livable, efficient, and sustainable urban areas.
1 code implementation • 2 Jul 2024 • Chahat Raj, Anjishnu Mukherjee, Aylin Caliskan, Antonios Anastasopoulos, Ziwei Zhu
Existing works examining Vision-Language Models (VLMs) for social biases predominantly focus on a limited set of documented bias associations, such as gender:profession or race:crime.
no code implementations • 2 Jul 2024 • Chahat Raj, Anjishnu Mukherjee, Aylin Caliskan, Antonios Anastasopoulos, Ziwei Zhu
We propose a unique debiasing technique, Social Contact Debiasing (SCD), that instruction-tunes these models with unbiased responses to prompts.
1 code implementation • 2 Jul 2024 • Anjishnu Mukherjee, Ziwei Zhu, Antonios Anastasopoulos
We present a comprehensive three-phase study to examine (1) the cultural understanding of Large Multimodal Models (LMMs) by introducing DalleStreet, a large-scale dataset generated by DALL-E 3 and validated by humans, containing 9, 935 images of 67 countries and 10 concept classes; (2) the underlying implicit and potentially stereotypical cultural associations with a cultural artifact extraction task; and (3) an approach to adapt cultural representation in an image based on extracted associations using a modular pipeline, CultureAdapt.
1 code implementation • 1 Jul 2024 • Pooya Fayyazsanavi, Antonios Anastasopoulos, Jana Košecká
Sign language translation from video to spoken text presents unique challenges owing to the distinct grammar, expression nuances, and high variation of visual appearance across different speakers and contexts.
1 code implementation • 25 Jun 2024 • Milind Agarwal, Joshua Otten, Antonios Anastasopoulos
Language identification is used as the first step in many data collection and crawling efforts because it allows us to sort online text into language-specific buckets.
no code implementations • 29 May 2024 • Michael Fore, Simranjit Singh, Chaehong Lee, Amritanshu Pandey, Antonios Anastasopoulos, Dimitrios Stamoulis
Misinformation regarding climate change is a key roadblock in addressing one of the most serious threats to humanity.
1 code implementation • 11 May 2024 • Nishat Raihan, Dhiman Goswami, Antara Mahmud, Antonios Anastasopoulos, Marcos Zampieri
Code-mixing is a well-studied linguistic phenomenon that occurs when two or more languages are mixed in text or speech.
1 code implementation • 11 Apr 2024 • Fahim Faisal, Antonios Anastasopoulos
We propose an approach that combines the strengths of different types of language models and leverages data augmentation techniques to improve task performance on three South Slavic dialects: Chakavian, Cherkano, and Torlak.
1 code implementation • 3 Apr 2024 • Zaid Sheikh, Antonios Anastasopoulos, Shruti Rijhwani, Lindia Tjuatja, Robbie Jimerson, Graham Neubig
Effectively using Natural Language Processing (NLP) tools in under-resourced languages requires a thorough understanding of the language itself, familiarity with the latest models and training methodologies, and technical expertise to deploy these models.
1 code implementation • 1 Apr 2024 • Syeda Sabrina Akter, Antonios Anastasopoulos
Media framing is the study of strategically selecting and presenting specific aspects of political issues to shape public opinion.
1 code implementation • 29 Mar 2024 • Fahim Faisal, Antonios Anastasopoulos
The capacity and effectiveness of pre-trained multilingual models (MLMs) for zero-shot cross-lingual transfer is well established.
1 code implementation • 16 Mar 2024 • Fahim Faisal, Orevaoghene Ahia, Aarohi Srivastava, Kabir Ahuja, David Chiang, Yulia Tsvetkov, Antonios Anastasopoulos
This allows for a comprehensive evaluation of NLP system performance on different language varieties.
1 code implementation • 4 Mar 2024 • Sina Ahmadi, Daban Q. Jaff, Md Mahfuz ibn Alam, Antonios Anastasopoulos
Kurdish, an Indo-European language spoken by over 30 million speakers, is considered a dialect continuum and known for its diversity in language varieties.
1 code implementation • 27 Feb 2024 • Roy Xie, Orevaoghene Ahia, Yulia Tsvetkov, Antonios Anastasopoulos
Identifying linguistic differences between dialects of a language often requires expert knowledge and meticulous human analysis.
no code implementations • 2 Feb 2024 • Md Mahfuz ibn Alam, Antonios Anastasopoulos
It is relatively easy to mine a large parallel corpus for any machine learning task, such as speech-to-text or speech-to-speech translation.
no code implementations • 2 Feb 2024 • Md Mahfuz ibn Alam, Sina Ahmadi, Antonios Anastasopoulos
In this paper, we propose strategies to synthesize parallel data relying on morpho-syntactic information and using bilingual lexicons along with a small amount of seed parallel data.
no code implementations • 29 Nov 2023 • Angeela Acharya, Sulabh Shrestha, Anyi Chen, Joseph Conte, Sanja Avramovic, Siddhartha Sikdar, Antonios Anastasopoulos, Sanmay Das
Previous research has addressed this data limitation by incorporating medical ontologies and employing transfer learning methods.
no code implementations • 25 Nov 2023 • Md Nishat Raihan, Umma Hani Tanmoy, Anika Binte Islam, Kai North, Tharindu Ranasinghe, Antonios Anastasopoulos, Marcos Zampieri
Identifying offensive content in social media is vital for creating safe online communities.
1 code implementation • 27 Oct 2023 • Dhiman Goswami, Md Nishat Raihan, Antara Mahmud, Antonios Anastasopoulos, Marcos Zampieri
Code-mixing is a well-studied linguistic phenomenon when two or more languages are mixed in text or speech.
1 code implementation • 27 Oct 2023 • Md Nishat Raihan, Dhiman Goswami, Antara Mahmud, Antonios Anastasopoulos, Marcos Zampieri
Code-mixing is a well-studied linguistic phenomenon when two or more languages are mixed in text or speech.
no code implementations • 27 Oct 2023 • Aditi Chaudhary, Arun Sampath, Ashwin Sheshadri, Antonios Anastasopoulos, Graham Neubig
This is challenging because i) it requires that such experts be accessible and have the necessary resources, and ii) describing all the intricacies of a language is time-consuming and prone to omission.
1 code implementation • 26 Oct 2023 • Anjishnu Mukherjee, Chahat Raj, Ziwei Zhu, Antonios Anastasopoulos
Finally, we highlight the significance of these social biases and the new dimensions through an extensive comparison of embedding methods, reinforcing the need to address them in pursuit of more equitable language models.
1 code implementation • 12 Oct 2023 • Md Mushfiqur Rahman, Fardin Ahsan Sakib, Fahim Faisal, Antonios Anastasopoulos
To understand the downstream implications of text representation choices, we perform a comparative analysis on language models having diverse text representation modalities including 2 segmentation-based models (\texttt{BERT}, \texttt{mBERT}), 1 image-based model (\texttt{PIXEL}), and 1 character-level model (\texttt{CANINE}).
1 code implementation • 9 Oct 2023 • Prabin Bhandari, Antonios Anastasopoulos, Dieter Pfoser
Despite the impressive performance of Large Language Models (LLM) for various natural language processing tasks, little is known about their comprehension of geographic data and related ability to facilitate informed geospatial decision-making.
no code implementations • 27 Sep 2023 • Amir Hussein, Brian Yan, Antonios Anastasopoulos, Shinji Watanabe, Sanjeev Khudanpur
Incorporating longer context has been shown to benefit machine translation, but the inclusion of context in end-to-end speech translation (E2E-ST) remains under-studied.
no code implementations • 27 Sep 2023 • Brian Yan, Xuankai Chang, Antonios Anastasopoulos, Yuya Fujita, Shinji Watanabe
Recent works in end-to-end speech-to-text translation (ST) have proposed multi-tasking methods with soft parameter sharing which leverage machine translation (MT) data via secondary encoders that map text inputs to an eventual cross-modal representation.
1 code implementation • 7 Jun 2023 • Claytone Sikasote, Kalinda Siaminwe, Stanly Mwape, Bangiwe Zulu, Mofya Phiri, Martin Phiri, David Zulu, Mayumbo Nyirenda, Antonios Anastasopoulos
The dataset is created for speech recognition but can be extended to multilingual speech processing research for both supervised and unsupervised learning approaches.
1 code implementation • 26 May 2023 • Claytone Sikasote, Eunice Mukonde, Md Mahfuz ibn Alam, Antonios Anastasopoulos
We present BIG-C (Bemba Image Grounded Conversations), a large multimodal dataset for Bemba.
no code implementations • 26 May 2023 • Md Mahfuz ibn Alam, Sina Ahmadi, Antonios Anastasopoulos
Neural machine translation (NMT) systems exhibit limited robustness in handling source-side linguistic variations.
1 code implementation • 25 May 2023 • Sina Ahmadi, Antonios Anastasopoulos
The wide accessibility of social media has provided linguistically under-represented communities with an extraordinary opportunity to create content in their native languages.
no code implementations • 24 May 2023 • Yueqi Song, Catherine Cui, Simran Khanuja, PengFei Liu, Fahim Faisal, Alissa Ostapenko, Genta Indra Winata, Alham Fikri Aji, Samuel Cahyawijaya, Yulia Tsvetkov, Antonios Anastasopoulos, Graham Neubig
Despite the major advances in NLP, significant disparities in NLP system performance across languages still exist.
1 code implementation • 23 May 2023 • Milind Agarwal, Md Mahfuz ibn Alam, Antonios Anastasopoulos
Second, we propose a novel misprediction-resolution hierarchical model, LIMIt, for language identification that reduces error by 55% (from 0. 71 to 0. 32) on our compiled children's stories dataset and by 40% (from 0. 23 to 0. 14) on the FLORES-200 benchmark.
no code implementations • 25 Apr 2023 • Md Mahfuz ibn Alam, Ruoyu Xie, Fahim Faisal, Antonios Anastasopoulos
This report describes GMU's sentiment analysis system for the SemEval-2023 shared task AfriSenti-SemEval.
1 code implementation • 3 Apr 2023 • Sina Ahmadi, Milind Agarwal, Antonios Anastasopoulos
The Perso-Arabic scripts are a family of scripts that are widely adopted and used by various linguistic communities around the globe.
1 code implementation • 3 Apr 2023 • Sina Ahmadi, Zahra Azin, Sara Belelli, Antonios Anastasopoulos
One of the major challenges that under-represented and endangered language communities face in language technology is the lack or paucity of language data.
no code implementations • 26 Feb 2023 • Shruti Rijhwani, Daisy Rosenblum, Michayla King, Antonios Anastasopoulos, Graham Neubig
There has been recent interest in improving optical character recognition (OCR) for endangered languages, particularly because a large number of documents and books in these languages are not in machine-readable formats.
Optical Character Recognition Optical Character Recognition (OCR)
1 code implementation • 23 Jan 2023 • Ruoyu Xie, Antonios Anastasopoulos
An ongoing challenge in current natural language processing is how its major advancements tend to disproportionately favor resource-rich languages, leaving a significant number of under-resourced languages behind.
Optical Character Recognition Optical Character Recognition (OCR) +1
no code implementations • 20 Dec 2022 • Fahim Faisal, Antonios Anastasopoulos
Pretrained language models (PLMs) often fail to fairly represent target users from certain world regions because of the under-representation of those regions in training datasets.
no code implementations • 14 Oct 2022 • Sachin Kumar, Vidhisha Balachandran, Lucille Njoo, Antonios Anastasopoulos, Yulia Tsvetkov
Recent advances in the capacity of large language models to generate human-like text have resulted in their increased adoption in user-facing settings.
no code implementations • 10 Jun 2022 • Aditi Chaudhary, Arun Sampath, Ashwin Sheshadri, Antonios Anastasopoulos, Graham Neubig
This process is challenging because i) it requires that such experts be accessible and have the necessary resources, and ii) even if there are such experts, describing all the intricacies of a language is time-consuming and prone to omission.
no code implementations • NAACL (BEA) 2022 • Cristian Ahumada, Claudio Gutierrez, Antonios Anastasopoulos
Mapuzugun is the language of the Mapuche people.
1 code implementation • 19 May 2022 • Fahim Faisal, Antonios Anastasopoulos
Large pretrained multilingual models, trained on dozens of languages, have delivered promising results due to cross-lingual learning capabilities on variety of language tasks.
no code implementations • LREC 2022 • Khuyagbaatar Batsuren, Omer Goldman, Salam Khalifa, Nizar Habash, Witold Kieraś, Gábor Bella, Brian Leonard, Garrett Nicolai, Kyle Gorman, Yustinus Ghanggo Ate, Maria Ryskina, Sabrina J. Mielke, Elena Budianskaya, Charbel El-Khaissi, Tiago Pimentel, Michael Gasser, William Lane, Mohit Raj, Matt Coler, Jaime Rafael Montoya Samame, Delio Siticonatzi Camaiteri, Benoît Sagot, Esaú Zumaeta Rojas, Didier López Francis, Arturo Oncevay, Juan López Bautista, Gema Celeste Silva Villegas, Lucas Torroba Hennigen, Adam Ek, David Guriel, Peter Dirix, Jean-Philippe Bernardy, Andrey Scherbakov, Aziyana Bayyr-ool, Antonios Anastasopoulos, Roberto Zariquiey, Karina Sheifer, Sofya Ganieva, Hilaria Cruz, Ritván Karahóǧa, Stella Markantonatou, George Pavlidis, Matvey Plugaryov, Elena Klyachko, Ali Salehi, Candy Angulo, Jatayu Baxi, Andrew Krizhanovsky, Natalia Krizhanovskaya, Elizabeth Salesky, Clara Vania, Sardana Ivanova, Jennifer White, Rowan Hall Maudslay, Josef Valvoda, Ran Zmigrod, Paula Czarnowska, Irene Nikkarinen, Aelita Salchak, Brijesh Bhatt, Christopher Straughn, Zoey Liu, Jonathan North Washington, Yuval Pinter, Duygu Ataman, Marcin Wolinski, Totok Suhardijanto, Anna Yablonskaya, Niklas Stoehr, Hossep Dolatian, Zahroh Nuriah, Shyam Ratan, Francis M. Tyers, Edoardo M. Ponti, Grant Aiton, Aryaman Arora, Richard J. Hatcher, Ritesh Kumar, Jeremiah Young, Daria Rodionova, Anastasia Yemelina, Taras Andrushko, Igor Marchenko, Polina Mashkovtseva, Alexandra Serova, Emily Prud'hommeaux, Maria Nepomniashchaya, Fausto Giunchiglia, Eleanor Chodroff, Mans Hulden, Miikka Silfverberg, Arya D. McCarthy, David Yarowsky, Ryan Cotterell, Reut Tsarfaty, Ekaterina Vylomova
The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema.
no code implementations • 25 Mar 2022 • Aditi Chaudhary, Zaid Sheikh, David R Mortensen, Antonios Anastasopoulos, Graham Neubig
Each language has its own complex systems of word, phrase, and sentence construction, the guiding principles of which are often summarized in grammar descriptions for the consumption of linguists or language learners.
1 code implementation • Findings (ACL) 2022 • Nathaniel Krasner, Miriam Wanner, Antonios Anastasopoulos
Recent work by S{\o}gaard (2020) showed that, treebank size aside, overlap between training and test graphs (termed leakage) explains more of the observed variation in dependency parsing performance than other explanations.
no code implementations • ACL 2022 • Fahim Faisal, Yinkai Wang, Antonios Anastasopoulos
As language technologies become more ubiquitous, there are increasing efforts towards expanding the language diversity and coverage of natural language processing (NLP) systems.
1 code implementation • 4 Nov 2021 • Shruti Rijhwani, Daisy Rosenblum, Antonios Anastasopoulos, Graham Neubig
In addition, to enforce consistency in the recognized vocabulary, we introduce a lexically-aware decoding method that augments the neural post-correction model with a count-based language model constructed from the recognized texts, implemented using weighted finite-state automata (WFSA) for efficient and effective decoding.
2 code implementations • 13 Oct 2021 • Damián Blasi, Antonios Anastasopoulos, Graham Neubig
Natural language processing (NLP) systems have become a central technology in communication, education, medicine, artificial intelligence, and many other domains of research and development.
1 code implementation • Findings (EMNLP) 2021 • Fahim Faisal, Sharlina Keshava, Md Mahfuz ibn Alam, Antonios Anastasopoulos
Question answering (QA) systems are now available through numerous commercial applications for a wide variety of domains, serving millions of users that interact with them via speech interfaces.
no code implementations • EMNLP (MRQA) 2021 • Fahim Faisal, Antonios Anastasopoulos
Human knowledge is collectively encoded in the roughly 6500 languages spoken around the world, but it is not distributed equally across languages.
1 code implementation • 13 Sep 2021 • Aditi Chaudhary, Kayo Yin, Antonios Anastasopoulos, Graham Neubig
Learning fine-grained distinctions between vocabulary items is a key challenge in learning a new language.
1 code implementation • 31 Aug 2021 • Jitin Krishnan, Antonios Anastasopoulos, Hemant Purohit, Huzefa Rangwala
Transliteration is very common on social media, but transliterated text is not adequately handled by modern neural models for various NLP tasks.
1 code implementation • 22 Jun 2021 • Md Mahfuz ibn Alam, Antonios Anastasopoulos, Laurent Besacier, James Cross, Matthias Gallé, Philipp Koehn, Vassilina Nikoulina
As neural machine translation (NMT) systems become an important part of professional translator pipelines, a growing body of work focuses on combining NMT with terminologies.
1 code implementation • ACL (NLP4Prog) 2021 • Junayed Mahmud, Fahim Faisal, Raihan Islam Arnob, Antonios Anastasopoulos, Kevin Moran
Automated source code summarization is a popular software engineering research topic wherein machine translation models are employed to "translate" code snippets into relevant natural language descriptions.
no code implementations • ACL 2021 • Sachin Kumar, Antonios Anastasopoulos, Shuly Wintner, Yulia Tsvetkov
State-of-the-art machine translation (MT) systems are typically trained to generate the "standard" target language; however, many languages have multiple varieties (regional varieties, dialects, sociolects, non-native varieties) that are different from the standard language.
1 code implementation • ACL 2021 • Arnab Debnath, Navid Rajabi, Fardina Fathmiul Alam, Antonios Anastasopoulos
Question answering (QA) in English has been widely explored, but multilingual datasets are relatively new, with several methods attempting to bridge the gap between high- and low-resourced languages using data augmentation through translation and cross-lingual transfer.
no code implementations • 4 Apr 2021 • Kathleen Siminyu, Xinjian Li, Antonios Anastasopoulos, David Mortensen, Michael R. Marlo, Graham Neubig
Models pre-trained on multiple languages have shown significant promise for improving speech recognition, particularly for low-resource languages.
1 code implementation • EMNLP 2021 • Adithya Pratapa, Antonios Anastasopoulos, Shruti Rijhwani, Aditi Chaudhary, David R. Mortensen, Graham Neubig, Yulia Tsvetkov
Text generation systems are ubiquitous in natural language processing applications.
1 code implementation • EMNLP (MRL) 2021 • Jitin Krishnan, Antonios Anastasopoulos, Hemant Purohit, Huzefa Rangwala
Predicting user intent and detecting the corresponding slots from text are two key problems in Natural Language Understanding (NLU).
2 code implementations • LREC 2022 • Claytone Sikasote, Antonios Anastasopoulos
We present a preprocessed, ready-to-use automatic speech recognition corpus, BembaSpeech, consisting over 24 hours of read speech in the Bemba language, a written but low-resourced language spoken by over 30% of the population in Zambia.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • COLING 2020 • Xingyuan Zhao, Satoru Ozaki, Antonios Anastasopoulos, Graham Neubig, Lori Levin
Interlinear Glossed Text (IGT) is a widely used format for encoding linguistic information in language documentation projects and scholarly papers.
no code implementations • COLING 2020 • Antonios Anastasopoulos, Christopher Cox, Graham Neubig, Hilaria Cruz
This tutorial will focus on NLP for endangered languages documentation and revitalization.
2 code implementations • EMNLP 2020 • Shruti Rijhwani, Antonios Anastasopoulos, Graham Neubig
There is little to no data available to build natural language processing models for most endangered languages.
no code implementations • 2 Nov 2020 • Aditi Chaudhary, Antonios Anastasopoulos, Zaid Sheikh, Graham Neubig
Active learning (AL) uses a data selection algorithm to select useful training samples to minimize annotation cost.
no code implementations • 20 Oct 2020 • Yiyuan Li, Antonios Anastasopoulos, Alan W Black
In this work, we design a knowledge-base and prediction model embedded system for spelling correction in low-resource languages.
1 code implementation • EMNLP 2020 • Zhengbao Jiang, Antonios Anastasopoulos, Jun Araki, Haibo Ding, Graham Neubig
We further propose a code-switching-based method to improve the ability of multilingual LMs to access knowledge, and verify its effectiveness on several benchmark languages.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Md Mosharaf Hossain, Antonios Anastasopoulos, Eduardo Blanco, Alexis Palmer
As machine translation (MT) systems progress at a rapid pace, questions of their adequacy linger.
1 code implementation • EMNLP 2020 • Aditi Chaudhary, Antonios Anastasopoulos, Adithya Pratapa, David R. Mortensen, Zaid Sheikh, Yulia Tsvetkov, Graham Neubig
Using cross-lingual transfer, even with no expert annotations in the language of interest, our framework extracts a grammatical specification which is nearly equivalent to those created with large amounts of gold-standard annotated data.
no code implementations • EMNLP (NLP-COVID19) 2020 • Antonios Anastasopoulos, Alessandro Cattelan, Zi-Yi Dou, Marcello Federico, Christian Federman, Dmitriy Genzel, Francisco Guzmán, Junjie Hu, Macduff Hughes, Philipp Koehn, Rosie Lazar, Will Lewis, Graham Neubig, Mengmeng Niu, Alp Öktem, Eric Paquin, Grace Tang, Sylwia Tur
Further, the team is converting the test and development data into translation memories (TMXs) that can be used by localizers from and to any of the languages.
no code implementations • WS 2020 • Nikitha Murikinati, Antonios Anastasopoulos, Graham Neubig
Cross-lingual transfer between typologically related languages has been proven successful for the task of morphological inflection.
no code implementations • WS 2020 • Nikitha Murikinati, Antonios Anastasopoulos
This paper describes the CMU-LTI submission to the SIGMORPHON 2020 Shared Task 0 on typologically diverse morphological inflection.
1 code implementation • WS 2020 • Ekaterina Vylomova, Jennifer White, Elizabeth Salesky, Sabrina J. Mielke, Shijie Wu, Edoardo Ponti, Rowan Hall Maudslay, Ran Zmigrod, Josef Valvoda, Svetlana Toldova, Francis Tyers, Elena Klyachko, Ilya Yegorov, Natalia Krizhanovsky, Paula Czarnowska, Irene Nikkarinen, Andrew Krizhanovsky, Tiago Pimentel, Lucas Torroba Hennigen, Christo Kirov, Garrett Nicolai, Adina Williams, Antonios Anastasopoulos, Hilaria Cruz, Eleanor Chodroff, Ryan Cotterell, Miikka Silfverberg, Mans Hulden
Systems were developed using data from 45 languages and just 5 language families, fine-tuned with data from an additional 45 languages and 10 language families (13 in total), and evaluated on all 90 languages.
1 code implementation • ACL 2020 • Emanuele Bugliarello, Sabrina J. Mielke, Antonios Anastasopoulos, Ryan Cotterell, Naoaki Okazaki
The performance of neural machine translation systems is commonly evaluated in terms of BLEU.
1 code implementation • ACL 2020 • Mengzhou Xia, Antonios Anastasopoulos, Ruochen Xu, Yiming Yang, Graham Neubig
Given the complexity of combinations of tasks, languages, and domains in natural language processing (NLP) research, it is computationally prohibitive to exhaustively test newly proposed models on each possible experimental setting.
no code implementations • LREC 2020 • Graham Neubig, Shruti Rijhwani, Alexis Palmer, Jordan MacKenzie, Hilaria Cruz, Xinjian Li, Matthew Lee, Aditi Chaudhary, Luke Gessler, Steven Abney, Shirley Anugrah Hayati, Antonios Anastasopoulos, Olga Zamaraeva, Emily Prud'hommeaux, Jennette Child, Sara Child, Rebecca Knowles, Sarah Moeller, Jeffrey Micher, Yiyuan Li, Sydney Zink, Mengzhou Xia, Roshan S Sharma, Patrick Littell
Despite recent advances in natural language processing and other language technology, the application of such technology to language documentation and conservation has been limited.
1 code implementation • 24 Apr 2020 • Aman Madaan, Shruti Rijhwani, Antonios Anastasopoulos, Yiming Yang, Graham Neubig
We propose a method of curating high-quality comparable training data for low-resource languages with monolingual annotators.
no code implementations • LREC 2020 • David R. Mortensen, Xinjian Li, Patrick Littell, Alexis Michaud, Shruti Rijhwani, Antonios Anastasopoulos, Alan W. black, Florian Metze, Graham Neubig
While phonemic representations are language specific, phonetic representations (stated in terms of (allo)phones) are much closer to a universal (language-independent) transcription.
1 code implementation • EMNLP 2020 • Zi-Yi Dou, Antonios Anastasopoulos, Graham Neubig
Back-translation has proven to be an effective method to utilize monolingual data in neural machine translation (NMT), and iteratively conducting back-translation can further improve the model performance.
no code implementations • LREC 2020 • Hilaria Cruz, Gregory Stump, Antonios Anastasopoulos
We present the first resource focusing on the verbal inflectional morphology of San Juan Quiahije Chatino, a tonal mesoamerican language spoken in Mexico.
1 code implementation • 26 Feb 2020 • Xinjian Li, Siddharth Dalmia, Juncheng Li, Matthew Lee, Patrick Littell, Jiali Yao, Antonios Anastasopoulos, David R. Mortensen, Graham Neubig, Alan W. black, Florian Metze
Multilingual models can improve language processing, particularly for low resource situations, by sharing parameters across languages.
no code implementations • 10 Jan 2020 • Yiyuan Li, Antonios Anastasopoulos, Alan W. black
Current grammatical error correction (GEC) models typically consider the task as sequence generation, which requires large amounts of annotated data and limit the applications in data-limited settings.
1 code implementation • 14 Dec 2019 • Keita Kurita, Anna Belova, Antonios Anastasopoulos
We propose a method of generating realistic model-agnostic attacks using a lexicon of toxic tokens, which attempts to mislead toxicity classifiers by diluting the toxicity signal either by obfuscating toxic tokens through character-level perturbations, or by injecting non-toxic distractor tokens.
1 code implementation • LREC 2020 • Mingjun Duan, Carlos Fasola, Sai Krishna Rallabandi, Rodolfo M. Vega, Antonios Anastasopoulos, Lori Levin, Alan W. black
We present a resource for computational experiments on Mapudungun, a polysynthetic indigenous language spoken in Chile with upwards of 200 thousand speakers.
1 code implementation • ICML 2020 • Xinyi Wang, Hieu Pham, Paul Michel, Antonios Anastasopoulos, Jaime Carbonell, Graham Neubig
To acquire a new skill, humans learn better and faster if a tutor, based on their current knowledge level, informs them of how much attention they should pay to particular content or practice problems.
1 code implementation • ACL 2020 • Antonios Anastasopoulos, Graham Neubig
Most of recent work in cross-lingual word embeddings is severely Anglocentric.
no code implementations • IJCNLP 2019 • Zi-Yi Dou, Keyi Yu, Antonios Anastasopoulos
Learning general representations of text is a fundamental problem for many natural language understanding (NLU) tasks.
1 code implementation • IJCNLP 2019 • Zi-Yi Dou, Junjie Hu, Antonios Anastasopoulos, Graham Neubig
The recent success of neural machine translation models relies on the availability of high quality, in-domain data.
4 code implementations • IJCNLP 2019 • Antonios Anastasopoulos, Graham Neubig
Recent years have seen exceptional strides in the task of automatic morphological inflection generation.
1 code implementation • WS 2019 • Shuyan Zhou, Xiangkai Zeng, Yingqi Zhou, Antonios Anastasopoulos, Graham Neubig
While neural machine translation (NMT) achieves remarkable performance on clean, in-domain text, performance is known to degrade drastically when facing text which is full of typos, grammatical errors and other varieties of noise.
1 code implementation • WS 2019 • Xi-An Li, Paul Michel, Antonios Anastasopoulos, Yonatan Belinkov, Nadir Durrani, Orhan Firat, Philipp Koehn, Graham Neubig, Juan Pino, Hassan Sajjad
We share the findings of the first shared task on improving robustness of Machine Translation (MT).
no code implementations • ACL 2019 • Mengzhou Xia, Xiang Kong, Antonios Anastasopoulos, Graham Neubig
Translation to or from low-resource languages LRLs poses challenges for machine translation in terms of both adequacy and fluency.
1 code implementation • ACL 2019 • Yu-Hsiang Lin, Chian-Yu Chen, Jean Lee, Zirui Li, Yuyan Zhang, Mengzhou Xia, Shruti Rijhwani, Junxian He, Zhisong Zhang, Xuezhe Ma, Antonios Anastasopoulos, Patrick Littell, Graham Neubig
Cross-lingual transfer, where a high-resource transfer language is used to improve the accuracy of a low-resource task language, is now an invaluable tool for improving performance of natural language processing (NLP) on low-resource languages.
no code implementations • WS 2019 • Antonios Anastasopoulos
The quality of Neural Machine Translation (NMT) has been shown to significantly degrade when confronted with source-side noise.
no code implementations • 7 Mar 2019 • Antonios Anastasopoulos, Shankar Kumar, Hank Liao
We report analysis that provides insights into why our multimodal language model improves upon a standard RNN language model.
1 code implementation • WS 2018 • Brian Thompson, Huda Khayrallah, Antonios Anastasopoulos, Arya D. McCarthy, Kevin Duh, Rebecca Marvin, Paul McNamee, Jeremy Gwinnup, Tim Anderson, Philipp Koehn
To better understand the effectiveness of continued training, we analyze the major components of a neural machine translation system (the encoder, decoder, and each embedding space) and consider each component's contribution to, and capacity for, domain adaptation.
2 code implementations • NAACL 2019 • Antonios Anastasopoulos, Alison Lui, Toan Nguyen, David Chiang
Neural Machine Translation (NMT) systems are known to degrade when confronted with noisy data, especially when the system is trained only on clean data.
no code implementations • 27 Jul 2018 • Marcely Zanon Boito, Antonios Anastasopoulos, Marika Lekakou, Aline Villavicencio, Laurent Besacier
This paper presents an extension to a very low-resource parallel corpus collected in an endangered language, Griko, making it useful for computational research.
no code implementations • NAACL 2018 • Antonios Anastasopoulos, David Chiang
We explore multitask models for neural translation of speech, augmenting them in order to reflect two intuitive notions.
no code implementations • WS 2017 • Antonios Anastasopoulos, Sameer Bansal, David Chiang, Sharon Goldwater, Adam Lopez
Vast amounts of speech data collected for language documentation and research remain untranscribed and unsearchable, but often a small amount of speech may have text translations available.
no code implementations • WS 2017 • Antonios Anastasopoulos, David Chiang
For many low-resource or endangered languages, spoken language resources are more likely to be annotated with translations than with transcriptions.
4 code implementations • 15 Jan 2017 • Graham Neubig, Chris Dyer, Yoav Goldberg, Austin Matthews, Waleed Ammar, Antonios Anastasopoulos, Miguel Ballesteros, David Chiang, Daniel Clothiaux, Trevor Cohn, Kevin Duh, Manaal Faruqui, Cynthia Gan, Dan Garrette, Yangfeng Ji, Lingpeng Kong, Adhiguna Kuncoro, Gaurav Kumar, Chaitanya Malaviya, Paul Michel, Yusuke Oda, Matthew Richardson, Naomi Saphra, Swabha Swayamdipta, Pengcheng Yin
In the static declaration strategy that is used in toolkits like Theano, CNTK, and TensorFlow, the user first defines a computation graph (a symbolic representation of the computation), and then examples are fed into an engine that executes this computation and computes its derivatives.
1 code implementation • EMNLP 2016 • Antonios Anastasopoulos, David Chiang, Long Duong
For many low-resource languages, spoken language resources are more likely to be annotated with translations than with transcriptions.