Search Results for author: Antonios Anastasopoulos

Found 117 papers, 65 papers with code

Fine-Tuning MT systems for Robustness to Second-Language Speaker Variations

1 code implementation EMNLP (WNUT) 2020 Md Mahfuz ibn Alam, Antonios Anastasopoulos

The performance of neural machine translation (NMT) systems only trained on a single language variant degrades when confronted with even slightly different language variations.

Machine Translation NMT +1

Findings of the IWSLT 2022 Evaluation Campaign

no code implementations IWSLT (ACL) 2022 Antonios Anastasopoulos, Loïc Barrault, Luisa Bentivogli, Marcely Zanon Boito, Ondřej Bojar, Roldano Cattoni, Anna Currey, Georgiana Dinu, Kevin Duh, Maha Elbayad, Clara Emmanuel, Yannick Estève, Marcello Federico, Christian Federmann, Souhir Gahbiche, Hongyu Gong, Roman Grundkiewicz, Barry Haddow, Benjamin Hsu, Dávid Javorský, Vĕra Kloudová, Surafel Lakew, Xutai Ma, Prashant Mathur, Paul McNamee, Kenton Murray, Maria Nǎdejde, Satoshi Nakamura, Matteo Negri, Jan Niehues, Xing Niu, John Ortega, Juan Pino, Elizabeth Salesky, Jiatong Shi, Matthias Sperber, Sebastian Stüker, Katsuhito Sudoh, Marco Turchi, Yogesh Virkar, Alexander Waibel, Changhan Wang, Shinji Watanabe

The evaluation campaign of the 19th International Conference on Spoken Language Translation featured eight shared tasks: (i) Simultaneous speech translation, (ii) Offline speech translation, (iii) Speech to speech translation, (iv) Low-resource speech translation, (v) Multilingual speech translation, (vi) Dialect speech translation, (vii) Formality control for speech translation, (viii) Isometric speech translation.

Speech-to-Speech Translation Translation

Systematic Inequalities in Language Technology Performance across the World’s Languages

1 code implementation ACL 2022 Damian Blasi, Antonios Anastasopoulos, Graham Neubig

Natural language processing (NLP) systems have become a central technology in communication, education, medicine, artificial intelligence, and many other domains of research and development.

Dependency Parsing Machine Translation +5

FINDINGS OF THE IWSLT 2021 EVALUATION CAMPAIGN

no code implementations ACL (IWSLT) 2021 Antonios Anastasopoulos, Ondřej Bojar, Jacob Bremerman, Roldano Cattoni, Maha Elbayad, Marcello Federico, Xutai Ma, Satoshi Nakamura, Matteo Negri, Jan Niehues, Juan Pino, Elizabeth Salesky, Sebastian Stüker, Katsuhito Sudoh, Marco Turchi, Alexander Waibel, Changhan Wang, Matthew Wiesner

The evaluation campaign of the International Conference on Spoken Language Translation (IWSLT 2021) featured this year four shared tasks: (i) Simultaneous speech translation, (ii) Offline speech translation, (iii) Multilingual speech translation, (iv) Low-resource speech translation.

Translation

SIGMORPHON–UniMorph 2022 Shared Task 0: Generalization and Typologically Diverse Morphological Inflection

1 code implementation NAACL (SIGMORPHON) 2022 Jordan Kodner, Salam Khalifa, Khuyagbaatar Batsuren, Hossep Dolatian, Ryan Cotterell, Faruk Akkus, Antonios Anastasopoulos, Taras Andrushko, Aryaman Arora, Nona Atanalov, Gábor Bella, Elena Budianskaya, Yustinus Ghanggo Ate, Omer Goldman, David Guriel, Simon Guriel, Silvia Guriel-Agiashvili, Witold Kieraś, Andrew Krizhanovsky, Natalia Krizhanovsky, Igor Marchenko, Magdalena Markowska, Polina Mashkovtseva, Maria Nepomniashchaya, Daria Rodionova, Karina Scheifer, Alexandra Sorova, Anastasia Yemelina, Jeremiah Young, Ekaterina Vylomova

The 2022 SIGMORPHON–UniMorph shared task on large scale morphological inflection generation included a wide range of typologically diverse languages: 33 languages from 11 top-level language families: Arabic (Modern Standard), Assamese, Braj, Chukchi, Eastern Armenian, Evenki, Georgian, Gothic, Gujarati, Hebrew, Hungarian, Itelmen, Karelian, Kazakh, Ket, Khalkha Mongolian, Kholosi, Korean, Lamahalot, Low German, Ludic, Magahi, Middle Low German, Old English, Old High German, Old Norse, Polish, Pomak, Slovak, Turkish, Upper Sorbian, Veps, and Xibe.

Morphological Inflection

Back to School: Translation Using Grammar Books

1 code implementation20 Oct 2024 Jonathan Hus, Antonios Anastasopoulos

Machine translation systems for high resource languages perform exceptionally well and produce high quality translations.

Machine Translation Translation

The LLM Effect: Are Humans Truly Using LLMs, or Are They Being Influenced By Them Instead?

no code implementations7 Oct 2024 Alexander S. Choi, Syeda Sabrina Akter, JP Singh, Antonios Anastasopoulos

The study, conducted in two stages-Topic Discovery and Topic Assignment-integrates LLMs with expert annotators to observe the impact of LLM suggestions on what is usually human-only analysis.

Urban Mobility Assessment Using LLMs

no code implementations22 Aug 2024 Prabin Bhandari, Antonios Anastasopoulos, Dieter Pfoser

Understanding urban mobility patterns and analyzing how people move around cities helps improve the overall quality of life and supports the development of more livable, efficient, and sustainable urban areas.

Survey Text Generation

BiasDora: Exploring Hidden Biased Associations in Vision-Language Models

1 code implementation2 Jul 2024 Chahat Raj, Anjishnu Mukherjee, Aylin Caliskan, Antonios Anastasopoulos, Ziwei Zhu

Existing works examining Vision-Language Models (VLMs) for social biases predominantly focus on a limited set of documented bias associations, such as gender:profession or race:crime.

Breaking Bias, Building Bridges: Evaluation and Mitigation of Social Biases in LLMs via Contact Hypothesis

no code implementations2 Jul 2024 Chahat Raj, Anjishnu Mukherjee, Aylin Caliskan, Antonios Anastasopoulos, Ziwei Zhu

We propose a unique debiasing technique, Social Contact Debiasing (SCD), that instruction-tunes these models with unbiased responses to prompts.

Crossroads of Continents: Automated Artifact Extraction for Cultural Adaptation with Large Multimodal Models

1 code implementation2 Jul 2024 Anjishnu Mukherjee, Ziwei Zhu, Antonios Anastasopoulos

We present a comprehensive three-phase study to examine (1) the cultural understanding of Large Multimodal Models (LMMs) by introducing DalleStreet, a large-scale dataset generated by DALL-E 3 and validated by humans, containing 9, 935 images of 67 countries and 10 concept classes; (2) the underlying implicit and potentially stereotypical cultural associations with a cultural artifact extraction task; and (3) an approach to adapt cultural representation in an image based on extracted associations using a modular pipeline, CultureAdapt.

Gloss2Text: Sign Language Gloss translation using LLMs and Semantically Aware Label Smoothing

1 code implementation1 Jul 2024 Pooya Fayyazsanavi, Antonios Anastasopoulos, Jana Košecká

Sign language translation from video to spoken text presents unique challenges owing to the distinct grammar, expression nuances, and high variation of visual appearance across different speakers and contexts.

Data Augmentation Sign Language Translation +1

Script-Agnostic Language Identification

1 code implementation25 Jun 2024 Milind Agarwal, Joshua Otten, Antonios Anastasopoulos

Language identification is used as the first step in many data collection and crawling efforts because it allows us to sort online text into language-specific buckets.

Language Identification

Unlearning Climate Misinformation in Large Language Models

no code implementations29 May 2024 Michael Fore, Simranjit Singh, Chaehong Lee, Amritanshu Pandey, Antonios Anastasopoulos, Dimitrios Stamoulis

Misinformation regarding climate change is a key roadblock in addressing one of the most serious threats to humanity.

Misinformation RAG +1

EmoMix-3L: A Code-Mixed Dataset for Bangla-English-Hindi Emotion Detection

1 code implementation11 May 2024 Nishat Raihan, Dhiman Goswami, Antara Mahmud, Antonios Anastasopoulos, Marcos Zampieri

Code-mixing is a well-studied linguistic phenomenon that occurs when two or more languages are mixed in text or speech.

Data-Augmentation-Based Dialectal Adaptation for LLMs

1 code implementation11 Apr 2024 Fahim Faisal, Antonios Anastasopoulos

We propose an approach that combines the strengths of different types of language models and leverages data augmentation techniques to improve task performance on three South Slavic dialects: Chakavian, Cherkano, and Torlak.

Data Augmentation Natural Language Understanding

CMULAB: An Open-Source Framework for Training and Deployment of Natural Language Processing Models

1 code implementation3 Apr 2024 Zaid Sheikh, Antonios Anastasopoulos, Shruti Rijhwani, Lindia Tjuatja, Robbie Jimerson, Graham Neubig

Effectively using Natural Language Processing (NLP) tools in under-resourced languages requires a thorough understanding of the language itself, familiarity with the latest models and training methodologies, and technical expertise to deploy these models.

Optical Character Recognition (OCR) speech-recognition +1

A Study on Scaling Up Multilingual News Framing Analysis

1 code implementation1 Apr 2024 Syeda Sabrina Akter, Antonios Anastasopoulos

Media framing is the study of strategically selecting and presenting specific aspects of political issues to shape public opinion.

An Efficient Approach for Studying Cross-Lingual Transfer in Multilingual Language Models

1 code implementation29 Mar 2024 Fahim Faisal, Antonios Anastasopoulos

The capacity and effectiveness of pre-trained multilingual models (MLMs) for zero-shot cross-lingual transfer is well established.

Zero-Shot Cross-Lingual Transfer

Language and Speech Technology for Central Kurdish Varieties

1 code implementation4 Mar 2024 Sina Ahmadi, Daban Q. Jaff, Md Mahfuz ibn Alam, Antonios Anastasopoulos

Kurdish, an Indo-European language spoken by over 30 million speakers, is considered a dialect continuum and known for its diversity in language varieties.

Automatic Speech Recognition Diversity +4

Extracting Lexical Features from Dialects via Interpretable Dialect Classifiers

1 code implementation27 Feb 2024 Roy Xie, Orevaoghene Ahia, Yulia Tsvetkov, Antonios Anastasopoulos

Identifying linguistic differences between dialects of a language often requires expert knowledge and meticulous human analysis.

A Case Study on Filtering for End-to-End Speech Translation

no code implementations2 Feb 2024 Md Mahfuz ibn Alam, Antonios Anastasopoulos

It is relatively easy to mine a large parallel corpus for any machine learning task, such as speech-to-text or speech-to-speech translation.

Speech-to-Speech Translation Translation

A Morphologically-Aware Dictionary-based Data Augmentation Technique for Machine Translation of Under-Represented Languages

no code implementations2 Feb 2024 Md Mahfuz ibn Alam, Sina Ahmadi, Antonios Anastasopoulos

In this paper, we propose strategies to synthesize parallel data relying on morpho-syntactic information and using bilingual lexicons along with a small amount of seed parallel data.

Data Augmentation Machine Translation

Teacher Perception of Automatically Extracted Grammar Concepts for L2 Language Learning

no code implementations27 Oct 2023 Aditi Chaudhary, Arun Sampath, Ashwin Sheshadri, Antonios Anastasopoulos, Graham Neubig

This is challenging because i) it requires that such experts be accessible and have the necessary resources, and ii) describing all the intricacies of a language is time-consuming and prone to omission.

Global Voices, Local Biases: Socio-Cultural Prejudices across Languages

1 code implementation26 Oct 2023 Anjishnu Mukherjee, Chahat Raj, Ziwei Zhu, Antonios Anastasopoulos

Finally, we highlight the significance of these social biases and the new dimensions through an extensive comparison of embedding methods, reinforcing the need to address them in pursuit of more equitable language models.

To token or not to token: A Comparative Study of Text Representations for Cross-Lingual Transfer

1 code implementation12 Oct 2023 Md Mushfiqur Rahman, Fardin Ahsan Sakib, Fahim Faisal, Antonios Anastasopoulos

To understand the downstream implications of text representation choices, we perform a comparative analysis on language models having diverse text representation modalities including 2 segmentation-based models (\texttt{BERT}, \texttt{mBERT}), 1 image-based model (\texttt{PIXEL}), and 1 character-level model (\texttt{CANINE}).

Cross-Lingual Transfer Dependency Parsing +4

Are Large Language Models Geospatially Knowledgeable?

1 code implementation9 Oct 2023 Prabin Bhandari, Antonios Anastasopoulos, Dieter Pfoser

Despite the impressive performance of Large Language Models (LLM) for various natural language processing tasks, little is known about their comprehension of geographic data and related ability to facilitate informed geospatial decision-making.

Decision Making

Enhancing End-to-End Conversational Speech Translation Through Target Language Context Utilization

no code implementations27 Sep 2023 Amir Hussein, Brian Yan, Antonios Anastasopoulos, Shinji Watanabe, Sanjeev Khudanpur

Incorporating longer context has been shown to benefit machine translation, but the inclusion of context in end-to-end speech translation (E2E-ST) remains under-studied.

Machine Translation Translation

Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing

no code implementations27 Sep 2023 Brian Yan, Xuankai Chang, Antonios Anastasopoulos, Yuya Fujita, Shinji Watanabe

Recent works in end-to-end speech-to-text translation (ST) have proposed multi-tasking methods with soft parameter sharing which leverage machine translation (MT) data via secondary encoders that map text inputs to an eventual cross-modal representation.

Decoder Machine Translation +3

Zambezi Voice: A Multilingual Speech Corpus for Zambian Languages

1 code implementation7 Jun 2023 Claytone Sikasote, Kalinda Siaminwe, Stanly Mwape, Bangiwe Zulu, Mofya Phiri, Martin Phiri, David Zulu, Mayumbo Nyirenda, Antonios Anastasopoulos

The dataset is created for speech recognition but can be extended to multilingual speech processing research for both supervised and unsupervised learning approaches.

Cross-Lingual Transfer speech-recognition +2

CODET: A Benchmark for Contrastive Dialectal Evaluation of Machine Translation

no code implementations26 May 2023 Md Mahfuz ibn Alam, Sina Ahmadi, Antonios Anastasopoulos

Neural machine translation (NMT) systems exhibit limited robustness in handling source-side linguistic variations.

Machine Translation NMT +1

Script Normalization for Unconventional Writing of Under-Resourced Languages in Bilingual Communities

1 code implementation25 May 2023 Sina Ahmadi, Antonios Anastasopoulos

The wide accessibility of social media has provided linguistically under-represented communities with an extraordinary opportunity to create content in their native languages.

Language Identification Machine Translation

LIMIT: Language Identification, Misidentification, and Translation using Hierarchical Models in 350+ Languages

1 code implementation23 May 2023 Milind Agarwal, Md Mahfuz ibn Alam, Antonios Anastasopoulos

Second, we propose a novel misprediction-resolution hierarchical model, LIMIt, for language identification that reduces error by 55% (from 0. 71 to 0. 32) on our compiled children's stories dataset and by 40% (from 0. 23 to 0. 14) on the FLORES-200 benchmark.

Language Identification Translation

PALI: A Language Identification Benchmark for Perso-Arabic Scripts

1 code implementation3 Apr 2023 Sina Ahmadi, Milind Agarwal, Antonios Anastasopoulos

The Perso-Arabic scripts are a family of scripts that are widely adopted and used by various linguistic communities around the globe.

Language Identification

Approaches to Corpus Creation for Low-Resource Language Technology: the Case of Southern Kurdish and Laki

1 code implementation3 Apr 2023 Sina Ahmadi, Zahra Azin, Sara Belelli, Antonios Anastasopoulos

One of the major challenges that under-represented and endangered language communities face in language technology is the lack or paucity of language data.

Language Identification

User-Centric Evaluation of OCR Systems for Kwak'wala

no code implementations26 Feb 2023 Shruti Rijhwani, Daisy Rosenblum, Michayla King, Antonios Anastasopoulos, Graham Neubig

There has been recent interest in improving optical character recognition (OCR) for endangered languages, particularly because a large number of documents and books in these languages are not in machine-readable formats.

Optical Character Recognition Optical Character Recognition (OCR)

Noisy Parallel Data Alignment

1 code implementation23 Jan 2023 Ruoyu Xie, Antonios Anastasopoulos

An ongoing challenge in current natural language processing is how its major advancements tend to disproportionately favor resource-rich languages, leaving a significant number of under-resourced languages behind.

Optical Character Recognition Optical Character Recognition (OCR) +1

Geographic and Geopolitical Biases of Language Models

no code implementations20 Dec 2022 Fahim Faisal, Antonios Anastasopoulos

Pretrained language models (PLMs) often fail to fairly represent target users from certain world regions because of the under-representation of those regions in training datasets.

Language Generation Models Can Cause Harm: So What Can We Do About It? An Actionable Survey

no code implementations14 Oct 2022 Sachin Kumar, Vidhisha Balachandran, Lucille Njoo, Antonios Anastasopoulos, Yulia Tsvetkov

Recent advances in the capacity of large language models to generate human-like text have resulted in their increased adoption in user-facing settings.

Language Modelling Survey +1

Teacher Perception of Automatically Extracted Grammar Concepts for L2 Language Learning

no code implementations10 Jun 2022 Aditi Chaudhary, Arun Sampath, Ashwin Sheshadri, Antonios Anastasopoulos, Graham Neubig

This process is challenging because i) it requires that such experts be accessible and have the necessary resources, and ii) even if there are such experts, describing all the intricacies of a language is time-consuming and prone to omission.

Phylogeny-Inspired Adaptation of Multilingual Models to New Languages

1 code implementation19 May 2022 Fahim Faisal, Antonios Anastasopoulos

Large pretrained multilingual models, trained on dozens of languages, have delivered promising results due to cross-lingual learning capabilities on variety of language tasks.

Cross-Lingual Transfer

UniMorph 4.0: Universal Morphology

no code implementations LREC 2022 Khuyagbaatar Batsuren, Omer Goldman, Salam Khalifa, Nizar Habash, Witold Kieraś, Gábor Bella, Brian Leonard, Garrett Nicolai, Kyle Gorman, Yustinus Ghanggo Ate, Maria Ryskina, Sabrina J. Mielke, Elena Budianskaya, Charbel El-Khaissi, Tiago Pimentel, Michael Gasser, William Lane, Mohit Raj, Matt Coler, Jaime Rafael Montoya Samame, Delio Siticonatzi Camaiteri, Benoît Sagot, Esaú Zumaeta Rojas, Didier López Francis, Arturo Oncevay, Juan López Bautista, Gema Celeste Silva Villegas, Lucas Torroba Hennigen, Adam Ek, David Guriel, Peter Dirix, Jean-Philippe Bernardy, Andrey Scherbakov, Aziyana Bayyr-ool, Antonios Anastasopoulos, Roberto Zariquiey, Karina Sheifer, Sofya Ganieva, Hilaria Cruz, Ritván Karahóǧa, Stella Markantonatou, George Pavlidis, Matvey Plugaryov, Elena Klyachko, Ali Salehi, Candy Angulo, Jatayu Baxi, Andrew Krizhanovsky, Natalia Krizhanovskaya, Elizabeth Salesky, Clara Vania, Sardana Ivanova, Jennifer White, Rowan Hall Maudslay, Josef Valvoda, Ran Zmigrod, Paula Czarnowska, Irene Nikkarinen, Aelita Salchak, Brijesh Bhatt, Christopher Straughn, Zoey Liu, Jonathan North Washington, Yuval Pinter, Duygu Ataman, Marcin Wolinski, Totok Suhardijanto, Anna Yablonskaya, Niklas Stoehr, Hossep Dolatian, Zahroh Nuriah, Shyam Ratan, Francis M. Tyers, Edoardo M. Ponti, Grant Aiton, Aryaman Arora, Richard J. Hatcher, Ritesh Kumar, Jeremiah Young, Daria Rodionova, Anastasia Yemelina, Taras Andrushko, Igor Marchenko, Polina Mashkovtseva, Alexandra Serova, Emily Prud'hommeaux, Maria Nepomniashchaya, Fausto Giunchiglia, Eleanor Chodroff, Mans Hulden, Miikka Silfverberg, Arya D. McCarthy, David Yarowsky, Ryan Cotterell, Reut Tsarfaty, Ekaterina Vylomova

The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema.

Morphological Inflection

AUTOLEX: An Automatic Framework for Linguistic Exploration

no code implementations25 Mar 2022 Aditi Chaudhary, Zaid Sheikh, David R Mortensen, Antonios Anastasopoulos, Graham Neubig

Each language has its own complex systems of word, phrase, and sentence construction, the guiding principles of which are often summarized in grammar descriptions for the consumption of linguists or language learners.

Sentence

Revisiting the Effects of Leakage on Dependency Parsing

1 code implementation Findings (ACL) 2022 Nathaniel Krasner, Miriam Wanner, Antonios Anastasopoulos

Recent work by S{\o}gaard (2020) showed that, treebank size aside, overlap between training and test graphs (termed leakage) explains more of the observed variation in dependency parsing performance than other explanations.

Dependency Parsing

Dataset Geography: Mapping Language Data to Language Users

no code implementations ACL 2022 Fahim Faisal, Yinkai Wang, Antonios Anastasopoulos

As language technologies become more ubiquitous, there are increasing efforts towards expanding the language diversity and coverage of natural language processing (NLP) systems.

Diversity

Lexically Aware Semi-Supervised Learning for OCR Post-Correction

1 code implementation4 Nov 2021 Shruti Rijhwani, Daisy Rosenblum, Antonios Anastasopoulos, Graham Neubig

In addition, to enforce consistency in the recognized vocabulary, we introduce a lexically-aware decoding method that augments the neural post-correction model with a count-based language model constructed from the recognized texts, implemented using weighted finite-state automata (WFSA) for efficient and effective decoding.

Language Modelling Optical Character Recognition +1

Systematic Inequalities in Language Technology Performance across the World's Languages

2 code implementations13 Oct 2021 Damián Blasi, Antonios Anastasopoulos, Graham Neubig

Natural language processing (NLP) systems have become a central technology in communication, education, medicine, artificial intelligence, and many other domains of research and development.

Dependency Parsing Machine Translation +6

SD-QA: Spoken Dialectal Question Answering for the Real World

1 code implementation Findings (EMNLP) 2021 Fahim Faisal, Sharlina Keshava, Md Mahfuz ibn Alam, Antonios Anastasopoulos

Question answering (QA) systems are now available through numerous commercial applications for a wide variety of domains, serving millions of users that interact with them via speech interfaces.

Fairness Question Answering +2

Investigating Post-pretraining Representation Alignment for Cross-Lingual Question Answering

no code implementations EMNLP (MRQA) 2021 Fahim Faisal, Antonios Anastasopoulos

Human knowledge is collectively encoded in the roughly 6500 languages spoken around the world, but it is not distributed equally across languages.

Cross-Lingual Question Answering

When is Wall a Pared and when a Muro? -- Extracting Rules Governing Lexical Selection

1 code implementation13 Sep 2021 Aditi Chaudhary, Kayo Yin, Antonios Anastasopoulos, Graham Neubig

Learning fine-grained distinctions between vocabulary items is a key challenge in learning a new language.

Cross-Lingual Text Classification of Transliterated Hindi and Malayalam

1 code implementation31 Aug 2021 Jitin Krishnan, Antonios Anastasopoulos, Hemant Purohit, Huzefa Rangwala

Transliteration is very common on social media, but transliterated text is not adequately handled by modern neural models for various NLP tasks.

Benchmarking Cross-Lingual Transfer +7

On the Evaluation of Machine Translation for Terminology Consistency

1 code implementation22 Jun 2021 Md Mahfuz ibn Alam, Antonios Anastasopoulos, Laurent Besacier, James Cross, Matthias Gallé, Philipp Koehn, Vassilina Nikoulina

As neural machine translation (NMT) systems become an important part of professional translator pipelines, a growing body of work focuses on combining NMT with terminologies.

Domain Adaptation Machine Translation +2

Code to Comment Translation: A Comparative Study on Model Effectiveness & Errors

1 code implementation ACL (NLP4Prog) 2021 Junayed Mahmud, Fahim Faisal, Raihan Islam Arnob, Antonios Anastasopoulos, Kevin Moran

Automated source code summarization is a popular software engineering research topic wherein machine translation models are employed to "translate" code snippets into relevant natural language descriptions.

Code Summarization Machine Translation +2

Machine Translation into Low-resource Language Varieties

no code implementations ACL 2021 Sachin Kumar, Antonios Anastasopoulos, Shuly Wintner, Yulia Tsvetkov

State-of-the-art machine translation (MT) systems are typically trained to generate the "standard" target language; however, many languages have multiple varieties (regional varieties, dialects, sociolects, non-native varieties) that are different from the standard language.

Machine Translation Translation

Towards More Equitable Question Answering Systems: How Much More Data Do You Need?

1 code implementation ACL 2021 Arnab Debnath, Navid Rajabi, Fardina Fathmiul Alam, Antonios Anastasopoulos

Question answering (QA) in English has been widely explored, but multilingual datasets are relatively new, with several methods attempting to bridge the gap between high- and low-resourced languages using data augmentation through translation and cross-lingual transfer.

Cross-Lingual Transfer Data Augmentation +2

BembaSpeech: A Speech Recognition Corpus for the Bemba Language

2 code implementations LREC 2022 Claytone Sikasote, Antonios Anastasopoulos

We present a preprocessed, ready-to-use automatic speech recognition corpus, BembaSpeech, consisting over 24 hours of read speech in the Bemba language, a written but low-resourced language spoken by over 30% of the population in Zambia.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Automatic Interlinear Glossing for Under-Resourced Languages Leveraging Translations

no code implementations COLING 2020 Xingyuan Zhao, Satoru Ozaki, Antonios Anastasopoulos, Graham Neubig, Lori Levin

Interlinear Glossed Text (IGT) is a widely used format for encoding linguistic information in language documentation projects and scholarly papers.

Cross-Lingual Transfer LEMMA +1

Endangered Languages meet Modern NLP

no code implementations COLING 2020 Antonios Anastasopoulos, Christopher Cox, Graham Neubig, Hilaria Cruz

This tutorial will focus on NLP for endangered languages documentation and revitalization.

OCR Post Correction for Endangered Language Texts

2 code implementations EMNLP 2020 Shruti Rijhwani, Antonios Anastasopoulos, Graham Neubig

There is little to no data available to build natural language processing models for most endangered languages.

Optical Character Recognition (OCR)

Reducing Confusion in Active Learning for Part-Of-Speech Tagging

no code implementations2 Nov 2020 Aditi Chaudhary, Antonios Anastasopoulos, Zaid Sheikh, Graham Neubig

Active learning (AL) uses a data selection algorithm to select useful training samples to minimize annotation cost.

Active Learning Part-Of-Speech Tagging +1

Comparison of Interactive Knowledge Base Spelling Correction Models for Low-Resource Languages

no code implementations20 Oct 2020 Yiyuan Li, Antonios Anastasopoulos, Alan W Black

In this work, we design a knowledge-base and prediction model embedded system for spelling correction in low-resource languages.

Spelling Correction

X-FACTR: Multilingual Factual Knowledge Retrieval from Pretrained Language Models

1 code implementation EMNLP 2020 Zhengbao Jiang, Antonios Anastasopoulos, Jun Araki, Haibo Ding, Graham Neubig

We further propose a code-switching-based method to improve the ability of multilingual LMs to access knowledge, and verify its effectiveness on several benchmark languages.

Retrieval

Automatic Extraction of Rules Governing Morphological Agreement

1 code implementation EMNLP 2020 Aditi Chaudhary, Antonios Anastasopoulos, Adithya Pratapa, David R. Mortensen, Zaid Sheikh, Yulia Tsvetkov, Graham Neubig

Using cross-lingual transfer, even with no expert annotations in the language of interest, our framework extracts a grammatical specification which is nearly equivalent to those created with large amounts of gold-standard annotated data.

Cross-Lingual Transfer Descriptive

Transliteration for Cross-Lingual Morphological Inflection

no code implementations WS 2020 Nikitha Murikinati, Antonios Anastasopoulos, Graham Neubig

Cross-lingual transfer between typologically related languages has been proven successful for the task of morphological inflection.

Cross-Lingual Transfer Morphological Inflection +1

Predicting Performance for Natural Language Processing Tasks

1 code implementation ACL 2020 Mengzhou Xia, Antonios Anastasopoulos, Ruochen Xu, Yiming Yang, Graham Neubig

Given the complexity of combinations of tasks, languages, and domains in natural language processing (NLP) research, it is computationally prohibitive to exhaustively test newly proposed models on each possible experimental setting.

Practical Comparable Data Collection for Low-Resource Languages via Images

1 code implementation24 Apr 2020 Aman Madaan, Shruti Rijhwani, Antonios Anastasopoulos, Yiming Yang, Graham Neubig

We propose a method of curating high-quality comparable training data for low-resource languages with monolingual annotators.

Machine Translation Translation

AlloVera: A Multilingual Allophone Database

no code implementations LREC 2020 David R. Mortensen, Xinjian Li, Patrick Littell, Alexis Michaud, Shruti Rijhwani, Antonios Anastasopoulos, Alan W. black, Florian Metze, Graham Neubig

While phonemic representations are language specific, phonetic representations (stated in terms of (allo)phones) are much closer to a universal (language-independent) transcription.

speech-recognition Speech Recognition

Dynamic Data Selection and Weighting for Iterative Back-Translation

1 code implementation EMNLP 2020 Zi-Yi Dou, Antonios Anastasopoulos, Graham Neubig

Back-translation has proven to be an effective method to utilize monolingual data in neural machine translation (NMT), and iteratively conducting back-translation can further improve the model performance.

Domain Adaptation Machine Translation +3

A Resource for Studying Chatino Verbal Morphology

no code implementations LREC 2020 Hilaria Cruz, Gregory Stump, Antonios Anastasopoulos

We present the first resource focusing on the verbal inflectional morphology of San Juan Quiahije Chatino, a tonal mesoamerican language spoken in Mexico.

Lemmatization Morphological Analysis +1

Towards Minimal Supervision BERT-based Grammar Error Correction

no code implementations10 Jan 2020 Yiyuan Li, Antonios Anastasopoulos, Alan W. black

Current grammatical error correction (GEC) models typically consider the task as sequence generation, which requires large amounts of annotated data and limit the applications in data-limited settings.

Grammatical Error Correction Language Modelling

Towards Robust Toxic Content Classification

1 code implementation14 Dec 2019 Keita Kurita, Anna Belova, Antonios Anastasopoulos

We propose a method of generating realistic model-agnostic attacks using a lexicon of toxic tokens, which attempts to mislead toxicity classifiers by diluting the toxicity signal either by obfuscating toxic tokens through character-level perturbations, or by injecting non-toxic distractor tokens.

Classification Denoising +1

A Resource for Computational Experiments on Mapudungun

1 code implementation LREC 2020 Mingjun Duan, Carlos Fasola, Sai Krishna Rallabandi, Rodolfo M. Vega, Antonios Anastasopoulos, Lori Levin, Alan W. black

We present a resource for computational experiments on Mapudungun, a polysynthetic indigenous language spoken in Chile with upwards of 200 thousand speakers.

Machine Translation speech-recognition +3

Optimizing Data Usage via Differentiable Rewards

1 code implementation ICML 2020 Xinyi Wang, Hieu Pham, Paul Michel, Antonios Anastasopoulos, Jaime Carbonell, Graham Neubig

To acquire a new skill, humans learn better and faster if a tutor, based on their current knowledge level, informs them of how much attention they should pay to particular content or practice problems.

Image Classification Machine Translation +1

Improving Robustness of Neural Machine Translation with Multi-task Learning

1 code implementation WS 2019 Shuyan Zhou, Xiangkai Zeng, Yingqi Zhou, Antonios Anastasopoulos, Graham Neubig

While neural machine translation (NMT) achieves remarkable performance on clean, in-domain text, performance is known to degrade drastically when facing text which is full of typos, grammatical errors and other varieties of noise.

Machine Translation Multi-Task Learning +2

Generalized Data Augmentation for Low-Resource Translation

no code implementations ACL 2019 Mengzhou Xia, Xiang Kong, Antonios Anastasopoulos, Graham Neubig

Translation to or from low-resource languages LRLs poses challenges for machine translation in terms of both adequacy and fluency.

Data Augmentation Translation +1

Choosing Transfer Languages for Cross-Lingual Learning

1 code implementation ACL 2019 Yu-Hsiang Lin, Chian-Yu Chen, Jean Lee, Zirui Li, Yuyan Zhang, Mengzhou Xia, Shruti Rijhwani, Junxian He, Zhisong Zhang, Xuezhe Ma, Antonios Anastasopoulos, Patrick Littell, Graham Neubig

Cross-lingual transfer, where a high-resource transfer language is used to improve the accuracy of a low-resource task language, is now an invaluable tool for improving performance of natural language processing (NLP) on low-resource languages.

Cross-Lingual Transfer

An Analysis of Source-Side Grammatical Errors in NMT

no code implementations WS 2019 Antonios Anastasopoulos

The quality of Neural Machine Translation (NMT) has been shown to significantly degrade when confronted with source-side noise.

Machine Translation NMT +1

Neural Language Modeling with Visual Features

no code implementations7 Mar 2019 Antonios Anastasopoulos, Shankar Kumar, Hank Liao

We report analysis that provides insights into why our multimodal language model improves upon a standard RNN language model.

Language Modelling

Freezing Subnetworks to Analyze Domain Adaptation in Neural Machine Translation

1 code implementation WS 2018 Brian Thompson, Huda Khayrallah, Antonios Anastasopoulos, Arya D. McCarthy, Kevin Duh, Rebecca Marvin, Paul McNamee, Jeremy Gwinnup, Tim Anderson, Philipp Koehn

To better understand the effectiveness of continued training, we analyze the major components of a neural machine translation system (the encoder, decoder, and each embedding space) and consider each component's contribution to, and capacity for, domain adaptation.

Decoder Domain Adaptation +2

Neural Machine Translation of Text from Non-Native Speakers

2 code implementations NAACL 2019 Antonios Anastasopoulos, Alison Lui, Toan Nguyen, David Chiang

Neural Machine Translation (NMT) systems are known to degrade when confronted with noisy data, especially when the system is trained only on clean data.

Machine Translation NMT +1

A small Griko-Italian speech translation corpus

no code implementations27 Jul 2018 Marcely Zanon Boito, Antonios Anastasopoulos, Marika Lekakou, Aline Villavicencio, Laurent Besacier

This paper presents an extension to a very low-resource parallel corpus collected in an endangered language, Griko, making it useful for computational research.

Diversity Translation

Tied Multitask Learning for Neural Speech Translation

no code implementations NAACL 2018 Antonios Anastasopoulos, David Chiang

We explore multitask models for neural translation of speech, augmenting them in order to reflect two intuitive notions.

Decoder Translation

Spoken Term Discovery for Language Documentation using Translations

no code implementations WS 2017 Antonios Anastasopoulos, Sameer Bansal, David Chiang, Sharon Goldwater, Adam Lopez

Vast amounts of speech data collected for language documentation and research remain untranscribed and unsearchable, but often a small amount of speech may have text translations available.

Translation

A case study on using speech-to-translation alignments for language documentation

no code implementations WS 2017 Antonios Anastasopoulos, David Chiang

For many low-resource or endangered languages, spoken language resources are more likely to be annotated with translations than with transcriptions.

speech-recognition Speech Recognition +1

DyNet: The Dynamic Neural Network Toolkit

4 code implementations15 Jan 2017 Graham Neubig, Chris Dyer, Yoav Goldberg, Austin Matthews, Waleed Ammar, Antonios Anastasopoulos, Miguel Ballesteros, David Chiang, Daniel Clothiaux, Trevor Cohn, Kevin Duh, Manaal Faruqui, Cynthia Gan, Dan Garrette, Yangfeng Ji, Lingpeng Kong, Adhiguna Kuncoro, Gaurav Kumar, Chaitanya Malaviya, Paul Michel, Yusuke Oda, Matthew Richardson, Naomi Saphra, Swabha Swayamdipta, Pengcheng Yin

In the static declaration strategy that is used in toolkits like Theano, CNTK, and TensorFlow, the user first defines a computation graph (a symbolic representation of the computation), and then examples are fed into an engine that executes this computation and computes its derivatives.

graph construction

Cannot find the paper you are looking for? You can Submit a new open access paper.