Search Results for author: Antonios Anastasopoulos

Found 68 papers, 37 papers with code

Findings of the IWSLT 2022 Evaluation Campaign

no code implementations IWSLT (ACL) 2022 Antonios Anastasopoulos, Loïc Barrault, Luisa Bentivogli, Marcely Zanon Boito, Ondřej Bojar, Roldano Cattoni, Anna Currey, Georgiana Dinu, Kevin Duh, Maha Elbayad, Clara Emmanuel, Yannick Estève, Marcello Federico, Christian Federmann, Souhir Gahbiche, Hongyu Gong, Roman Grundkiewicz, Barry Haddow, Benjamin Hsu, Dávid Javorský, Vĕra Kloudová, Surafel Lakew, Xutai Ma, Prashant Mathur, Paul McNamee, Kenton Murray, Maria Nǎdejde, Satoshi Nakamura, Matteo Negri, Jan Niehues, Xing Niu, John Ortega, Juan Pino, Elizabeth Salesky, Jiatong Shi, Matthias Sperber, Sebastian Stüker, Katsuhito Sudoh, Marco Turchi, Yogesh Virkar, Alexander Waibel, Changhan Wang, Shinji Watanabe

The evaluation campaign of the 19th International Conference on Spoken Language Translation featured eight shared tasks: (i) Simultaneous speech translation, (ii) Offline speech translation, (iii) Speech to speech translation, (iv) Low-resource speech translation, (v) Multilingual speech translation, (vi) Dialect speech translation, (vii) Formality control for speech translation, (viii) Isometric speech translation.

Speech-to-Speech Translation Translation

Systematic Inequalities in Language Technology Performance across the World’s Languages

1 code implementation ACL 2022 Damian Blasi, Antonios Anastasopoulos, Graham Neubig

Natural language processing (NLP) systems have become a central technology in communication, education, medicine, artificial intelligence, and many other domains of research and development.

Dependency Parsing Machine Translation +4

FINDINGS OF THE IWSLT 2021 EVALUATION CAMPAIGN

no code implementations ACL (IWSLT) 2021 Antonios Anastasopoulos, Ondřej Bojar, Jacob Bremerman, Roldano Cattoni, Maha Elbayad, Marcello Federico, Xutai Ma, Satoshi Nakamura, Matteo Negri, Jan Niehues, Juan Pino, Elizabeth Salesky, Sebastian Stüker, Katsuhito Sudoh, Marco Turchi, Alexander Waibel, Changhan Wang, Matthew Wiesner

The evaluation campaign of the International Conference on Spoken Language Translation (IWSLT 2021) featured this year four shared tasks: (i) Simultaneous speech translation, (ii) Offline speech translation, (iii) Multilingual speech translation, (iv) Low-resource speech translation.

Translation

Fine-Tuning MT systems for Robustness to Second-Language Speaker Variations

1 code implementation EMNLP (WNUT) 2020 Md Mahfuz ibn Alam, Antonios Anastasopoulos

The performance of neural machine translation (NMT) systems only trained on a single language variant degrades when confronted with even slightly different language variations.

Machine Translation Translation

UniMorph 4.0: Universal Morphology

no code implementations7 May 2022 Khuyagbaatar Batsuren, Omer Goldman, Salam Khalifa, Nizar Habash, Witold Kieraś, Gábor Bella, Brian Leonard, Garrett Nicolai, Kyle Gorman, Yustinus Ghanggo Ate, Maria Ryskina, Sabrina J. Mielke, Elena Budianskaya, Charbel El-Khaissi, Tiago Pimentel, Michael Gasser, William Lane, Mohit Raj, Matt Coler, Jaime Rafael Montoya Samame, Delio Siticonatzi Camaiteri, Esaú Zumaeta Rojas, Didier López Francis, Arturo Oncevay, Juan López Bautista, Gema Celeste Silva Villegas, Lucas Torroba Hennigen, Adam Ek, David Guriel, Peter Dirix, Jean-Philippe Bernardy, Andrey Scherbakov, Aziyana Bayyr-ool, Antonios Anastasopoulos, Roberto Zariquiey, Karina Sheifer, Sofya Ganieva, Hilaria Cruz, Ritván Karahóǧa, Stella Markantonatou, George Pavlidis, Matvey Plugaryov, Elena Klyachko, Ali Salehi, Candy Angulo, Jatayu Baxi, Andrew Krizhanovsky, Natalia Krizhanovskaya, Elizabeth Salesky, Clara Vania, Sardana Ivanova, Jennifer White, Rowan Hall Maudslay, Josef Valvoda, Ran Zmigrod, Paula Czarnowska, Irene Nikkarinen, Aelita Salchak, Brijesh Bhatt, Christopher Straughn, Zoey Liu, Jonathan North Washington, Yuval Pinter, Duygu Ataman, Marcin Wolinski, Totok Suhardijanto, Anna Yablonskaya, Niklas Stoehr, Hossep Dolatian, Zahroh Nuriah, Shyam Ratan, Francis M. Tyers, Edoardo M. Ponti, Grant Aiton, Aryaman Arora, Richard J. Hatcher, Ritesh Kumar, Jeremiah Young, Daria Rodionova, Anastasia Yemelina, Taras Andrushko, Igor Marchenko, Polina Mashkovtseva, Alexandra Serova, Emily Prud'hommeaux, Maria Nepomniashchaya, Fausto Giunchiglia, Eleanor Chodroff, Mans Hulden, Miikka Silfverberg, Arya D. McCarthy, David Yarowsky, Ryan Cotterell, Reut Tsarfaty, Ekaterina Vylomova

The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema.

Morphological Inflection

AUTOLEX: An Automatic Framework for Linguistic Exploration

no code implementations25 Mar 2022 Aditi Chaudhary, Zaid Sheikh, David R Mortensen, Antonios Anastasopoulos, Graham Neubig

Each language has its own complex systems of word, phrase, and sentence construction, the guiding principles of which are often summarized in grammar descriptions for the consumption of linguists or language learners.

Revisiting the Effects of Leakage on Dependency Parsing

1 code implementation Findings (ACL) 2022 Nathaniel Krasner, Miriam Wanner, Antonios Anastasopoulos

Recent work by S{\o}gaard (2020) showed that, treebank size aside, overlap between training and test graphs (termed leakage) explains more of the observed variation in dependency parsing performance than other explanations.

Dependency Parsing

Dataset Geography: Mapping Language Data to Language Users

no code implementations ACL 2022 Fahim Faisal, Yinkai Wang, Antonios Anastasopoulos

As language technologies become more ubiquitous, there are increasing efforts towards expanding the language diversity and coverage of natural language processing (NLP) systems.

Lexically Aware Semi-Supervised Learning for OCR Post-Correction

1 code implementation4 Nov 2021 Shruti Rijhwani, Daisy Rosenblum, Antonios Anastasopoulos, Graham Neubig

In addition, to enforce consistency in the recognized vocabulary, we introduce a lexically-aware decoding method that augments the neural post-correction model with a count-based language model constructed from the recognized texts, implemented using weighted finite-state automata (WFSA) for efficient and effective decoding.

Language Modelling Optical Character Recognition

Systematic Inequalities in Language Technology Performance across the World's Languages

1 code implementation13 Oct 2021 Damián Blasi, Antonios Anastasopoulos, Graham Neubig

Natural language processing (NLP) systems have become a central technology in communication, education, medicine, artificial intelligence, and many other domains of research and development.

Dependency Parsing Machine Translation +5

Investigating Post-pretraining Representation Alignment for Cross-Lingual Question Answering

no code implementations EMNLP (MRQA) 2021 Fahim Faisal, Antonios Anastasopoulos

Human knowledge is collectively encoded in the roughly 6500 languages spoken around the world, but it is not distributed equally across languages.

Cross-Lingual Question Answering

SD-QA: Spoken Dialectal Question Answering for the Real World

1 code implementation Findings (EMNLP) 2021 Fahim Faisal, Sharlina Keshava, Md Mahfuz ibn Alam, Antonios Anastasopoulos

Question answering (QA) systems are now available through numerous commercial applications for a wide variety of domains, serving millions of users that interact with them via speech interfaces.

Fairness Question Answering +1

When is Wall a Pared and when a Muro? -- Extracting Rules Governing Lexical Selection

1 code implementation13 Sep 2021 Aditi Chaudhary, Kayo Yin, Antonios Anastasopoulos, Graham Neubig

Learning fine-grained distinctions between vocabulary items is a key challenge in learning a new language.

Cross-Lingual Text Classification of Transliterated Hindi and Malayalam

1 code implementation31 Aug 2021 Jitin Krishnan, Antonios Anastasopoulos, Hemant Purohit, Huzefa Rangwala

Transliteration is very common on social media, but transliterated text is not adequately handled by modern neural models for various NLP tasks.

Cross-Lingual Transfer Data Augmentation +3

On the Evaluation of Machine Translation for Terminology Consistency

1 code implementation22 Jun 2021 Md Mahfuz ibn Alam, Antonios Anastasopoulos, Laurent Besacier, James Cross, Matthias Gallé, Philipp Koehn, Vassilina Nikoulina

As neural machine translation (NMT) systems become an important part of professional translator pipelines, a growing body of work focuses on combining NMT with terminologies.

Domain Adaptation Machine Translation +1

Code to Comment Translation: A Comparative Study on Model Effectiveness & Errors

1 code implementation ACL (NLP4Prog) 2021 Junayed Mahmud, Fahim Faisal, Raihan Islam Arnob, Antonios Anastasopoulos, Kevin Moran

Automated source code summarization is a popular software engineering research topic wherein machine translation models are employed to "translate" code snippets into relevant natural language descriptions.

Code Summarization Machine Translation +2

Machine Translation into Low-resource Language Varieties

no code implementations ACL 2021 Sachin Kumar, Antonios Anastasopoulos, Shuly Wintner, Yulia Tsvetkov

State-of-the-art machine translation (MT) systems are typically trained to generate the "standard" target language; however, many languages have multiple varieties (regional varieties, dialects, sociolects, non-native varieties) that are different from the standard language.

Machine Translation Translation

Towards More Equitable Question Answering Systems: How Much More Data Do You Need?

1 code implementation ACL 2021 Arnab Debnath, Navid Rajabi, Fardina Fathmiul Alam, Antonios Anastasopoulos

Question answering (QA) in English has been widely explored, but multilingual datasets are relatively new, with several methods attempting to bridge the gap between high- and low-resourced languages using data augmentation through translation and cross-lingual transfer.

Cross-Lingual Transfer Data Augmentation +2

Phoneme Recognition through Fine Tuning of Phonetic Representations: a Case Study on Luhya Language Varieties

no code implementations4 Apr 2021 Kathleen Siminyu, Xinjian Li, Antonios Anastasopoulos, David Mortensen, Michael R. Marlo, Graham Neubig

Models pre-trained on multiple languages have shown significant promise for improving speech recognition, particularly for low-resource languages.

Speech Recognition

BembaSpeech: A Speech Recognition Corpus for the Bemba Language

2 code implementations9 Feb 2021 Claytone Sikasote, Antonios Anastasopoulos

We present a preprocessed, ready-to-use automatic speech recognition corpus, BembaSpeech, consisting over 24 hours of read speech in the Bemba language, a written but low-resourced language spoken by over 30% of the population in Zambia.

Automatic Speech Recognition

Endangered Languages meet Modern NLP

no code implementations COLING 2020 Antonios Anastasopoulos, Christopher Cox, Graham Neubig, Hilaria Cruz

This tutorial will focus on NLP for endangered languages documentation and revitalization.

Automatic Interlinear Glossing for Under-Resourced Languages Leveraging Translations

no code implementations COLING 2020 Xingyuan Zhao, Satoru Ozaki, Antonios Anastasopoulos, Graham Neubig, Lori Levin

Interlinear Glossed Text (IGT) is a widely used format for encoding linguistic information in language documentation projects and scholarly papers.

Cross-Lingual Transfer TAG

OCR Post Correction for Endangered Language Texts

1 code implementation EMNLP 2020 Shruti Rijhwani, Antonios Anastasopoulos, Graham Neubig

There is little to no data available to build natural language processing models for most endangered languages.

Optical Character Recognition

Reducing Confusion in Active Learning for Part-Of-Speech Tagging

no code implementations2 Nov 2020 Aditi Chaudhary, Antonios Anastasopoulos, Zaid Sheikh, Graham Neubig

Active learning (AL) uses a data selection algorithm to select useful training samples to minimize annotation cost.

Active Learning Part-Of-Speech Tagging +1

Comparison of Interactive Knowledge Base Spelling Correction Models for Low-Resource Languages

no code implementations20 Oct 2020 Yiyuan Li, Antonios Anastasopoulos, Alan W Black

In this work, we design a knowledge-base and prediction model embedded system for spelling correction in low-resource languages.

Spelling Correction

X-FACTR: Multilingual Factual Knowledge Retrieval from Pretrained Language Models

1 code implementation EMNLP 2020 Zhengbao Jiang, Antonios Anastasopoulos, Jun Araki, Haibo Ding, Graham Neubig

We further propose a code-switching-based method to improve the ability of multilingual LMs to access knowledge, and verify its effectiveness on several benchmark languages.

Pretrained Language Models

Automatic Extraction of Rules Governing Morphological Agreement

1 code implementation EMNLP 2020 Aditi Chaudhary, Antonios Anastasopoulos, Adithya Pratapa, David R. Mortensen, Zaid Sheikh, Yulia Tsvetkov, Graham Neubig

Using cross-lingual transfer, even with no expert annotations in the language of interest, our framework extracts a grammatical specification which is nearly equivalent to those created with large amounts of gold-standard annotated data.

Cross-Lingual Transfer

Transliteration for Cross-Lingual Morphological Inflection

no code implementations WS 2020 Nikitha Murikinati, Antonios Anastasopoulos, Graham Neubig

Cross-lingual transfer between typologically related languages has been proven successful for the task of morphological inflection.

Cross-Lingual Transfer Morphological Inflection +1

Predicting Performance for Natural Language Processing Tasks

1 code implementation ACL 2020 Mengzhou Xia, Antonios Anastasopoulos, Ruochen Xu, Yiming Yang, Graham Neubig

Given the complexity of combinations of tasks, languages, and domains in natural language processing (NLP) research, it is computationally prohibitive to exhaustively test newly proposed models on each possible experimental setting.

Practical Comparable Data Collection for Low-Resource Languages via Images

1 code implementation24 Apr 2020 Aman Madaan, Shruti Rijhwani, Antonios Anastasopoulos, Yiming Yang, Graham Neubig

We propose a method of curating high-quality comparable training data for low-resource languages with monolingual annotators.

Machine Translation Translation

AlloVera: A Multilingual Allophone Database

no code implementations LREC 2020 David R. Mortensen, Xinjian Li, Patrick Littell, Alexis Michaud, Shruti Rijhwani, Antonios Anastasopoulos, Alan W. black, Florian Metze, Graham Neubig

While phonemic representations are language specific, phonetic representations (stated in terms of (allo)phones) are much closer to a universal (language-independent) transcription.

Speech Recognition

Dynamic Data Selection and Weighting for Iterative Back-Translation

1 code implementation EMNLP 2020 Zi-Yi Dou, Antonios Anastasopoulos, Graham Neubig

Back-translation has proven to be an effective method to utilize monolingual data in neural machine translation (NMT), and iteratively conducting back-translation can further improve the model performance.

Domain Adaptation Machine Translation +1

A Resource for Studying Chatino Verbal Morphology

no code implementations LREC 2020 Hilaria Cruz, Gregory Stump, Antonios Anastasopoulos

We present the first resource focusing on the verbal inflectional morphology of San Juan Quiahije Chatino, a tonal mesoamerican language spoken in Mexico.

Lemmatization Morphological Analysis +1

Towards Minimal Supervision BERT-based Grammar Error Correction

no code implementations10 Jan 2020 Yiyuan Li, Antonios Anastasopoulos, Alan W. black

Current grammatical error correction (GEC) models typically consider the task as sequence generation, which requires large amounts of annotated data and limit the applications in data-limited settings.

Grammatical Error Correction Language Modelling

Towards Robust Toxic Content Classification

1 code implementation14 Dec 2019 Keita Kurita, Anna Belova, Antonios Anastasopoulos

We propose a method of generating realistic model-agnostic attacks using a lexicon of toxic tokens, which attempts to mislead toxicity classifiers by diluting the toxicity signal either by obfuscating toxic tokens through character-level perturbations, or by injecting non-toxic distractor tokens.

Denoising General Classification

A Resource for Computational Experiments on Mapudungun

1 code implementation LREC 2020 Mingjun Duan, Carlos Fasola, Sai Krishna Rallabandi, Rodolfo M. Vega, Antonios Anastasopoulos, Lori Levin, Alan W. black

We present a resource for computational experiments on Mapudungun, a polysynthetic indigenous language spoken in Chile with upwards of 200 thousand speakers.

Machine Translation Speech Recognition +2

Optimizing Data Usage via Differentiable Rewards

1 code implementation ICML 2020 Xinyi Wang, Hieu Pham, Paul Michel, Antonios Anastasopoulos, Jaime Carbonell, Graham Neubig

To acquire a new skill, humans learn better and faster if a tutor, based on their current knowledge level, informs them of how much attention they should pay to particular content or practice problems.

Image Classification Machine Translation

Improving Robustness of Neural Machine Translation with Multi-task Learning

1 code implementation WS 2019 Shuyan Zhou, Xiangkai Zeng, Yingqi Zhou, Antonios Anastasopoulos, Graham Neubig

While neural machine translation (NMT) achieves remarkable performance on clean, in-domain text, performance is known to degrade drastically when facing text which is full of typos, grammatical errors and other varieties of noise.

Machine Translation Multi-Task Learning +1

Generalized Data Augmentation for Low-Resource Translation

no code implementations ACL 2019 Mengzhou Xia, Xiang Kong, Antonios Anastasopoulos, Graham Neubig

Translation to or from low-resource languages LRLs poses challenges for machine translation in terms of both adequacy and fluency.

Data Augmentation Translation +1

Choosing Transfer Languages for Cross-Lingual Learning

1 code implementation ACL 2019 Yu-Hsiang Lin, Chian-Yu Chen, Jean Lee, Zirui Li, Yuyan Zhang, Mengzhou Xia, Shruti Rijhwani, Junxian He, Zhisong Zhang, Xuezhe Ma, Antonios Anastasopoulos, Patrick Littell, Graham Neubig

Cross-lingual transfer, where a high-resource transfer language is used to improve the accuracy of a low-resource task language, is now an invaluable tool for improving performance of natural language processing (NLP) on low-resource languages.

Cross-Lingual Transfer

An Analysis of Source-Side Grammatical Errors in NMT

no code implementations WS 2019 Antonios Anastasopoulos

The quality of Neural Machine Translation (NMT) has been shown to significantly degrade when confronted with source-side noise.

Machine Translation Translation

Neural Language Modeling with Visual Features

no code implementations7 Mar 2019 Antonios Anastasopoulos, Shankar Kumar, Hank Liao

We report analysis that provides insights into why our multimodal language model improves upon a standard RNN language model.

Language Modelling

Freezing Subnetworks to Analyze Domain Adaptation in Neural Machine Translation

1 code implementation WS 2018 Brian Thompson, Huda Khayrallah, Antonios Anastasopoulos, Arya D. McCarthy, Kevin Duh, Rebecca Marvin, Paul McNamee, Jeremy Gwinnup, Tim Anderson, Philipp Koehn

To better understand the effectiveness of continued training, we analyze the major components of a neural machine translation system (the encoder, decoder, and each embedding space) and consider each component's contribution to, and capacity for, domain adaptation.

Domain Adaptation Machine Translation +1

Neural Machine Translation of Text from Non-Native Speakers

2 code implementations NAACL 2019 Antonios Anastasopoulos, Alison Lui, Toan Nguyen, David Chiang

Neural Machine Translation (NMT) systems are known to degrade when confronted with noisy data, especially when the system is trained only on clean data.

Machine Translation Translation

A small Griko-Italian speech translation corpus

no code implementations27 Jul 2018 Marcely Zanon Boito, Antonios Anastasopoulos, Marika Lekakou, Aline Villavicencio, Laurent Besacier

This paper presents an extension to a very low-resource parallel corpus collected in an endangered language, Griko, making it useful for computational research.

Translation

Tied Multitask Learning for Neural Speech Translation

no code implementations NAACL 2018 Antonios Anastasopoulos, David Chiang

We explore multitask models for neural translation of speech, augmenting them in order to reflect two intuitive notions.

Translation

Spoken Term Discovery for Language Documentation using Translations

no code implementations WS 2017 Antonios Anastasopoulos, Sameer Bansal, David Chiang, Sharon Goldwater, Adam Lopez

Vast amounts of speech data collected for language documentation and research remain untranscribed and unsearchable, but often a small amount of speech may have text translations available.

Translation

A case study on using speech-to-translation alignments for language documentation

no code implementations WS 2017 Antonios Anastasopoulos, David Chiang

For many low-resource or endangered languages, spoken language resources are more likely to be annotated with translations than with transcriptions.

Speech Recognition Translation

DyNet: The Dynamic Neural Network Toolkit

4 code implementations15 Jan 2017 Graham Neubig, Chris Dyer, Yoav Goldberg, Austin Matthews, Waleed Ammar, Antonios Anastasopoulos, Miguel Ballesteros, David Chiang, Daniel Clothiaux, Trevor Cohn, Kevin Duh, Manaal Faruqui, Cynthia Gan, Dan Garrette, Yangfeng Ji, Lingpeng Kong, Adhiguna Kuncoro, Gaurav Kumar, Chaitanya Malaviya, Paul Michel, Yusuke Oda, Matthew Richardson, Naomi Saphra, Swabha Swayamdipta, Pengcheng Yin

In the static declaration strategy that is used in toolkits like Theano, CNTK, and TensorFlow, the user first defines a computation graph (a symbolic representation of the computation), and then examples are fed into an engine that executes this computation and computes its derivatives.

graph construction

Cannot find the paper you are looking for? You can Submit a new open access paper.