Search Results for author: Vishrav Chaudhary

Found 35 papers, 14 papers with code

Findings of the WMT 2020 Shared Task on Quality Estimation

no code implementations WMT (EMNLP) 2020 Lucia Specia, Frédéric Blain, Marina Fomicheva, Erick Fonseca, Vishrav Chaudhary, Francisco Guzmán, André F. T. Martins

We report the results of the WMT20 shared task on Quality Estimation, where the challenge is to predict the quality of the output of neural machine translation systems at the word, sentence and document levels.

Machine Translation Translation

Findings of the WMT 2021 Shared Task on Quality Estimation

no code implementations WMT (EMNLP) 2021 Lucia Specia, Frédéric Blain, Marina Fomicheva, Chrysoula Zerva, Zhenhao Li, Vishrav Chaudhary, André F. T. Martins

We report the results of the WMT 2021 shared task on Quality Estimation, where the challenge is to predict the quality of the output of neural machine translation systems at the word and sentence levels.

Machine Translation Translation

Findings of the 2021 Conference on Machine Translation (WMT21)

no code implementations WMT (EMNLP) 2021 Farhad Akhbardeh, Arkady Arkhangorodsky, Magdalena Biesialska, Ondřej Bojar, Rajen Chatterjee, Vishrav Chaudhary, Marta R. Costa-Jussa, Cristina España-Bonet, Angela Fan, Christian Federmann, Markus Freitag, Yvette Graham, Roman Grundkiewicz, Barry Haddow, Leonie Harter, Kenneth Heafield, Christopher Homan, Matthias Huck, Kwabena Amponsah-Kaakyire, Jungo Kasai, Daniel Khashabi, Kevin Knight, Tom Kocmi, Philipp Koehn, Nicholas Lourie, Christof Monz, Makoto Morishita, Masaaki Nagata, Ajay Nagesh, Toshiaki Nakazawa, Matteo Negri, Santanu Pal, Allahsera Auguste Tapo, Marco Turchi, Valentin Vydrin, Marcos Zampieri

This paper presents the results of the newstranslation task, the multilingual low-resourcetranslation for Indo-European languages, thetriangular translation task, and the automaticpost-editing task organised as part of the Con-ference on Machine Translation (WMT) 2021. In the news task, participants were asked tobuild machine translation systems for any of10 language pairs, to be evaluated on test setsconsisting mainly of news stories.

Machine Translation Translation

Findings of the WMT 2020 Shared Task on Parallel Corpus Filtering and Alignment

no code implementations WMT (EMNLP) 2020 Philipp Koehn, Vishrav Chaudhary, Ahmed El-Kishky, Naman Goyal, Peng-Jen Chen, Francisco Guzmán

Following two preceding WMT Shared Task on Parallel Corpus Filtering (Koehn et al., 2018, 2019), we posed again the challenge of assigning sentence-level quality scores for very noisy corpora of sentence pairs crawled from the web, with the goal of sub-selecting the highest-quality data to be used to train ma-chine translation systems.

Translation

BERGAMOT-LATTE Submissions for the WMT20 Quality Estimation Shared Task

no code implementations WMT (EMNLP) 2020 Marina Fomicheva, Shuo Sun, Lisa Yankovskaya, Frédéric Blain, Vishrav Chaudhary, Mark Fishel, Francisco Guzmán, Lucia Specia

We explore (a) a black-box approach to QE based on pre-trained representations; and (b) glass-box approaches that leverage various indicators that can be extracted from the neural MT systems.

Data Selection Curriculum for Neural Machine Translation

no code implementations25 Mar 2022 Tasnim Mohiuddin, Philipp Koehn, Vishrav Chaudhary, James Cross, Shruti Bhosale, Shafiq Joty

In this work, we introduce a two-stage curriculum training framework for NMT where we fine-tune a base NMT model on subsets of data, selected by both deterministic scoring using pre-trained methods and online scoring that considers prediction scores of the emerging NMT model.

Machine Translation Translation

Few-shot Learning with Multilingual Language Models

1 code implementation20 Dec 2021 Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona Diab, Veselin Stoyanov, Xian Li

In this work, we train multilingual autoregressive language models on a balanced corpus covering a diverse set of languages, and study their few- and zero-shot learning capabilities in a wide range of tasks.

Few-Shot Learning Hate Speech Detection +4

Alternative Input Signals Ease Transfer in Multilingual Machine Translation

no code implementations ACL 2022 Simeng Sun, Angela Fan, James Cross, Vishrav Chaudhary, Chau Tran, Philipp Koehn, Francisco Guzman

Further, we find that incorporating alternative inputs via self-ensemble can be particularly effective when training set is small, leading to +5 BLEU when only 5% of the total training data is accessible.

Machine Translation Translation

Classification-based Quality Estimation: Small and Efficient Models for Real-world Applications

no code implementations EMNLP 2021 Shuo Sun, Ahmed El-Kishky, Vishrav Chaudhary, James Cross, Francisco Guzmán, Lucia Specia

Sentence-level Quality estimation (QE) of machine translation is traditionally formulated as a regression task, and the performance of QE models is typically measured by Pearson correlation with human labels.

Machine Translation Model Compression +1

LAWDR: Language-Agnostic Weighted Document Representations from Pre-trained Models

no code implementations7 Jun 2021 Hongyu Gong, Vishrav Chaudhary, Yuqing Tang, Francisco Guzmán

Cross-lingual document representations enable language understanding in multilingual contexts and allow transfer learning from high-resource to low-resource languages at the document level.

Sentence Embeddings Transfer Learning

An Exploratory Study on Multilingual Quality Estimation

no code implementations Asian Chapter of the Association for Computational Linguistics 2020 Shuo Sun, Marina Fomicheva, Fr{\'e}d{\'e}ric Blain, Vishrav Chaudhary, Ahmed El-Kishky, Adithya Renduchintala, Francisco Guzm{\'a}n, Lucia Specia

Predicting the quality of machine translation has traditionally been addressed with language-specific models, under the assumption that the quality label distribution or linguistic features exhibit traits that are not shared across languages.

Machine Translation Translation

Beyond English-Centric Multilingual Machine Translation

4 code implementations21 Oct 2020 Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin

Existing work in translation demonstrated the potential of massively multilingual machine translation by training a single model able to translate between any pair of languages.

Machine Translation Translation

Multilingual Translation with Extensible Multilingual Pretraining and Finetuning

3 code implementations2 Aug 2020 Yuqing Tang, Chau Tran, Xi-An Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan

Recent work demonstrates the potential of multilingual pretraining of creating one model that can be used for various tasks in different languages.

Machine Translation Translation

Unsupervised Quality Estimation for Neural Machine Translation

3 code implementations21 May 2020 Marina Fomicheva, Shuo Sun, Lisa Yankovskaya, Frédéric Blain, Francisco Guzmán, Mark Fishel, Nikolaos Aletras, Vishrav Chaudhary, Lucia Specia

Quality Estimation (QE) is an important component in making Machine Translation (MT) useful in real-world applications, as it is aimed to inform the user on the quality of the MT output at test time.

Machine Translation Translation

CCAligned: A Massive Collection of Cross-Lingual Web-Document Pairs

no code implementations EMNLP 2020 Ahmed El-Kishky, Vishrav Chaudhary, Francisco Guzman, Philipp Koehn

We mine sixty-eight snapshots of the Common Crawl corpus and identify web document pairs that are translations of each other.

Unsupervised Cross-lingual Representation Learning at Scale

24 code implementations ACL 2020 Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, Veselin Stoyanov

We also present a detailed empirical analysis of the key factors that are required to achieve these gains, including the trade-offs between (1) positive transfer and capacity dilution and (2) the performance of high and low resource languages at scale.

Cross-Lingual Transfer Language Modelling +2

Findings of the WMT 2019 Shared Task on Parallel Corpus Filtering for Low-Resource Conditions

no code implementations WS 2019 Philipp Koehn, Francisco Guzm{\'a}n, Vishrav Chaudhary, Juan Pino

Following the WMT 2018 Shared Task on Parallel Corpus Filtering, we posed the challenge of assigning sentence-level quality scores for very noisy corpora of sentence pairs crawled from the web, with the goal of sub-selecting 2{\%} and 10{\%} of the highest-quality data to be used to train machine translation systems.

Machine Translation Translation

WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia

6 code implementations EACL 2021 Holger Schwenk, Vishrav Chaudhary, Shuo Sun, Hongyu Gong, Francisco Guzmán

We present an approach based on multilingual sentence embeddings to automatically extract parallel sentences from the content of Wikipedia articles in 85 languages, including several dialects or low-resource languages.

Sentence Embeddings

Cannot find the paper you are looking for? You can Submit a new open access paper.