Search Results for author: Francisco Guzmán

Found 26 papers, 10 papers with code

BERGAMOT-LATTE Submissions for the WMT20 Quality Estimation Shared Task

no code implementations WMT (EMNLP) 2020 Marina Fomicheva, Shuo Sun, Lisa Yankovskaya, Frédéric Blain, Vishrav Chaudhary, Mark Fishel, Francisco Guzmán, Lucia Specia

We explore (a) a black-box approach to QE based on pre-trained representations; and (b) glass-box approaches that leverage various indicators that can be extracted from the neural MT systems.

Findings of the WMT 2020 Shared Task on Quality Estimation

no code implementations WMT (EMNLP) 2020 Lucia Specia, Frédéric Blain, Marina Fomicheva, Erick Fonseca, Vishrav Chaudhary, Francisco Guzmán, André F. T. Martins

We report the results of the WMT20 shared task on Quality Estimation, where the challenge is to predict the quality of the output of neural machine translation systems at the word, sentence and document levels.

Machine Translation Translation

Findings of the WMT 2020 Shared Task on Parallel Corpus Filtering and Alignment

no code implementations WMT (EMNLP) 2020 Philipp Koehn, Vishrav Chaudhary, Ahmed El-Kishky, Naman Goyal, Peng-Jen Chen, Francisco Guzmán

Following two preceding WMT Shared Task on Parallel Corpus Filtering (Koehn et al., 2018, 2019), we posed again the challenge of assigning sentence-level quality scores for very noisy corpora of sentence pairs crawled from the web, with the goal of sub-selecting the highest-quality data to be used to train ma-chine translation systems.

Translation

Classification-based Quality Estimation: Small and Efficient Models for Real-world Applications

no code implementations EMNLP 2021 Shuo Sun, Ahmed El-Kishky, Vishrav Chaudhary, James Cross, Francisco Guzmán, Lucia Specia

Sentence-level Quality estimation (QE) of machine translation is traditionally formulated as a regression task, and the performance of QE models is typically measured by Pearson correlation with human labels.

Machine Translation Model Compression +1

LAWDR: Language-Agnostic Weighted Document Representations from Pre-trained Models

no code implementations7 Jun 2021 Hongyu Gong, Vishrav Chaudhary, Yuqing Tang, Francisco Guzmán

Cross-lingual document representations enable language understanding in multilingual contexts and allow transfer learning from high-resource to low-resource languages at the document level.

Sentence Embeddings Transfer Learning

Improving Zero-Shot Translation by Disentangling Positional Information

1 code implementation ACL 2021 Danni Liu, Jan Niehues, James Cross, Francisco Guzmán, Xian Li

The difficulty of generalizing to new translation directions suggests the model representations are highly specific to those language pairs seen in training.

Machine Translation Translation

Unsupervised Quality Estimation for Neural Machine Translation

3 code implementations21 May 2020 Marina Fomicheva, Shuo Sun, Lisa Yankovskaya, Frédéric Blain, Francisco Guzmán, Mark Fishel, Nikolaos Aletras, Vishrav Chaudhary, Lucia Specia

Quality Estimation (QE) is an important component in making Machine Translation (MT) useful in real-world applications, as it is aimed to inform the user on the quality of the MT output at test time.

Machine Translation Translation

Machine Translation Evaluation Meets Community Question Answering

no code implementations ACL 2016 Francisco Guzmán, Lluís Màrquez, Preslav Nakov

We explore the applicability of machine translation evaluation (MTE) methods to a very different problem: answer ranking in community Question Answering.

Community Question Answering Machine Translation +1

Unsupervised Cross-lingual Representation Learning at Scale

24 code implementations ACL 2020 Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, Veselin Stoyanov

We also present a detailed empirical analysis of the key factors that are required to achieve these gains, including the trade-offs between (1) positive transfer and capacity dilution and (2) the performance of high and low resource languages at scale.

Cross-Lingual Transfer Language Modelling +2

WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia

6 code implementations EACL 2021 Holger Schwenk, Vishrav Chaudhary, Shuo Sun, Hongyu Gong, Francisco Guzmán

We present an approach based on multilingual sentence embeddings to automatically extract parallel sentences from the content of Wikipedia articles in 85 languages, including several dialects or low-resource languages.

Sentence Embeddings

Machine Translation Evaluation with Neural Networks

no code implementations5 Oct 2017 Francisco Guzmán, Shafiq R. Joty, Lluís Màrquez, Preslav Nakov

We present a framework for machine translation evaluation using neural networks in a pairwise setting, where the goal is to select the better translation from a pair of hypotheses, given the reference translation.

Machine Translation Translation

Discourse Structure in Machine Translation Evaluation

no code implementations CL 2017 Shafiq Joty, Francisco Guzmán, Lluís Màrquez, Preslav Nakov

In this article, we explore the potential of using sentence-level discourse structure for machine translation evaluation.

Machine Translation Translation

Cannot find the paper you are looking for? You can Submit a new open access paper.