Search Results for author: Francisco Guzmán

Found 29 papers, 13 papers with code

Putting words into the system’s mouth: A targeted attack on neural machine translation using monolingual data poisoning

1 code implementation • Findings (ACL) 2021 • Jun Wang, Chang Xu, Francisco Guzmán, Ahmed El-Kishky, Yuqing Tang, Benjamin Rubinstein, Trevor Cohn

Data Poisoning Machine Translation

Paper
Code

Findings of the WMT 2021 Shared Task on Large-Scale Multilingual Machine Translation

no code implementations • WMT (EMNLP) 2021 • Guillaume Wenzek, Vishrav Chaudhary, Angela Fan, Sahir Gomez, Naman Goyal, Somya Jain, Douwe Kiela, Tristan Thrush, Francisco Guzmán

There were a total of 10 participating teams for the tasks, with a total of 151 intermediate model submissions and 13 final models.

Machine Translation Translation

Paper
Add Code

Mitigating Data Poisoning in Text Classification with Differential Privacy

no code implementations • Findings (EMNLP) 2021 • Chang Xu, Jun Wang, Francisco Guzmán, Benjamin Rubinstein, Trevor Cohn

NLP models are vulnerable to data poisoning attacks.

Data Poisoning text-classification +1

Paper
Add Code

BERGAMOT-LATTE Submissions for the WMT20 Quality Estimation Shared Task

no code implementations • WMT (EMNLP) 2020 • Marina Fomicheva, Shuo Sun, Lisa Yankovskaya, Frédéric Blain, Vishrav Chaudhary, Mark Fishel, Francisco Guzmán, Lucia Specia

We explore (a) a black-box approach to QE based on pre-trained representations; and (b) glass-box approaches that leverage various indicators that can be extracted from the neural MT systems.

Sentence Task 2

Paper
Add Code

Findings of the WMT 2020 Shared Task on Quality Estimation

no code implementations • WMT (EMNLP) 2020 • Lucia Specia, Frédéric Blain, Marina Fomicheva, Erick Fonseca, Vishrav Chaudhary, Francisco Guzmán, André F. T. Martins

We report the results of the WMT20 shared task on Quality Estimation, where the challenge is to predict the quality of the output of neural machine translation systems at the word, sentence and document levels.

Machine Translation Sentence +1

Paper
Add Code

Findings of the WMT 2020 Shared Task on Parallel Corpus Filtering and Alignment

no code implementations • WMT (EMNLP) 2020 • Philipp Koehn, Vishrav Chaudhary, Ahmed El-Kishky, Naman Goyal, Peng-Jen Chen, Francisco Guzmán

Following two preceding WMT Shared Task on Parallel Corpus Filtering (Koehn et al., 2018, 2019), we posed again the challenge of assigning sentence-level quality scores for very noisy corpora of sentence pairs crawled from the web, with the goal of sub-selecting the highest-quality data to be used to train ma-chine translation systems.

Sentence Translation

Paper
Add Code

Findings of the WMT 2020 Shared Task on Machine Translation Robustness

no code implementations • WMT (EMNLP) 2020 • Lucia Specia, Zhenhao Li, Juan Pino, Vishrav Chaudhary, Francisco Guzmán, Graham Neubig, Nadir Durrani, Yonatan Belinkov, Philipp Koehn, Hassan Sajjad, Paul Michel, Xian Li

We report the findings of the second edition of the shared task on improving robustness in Machine Translation (MT).

Machine Translation Translation

Paper
Add Code

Seamless: Multilingual Expressive and Streaming Speech Translation

1 code implementation • 8 Dec 2023 • Seamless Communication, Loïc Barrault, Yu-An Chung, Mariano Coria Meglioli, David Dale, Ning Dong, Mark Duppenthaler, Paul-Ambroise Duquenne, Brian Ellis, Hady Elsahar, Justin Haaheim, John Hoffman, Min-Jae Hwang, Hirofumi Inaguma, Christopher Klaiber, Ilia Kulikov, Pengwei Li, Daniel Licht, Jean Maillard, Ruslan Mavlyutov, Alice Rakotoarison, Kaushik Ram Sadagopan, Abinesh Ramakrishnan, Tuan Tran, Guillaume Wenzek, Yilin Yang, Ethan Ye, Ivan Evtimov, Pierre Fernandez, Cynthia Gao, Prangthip Hansanti, Elahe Kalbassi, Amanda Kallet, Artyom Kozhevnikov, Gabriel Mejia Gonzalez, Robin San Roman, Christophe Touret, Corinne Wong, Carleigh Wood, Bokai Yu, Pierre Andrews, Can Balioglu, Peng-Jen Chen, Marta R. Costa-jussà, Maha Elbayad, Hongyu Gong, Francisco Guzmán, Kevin Heffernan, Somya Jain, Justine Kao, Ann Lee, Xutai Ma, Alex Mourachko, Benjamin Peloquin, Juan Pino, Sravya Popuri, Christophe Ropers, Safiyyah Saleem, Holger Schwenk, Anna Sun, Paden Tomasello, Changhan Wang, Jeff Wang, Skyler Wang, Mary Williamson

In this work, we introduce a family of models that enable end-to-end expressive and multilingual translations in a streaming fashion.

Multimodal Machine Translation Translation

10,177

Paper
Code

SeamlessM4T: Massively Multilingual & Multimodal Machine Translation

2 code implementations • 22 Aug 2023 • Seamless Communication, Loïc Barrault, Yu-An Chung, Mariano Cora Meglioli, David Dale, Ning Dong, Paul-Ambroise Duquenne, Hady Elsahar, Hongyu Gong, Kevin Heffernan, John Hoffman, Christopher Klaiber, Pengwei Li, Daniel Licht, Jean Maillard, Alice Rakotoarison, Kaushik Ram Sadagopan, Guillaume Wenzek, Ethan Ye, Bapi Akula, Peng-Jen Chen, Naji El Hachem, Brian Ellis, Gabriel Mejia Gonzalez, Justin Haaheim, Prangthip Hansanti, Russ Howes, Bernie Huang, Min-Jae Hwang, Hirofumi Inaguma, Somya Jain, Elahe Kalbassi, Amanda Kallet, Ilia Kulikov, Janice Lam, Daniel Li, Xutai Ma, Ruslan Mavlyutov, Benjamin Peloquin, Mohamed Ramadan, Abinesh Ramakrishnan, Anna Sun, Kevin Tran, Tuan Tran, Igor Tufanov, Vish Vogeti, Carleigh Wood, Yilin Yang, Bokai Yu, Pierre Andrews, Can Balioglu, Marta R. Costa-jussà, Onur Celebi, Maha Elbayad, Cynthia Gao, Francisco Guzmán, Justine Kao, Ann Lee, Alexandre Mourachko, Juan Pino, Sravya Popuri, Christophe Ropers, Safiyyah Saleem, Holger Schwenk, Paden Tomasello, Changhan Wang, Jeff Wang, Skyler Wang

What does it take to create the Babel Fish, a tool that can help individuals translate speech between any two languages?

Ranked #1 on Machine Translation on flores95-devtest eng-X

Automatic Speech Recognition Speech-to-Speech Translation +3

10,177

Paper
Code

No Language Left Behind: Scaling Human-Centered Machine Translation

7 code implementations • Meta AI 2022 • NLLB team, Marta R. Costa-jussà, James Cross, Onur Çelebi, Maha Elbayad, Kenneth Heafield, Kevin Heffernan, Elahe Kalbassi, Janice Lam, Daniel Licht, Jean Maillard, Anna Sun, Skyler Wang, Guillaume Wenzek, Al Youngblood, Bapi Akula, Loic Barrault, Gabriel Mejia Gonzalez, Prangthip Hansanti, John Hoffman, Semarley Jarrett, Kaushik Ram Sadagopan, Dirk Rowe, Shannon Spruit, Chau Tran, Pierre Andrews, Necip Fazil Ayan, Shruti Bhosale, Sergey Edunov, Angela Fan, Cynthia Gao, Vedanuj Goswami, Francisco Guzmán, Philipp Koehn, Alexandre Mourachko, Christophe Ropers, Safiyyah Saleem, Holger Schwenk, Jeff Wang

Driven by the goal of eradicating language barriers on a global scale, machine translation has solidified itself as a key focus of artificial intelligence research today.

Ranked #1 on Machine Translation on IWSLT2017 French-English (SacreBLEU metric)

Machine Translation Translation

29,237

Paper
Code

OCR Improves Machine Translation for Low-Resource Languages

no code implementations • Findings (ACL) 2022 • Oana Ignat, Jean Maillard, Vishrav Chaudhary, Francisco Guzmán

We aim to investigate the performance of current OCR systems on low resource languages and low resource scripts.

Machine Translation Optical Character Recognition (OCR) +1

Paper
Add Code

Classification-based Quality Estimation: Small and Efficient Models for Real-world Applications

no code implementations • EMNLP 2021 • Shuo Sun, Ahmed El-Kishky, Vishrav Chaudhary, James Cross, Francisco Guzmán, Lucia Specia

Sentence-level Quality estimation (QE) of machine translation is traditionally formulated as a regression task, and the performance of QE models is typically measured by Pearson correlation with human labels.

Machine Translation Model Compression +3

Paper
Add Code

LAWDR: Language-Agnostic Weighted Document Representations from Pre-trained Models

no code implementations • 7 Jun 2021 • Hongyu Gong, Vishrav Chaudhary, Yuqing Tang, Francisco Guzmán

Cross-lingual document representations enable language understanding in multilingual contexts and allow transfer learning from high-resource to low-resource languages at the document level.

Sentence Sentence Embeddings +1

Paper
Add Code

Adapting High-resource NMT Models to Translate Low-resource Related Languages without Parallel Data

1 code implementation • ACL 2021 • Wei-Jen Ko, Ahmed El-Kishky, Adithya Renduchintala, Vishrav Chaudhary, Naman Goyal, Francisco Guzmán, Pascale Fung, Philipp Koehn, Mona Diab

The scarcity of parallel data is a major obstacle for training high-quality machine translation systems for low-resource languages.

Denoising Machine Translation +2

Paper
Code

XLEnt: Mining a Large Cross-lingual Entity Dataset with Lexical-Semantic-Phonetic Word Alignment

no code implementations • EMNLP 2021 • Ahmed El-Kishky, Adithya Renduchintala, James Cross, Francisco Guzmán, Philipp Koehn

Cross-lingual named-entity lexica are an important resource to multilingual NLP tasks such as machine translation and cross-lingual wikification.

Machine Translation Multilingual NLP +2

Paper
Add Code

Quality Estimation without Human-labeled Data

no code implementations • EACL 2021 • Yi-Lin Tuan, Ahmed El-Kishky, Adithya Renduchintala, Vishrav Chaudhary, Francisco Guzmán, Lucia Specia

Quality estimation aims to measure the quality of translated content without access to a reference translation.

Machine Translation Sentence +1

Paper
Add Code

Improving Zero-Shot Translation by Disentangling Positional Information

1 code implementation • ACL 2021 • Danni Liu, Jan Niehues, James Cross, Francisco Guzmán, Xian Li

The difficulty of generalizing to new translation directions suggests the model representations are highly specific to those language pairs seen in training.

Machine Translation Translation

Paper
Code

MLQE-PE: A Multilingual Quality Estimation and Post-Editing Dataset

1 code implementation • LREC 2022 • Marina Fomicheva, Shuo Sun, Erick Fonseca, Chrysoula Zerva, Frédéric Blain, Vishrav Chaudhary, Francisco Guzmán, Nina Lopatina, Lucia Specia, André F. T. Martins

We present MLQE-PE, a new dataset for Machine Translation (MT) Quality Estimation (QE) and Automatic Post-Editing (APE).

Automatic Post-Editing Sentence +1

Paper
Code

TICO-19: the Translation Initiative for Covid-19

no code implementations • EMNLP (NLP-COVID19) 2020 • Antonios Anastasopoulos, Alessandro Cattelan, Zi-Yi Dou, Marcello Federico, Christian Federman, Dmitriy Genzel, Francisco Guzmán, Junjie Hu, Macduff Hughes, Philipp Koehn, Rosie Lazar, Will Lewis, Graham Neubig, Mengmeng Niu, Alp Öktem, Eric Paquin, Grace Tang, Sylwia Tur

Further, the team is converting the test and development data into translation memories (TMXs) that can be used by localizers from and to any of the languages.

Translation

Paper
Add Code

Unsupervised Quality Estimation for Neural Machine Translation

3 code implementations • 21 May 2020 • Marina Fomicheva, Shuo Sun, Lisa Yankovskaya, Frédéric Blain, Francisco Guzmán, Mark Fishel, Nikolaos Aletras, Vishrav Chaudhary, Lucia Specia

Quality Estimation (QE) is an important component in making Machine Translation (MT) useful in real-world applications, as it is aimed to inform the user on the quality of the MT output at test time.

Machine Translation Translation +1

29,233

Paper
Code

Massively Multilingual Document Alignment with Cross-lingual Sentence-Mover's Distance

1 code implementation • Asian Chapter of the Association for Computational Linguistics 2020 • Ahmed El-Kishky, Francisco Guzmán

Document alignment aims to identify pairs of documents in two distinct languages that are of comparable content or translations of each other.

Machine Translation Sentence +2

Paper
Code

Machine Translation Evaluation Meets Community Question Answering

no code implementations • ACL 2016 • Francisco Guzmán, Lluís Màrquez, Preslav Nakov

We explore the applicability of machine translation evaluation (MTE) methods to a very different problem: answer ranking in community Question Answering.

Community Question Answering Machine Translation +1

Paper
Add Code

Unsupervised Cross-lingual Representation Learning at Scale

27 code implementations • ACL 2020 • Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, Veselin Stoyanov

We also present a detailed empirical analysis of the key factors that are required to achieve these gains, including the trade-offs between (1) positive transfer and capacity dilution and (2) the performance of high and low resource languages at scale.

Cross-Lingual Transfer Multilingual NLP +2

124,889

Paper
Code

CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data

3 code implementations • LREC 2020 • Guillaume Wenzek, Marie-Anne Lachaux, Alexis Conneau, Vishrav Chaudhary, Francisco Guzmán, Armand Joulin, Edouard Grave

Pre-training text representations have led to significant improvements in many areas of natural language processing.

Vocal Bursts Intensity Prediction

894

Paper
Code

WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia

6 code implementations • EACL 2021 • Holger Schwenk, Vishrav Chaudhary, Shuo Sun, Hongyu Gong, Francisco Guzmán

We present an approach based on multilingual sentence embeddings to automatically extract parallel sentences from the content of Wikipedia articles in 85 languages, including several dialects or low-resource languages.

Sentence Sentence Embeddings

3,520

Paper
Code

Low-Resource Corpus Filtering using Multilingual Sentence Embeddings

no code implementations • WS 2019 • Vishrav Chaudhary, Yuqing Tang, Francisco Guzmán, Holger Schwenk, Philipp Koehn

In this paper, we describe our submission to the WMT19 low-resource parallel corpus filtering shared task.

Sentence Sentence Embeddings

Paper
Add Code

The FLoRes Evaluation Datasets for Low-Resource Machine Translation: Nepali-English and Sinhala-English

2 code implementations • 4 Feb 2019 • Francisco Guzmán, Peng-Jen Chen, Myle Ott, Juan Pino, Guillaume Lample, Philipp Koehn, Vishrav Chaudhary, Marc'Aurelio Ranzato

For machine translation, a vast majority of language pairs in the world are considered low-resource because they have little parallel data available.

Machine Translation Translation

657

Paper
Code

Machine Translation Evaluation with Neural Networks

no code implementations • 5 Oct 2017 • Francisco Guzmán, Shafiq R. Joty, Lluís Màrquez, Preslav Nakov

We present a framework for machine translation evaluation using neural networks in a pairwise setting, where the goal is to select the better translation from a pair of hypotheses, given the reference translation.

Machine Translation Sentence +1

Paper
Add Code

Discourse Structure in Machine Translation Evaluation

no code implementations • CL 2017 • Shafiq Joty, Francisco Guzmán, Lluís Màrquez, Preslav Nakov

In this article, we explore the potential of using sentence-level discourse structure for machine translation evaluation.

Machine Translation Sentence +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.