Search Results for author: Brian Thompson

Found 28 papers, 13 papers with code

The MITLL-AFRL IWSLT 2016 Systems

no code implementations • IWSLT 2016 • Michaeel Kazi, Elizabeth Salesky, Brian Thompson, Jonathan Taylor, Jeremy Gwinnup, Timothy Anderson, Grant Erdmann, Eric Hansen, Brian Ore, Katherine Young, Michael Hutt

This report summarizes the MITLL-AFRL MT and ASR systems and the experiments run during the 2016 IWSLT evaluation campaign.

Machine Translation Translation

Paper
Add Code

Fine-Tuned Machine Translation Metrics Struggle in Unseen Domains

1 code implementation • 28 Feb 2024 • Vilém Zouhar, Shuoyang Ding, Anna Currey, Tatyana Badeka, Jenyuan Wang, Brian Thompson

We introduce a new, extensive multidimensional quality metrics (MQM) annotated dataset covering 11 language pairs in the biomedical domain.

Machine Translation Translation

Paper
Code

A Shocking Amount of the Web is Machine Translated: Insights from Multi-Way Parallelism

1 code implementation • 11 Jan 2024 • Brian Thompson, Mehak Preet Dhaliwal, Peter Frisch, Tobias Domhan, Marcello Federico

We show that content on the web is often translated into many languages, and the low quality of these multi-way translations indicates they were likely created using Machine Translation (MT).

Machine Translation Selection bias

Paper
Code

End-to-End Single-Channel Speaker-Turn Aware Conversational Speech Translation

1 code implementation • 1 Nov 2023 • Juan Zuluaga-Gomez, Zhaocheng Huang, Xing Niu, Rohit Paturi, Sundararajan Srinivasan, Prashant Mathur, Brian Thompson, Marcello Federico

Conventional speech-to-text translation (ST) systems are trained on single-speaker utterances, and they may not generalize to real-life scenarios where the audio contains conversations by multiple speakers.

Automatic Speech Recognition speech-recognition +3

Paper
Code

Speaker Diarization of Scripted Audiovisual Content

no code implementations • 4 Aug 2023 • Yogesh Virkar, Brian Thompson, Rohit Paturi, Sundararajan Srinivasan, Marcello Federico

The media localization industry usually requires a verbatim script of the final film or TV production in order to create subtitles or dubbing scripts in a foreign language.

speaker-diarization Speaker Diarization +2

Paper
Add Code

Improving Isochronous Machine Translation with Target Factors and Auxiliary Counters

no code implementations • 22 May 2023 • Proyag Pal, Brian Thompson, Yogesh Virkar, Prashant Mathur, Alexandra Chronopoulou, Marcello Federico

To translate speech for automatic dubbing, machine translation needs to be isochronous, i. e. translated speech needs to be aligned with the source in terms of speech durations.

Machine Translation Translation

Paper
Add Code

Jointly Optimizing Translations and Speech Timing to Improve Isochrony in Automatic Dubbing

1 code implementation • 25 Feb 2023 • Alexandra Chronopoulou, Brian Thompson, Prashant Mathur, Yogesh Virkar, Surafel M. Lakew, Marcello Federico

Automatic dubbing (AD) is the task of translating the original speech in a video into target language speech.

Translation

Paper
Code

Dubbing in Practice: A Large Scale Study of Human Localization With Insights for Automatic Dubbing

1 code implementation • 23 Dec 2022 • William Brannon, Yogesh Virkar, Brian Thompson

We investigate how humans perform the task of dubbing video content from one language into another, leveraging a novel corpus of 319. 57 hours of video from 54 professionally produced titles.

Translation

Paper
Code

Improving Robustness of Retrieval Augmented Translation via Shuffling of Suggestions

no code implementations • 11 Oct 2022 • Cuong Hoang, Devendra Sachan, Prashant Mathur, Brian Thompson, Marcello Federico

Several recent studies have reported dramatic performance improvements in neural machine translation (NMT) by augmenting translation at inference time with fuzzy-matches retrieved from a translation memory (TM).

Machine Translation NMT +2

Paper
Add Code

Improving Retrieval Augmented Neural Machine Translation by Controlling Source and Fuzzy-Match Interactions

no code implementations • 10 Oct 2022 • Cuong Hoang, Devendra Sachan, Prashant Mathur, Brian Thompson, Marcello Federico

We explore zero-shot adaptation, where a general-domain model has access to customer or domain specific parallel data at inference time, but not during training.

Machine Translation Retrieval +2

Paper
Add Code

Embarrassingly Easy Document-Level MT Metrics: How to Convert Any Pretrained Metric Into a Document-Level Metric

1 code implementation • 27 Sep 2022 • Giorgos Vernikos, Brian Thompson, Prashant Mathur, Marcello Federico

Our experimental results support our initial hypothesis and show that a simple extension of the metrics permits them to take advantage of context to resolve ambiguities in the reference.

Machine Translation Sentence

Paper
Code

Improving Arabic Diacritization by Learning to Diacritize and Translate

no code implementations • IWSLT (ACL) 2022 • Brian Thompson, Ali Alshehri

We propose a novel multitask learning method for diacritization which trains a model to both diacritize and translate.

Translation

Paper
Add Code

Paraphrase Generation as Zero-Shot Multilingual Translation: Disentangling Semantic Similarity from Lexical and Syntactic Diversity

1 code implementation • WMT (EMNLP) 2020 • Brian Thompson, Matt Post

Recent work has shown that a multilingual neural machine translation (NMT) model can be used to judge how well a sentence paraphrases another sentence in the same language (Thompson and Post, 2020); however, attempting to generate paraphrases from such a model using standard beam search produces trivial copies or near copies.

Machine Translation NMT +5

Paper
Code

ParaCrawl: Web-Scale Acquisition of Parallel Corpora

2 code implementations • ACL 2020 • Marta Ba{\~n}{\'o}n, Pin-zhen Chen, Barry Haddow, Kenneth Heafield, Hieu Hoang, Miquel Espl{\`a}-Gomis, Mikel L. Forcada, Amir Kamran, Faheem Kirefu, Philipp Koehn, Sergio Ortiz Rojas, Leopoldo Pla Sempere, Gema Ram{\'\i}rez-S{\'a}nchez, Elsa Sarr{\'\i}as, Marek Strelec, Brian Thompson, William Waites, Dion Wiggins, Jaume Zaragoza

We report on methods to create the largest publicly available parallel corpora by crawling the web, using open source software.

Machine Translation Parallel Corpus Mining +2

1,170

Paper
Code

Benchmarking Neural and Statistical Machine Translation on Low-Resource African Languages

no code implementations • LREC 2020 • Kevin Duh, Paul McNamee, Matt Post, Brian Thompson

In this study, we benchmark state of the art statistical and neural machine translation systems on two African languages which do not have large amounts of resources: Somali and Swahili.

Benchmarking Machine Translation +2

Paper
Add Code

Automatic Machine Translation Evaluation in Many Languages via Zero-Shot Paraphrasing

1 code implementation • EMNLP 2020 • Brian Thompson, Matt Post

We frame the task of machine translation evaluation as one of scoring machine translation output with a sequence-to-sequence paraphraser, conditioned on a human reference.

Machine Translation NMT +1

Paper
Code

Simulated Multiple Reference Training Improves Low-Resource Machine Translation

1 code implementation • EMNLP 2020 • Huda Khayrallah, Brian Thompson, Matt Post, Philipp Koehn

Many valid translations exist for a given sentence, yet machine translation (MT) is trained with a single reference translation, exacerbating data sparsity in low-resource settings.

Machine Translation Sentence +2

Paper
Code

Exploiting Sentence Order in Document Alignment

1 code implementation • EMNLP 2020 • Brian Thompson, Philipp Koehn

We present a simple document alignment method that incorporates sentence order information in both candidate generation and candidate re-scoring.

Sentence

148

Paper
Code

Vecalign: Improved Sentence Alignment in Linear Time and Space

no code implementations • IJCNLP 2019 • Brian Thompson, Philipp Koehn

It substantially outperforms the popular Hunalign toolkit at recovering Bible verse alignments in medium- to low-resource language pairs, and it improves downstream MT quality by 1. 7 and 1. 6 BLEU in Sinhala-English and Nepali-English, respectively, compared to the Hunalign-based Paracrawl pipeline.

Machine Translation Sentence +2

Paper
Add Code

HABLex: Human Annotated Bilingual Lexicons for Experiments in Machine Translation

no code implementations • IJCNLP 2019 • Brian Thompson, Rebecca Knowles, Xuan Zhang, Huda Khayrallah, Kevin Duh, Philipp Koehn

Bilingual lexicons are valuable resources used by professional human translators.

Machine Translation Translation

Paper
Add Code

Overcoming Catastrophic Forgetting During Domain Adaptation of Neural Machine Translation

no code implementations • NAACL 2019 • Brian Thompson, Jeremy Gwinnup, Huda Khayrallah, Kevin Duh, Philipp Koehn

Continued training is an effective method for domain adaptation in neural machine translation.

BIG-bench Machine Learning Domain Adaptation +2

Paper
Add Code

The JHU Machine Translation Systems for WMT 2018

no code implementations • WS 2018 • Philipp Koehn, Kevin Duh, Brian Thompson

We report on the efforts of the Johns Hopkins University to develop neural machine translation systems for the shared task for news translation organized around the Conference for Machine Translation (WMT) 2018.

Machine Translation Translation

Paper
Add Code

Freezing Subnetworks to Analyze Domain Adaptation in Neural Machine Translation

1 code implementation • WS 2018 • Brian Thompson, Huda Khayrallah, Antonios Anastasopoulos, Arya D. McCarthy, Kevin Duh, Rebecca Marvin, Paul McNamee, Jeremy Gwinnup, Tim Anderson, Philipp Koehn

To better understand the effectiveness of continued training, we analyze the major components of a neural machine translation system (the encoder, decoder, and each embedding space) and consider each component's contribution to, and capacity for, domain adaptation.

Domain Adaptation Machine Translation +1

1,206

Paper
Code

Regularized Training Objective for Continued Training for Domain Adaptation in Neural Machine Translation

1 code implementation • WS 2018 • Huda Khayrallah, Brian Thompson, Kevin Duh, Philipp Koehn

Supervised domain adaptation{---}where a large generic corpus and a smaller in-domain corpus are both available for training{---}is a challenge for neural machine translation (NMT).

Domain Adaptation Machine Translation +2