Search Results for author: Brian Thompson

Found 28 papers, 13 papers with code

Fine-Tuned Machine Translation Metrics Struggle in Unseen Domains

1 code implementation28 Feb 2024 Vilém Zouhar, Shuoyang Ding, Anna Currey, Tatyana Badeka, Jenyuan Wang, Brian Thompson

We introduce a new, extensive multidimensional quality metrics (MQM) annotated dataset covering 11 language pairs in the biomedical domain.

Machine Translation Translation

A Shocking Amount of the Web is Machine Translated: Insights from Multi-Way Parallelism

1 code implementation11 Jan 2024 Brian Thompson, Mehak Preet Dhaliwal, Peter Frisch, Tobias Domhan, Marcello Federico

We show that content on the web is often translated into many languages, and the low quality of these multi-way translations indicates they were likely created using Machine Translation (MT).

Machine Translation Selection bias

End-to-End Single-Channel Speaker-Turn Aware Conversational Speech Translation

1 code implementation1 Nov 2023 Juan Zuluaga-Gomez, Zhaocheng Huang, Xing Niu, Rohit Paturi, Sundararajan Srinivasan, Prashant Mathur, Brian Thompson, Marcello Federico

Conventional speech-to-text translation (ST) systems are trained on single-speaker utterances, and they may not generalize to real-life scenarios where the audio contains conversations by multiple speakers.

Automatic Speech Recognition speech-recognition +3

Speaker Diarization of Scripted Audiovisual Content

no code implementations4 Aug 2023 Yogesh Virkar, Brian Thompson, Rohit Paturi, Sundararajan Srinivasan, Marcello Federico

The media localization industry usually requires a verbatim script of the final film or TV production in order to create subtitles or dubbing scripts in a foreign language.

speaker-diarization Speaker Diarization +2

Improving Isochronous Machine Translation with Target Factors and Auxiliary Counters

no code implementations22 May 2023 Proyag Pal, Brian Thompson, Yogesh Virkar, Prashant Mathur, Alexandra Chronopoulou, Marcello Federico

To translate speech for automatic dubbing, machine translation needs to be isochronous, i. e. translated speech needs to be aligned with the source in terms of speech durations.

Machine Translation Translation

Dubbing in Practice: A Large Scale Study of Human Localization With Insights for Automatic Dubbing

1 code implementation23 Dec 2022 William Brannon, Yogesh Virkar, Brian Thompson

We investigate how humans perform the task of dubbing video content from one language into another, leveraging a novel corpus of 319. 57 hours of video from 54 professionally produced titles.


Improving Robustness of Retrieval Augmented Translation via Shuffling of Suggestions

no code implementations11 Oct 2022 Cuong Hoang, Devendra Sachan, Prashant Mathur, Brian Thompson, Marcello Federico

Several recent studies have reported dramatic performance improvements in neural machine translation (NMT) by augmenting translation at inference time with fuzzy-matches retrieved from a translation memory (TM).

Machine Translation NMT +2

Improving Retrieval Augmented Neural Machine Translation by Controlling Source and Fuzzy-Match Interactions

no code implementations10 Oct 2022 Cuong Hoang, Devendra Sachan, Prashant Mathur, Brian Thompson, Marcello Federico

We explore zero-shot adaptation, where a general-domain model has access to customer or domain specific parallel data at inference time, but not during training.

Machine Translation Retrieval +2

Embarrassingly Easy Document-Level MT Metrics: How to Convert Any Pretrained Metric Into a Document-Level Metric

1 code implementation27 Sep 2022 Giorgos Vernikos, Brian Thompson, Prashant Mathur, Marcello Federico

Our experimental results support our initial hypothesis and show that a simple extension of the metrics permits them to take advantage of context to resolve ambiguities in the reference.

Machine Translation Sentence

Improving Arabic Diacritization by Learning to Diacritize and Translate

no code implementations IWSLT (ACL) 2022 Brian Thompson, Ali Alshehri

We propose a novel multitask learning method for diacritization which trains a model to both diacritize and translate.


Paraphrase Generation as Zero-Shot Multilingual Translation: Disentangling Semantic Similarity from Lexical and Syntactic Diversity

1 code implementation WMT (EMNLP) 2020 Brian Thompson, Matt Post

Recent work has shown that a multilingual neural machine translation (NMT) model can be used to judge how well a sentence paraphrases another sentence in the same language (Thompson and Post, 2020); however, attempting to generate paraphrases from such a model using standard beam search produces trivial copies or near copies.

Machine Translation NMT +5

Benchmarking Neural and Statistical Machine Translation on Low-Resource African Languages

no code implementations LREC 2020 Kevin Duh, Paul McNamee, Matt Post, Brian Thompson

In this study, we benchmark state of the art statistical and neural machine translation systems on two African languages which do not have large amounts of resources: Somali and Swahili.

Benchmarking Machine Translation +2

Automatic Machine Translation Evaluation in Many Languages via Zero-Shot Paraphrasing

1 code implementation EMNLP 2020 Brian Thompson, Matt Post

We frame the task of machine translation evaluation as one of scoring machine translation output with a sequence-to-sequence paraphraser, conditioned on a human reference.

Machine Translation NMT +1

Exploiting Sentence Order in Document Alignment

1 code implementation EMNLP 2020 Brian Thompson, Philipp Koehn

We present a simple document alignment method that incorporates sentence order information in both candidate generation and candidate re-scoring.


Simulated Multiple Reference Training Improves Low-Resource Machine Translation

1 code implementation EMNLP 2020 Huda Khayrallah, Brian Thompson, Matt Post, Philipp Koehn

Many valid translations exist for a given sentence, yet machine translation (MT) is trained with a single reference translation, exacerbating data sparsity in low-resource settings.

Machine Translation Sentence +2

Vecalign: Improved Sentence Alignment in Linear Time and Space

no code implementations IJCNLP 2019 Brian Thompson, Philipp Koehn

It substantially outperforms the popular Hunalign toolkit at recovering Bible verse alignments in medium- to low-resource language pairs, and it improves downstream MT quality by 1. 7 and 1. 6 BLEU in Sinhala-English and Nepali-English, respectively, compared to the Hunalign-based Paracrawl pipeline.

Machine Translation Sentence +2

The JHU Machine Translation Systems for WMT 2018

no code implementations WS 2018 Philipp Koehn, Kevin Duh, Brian Thompson

We report on the efforts of the Johns Hopkins University to develop neural machine translation systems for the shared task for news translation organized around the Conference for Machine Translation (WMT) 2018.

Machine Translation Translation

Freezing Subnetworks to Analyze Domain Adaptation in Neural Machine Translation

1 code implementation WS 2018 Brian Thompson, Huda Khayrallah, Antonios Anastasopoulos, Arya D. McCarthy, Kevin Duh, Rebecca Marvin, Paul McNamee, Jeremy Gwinnup, Tim Anderson, Philipp Koehn

To better understand the effectiveness of continued training, we analyze the major components of a neural machine translation system (the encoder, decoder, and each embedding space) and consider each component's contribution to, and capacity for, domain adaptation.

Domain Adaptation Machine Translation +1

Regularized Training Objective for Continued Training for Domain Adaptation in Neural Machine Translation

1 code implementation WS 2018 Huda Khayrallah, Brian Thompson, Kevin Duh, Philipp Koehn

Supervised domain adaptation{---}where a large generic corpus and a smaller in-domain corpus are both available for training{---}is a challenge for neural machine translation (NMT).

Domain Adaptation Machine Translation +2

Implicitly-Defined Neural Networks for Sequence Labeling

no code implementations ACL 2017 Michaeel Kazi, Brian Thompson

In this work, we propose a novel, implicitly-defined neural network architecture and describe a method to compute its components.

Part-Of-Speech Tagging

Cannot find the paper you are looking for? You can Submit a new open access paper.