Search Results for author: Huda Khayrallah

Found 25 papers, 9 papers with code

On the Evaluation of Machine Translation n-best Lists

no code implementations • EMNLP (Eval4NLP) 2020 • Jacob Bremerman, Huda Khayrallah, Douglas Oard, Matt Post

The first and principal contribution is an evaluation measure that characterizes the translation quality of an entire n-best list by asking whether many of the valid translations are placed near the top of the list.

Machine Translation Translation +1

Paper
Add Code

Translation of Unknown Words in Low Resource Languages

no code implementations • AMTA 2016 • Biman Gujral, Huda Khayrallah, Philipp Koehn

Translation

Paper
Add Code

On-the-Fly Fusion of Large Language Models and Machine Translation

no code implementations • 14 Nov 2023 • Hieu Hoang, Huda Khayrallah, Marcin Junczys-Dowmunt

We propose the on-the-fly ensembling of a machine translation model with an LLM, prompted on the same task and input.

In-Context Learning Machine Translation +2

Paper
Add Code

SOTASTREAM: A Streaming Approach to Machine Translation Training

1 code implementation • 14 Aug 2023 • Matt Post, Thamme Gowda, Roman Grundkiewicz, Huda Khayrallah, Rohit Jain, Marcin Junczys-Dowmunt

Many machine translation toolkits make use of a data preparation step wherein raw data is transformed into a tensor format that can be used directly by the trainer.

Machine Translation Management +2

Paper
Code

How to Choose How to Choose Your Chatbot: A Massively Multi-System MultiReference Data Set for Dialog Metric Evaluation

no code implementations • 23 May 2023 • Huda Khayrallah, Zuhaib Akhtar, Edward Cohen, João Sedoc

We release MMSMR, a Massively Multi-System MultiReference dataset to enable future work on metrics and evaluation for dialog.

Chatbot Dialogue Evaluation

Paper
Add Code

Doubly-Trained Adversarial Data Augmentation for Neural Machine Translation

1 code implementation • AMTA 2022 • Weiting Tan, Shuoyang Ding, Huda Khayrallah, Philipp Koehn

Neural Machine Translation (NMT) models are known to suffer from noisy inputs.

Data Augmentation Machine Translation +4

Paper
Code

SMRT Chatbots: Improving Non-Task-Oriented Dialog with Simulated Multiple Reference Training

no code implementations • Findings of the Association for Computational Linguistics 2020 • Huda Khayrallah, João Sedoc

Non-task-oriented dialog models suffer from poor quality and non-diverse responses.

Paper
Add Code

Measuring the `I don't know' Problem through the Lens of Gricean Quantity

no code implementations • NAACL 2021 • Huda Khayrallah, João Sedoc

We consider the intrinsic evaluation of neural generative dialog models through the lens of Grice's Maxims of Conversation (1975).

Paper
Add Code

The JHU Submission to the 2020 Duolingo Shared Task on Simultaneous Translation and Paraphrase for Language Education

no code implementations • WS 2020 • Huda Khayrallah, Jacob Bremerman, Arya D. McCarthy, Kenton Murray, Winston Wu, Matt Post

This paper presents the Johns Hopkins University submission to the 2020 Duolingo Shared Task on Simultaneous Translation and Paraphrase for Language Education (STAPLE).

Machine Translation Translation

Paper
Add Code

Simulated Multiple Reference Training Improves Low-Resource Machine Translation

1 code implementation • EMNLP 2020 • Huda Khayrallah, Brian Thompson, Matt Post, Philipp Koehn

Many valid translations exist for a given sentence, yet machine translation (MT) is trained with a single reference translation, exacerbating data sparsity in low-resource settings.

Machine Translation Sentence +2

Paper
Code

HABLex: Human Annotated Bilingual Lexicons for Experiments in Machine Translation

no code implementations • IJCNLP 2019 • Brian Thompson, Rebecca Knowles, Xuan Zhang, Huda Khayrallah, Kevin Duh, Philipp Koehn

Bilingual lexicons are valuable resources used by professional human translators.

Machine Translation Translation

Paper
Add Code

Improved Lexically Constrained Decoding for Translation and Monolingual Rewriting

1 code implementation • NAACL 2019 • J. Edward Hu, Huda Khayrallah, Ryan Culkin, Patrick Xia, Tongfei Chen, Matt Post, Benjamin Van Durme

Lexically-constrained sequence decoding allows for explicit positive or negative phrase-based constraints to be placed on target output strings in generation tasks such as machine translation or monolingual text rewriting.

Data Augmentation Machine Translation +3

1,206

Paper
Code

Overcoming Catastrophic Forgetting During Domain Adaptation of Neural Machine Translation

no code implementations • NAACL 2019 • Brian Thompson, Jeremy Gwinnup, Huda Khayrallah, Kevin Duh, Philipp Koehn

Continued training is an effective method for domain adaptation in neural machine translation.

BIG-bench Machine Learning Domain Adaptation +2

Paper
Add Code

An Empirical Exploration of Curriculum Learning for Neural Machine Translation

1 code implementation • 2 Nov 2018 • Xuan Zhang, Gaurav Kumar, Huda Khayrallah, Kenton Murray, Jeremy Gwinnup, Marianna J. Martindale, Paul McNamee, Kevin Duh, Marine Carpuat

Machine translation systems based on deep neural networks are expensive to train.

Machine Translation Translation

1,206

Paper
Code

The JHU Parallel Corpus Filtering Systems for WMT 2018

no code implementations • WS 2018 • Huda Khayrallah, Hainan Xu, Philipp Koehn

This work describes our submission to the WMT18 Parallel Corpus Filtering shared task.

Language Modelling Machine Translation +2

Paper
Add Code

Findings of the WMT 2018 Shared Task on Parallel Corpus Filtering

no code implementations • WS 2018 • Philipp Koehn, Huda Khayrallah, Kenneth Heafield, Mikel L. Forcada

We posed the shared task of assigning sentence-level quality scores for a very noisy corpus of sentence pairs crawled from the web, with the goal of sub-selecting 1{\%} and 10{\%} of high-quality data to be used to train machine translation systems.

Machine Translation Outlier Detection +2

Paper
Add Code

Freezing Subnetworks to Analyze Domain Adaptation in Neural Machine Translation

1 code implementation • WS 2018 • Brian Thompson, Huda Khayrallah, Antonios Anastasopoulos, Arya D. McCarthy, Kevin Duh, Rebecca Marvin, Paul McNamee, Jeremy Gwinnup, Tim Anderson, Philipp Koehn

To better understand the effectiveness of continued training, we analyze the major components of a neural machine translation system (the encoder, decoder, and each embedding space) and consider each component's contribution to, and capacity for, domain adaptation.

Domain Adaptation Machine Translation +1

1,206

Paper
Code

Regularized Training Objective for Continued Training for Domain Adaptation in Neural Machine Translation

1 code implementation • WS 2018 • Huda Khayrallah, Brian Thompson, Kevin Duh, Philipp Koehn

Supervised domain adaptation{---}where a large generic corpus and a smaller in-domain corpus are both available for training{---}is a challenge for neural machine translation (NMT).

Domain Adaptation Machine Translation +2

Paper
Code

On the Impact of Various Types of Noise on Neural Machine Translation

1 code implementation • WS 2018 • Huda Khayrallah, Philipp Koehn

We examine how various types of noise in the parallel training data impact the quality of neural machine translation systems.

Machine Translation Sentence +1

Paper
Code

Improving Low Resource Machine Translation using Morphological Glosses (Non-archival Extended Abstract)

no code implementations • WS 2018 • Steven Shearing, Christo Kirov, Huda Khayrallah, David Yarowsky

Data Augmentation Machine Translation +1

Paper
Add Code

Neural Lattice Search for Domain Adaptation in Machine Translation

no code implementations • IJCNLP 2017 • Huda Khayrallah, Gaurav Kumar, Kevin Duh, Matt Post, Philipp Koehn

Domain adaptation is a major challenge for neural machine translation (NMT).

Domain Adaptation Machine Translation +2

Paper
Add Code

The JHU Machine Translation Systems for WMT 2017

no code implementations • WS 2017 • Shuoyang Ding, Huda Khayrallah, Philipp Koehn, Matt Post, Gaurav Kumar, Kevin Duh

Language Modelling Machine Translation +1

Paper
Add Code

Paradigm Completion for Derivational Morphology

no code implementations • EMNLP 2017 • Ryan Cotterell, Ekaterina Vylomova, Huda Khayrallah, Christo Kirov, David Yarowsky

The generation of complex derived word forms has been an overlooked problem in NLP; we fill this gap by applying neural sequence-to-sequence models to the task.

Paper
Add Code

Deep Generalized Canonical Correlation Analysis

3 code implementations • WS 2019 • Adrian Benton, Huda Khayrallah, Biman Gujral, Dee Ann Reisinger, Sheng Zhang, Raman Arora

We present Deep Generalized Canonical Correlation Analysis (DGCCA) -- a method for learning nonlinear transformations of arbitrarily many views of data, such that the resulting transformations are maximally informative of each other.

Representation Learning Stochastic Optimization

176

Paper
Code

The JHU Machine Translation Systems for WMT 2016

no code implementations • WS 2016 • Shuoyang Ding, Kevin Duh, Huda Khayrallah, Philipp Koehn, Matt Post

Language Modelling Machine Translation +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.