no code implementations • EACL (BEA) 2021 • Simon Flachs, Felix Stahlberg, Shankar Kumar
We investigate how best to take advantage of existing data sources for improving GEC systems for languages with limited quantities of high quality training data.
no code implementations • 20 Oct 2023 • Arya D. McCarthy, Hao Zhang, Shankar Kumar, Felix Stahlberg, Ke wu
One challenge in speech translation is that plenty of spoken content is long-form, but short units are necessary for obtaining high-quality translations.
no code implementations • 22 Aug 2023 • Yun Zhu, Yinxiao Liu, Felix Stahlberg, Shankar Kumar, Yu-Hui Chen, Liangchen Luo, Lei Shu, Renjie Liu, Jindong Chen, Lei Meng
Large Language Models (LLMs) have demonstrated impressive capabilities for text rewriting.
no code implementations • 19 Dec 2022 • Arya D. McCarthy, Hao Zhang, Shankar Kumar, Felix Stahlberg, Axel H. Ng
A challenge in spoken language translation is that plenty of spoken content is long-form, but short units are necessary for obtaining high-quality translations.
no code implementations • 8 Nov 2022 • Felix Stahlberg, Aashish Kumar, Chris Alberti, Shankar Kumar
We report on novel investigations into training models that make sentences concise.
no code implementations • NAACL (ACL) 2022 • Eric Malmi, Yue Dong, Jonathan Mallinson, Aleksandr Chuklin, Jakub Adamek, Daniil Mirylenka, Felix Stahlberg, Sebastian Krause, Shankar Kumar, Aliaksei Severyn
Text-editing models have recently become a prominent alternative to seq2seq models for monolingual text-generation tasks such as grammatical error correction, simplification, and style transfer.
no code implementations • NAACL 2022 • Felix Stahlberg, Shankar Kumar
The softmax layer in neural machine translation is designed to model the distribution over mutually exclusive tokens.
no code implementations • ACL 2022 • Felix Stahlberg, Ilia Kulikov, Shankar Kumar
In many natural language processing (NLP) tasks the same input (e. g. source sentence) can have multiple possible outputs (e. g. translations).
no code implementations • 1 Feb 2022 • Jae Hun Ro, Felix Stahlberg, Ke wu, Shankar Kumar
Text normalization, or the process of transforming text into a consistent, canonical form, is crucial for speech applications such as text-to-speech synthesis (TTS).
1 code implementation • EACL (BEA) 2021 • Felix Stahlberg, Shankar Kumar
Synthetic data generation is widely known to boost the accuracy of neural grammatical error correction (GEC) systems, but existing methods often lack diversity or are too simplistic to generate the broad range of grammatical errors made by human writers.
1 code implementation • EMNLP 2020 • Felix Stahlberg, Shankar Kumar
For text normalization, sentence fusion, and grammatical error correction, our approach improves explainability by associating each edit operation with a human-readable tag.
no code implementations • ACL 2020 • Danielle Saunders, Felix Stahlberg, Bill Byrne
We find that each of these lines of research has a clear space in it for the other, and propose merging them with a scheme that allows a document-level evaluation metric to be used in the NMT training objective.
2 code implementations • 4 Dec 2019 • Felix Stahlberg
The field of machine translation (MT), the automatic translation of written text from one natural language into another, has experienced a major paradigm shift in recent years.
no code implementations • IJCNLP 2019 • Felix Stahlberg, Bill Byrne
We report on search errors and model errors in neural machine translation (NMT).
no code implementations • WS 2019 • Felix Stahlberg, Danielle Saunders, Adri{\`a} de Gispert, Bill Byrne
Two techniques provide the fabric of the Cambridge University Engineering Department{'}s (CUED) entry to the WMT19 evaluation campaign: elastic weight consolidation (EWC) and different forms of language modelling (LMs).
no code implementations • WS 2019 • Zheng Yuan, Felix Stahlberg, Marek Rei, Bill Byrne, Helen Yannakoudakis
In this paper, we describe our submission to the BEA 2019 shared task on grammatical error correction.
no code implementations • WS 2019 • Felix Stahlberg, Bill Byrne
We describe two entries from the Cambridge University Engineering Department to the BEA 2019 Shared Task on grammatical error correction.
no code implementations • WS 2019 • Danielle Saunders, Felix Stahlberg, Bill Byrne
The 2019 WMT Biomedical translation task involved translating Medline abstracts.
no code implementations • 11 Jun 2019 • Felix Stahlberg, Danielle Saunders, Adria de Gispert, Bill Byrne
Two techniques provide the fabric of the Cambridge University Engineering Department's (CUED) entry to the WMT19 evaluation campaign: elastic weight consolidation (EWC) and different forms of language modelling (LMs).
no code implementations • ACL 2019 • Danielle Saunders, Felix Stahlberg, Adria de Gispert, Bill Byrne
We investigate adaptive ensemble weighting for Neural Machine Translation, addressing the case of improving performance on a new and potentially unknown domain without sacrificing performance on the original domain.
no code implementations • CL 2019 • Hao Zhang, Richard Sproat, Axel H. Ng, Felix Stahlberg, Xiaochang Peng, Kyle Gorman, Brian Roark
One problem that has been somewhat resistant to effective machine learning solutions is text normalization for speech applications such as text-to-speech synthesis (TTS).
no code implementations • NAACL 2019 • Felix Stahlberg, Christopher Bryant, Bill Byrne
Language model based GEC (LM-GEC) is a promising alternative which does not rely on annotated training data.
1 code implementation • WS 2018 • Felix Stahlberg, James Cross, Veselin Stoyanov
Neural Machine Translation (NMT) typically leverages monolingual data in training through backtranslation.
1 code implementation • WS 2018 • Felix Stahlberg, Danielle Saunders, Bill Byrne
We propose to achieve explainable neural machine translation (NMT) by changing the output representation to explain itself.
no code implementations • WS 2018 • Felix Stahlberg, Adria de Gispert, Bill Byrne
The University of Cambridge submission to the WMT18 news translation task focuses on the combination of diverse models of translation.
no code implementations • ACL 2018 • Danielle Saunders, Felix Stahlberg, Adria de Gispert, Bill Byrne
We explore strategies for incorporating target syntax into Neural Machine Translation.
no code implementations • WS 2018 • Felix Stahlberg, Danielle Saunders, Gonzalo Iglesias, Bill Byrne
SGNMT is a decoding platform for machine translation which allows paring various modern neural models of translation with different kinds of constraints and symbolic models.
1 code implementation • WS 2017 • Eva Hasler, Felix Stahlberg, Marcus Tomalin, Adri`a de Gispert, Bill Byrne
We compare several language models for the word-ordering task and propose a new bag-to-sequence neural model based on attention-based sequence-to-sequence models.
1 code implementation • EMNLP 2017 • Felix Stahlberg, Eva Hasler, Danielle Saunders, Bill Byrne
This paper introduces SGNMT, our experimental platform for machine translation research.
no code implementations • EMNLP 2017 • Felix Stahlberg, Bill Byrne
Ensembling is a well-known technique in neural machine translation (NMT) to improve system performance.
no code implementations • EACL 2017 • Felix Stahlberg, Adrià De Gispert, Eva Hasler, Bill Byrne
This makes our approach much more flexible than $n$-best list or lattice rescoring as the neural decoder is not restricted to the SMT search space.
no code implementations • WS 2016 • Felix Stahlberg, Eva Hasler, Bill Byrne
This paper presents the University of Cambridge submission to WMT16.
no code implementations • ACL 2016 • Felix Stahlberg, Eva Hasler, Aurelien Waite, Bill Byrne
We investigate the use of hierarchical phrase-based SMT lattices in end-to-end neural machine translation (NMT).