1 code implementation • EMNLP 2021 • Denis Emelin, Rico Sennrich
We use this resource to investigate whether neural machine translation (NMT) models can perform CoR that requires commonsense knowledge and whether multilingual language models (MLLMs) are capable of CSR across multiple languages.
no code implementations • IWSLT 2017 • Pawel Przybysz, Marcin Chochowski, Rico Sennrich, Barry Haddow, Alexandra Birch
This paper describes the joint submission of Samsung Research and Development, Warsaw, Poland and the University of Edinburgh team to the IWSLT MT task for TED talks.
no code implementations • EMNLP (IWSLT) 2019 • Joanna Wetesko, Marcin Chochowski, Pawel Przybysz, Philip Williams, Roman Grundkiewicz, Rico Sennrich, Barry Haddow, None Barone, Valerio Miceli, Alexandra Birch
This paper describes the joint submission to the IWSLT 2019 English to Czech task by Samsung RD Institute, Poland, and the University of Edinburgh.
no code implementations • ACL (IWSLT) 2021 • Biao Zhang, Rico Sennrich
This paper describes Edinburgh’s submissions to the IWSLT2021 multilingual speech translation (ST) task.
1 code implementation • ACL ARR May 2021 • Jannis Vamvas, Rico Sennrich
Lexical disambiguation is a major challenge for machine translation systems, especially if some senses of a word are trained less often than others.
1 code implementation • NoDaLiDa 2021 • Chaojun Wang, Christian Hardmeier, Rico Sennrich
They also highlight blind spots in automatic methods for targeted evaluation and demonstrate the need for human assessment to evaluate document-level translation quality reliably.
no code implementations • EMNLP 2020 • Jonathan Mallinson, Rico Sennrich, Mirella Lapata
Sentence simplification aims to make sentences easier to read and understand.
no code implementations • EAMT 2020 • Ondřej Bojar, Dominik Macháček, Sangeet Sagar, Otakar Smrž, Jonáš Kratochvíl, Ebrahim Ansari, Dario Franceschini, Chiara Canton, Ivan Simonini, Thai-Son Nguyen, Felix Schneider, Sebastian Stücker, Alex Waibel, Barry Haddow, Rico Sennrich, Philip Williams
ELITR (European Live Translator) project aims to create a speech translation system for simultaneous subtitling of conferences and online meetings targetting up to 43 languages.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • IWSLT (EMNLP) 2018 • Philip Williams, Marcin Chochowski, Pawel Przybysz, Rico Sennrich, Barry Haddow, Alexandra Birch
This paper describes the joint submission to the IWSLT 2018 Low Resource MT task by Samsung R&D Institute, Poland, and the University of Edinburgh.
no code implementations • EAMT 2016 • Valia Kordoni, Lexi Birch, Ioana Buliga, Kostadin Cholakov, Markus Egg, Federico Gaspari, Yota Georgakopolou, Maria Gialama, Iris Hendrickx, Mitja Jermol, Katia Kermanidis, Joss Moorkens, Davor Orlic, Michael Papadopoulos, Maja Popović, Rico Sennrich, Vilelmini Sosoni, Dimitrios Tsoumakos, Antal Van den Bosch, Menno van Zaanen, Andy Way
1 code implementation • 3 Jul 2024 • Guojun Wu, Shay B. Cohen, Rico Sennrich
We introduce a dataset comprising commercial machine translations, gathered weekly over six years across 12 translation directions.
1 code implementation • 1 Jul 2024 • Zifan Jiang, Gerard Sant, Amit Moryossef, Mathias Müller, Rico Sennrich, Sarah Ebling
We present SignCLIP, which re-purposes CLIP (Contrastive Language-Image Pretraining) to project spoken language text and sign language videos, two classes of natural languages of distinct modalities, into the same space.
no code implementations • 30 Mar 2024 • Marco Cognetta, Tatsuya Hiraoka, Naoaki Okazaki, Rico Sennrich, Yuval Pinter
We explore threshold vocabulary trimming in Byte-Pair Encoding subword tokenization, a postprocessing step that replaces rare subwords with their component subwords.
2 code implementations • 6 Feb 2024 • Jannis Vamvas, Rico Sennrich
Minimum Bayes Risk (MBR) decoding is a text generation technique that has been shown to improve the quality of machine translations, but is expensive, even if a sampling-based approximation is used.
1 code implementation • 29 Jan 2024 • Nikita Moghe, Arnisa Fazla, Chantal Amrhein, Tom Kocmi, Mark Steedman, Alexandra Birch, Rico Sennrich, Liane Guillou
We benchmark metric performance, assess their incremental performance over successive campaigns, and measure their sensitivity to a range of linguistic phenomena.
1 code implementation • 25 Jan 2024 • Jannis Vamvas, Noëmi Aepli, Rico Sennrich
Creating neural text encoders for written Swiss German is challenging due to a dearth of training data combined with dialectal variation.
1 code implementation • 12 Jan 2024 • Michelle Wastl, Jannis Vamvas, Rico Sennrich
Detecting the translation direction of parallel text has applications for machine translation training and evaluation, but also has forensic applications such as resolving plagiarism or forgery allegations.
1 code implementation • 20 Dec 2023 • Tannon Kew, Florian Schottmann, Rico Sennrich
In experiments across four LLMs, we find that multilingual instruction tuning with as few as two to three languages is both necessary and sufficient to elicit effective cross-lingual generalisation, with the limiting factor being the degree to which a target language is seen during pretraining.
1 code implementation • 1 Dec 2023 • Jannis Vamvas, Tobias Domhan, Sony Trenous, Rico Sennrich, Eva Hasler
Neural metrics trained on human evaluations of MT tend to correlate well with human judgments, but their behavior is not fully understood.
1 code implementation • 28 Nov 2023 • Noëmi Aepli, Chantal Amrhein, Florian Schottmann, Rico Sennrich
For sensible progress in natural language processing, it is important that we are aware of the limitations of the evaluation metrics we use.
1 code implementation • 13 Nov 2023 • Alireza Mohammadshahi, Jannis Vamvas, Rico Sennrich
Massively multilingual machine translation models allow for the translation of a large number of languages with a single model, but have limited performance on low- and very-low-resource translation directions.
1 code implementation • 13 Sep 2023 • Rico Sennrich, Jannis Vamvas, Alireza Mohammadshahi
Experiments on the massively multilingual models M2M-100 (418M) and SMaLL-100 show that these methods suppress hallucinations and off-target translations, reducing the number of translations with segment-level chrF2 below 10 by 67-83% on average, and the number of translations with oscillatory hallucinations by 75-92% on average, across 57 tested translation directions.
no code implementations • 28 Jul 2023 • Joris Baan, Nico Daheim, Evgenia Ilia, Dennis Ulmer, Haau-Sing Li, Raquel Fernández, Barbara Plank, Rico Sennrich, Chrysoula Zerva, Wilker Aziz
Recent advances of powerful Language Models have allowed Natural Language Generation (NLG) to emerge as an important technology that can not only perform traditional tasks like summarisation or translation, but also serve as a natural language interface to a variety of applications.
1 code implementation • 22 May 2023 • Jannis Vamvas, Rico Sennrich
Automatically highlighting words that cause semantic differences between two documents could be useful for a wide range of applications.
1 code implementation • 18 May 2023 • Chantal Amrhein, Florian Schottmann, Rico Sennrich, Samuel Läubli
We hypothesise that creating training data in the reverse direction, i. e. starting from gender-fair text, is easier for morphologically complex languages and show that it matches the performance of state-of-the-art rewriting models for English.
no code implementations • 15 May 2023 • Simone Tedeschi, Johan Bos, Thierry Declerck, Jan Hajic, Daniel Hershcovich, Eduard H. Hovy, Alexander Koller, Simon Krek, Steven Schockaert, Rico Sennrich, Ekaterina Shutova, Roberto Navigli
In the last five years, there has been a significant focus in Natural Language Processing (NLP) on developing larger Pretrained Language Models (PLMs) and introducing benchmarks such as SuperGLUE and SQuAD to measure their abilities in language understanding, reasoning, and reading comprehension.
1 code implementation • International Conference on Learning Representations (ICLR) 2023 • Biao Zhang, Mathias Müller, Rico Sennrich
We propose SLTUNET, a simple unified neural model designed to support multiple SLTrelated tasks jointly, such as sign-to-gloss, gloss-to-text and sign-to-text translation.
1 code implementation • 23 Mar 2023 • Jannis Vamvas, Johannes Graën, Rico Sennrich
We present SwissBERT, a masked language model created specifically for processing Switzerland-related text.
1 code implementation • 21 Feb 2023 • Biao Zhang, Barry Haddow, Rico Sennrich
For end-to-end speech translation, regularizing the encoder with the Connectionist Temporal Classification (CTC) objective using the source transcript or target translation as labels can greatly improve quality metrics.
1 code implementation • 7 Sep 2022 • Farhad Nooralahzadeh, Rico Sennrich
While several benefits were realized for multilingual vision-language pretrained models, recent benchmarks across various tasks and languages showed poor cross-lingual generalisation when multilingually pre-trained vision-language models are applied to non-English data, with a large gap between (supervised) English performance and (zero-shot) cross-lingual transfer.
1 code implementation • 24 Jul 2022 • Jason Armitage, Leonardo Impett, Rico Sennrich
In a busy city street, a pedestrian surrounded by distractions can pick out a single sign if it is relevant to their route.
1 code implementation • 9 Jun 2022 • Biao Zhang, Barry Haddow, Rico Sennrich
Finally, we discuss neural acoustic feature modeling, where a neural model is designed to extract acoustic features from raw speech signals directly, with the goal to simplify inductive biases and add freedom to the model in describing speech.
2 code implementations • 31 May 2022 • Genta Indra Winata, Alham Fikri Aji, Samuel Cahyawijaya, Rahmad Mahendra, Fajri Koto, Ade Romadhony, Kemal Kurniawan, David Moeljadi, Radityo Eko Prasojo, Pascale Fung, Timothy Baldwin, Jey Han Lau, Rico Sennrich, Sebastian Ruder
In this work, we focus on developing resources for languages in Indonesia.
1 code implementation • 28 Apr 2022 • Jannis Vamvas, Rico Sennrich
Being able to rank the similarity of short text segments is an interesting bonus feature of neural machine translation.
1 code implementation • ACL 2022 • Jannis Vamvas, Rico Sennrich
Omission and addition of content is a typical issue in neural machine translation.
1 code implementation • 10 Feb 2022 • Chantal Amrhein, Rico Sennrich
Neural metrics have achieved impressive correlation with human judgements in the evaluation of machine translation systems, but before we can safely optimise towards such metrics, we should be aware of (and ideally eliminate) biases toward bad translations that receive high scores.
no code implementations • RepL4NLP (ACL) 2022 • Antonio Valerio Miceli-Barone, Alexandra Birch, Rico Sennrich
Neural machine learning models can successfully model language that is similar to their training distribution, but they are highly susceptible to degradation under distribution shift, which occurs in many practical applications when processing out-of-domain (OOD) text.
1 code implementation • EMNLP (BlackboxNLP) 2021 • Jannis Vamvas, Rico Sennrich
Minimal sentence pairs are frequently used to analyze the behavior of language models.
no code implementations • Findings (ACL) 2022 • Noëmi Aepli, Rico Sennrich
Cross-lingual transfer between a high-resource language and its dialects or closely related language varieties should be facilitated by their similarity.
1 code implementation • EMNLP 2021 • Jiaoda Li, Duygu Ataman, Rico Sennrich
Multimodal machine translation (MMT) systems have been shown to outperform their text-only neural machine translation (NMT) counterparts when visual context is available.
no code implementations • EMNLP 2021 • Elena Voita, Rico Sennrich, Ivan Titov
Differently from the traditional statistical MT that decomposes the translation task into distinct separately learned components, neural machine translation uses a single neural network to model the entire translation process.
1 code implementation • Findings (EMNLP) 2021 • Chantal Amrhein, Rico Sennrich
Data-driven subword segmentation has become the default strategy for open-vocabulary machine translation and other NLP tasks, but may not be sufficiently generic for optimal learning of non-concatenative morphology.
1 code implementation • ACL 2021 • Biao Zhang, Ivan Titov, Barry Haddow, Rico Sennrich
Document-level contextual information has shown benefits to text-based machine translation, but whether and how context helps end-to-end (E2E) speech translation (ST) is still under-studied.
1 code implementation • 26 Jul 2021 • Gongbo Tang, Philipp Rönchen, Rico Sennrich, Joakim Nivre
In this paper, we evaluate the translation of negation both automatically and manually, in English--German (EN--DE) and English--Chinese (EN--ZH).
1 code implementation • ACL 2021 • Mathias Müller, Rico Sennrich
Neural Machine Translation (NMT) currently exhibits biases such as producing translations that are too short and overgenerating frequent words, and shows poor robustness to copy noise in training data or domain shift.
3 code implementations • EMNLP 2021 • Biao Zhang, Ivan Titov, Rico Sennrich
Recently, it has been argued that encoder-decoder models can be made more interpretable by replacing the softmax function in the attention with its sparse variants.
1 code implementation • NAACL 2021 • Annette Rios, Chantal Amrhein, Noëmi Aepli, Rico Sennrich
Many sequence-to-sequence tasks in natural language processing are roughly monotonic in the alignment between source and target sequence, and previous work has facilitated or enforced learning of monotonic attention behavior via specialized attention functions or pretraining.
no code implementations • EACL 2021 • Ond{\v{r}}ej Bojar, Dominik Mach{\'a}{\v{c}}ek, Sangeet Sagar, Otakar Smr{\v{z}}, Jon{\'a}{\v{s}} Kratochv{\'\i}l, Peter Pol{\'a}k, Ebrahim Ansari, Mohammad Mahmoudi, Rishu Kumar, Dario Franceschini, Chiara Canton, Ivan Simonini, Thai-Son Nguyen, Felix Schneider, Sebastian St{\"u}ker, Alex Waibel, Barry Haddow, Rico Sennrich, Philip Williams
This paper presents an automatic speech translation system aimed at live subtitling of conference presentations.
2 code implementations • NAACL 2022 • Yuchen Eleanor Jiang, Tianyu Liu, Shuming Ma, Dongdong Zhang, Jian Yang, Haoyang Huang, Rico Sennrich, Ryan Cotterell, Mrinmaya Sachan, Ming Zhou
Standard automatic metrics, e. g. BLEU, are not reliable for document-level MT evaluation.
no code implementations • ICLR 2021 • Biao Zhang, Ankur Bapna, Rico Sennrich, Orhan Firat
Our study further verifies the trade-off between the shared capacity and LS capacity for multilingual translation.
no code implementations • 11 Nov 2020 • Samuel Läubli, Patrick Simianer, Joern Wuebker, Geza Kovacs, Rico Sennrich, Spence Green
Widely used computer-aided translation (CAT) tools divide documents into segments such as sentences and arrange them in a side-by-side, spreadsheet-like view.
no code implementations • COLING 2020 • Gongbo Tang, Rico Sennrich, Joakim Nivre
The attention distribution pattern shows that separators attract a lot of attention and we explore a sparse word-level attention to enforce character hidden states to capture the full word-level information.
1 code implementation • WMT (EMNLP) 2020 • Annette Rios, Mathias Müller, Rico Sennrich
A recent trend in multilingual models is to not train on parallel data between all language pairs, but have a single bridge language, e. g. English.
1 code implementation • EMNLP 2020 • Denis Emelin, Ivan Titov, Rico Sennrich
Word sense disambiguation is a well-known source of translation errors in NMT.
1 code implementation • WMT (EMNLP) 2020 • Biao Zhang, Ivan Titov, Rico Sennrich
Instead of assuming independence between neighbouring tokens (semi-autoregressive decoding, SA), we take inspiration from bidirectional sequence generation and introduce a decoder that generates target words from the left-to-right and right-to-left directions simultaneously.
1 code implementation • ACL 2021 • Elena Voita, Rico Sennrich, Ivan Titov
We find that models trained with more data tend to rely on source information more and to have more sharp token contributions; the training process is non-monotonic with several stages of different nature.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Biao Zhang, Ivan Titov, Barry Haddow, Rico Sennrich
Information in speech signals is not evenly distributed, making it an additional challenge for end-to-end (E2E) speech translation (ST) to learn to focus on informative features.
no code implementations • Findings of the Association for Computational Linguistics 2020 • Chantal Amrhein, Rico Sennrich
Our results show that romanization entails information loss and is thus not always superior to simpler vocabulary transfer methods, but can improve the transfer between related languages with different scripts.
no code implementations • ACL 2020 • Alham Fikri Aji, Nikolay Bogoychev, Kenneth Heafield, Rico Sennrich
Transfer learning improves quality for low-resource machine translation, but it is unclear what exactly it transfers.
1 code implementation • ACL 2020 • Chaojun Wang, Rico Sennrich
In experiments on three datasets with multiple test domains, we show that exposure bias is partially to blame for hallucinations, and that training with Minimum Risk Training, which avoids exposure bias, can mitigate this.
no code implementations • LREC 2020 • Dario Franceschini, Chiara Canton, Ivan Simonini, Armin Schweinfurth, Adelheid Glott, Sebastian St{\"u}ker, Thai-Son Nguyen, Felix Schneider, Thanh-Le Ha, Alex Waibel, Barry Haddow, Philip Williams, Rico Sennrich, Ond{\v{r}}ej Bojar, Sangeet Sagar, Dominik Mach{\'a}{\v{c}}ek, Otakar Smr{\v{z}}
This paper presents our progress towards deploying a versatile communication platform in the task of highly multilingual live speech translation for conferences and remote meetings live subtitling.
3 code implementations • ACL 2020 • Biao Zhang, Philip Williams, Ivan Titov, Rico Sennrich
Massively multilingual models for neural machine translation (NMT) are theoretically attractive, but often underperform bilingual models and deliver poor zero-shot translations.
1 code implementation • Findings (ACL) 2021 • Biao Zhang, Ivan Titov, Rico Sennrich
Inspired by these observations, we explore the feasibility of specifying rule-based patterns that mask out encoder outputs based on information such as part-of-speech tags, word frequency and word position.
1 code implementation • 3 Apr 2020 • Samuel Läubli, Sheila Castilho, Graham Neubig, Rico Sennrich, Qinlan Shen, Antonio Toral
The quality of machine translation has increased remarkably over the past years, to the degree that it was found to be indistinguishable from professional human translation in a number of empirical investigations.
1 code implementation • 18 Mar 2020 • Jannis Vamvas, Rico Sennrich
Unlike stance detection models that have specific target issues, we use the dataset to train a single model on all the issues.
2 code implementations • AMTA 2020 • Mathias Müller, Annette Rios, Rico Sennrich
Domain robustness---the generalization of models to unseen test domains---is low for both statistical (SMT) and neural machine translation (NMT).
no code implementations • 6 Nov 2019 • Nikolay Bogoychev, Rico Sennrich
The quality of neural machine translation can be improved by leveraging additional monolingual resources to create synthetic training data.
4 code implementations • NeurIPS 2019 • Biao Zhang, Rico Sennrich
RMSNorm regularizes the summed inputs to a neuron in one layer according to root mean square (RMS), giving the model re-scaling invariance property and implicit learning rate adaptation ability.
1 code implementation • IJCNLP 2019 • Elena Voita, Rico Sennrich, Ivan Titov
For training, the DocRepair model requires only monolingual document-level data in the target language.
no code implementations • IJCNLP 2019 • Elena Voita, Rico Sennrich, Ivan Titov
In this work, we use canonical correlation analysis and mutual information estimators to study how information flows across Transformer layers and how this process depends on the choice of learning objective.
no code implementations • IJCNLP 2019 • Gongbo Tang, Rico Sennrich, Joakim Nivre
We find that encoder hidden states outperform word embeddings significantly which indicates that encoders adequately encode relevant information for disambiguation into hidden states.
1 code implementation • IJCNLP 2019 • Biao Zhang, Ivan Titov, Rico Sennrich
The general trend in NLP is towards increasing model capacity and performance via deeper neural networks.
no code implementations • RANLP 2019 • Gongbo Tang, Rico Sennrich, Joakim Nivre
In this paper, we try to understand neural machine translation (NMT) via simplifying NMT architectures and training encoder-free NMT models.
1 code implementation • WS 2019 • Denis Emelin, Ivan Titov, Rico Sennrich
The transformer is a state-of-the-art neural translation model that uses attention to iteratively refine lexical representations with information drawn from the surrounding context.
1 code implementation • ACL 2019 • Biao Zhang, Rico Sennrich
We apply LRN as a drop-in replacement of existing recurrent units in several neural sequential models.
2 code implementations • ACL 2019 • Rico Sennrich, Biao Zhang
It has been shown that the performance of neural machine translation (NMT) drops starkly in low-resource conditions, underperforming phrase-based statistical machine translation (PBSMT) and requiring large amounts of auxiliary data to achieve competitive results.
Low Resource Neural Machine Translation Low-Resource Neural Machine Translation +3
1 code implementation • ACL 2019 • Elena Voita, David Talbot, Fedor Moiseev, Rico Sennrich, Ivan Titov
Multi-head self-attention is a key component of the Transformer, a state-of-the-art architecture for neural machine translation.
1 code implementation • ACL 2019 • Elena Voita, Rico Sennrich, Ivan Titov
Though machine translation errors caused by the lack of context beyond one sentence have long been acknowledged, the development of context-aware NMT systems is hampered by several problems.
no code implementations • WS 2018 • Gongbo Tang, Rico Sennrich, Joakim Nivre
Recent work has shown that the encoder-decoder attention mechanisms in neural machine translation (NMT) are different from the word alignment in statistical machine translation.
1 code implementation • WS 2018 • Mathias Müller, Annette Rios, Elena Voita, Rico Sennrich
We show that, while gains in BLEU are moderate for those systems, they outperform baselines by a large margin in terms of accuracy on our contrastive test set.
1 code implementation • EMNLP 2018 • Jonathan Mallinson, Rico Sennrich, Mirella Lapata
In this paper we advocate the use of bilingual corpora which are abundantly available for training sentence compression models.
no code implementations • WS 2018 • Annette Rios, Mathias M{\"u}ller, Rico Sennrich
We evaluate all German{--}English submissions to the WMT{'}18 shared translation task, plus a number of submissions from previous years, and find that performance on the task has markedly improved compared to the 2016 WMT submissions (81{\%}→93{\%} accuracy on the WSD task).
no code implementations • WS 2018 • Barry Haddow, Nikolay Bogoychev, Denis Emelin, Ulrich Germann, Roman Grundkiewicz, Kenneth Heafield, Antonio Valerio Miceli Barone, Rico Sennrich
The University of Edinburgh made submissions to all 14 language pairs in the news translation task, with strong performances in most pairs.
1 code implementation • EMNLP 2018 • Gongbo Tang, Mathias Müller, Annette Rios, Rico Sennrich
Recently, non-recurrent architectures (convolutional, self-attentional) have outperformed RNNs in neural machine translation.
1 code implementation • EMNLP 2018 • Samuel Läubli, Rico Sennrich, Martin Volk
Recent research suggests that neural machine translation achieves parity with professional human translation on the WMT Chinese--English news translation task.
no code implementations • ACL 2018 • Elena Voita, Pavel Serdyukov, Rico Sennrich, Ivan Titov
Standard machine translation systems process sentences in isolation and hence ignore extra-sentential information, even though extended context can both prevent mistakes in ambiguous cases and improve translation coherence.
no code implementations • LREC 2018 • Maximiliana Behnke, Antonio Valerio Miceli Barone, Rico Sennrich, Vilelmini Sosoni, Thanasis Naskos, Eirini Takoulidou, Maria Stasimioti, Menno van Zaanen, Sheila Castilho, Federico Gaspari, Panayota Georgakopoulou, Valia Kordoni, Markus Egg, Katia Lida Kermanidis
no code implementations • LREC 2018 • Yutong Shao, Rico Sennrich, Bonnie Webber, Federico Fancellu
Our evaluation confirms that a sizable number of idioms in our test set are mistranslated (46. 1%), that literal translation error is a common error type, and that our blacklist method is effective at identifying literal translation errors.
no code implementations • NAACL 2018 • Rachel Bawden, Rico Sennrich, Alexandra Birch, Barry Haddow
Despite gains using BLEU, multi-encoder models give limited improvement in the handling of discourse phenomena: 50% accuracy on our coreference test set and 53. 5% for coherence/cohesion (compared to a non-contextual baseline of 50%).
no code implementations • WS 2017 • Rico Sennrich, Alexandra Birch, Anna Currey, Ulrich Germann, Barry Haddow, Kenneth Heafield, Antonio Valerio Miceli Barone, Philip Williams
This paper describes the University of Edinburgh's submissions to the WMT17 shared news translation and biomedical translation tasks.
no code implementations • EMNLP 2017 • Antonio Valerio Miceli Barone, Barry Haddow, Ulrich Germann, Rico Sennrich
We investigate techniques for supervised domain adaptation for neural machine translation where an existing model trained on a large out-of-domain dataset is adapted to a small in-domain dataset.
no code implementations • EMNLP 2017 • Spandana Gella, Rico Sennrich, Frank Keller, Mirella Lapata
In this paper we propose a model to learn multimodal multilingual representations for matching images and sentences in different languages, with the aim of advancing multilingual versions of image search and image understanding.
3 code implementations • WS 2017 • Antonio Valerio Miceli Barone, Jindřich Helcl, Rico Sennrich, Barry Haddow, Alexandra Birch
It has been shown that increasing model depth improves the quality of neural machine translation.
6 code implementations • IJCNLP 2017 • Antonio Valerio Miceli Barone, Rico Sennrich
Automated documentation of programming source code and automated code generation from natural language are challenging tasks of both practical and scientific interest.
no code implementations • EACL 2017 • Rico Sennrich, Barry Haddow
Neural Machine Translation (NMT) has achieved new breakthroughs in machine translation in recent years.
no code implementations • EACL 2017 • Jonathan Mallinson, Rico Sennrich, Mirella Lapata
Recognizing and generating paraphrases is an important component in many natural language processing applications.
no code implementations • EACL 2017 • Renars Liepins, Ulrich Germann, Guntis Barzdins, Alex Birch, ra, Steve Renals, Susanne Weber, Peggy van der Kreeft, Herv{\'e} Bourlard, Jo{\~a}o Prieto, Ond{\v{r}}ej Klejch, Peter Bell, Alex Lazaridis, ros, Alfonso Mendes, Sebastian Riedel, Mariana S. C. Almeida, Pedro Balage, Shay B. Cohen, Tomasz Dwojak, Philip N. Garner, Andreas Giefer, Marcin Junczys-Dowmunt, Hina Imran, David Nogueira, Ahmed Ali, Mir, Sebasti{\~a}o a, Andrei Popescu-Belis, Lesly Miculicich Werlen, Nikos Papasarantopoulos, Abiola Obamuyide, Clive Jones, Fahim Dalvi, Andreas Vlachos, Yang Wang, Sibo Tong, Rico Sennrich, Nikolaos Pappas, Shashi Narayan, Marco Damonte, Nadir Durrani, Sameer Khurana, Ahmed Abdelali, Hassan Sajjad, Stephan Vogel, David Sheppey, Chris Hernon, Jeff Mitchell
We present the first prototype of the SUMMA Platform: an integrated platform for multilingual media monitoring.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +5
4 code implementations • EACL 2017 • Rico Sennrich, Orhan Firat, Kyunghyun Cho, Alexandra Birch, Barry Haddow, Julian Hitschler, Marcin Junczys-Dowmunt, Samuel Läubli, Antonio Valerio Miceli Barone, Jozef Mokry, Maria Nădejde
We present Nematus, a toolkit for Neural Machine Translation.
no code implementations • WS 2017 • Maria Nadejde, Siva Reddy, Rico Sennrich, Tomasz Dwojak, Marcin Junczys-Dowmunt, Philipp Koehn, Alexandra Birch
Our results on WMT data show that explicitly modeling target-syntax improves machine translation quality for German->English, a high-resource pair, and for Romanian->English, a low-resource pair and also several syntactic phenomena including prepositional phrase attachment.
1 code implementation • EACL 2017 • Rico Sennrich
Analysing translation quality in regards to specific linguistic phenomena has historically been difficult and time-consuming.
no code implementations • WS 2016 • Jan-Thorsten Peter, Tamer Alkhouli, Hermann Ney, Matthias Huck, Fabienne Braune, Alex Fraser, er, Ale{\v{s}} Tamchyna, Ond{\v{r}}ej Bojar, Barry Haddow, Rico Sennrich, Fr{\'e}d{\'e}ric Blain, Lucia Specia, Jan Niehues, Alex Waibel, Alex Allauzen, re, Lauriane Aufrant, Franck Burlot, Elena Knyazeva, Thomas Lavergne, Fran{\c{c}}ois Yvon, M{\=a}rcis Pinnis, Stella Frank
Ranked #12 on Machine Translation on WMT2016 English-Romanian
1 code implementation • WS 2016 • Rico Sennrich, Barry Haddow
Neural machine translation has recently achieved impressive results, while using little in the way of external linguistic information.
Ranked #3 on Machine Translation on WMT2016 English-German
1 code implementation • WS 2016 • Rico Sennrich, Barry Haddow, Alexandra Birch
We participated in the WMT 2016 shared news translation task by building neural translation systems for four language pairs, each trained in both directions: English<->Czech, English<->German, English<->Romanian and English<->Russian.
Ranked #1 on Machine Translation on WMT2016 Czech-English
1 code implementation • WS 2016 • Marcin Junczys-Dowmunt, Tomasz Dwojak, Rico Sennrich
For the Russian-English task, our submission achieves the top BLEU result, outperforming the best pure neural system by 1. 1 BLEU points and our own phrase-based baseline by 1. 6 BLEU.
2 code implementations • ACL 2016 • Rico Sennrich, Barry Haddow, Alexandra Birch
Neural Machine Translation (NMT) has obtained state-of-the art performance for several language pairs, while only using parallel data for training.
26 code implementations • ACL 2016 • Rico Sennrich, Barry Haddow, Alexandra Birch
Neural machine translation (NMT) models typically operate with a fixed vocabulary, but translation is an open-vocabulary problem.
Ranked #1 on Machine Translation on WMT2015 English-Russian
no code implementations • TACL 2015 • Rico Sennrich
The role of language models in SMT is to promote fluent translation output, but traditional n-gram language models are unable to capture fluency phenomena between distant words, such as some morphological agreement phenomena, subcategorisation, and syntactic collocations with string-level gaps.
no code implementations • LREC 2014 • Rico Sennrich, Beat Kunz
We describe a method to automatically extract a German lexicon from Wiktionary that is compatible with the finite-state morphological grammar SMOR.