Livonian is one of the most endangered languages in Europe with just a tiny handful of speakers and virtually no publicly available corpora.
The paper describes the development process of the The University of Tokyo’s NMT systems that were submitted to the WAT 2020 Document-level Business Scene Dialogue Translation sub-task.
In this paper, we describe adaptation of a simple word guessing game that occupied the hearts and minds of people around the world.
One of the most popular methods for context-aware machine translation (MT) is to use separate encoders for the source sentence and context as multiple sources for one target sentence.
We analysed sentiment and frequencies related to smell, taste and temperature expressed by food tweets in the Latvian language.
Sentence-level (SL) machine translation (MT) has reached acceptable quality for many high-resourced languages, but not document-level (DL) MT, which is difficult to 1) train with little amount of DL data; and 2) evaluate, as the main methods and data sets focus on SL evaluation.
While the progress of machine translation of written text has come far in the past several years thanks to the increasing availability of parallel corpora and corpora-based training technologies, automatic translation of spoken text and dialogues remains challenging even for modern systems.
Ranked #1 on Machine Translation on Business Scene Dialogue JA-EN (using extra training data)
We present the Latvian Twitter Eater Corpus - a set of tweets in the narrow domain related to food, drinks, eating and drinking.
Ranked #1 on Sentiment Analysis on Latvian Twitter Eater Sentiment Dataset (using extra training data)
Large parallel corpora that are automatically obtained from the web, documents or elsewhere often exhibit many corrupted parts that are bound to negatively affect the quality of the systems and models that learn from these corpora.
Ranked #1 on Machine Translation on WMT 2018 English-Finnish
In this paper, we describe a tool for debugging the output and attention weights of neural machine translation (NMT) systems and for improved estimations of confidence about the output based on the attention.
Ranked #4 on Machine Translation on WMT 2017 Latvian-English
Attention distributions of the generated translations are a useful bi-product of attention-based recurrent neural network translation models and can be treated as soft alignments between the input and output tokens.
Ranked #3 on Machine Translation on WMT 2017 Latvian-English