We describe parBLEU, parCHRF++, and parESIM, which augment baseline metrics with automatically generated paraphrases produced by PRISM (Thompson and Post, 2020a), a multilingual neural machine translation system.
In this work, we study the effect of varying the architecture and training data quality on the data scaling properties of Neural Machine Translation (NMT).
Natural language understanding and generation models follow one of the two dominant architectural paradigms: language models (LMs) that process concatenated sequences in a single stack of layers, and encoder-decoder models (EncDec) that utilize separate layer stacks for input and output processing.
Using simple concatenation-based DocNMT, we explore the effect of 3 factors on the transfer: the number of teacher languages with document level data, the balance between document and sentence level data at training, and the data condition of parallel documents (genuine vs. backtranslated).
Document-level contextual information has shown benefits to text-based machine translation, but whether and how context helps end-to-end (E2E) speech translation (ST) is still under-studied.
Due to the great potential in facilitating software development, code generation has attracted increasing attention recently.
Recently, it has been argued that encoder-decoder models can be made more interpretable by replacing the softmax function in the attention with its sparse variants.
Our study further verifies the trade-off between the shared capacity and LS capacity for multilingual translation.
Instead of assuming independence between neighbouring tokens (semi-autoregressive decoding, SA), we take inspiration from bidirectional sequence generation and introduce a decoder that generates target words from the left-to-right and right-to-left directions simultaneously.
We combine training data generating networks with bi-level optimization algorithms to obtain a complete framework for which all components can be jointly trained.
Information in speech signals is not evenly distributed, making it an additional challenge for end-to-end (E2E) speech translation (ST) to learn to focus on informative features.
We investigate a long-perceived shortcoming in the typical use of BLEU: its reliance on a single reference.
Inspired by these observations, we explore the feasibility of specifying rule-based patterns that mask out encoder outputs based on information such as part-of-speech tags, word frequency and word position.
Massively multilingual models for neural machine translation (NMT) are theoretically attractive, but often underperform bilingual models and deliver poor zero-shot translations.
It has been shown that the performance of neural machine translation (NMT) drops starkly in low-resource conditions, underperforming phrase-based statistical machine translation (PBSMT) and requiring large amounts of auxiliary data to achieve competitive results.
Experiments on WMT14 translation tasks demonstrate that ATR-based neural machine translation can yield competitive performance on English- German and English-French language pairs in terms of both translation quality and speed.
Although neural machine translation(NMT) yields promising translation performance, it unfortunately suffers from over- and under-translation is- sues [Tu et al., 2016], of which studies have become research hotspots in NMT.
To alleviate this issue, we propose an average attention network as an alternative to the self-attention network in the decoder of the neural Transformer.
This paper describes our system used for the end-to-end (E2E) natural language generation (NLG) challenge.
Ranked #7 on Data-to-Text Generation on E2E NLG Challenge
Partially inspired by successful applications of variational recurrent neural networks, we propose a novel variational recurrent neural machine translation (VRNMT) model in this paper.
In this paper, we propose a novel GRU-gated attention model (GAtt) for NMT which enhances the degree of discrimination of context vectors by enabling source representations to be sensitive to the partial translation generated by the decoder.
After that, we fully incorporate information of different linguistic units into a bilinear semantic similarity model.
Parallel sentence representations are important for bilingual and cross-lingual tasks in natural language processing.
The vanilla sequence-to-sequence learning (seq2seq) reads and encodes a source sequence into a fixed-length vector only once, suffering from its insufficiency in modeling structural correspondence between the source and target sequence.
In this paper, we propose a bidimensional attention based recursive autoencoder (BattRAE) to integrate clues and sourcetarget interactions at multiple levels of granularity into bilingual phrase representations.
Models of neural machine translation are often from a discriminative family of encoderdecoders that learn a conditional distribution of a target sentence given a source sentence.
Inspired by this, we propose a neural recognizer for implicit discourse relation analysis, which builds upon a semantic memory that stores knowledge in a distributed fashion.
In order to perform efficient inference and learning, we introduce neural discourse relation models to approximate the prior and posterior distributions of the latent variable, and employ these approximated distributions to optimize a reparameterized variational lower bound.
As compared to the conventional RGB or gray-scale images, multispectral images (MSI) can deliver more faithful representation for real scenes, and enhance the performance of many computer vision tasks.