This paper describes the Volctrans' submission to the WMT21 news translation shared task for German->English translation.
However, we argue that there are gaps between the predictor and the estimator in both data quality and training objectives, which preclude QE models from benefiting from a large number of parallel corpora more directly.
Thus REDER enables reversible machine translation by simply flipping the input and output ends.
In this paper, we find an exciting relation between an information-theoretic feature and the performance of NLP tasks such as machine translation with a given vocabulary.
Based on the properties of RPD, we study the relations of word embeddings of different algorithms systematically and investigate the influence of different training processes and corpora.
Training neural machine translation models (NMT) requires a large amount of parallel corpus, which is scarce for many language pairs.
Document-level machine translation manages to outperform sentence level models by a small margin, but have failed to be widely adopted.
Intuitively, NLI should rely more on multiple perspectives to form a holistic view to eliminate bias.
However, one critical problem is that current approaches only get high accuracy for questions whose relations have been seen in the training data.
Previous studies have shown that neural machine translation (NMT) models can benefit from explicitly modeling translated (Past) and untranslated (Future) to groups of translated and untranslated contents through parts-to-wholes assignment.
Previous studies show that incorporating external information could improve the translation quality of Neural Machine Translation (NMT) systems.
The Past and Future contents are fed to both the attention model and the decoder states, which offers NMT systems the knowledge of translated and untranslated contents.
In the encoder-decoder architecture for neural machine translation (NMT), the hidden states of the recurrent structures in the encoder and decoder carry the crucial information about the sentence. These vectors are generated by parameters which are updated by back-propagation of translation errors through time.