This paper describes Facebook FAIR's submission to the WMT19 shared news translation task.
Progress in Machine Learning is often driven by the availability of large datasets, and consistent evaluation metrics for comparing modeling approaches.
Despite recent advances in natural language processing, many statistical models for processing text perform extremely poorly under domain shift.
Technologies for abusive language detection are being developed and applied with little consideration of their potential biases.
Large pre-trained neural networks such as BERT have had great recent success in NLP, motivating a growing body of research investigating what aspects of language they are able to learn from unlabeled data.
Our model is built upon the basic Seq2Seq model by augmenting it with a hierarchical joint attention mechanism that incorporates topical concepts and previous interactions into the response generation.
Pre-trained word embeddings are used in several downstream applications as well as for constructing representations for sentences, paragraphs and documents.