In this work, we introduce a model and beam-search training scheme, based on the work of Daume III and Marcu (2005), that extends seq2seq to learn global sequence scores.
#14 best model for Machine Translation on IWSLT2015 German-English
We demonstrate that standard knowledge distillation applied to word-level prediction can be effective for NMT, and also introduce two novel sequence-level versions of knowledge distillation that further improve performance, and somewhat surprisingly, seem to eliminate the need for beam search (even when applied on the original teacher model).
We present the Stanford Question Answering Dataset (SQuAD), a new reading comprehension dataset consisting of 100, 000+ questions posed by crowdworkers on a set of Wikipedia articles, where the answer to each question is a segment of text from the corresponding reading passage.
Approaches to multimodal pooling include element-wise product or sum, as well as concatenation of the visual and textual representations.
Such importance degree and text representation are calculated with multiple computational layers, each of which is a neural attention model over an external memory.
#12 best model for Aspect-Based Sentiment Analysis on SemEval 2014 Task 4 Sub Task 2
This paper explores the task of translating natural language queries into regular expressions which embody their meaning.