Scalable Attentive Sentence-Pair Modeling via Distilled Sentence Embedding

14 Aug 2019Oren BarkanNoam RazinItzik MalkielOri KatzAvi CaciularuNoam Koenigstein

Recent state-of-the-art natural language understanding models, such as BERT and XLNet, score a pair of sentences (A and B) using multiple cross-attention operations - a process in which each word in sentence A attends to all words in sentence B and vice versa. As a result, computing the similarity between a query sentence and a set of candidate sentences, requires the propagation of all query-candidate sentence-pairs throughout a stack of cross-attention layers... (read more)

PDF Abstract

Evaluation Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help community compare results to other papers.