Deeper Text Understanding for IR with Contextual Neural Language Modeling

22 May 2019  ·  Zhuyun Dai, Jamie Callan ·

Neural networks provide new possibilities to automatically learn complex language patterns and query-document relations. Neural IR models have achieved promising results in learning query-document relevance patterns, but few explorations have been done on understanding the text content of a query or a document. This paper studies leveraging a recently-proposed contextual neural language model, BERT, to provide deeper text understanding for IR. Experimental results demonstrate that the contextual text representations from BERT are more effective than traditional word embeddings. Compared to bag-of-words retrieval models, the contextual language model can better leverage language structures, bringing large improvements on queries written in natural languages. Combining the text understanding ability with search knowledge leads to an enhanced pre-trained BERT model that can benefit related search tasks where training data are limited.

PDF Abstract

Datasets


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Ad-Hoc Information Retrieval TREC Robust04 BERT-MaxP nDCG@20 0.469 # 5
Ad-Hoc Information Retrieval TREC Robust04 BERT-FirstP nDCG@20 0.444 # 11
Ad-Hoc Information Retrieval TREC Robust04 BERT-SumP nDCG@20 0.467 # 6

Methods