Contrastive Pre-training for Zero-Shot Information Retrieval

Information retrieval is an important component in natural language processing, for knowledge intensive tasks such as question answering and fact checking. Recently, information retrieval has seen the emergence of dense retrievers, based on neural networks, as an alternative to classical sparse methods based on term-frequency. Neural retrievers work well on the problems for which they were specifically trained, but they do not generalize as well as term-frequency methods to new domains or applications. By contrast, in many other NLP tasks, conventional self-supervised pre-training based on masking leads to strong generalization with small number of training examples. We believe this is not yet the case for information retrieval, because these pre-training methods are not well adapted to this task. In this work, we consider contrastive learning as a more natural pre-training technique for retrieval and show that it leads to models that are competitive with BM25 on many domains or applications, even without training on supervised data. Our dense pre-trained models also compare favorably against BERT pre-trained models in the few-shot setting, and achieves state-of-the-art performance on the BEIR benchmark when fine-tuned on MS-MARCO.

PDF Abstract

Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.