COCO-DR: Combating Distribution Shifts in Zero-Shot Dense Retrieval with Contrastive and Distributionally Robust Learning

27 Oct 2022  Â·  Yue Yu, Chenyan Xiong, Si Sun, Chao Zhang, Arnold Overwijk ·

We present a new zero-shot dense retrieval (ZeroDR) method, COCO-DR, to improve the generalization ability of dense retrieval by combating the distribution shifts between source training tasks and target scenarios. To mitigate the impact of document differences, COCO-DR continues pretraining the language model on the target corpora to adapt the model to target distributions via COtinuous COtrastive learning. To prepare for unseen target queries, COCO-DR leverages implicit Distributionally Robust Optimization (iDRO) to reweight samples from different source query clusters for improving model robustness over rare queries during fine-tuning. COCO-DR achieves superior average performance on BEIR, the zero-shot retrieval benchmark. At BERT Base scale, COCO-DR Base outperforms other ZeroDR models with 60x larger size. At BERT Large scale, COCO-DR Large outperforms the giant GPT-3 embedding model which has 500x more parameters. Our analysis show the correlation between COCO-DR's effectiveness in combating distribution shifts and improving zero-shot accuracy. Our code and model can be found at \url{https://github.com/OpenMatch/COCO-DR}.

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Zero-shot Text Search ArguAna COCO-DR Base nDCG@10 49.3 # 2
Zero-shot Text Search ArguAna COCO-DR Large nDCG@10 51.5 # 1
Zero-shot Text Search BEIR COCO-DR Base (Yu et al., 2022) Avg. Accuracy 52.1 # 7
Avg. nDCG@10 46.2 # 3
Zero-shot Text Search BEIR COCO-DR Large (Yu et al., 2022) Avg. Accuracy 54.1 # 2
Avg. nDCG@10 48.4 # 1
Zero-shot Text Search BioASQ COCO-DR Base nDCG@10 42.9 # 2
Zero-shot Text Search BioASQ COCO-DR Large nDCG@10 44.9 # 1
Zero-shot Text Search CLIMATE-FEVER COCO-DR Base nDCG@10 21.1 # 2
Zero-shot Text Search CLIMATE-FEVER COCO-DR Large nDCG@10 24.7 # 1
Zero-shot Text Search CQADupStack COCO-DR Base nDCG@10 37 # 2
Zero-shot Text Search CQADupStack COCO-DR Large nDCG@10 39.3 # 1
Zero-shot Text Search DBpedia COCO-DR Large nDCG@10 40.7 # 1
Zero-shot Text Search DBpedia COCO-DR Base nDCG@10 39.1 # 2
Zero-shot Text Search FEVER COCO-DR Base nDCG@10 75.1 # 2
Zero-shot Text Search FEVER COCO-DR Large nDCG@10 79.3 # 1
Zero-shot Text Search FiQA-2018 COCO-DR Base nDCG@10 30.7 # 2
Zero-shot Text Search FiQA-2018 COCO-DR Large nDCG@10 32.9 # 1
Zero-shot Text Search NFCorpus COCO-DR Large nDCG@10 35.4 # 2
Zero-shot Text Search NFCorpus COCO-DR Base nDCG@10 35.5 # 1
Zero-shot Text Search NQ COCO-DR Large nDCG@10 54.7 # 2
Zero-shot Text Search NQ COCO-DR Base nDCG@10 50.5 # 3
Zero-shot Text Search quora COCO-DR Large nDCG@10 87.2 # 1
Zero-shot Text Search quora COCO-DR Base nDCG@10 86.7 # 2
Zero-shot Text Search Robust04 COCO-DR Base nDCG@10 44.3 # 2
Zero-shot Text Search Robust04 COCO-DR Large nDCG@10 48.2 # 1
Zero-shot Text Search SciDocs COCO-DR Base nDCG@10 16 # 2
Zero-shot Text Search SciDocs COCO-DR Large nDCG@10 17.8 # 1
Zero-shot Text Search SciFact COCO-DR Large nDCG@10 72.2 # 1
Zero-shot Text Search SciFact COCO-DR Base nDCG@10 70.9 # 2
Zero-shot Text Search Signal-1M (RT) COCO-DR Large nDCG@10 28.5 # 1
Zero-shot Text Search Signal-1M (RT) COCO-DR Base nDCG@10 27.1 # 2
Zero-shot Text Search TREC-COVID COCO-DR Base nDCG@10 78.9 # 3
Zero-shot Text Search TREC-COVID COCO-DR Large nDCG@10 80.4 # 2
Zero-shot Text Search TREC-COVID GTR XXL nDCG@10 50.1 # 7
Zero-shot Text Search TREC-COVID GTR XL nDCG@10 58.4 # 6
Zero-shot Text Search TREC-COVID GPL nDCG@10 70 # 4
Zero-shot Text Search TREC-COVID GenQ nDCG@10 61.9 # 5
Zero-shot Text Search TREC-News COCO-DR Base nDCG@10 40.3 # 2
Zero-shot Text Search TREC-News COCO-DR Large nDCG@10 43.2 # 1
Zero-shot Text Search Webis-Touché-2020 COCO-DR Base nDCG@10 23.8 # 2
Zero-shot Text Search Webis-Touché-2020 COCO-DR Large nDCG@10 26.3 # 1

Methods