TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Zero-shot Text Search	BEIR	GTR XXL (Ni et al., 2022)	Avg. Accuracy	51.6	# 9
Zero-shot Text Search	BEIR	GTR XXL (Ni et al., 2022)	Avg. nDCG@10	45.8	# 5
Zero-shot Text Search	BEIR	GTR XL (Ni et al., 2022)	Avg. Accuracy	51.1	# 11
Zero-shot Text Search	BEIR	GTR XL (Ni et al., 2022)	Avg. nDCG@10	45.3	# 6

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/large-dual-encoders-are-generalizable/zero-shot-text-search-on-beir)](https://paperswithcode.com/sota/zero-shot-text-search-on-beir?p=large-dual-encoders-are-generalizable)`

Large Dual Encoders Are Generalizable Retrievers

15 Dec 2021 · Jianmo Ni, Chen Qu, Jing Lu, Zhuyun Dai, Gustavo Hernández Ábrego, Ji Ma, Vincent Y. Zhao, Yi Luan, Keith B. Hall, Ming-Wei Chang, Yinfei Yang ·

It has been shown that dual encoders trained on one domain often fail to generalize to other domains for retrieval tasks. One widespread belief is that the bottleneck layer of a dual encoder, where the final score is simply a dot-product between a query vector and a passage vector, is too limited to make dual encoders an effective retrieval model for out-of-domain generalization. In this paper, we challenge this belief by scaling up the size of the dual encoder model {\em while keeping the bottleneck embedding size fixed.} With multi-stage training, surprisingly, scaling up the model size brings significant improvement on a variety of retrieval tasks, especially for out-of-domain generalization. Experimental results show that our dual encoders, \textbf{G}eneralizable \textbf{T}5-based dense \textbf{R}etrievers (GTR), outperform %ColBERT~\cite{khattab2020colbert} and existing sparse and dense retrievers on the BEIR dataset~\cite{thakur2021beir} significantly. Most surprisingly, our ablation study finds that GTR is very data efficient, as it only needs 10\% of MS Marco supervised data to achieve the best out-of-domain performance. All the GTR models are released at https://tfhub.dev/google/collections/gtr/1.

PDF Abstract

Code

Add Remove Mark official

google-research/t5x_retrieval

openmatch/coco-dr

Tasks

Add Remove

Domain Generalization

Retrieval

Zero-shot Text Search

Datasets

Natural Questions

MS MARCO

BEIR

Results from the Paper

Edit

Ranked #9 on Zero-shot Text Search on BEIR

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Zero-shot Text Search	BEIR	GTR XXL (Ni et al., 2022)	Avg. Accuracy	51.6	# 9	Compare
Zero-shot Text Search	BEIR	GTR XXL (Ni et al., 2022)	Avg. nDCG@10	45.8	# 5	Compare
Zero-shot Text Search	BEIR	GTR XL (Ni et al., 2022)	Avg. Accuracy	51.1	# 11	Compare
Zero-shot Text Search	BEIR	GTR XL (Ni et al., 2022)	Avg. nDCG@10	45.3	# 6	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Large Dual Encoders Are Generalizable Retrievers

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove