TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Document Ranking	DaReCzech	Query-doc RobeCzech (Roberta-base)	P@10	46.73	# 1
Document Ranking	DaReCzech	Query-doc Small-E-Czech (Electra-small)	P@10	46.30	# 2
Document Ranking	DaReCzech	Siamese Small-E-Czech (Electra-small)	P@10	45.26	# 3

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/siamese-bert-based-model-for-web-search/document-ranking-on-dareczech)](https://paperswithcode.com/sota/document-ranking-on-dareczech?p=siamese-bert-based-model-for-web-search)`

Siamese BERT-based Model for Web Search Relevance Ranking Evaluated on a New Czech Dataset

3 Dec 2021 · Matěj Kocián, Jakub Náplava, Daniel Štancl, Vladimír Kadlec ·

Web search engines focus on serving highly relevant results within hundreds of milliseconds. Pre-trained language transformer models such as BERT are therefore hard to use in this scenario due to their high computational demands. We present our real-time approach to the document ranking problem leveraging a BERT-based siamese architecture. The model is already deployed in a commercial search engine and it improves production performance by more than 3%. For further research and evaluation, we release DaReCzech, a unique data set of 1.6 million Czech user query-document pairs with manually assigned relevance levels. We also release Small-E-Czech, an Electra-small language model pre-trained on a large Czech corpus. We believe this data will support endeavours both of search relevance and multilingual-focused research communities.

PDF Abstract

Code

Add Remove Mark official

seznam/dareczech official

Tasks

Add Remove

Document Ranking

Datasets

Introduced in the Paper:

DaReCzech

Results from the Paper

Edit

Ranked #1 on Document Ranking on DaReCzech

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Document Ranking	DaReCzech	Query-doc RobeCzech (Roberta-base)	P@10	46.73	# 1	Compare
Document Ranking	DaReCzech	Query-doc Small-E-Czech (Electra-small)	P@10	46.30	# 2	Compare
Document Ranking	DaReCzech	Siamese Small-E-Czech (Electra-small)	P@10	45.26	# 3	Compare

Methods

Add Remove

Adam • Attention Dropout • BERT • Dense Connections • Dropout • GELU • Layer Normalization • Linear Layer • Linear Warmup With Linear Decay • Multi-Head Attention • Residual Connection • Scaled Dot-Product Attention • Softmax • Weight Decay • WordPiece

Edit Social Preview

Siamese BERT-based Model for Web Search Relevance Ranking Evaluated on a New Czech Dataset

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove