Retrieval Question-Answering (ReQA) benchmark tests a model’s ability to retrieve relevant answers efficiently from a large set of documents.
10 PAPERS • NO BENCHMARKS YET
The MQ2008 dataset is a dataset for Learning to Rank. It contains 800 queries with labelled documents.
27 PAPERS • NO BENCHMARKS YET
The MSLR-WEB10K dataset consists of 10,000 search queries over the documents from search results. The data also contains the values of 136 features and a corresponding user-labeled relevance factor on a scale of one to five with respect to each query-document pair. It is a subset of the MSLR-WEB30K dataset.
35 PAPERS • NO BENCHMARKS YET
The Yahoo! Learning to Rank Challenge dataset consists of 709,877 documents encoded in 700 features and sampled from query logs of the Yahoo! search engine, spanning 29,921 queries.
24 PAPERS • NO BENCHMARKS YET
ART consists of over 20k commonsense narrative contexts and 200k explanations.
9 PAPERS • NO BENCHMARKS YET
The Flick Cropping Dataset consists of high quality cropping and pairwise ranking annotations used to evaluate the performance of automatic image cropping approaches.
5 PAPERS • NO BENCHMARKS YET
The MQ2007 dataset consists of queries, corresponding retrieved documents and labels provided by human experts. The possible relevance labels for each document are “relevant”, “partially relevant”, and “not relevant”.
30 PAPERS • NO BENCHMARKS YET
Publicly available dataset of naturally occurring factual claims for the purpose of automatic claim verification. It is collected from 26 fact checking websites in English, paired with textual sources and rich metadata, and labelled for veracity by human expert journalists.
18 PAPERS • NO BENCHMARKS YET
Consists of English-French sentence-pairs annotated with semantic divergence classes and token-level rationales.
3 PAPERS • NO BENCHMARKS YET