MS MARCO (Microsoft Machine Reading Comprehension Dataset)

Introduced by Bajaj et al. in MS MARCO: A Human Generated MAchine Reading COmprehension Dataset

The MS MARCO (Microsoft MAchine Reading Comprehension) is a collection of datasets focused on deep learning in search. The first dataset was a question answering dataset featuring 100,000 real Bing questions and a human generated answer. Over time the collection was extended with a 1,000,000 question dataset, a natural language generation dataset, a passage ranking dataset, keyphrase extraction dataset, crawling dataset, and a conversational search.



