3 dataset results for Supervised Text Retrieval

The Reuters-21578 dataset is a collection of documents with news articles. The original corpus has 10,369 documents and a vocabulary of 29,930 words.

63 PAPERS • 6 BENCHMARKS

20 Newsgroups

The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across 20 different newsgroups.

26 PAPERS • 6 BENCHMARKS

COVID-19 Twitter Chatter Dataset

A large-scale curated dataset of over 152 million tweets, growing daily, related to COVID-19 chatter generated from January 1st to April 4th at the time of writing.

10 PAPERS • 6 BENCHMARKS

Datasets

3 dataset results for Supervised Text Retrieval