12 dataset results for Hierarchical Multi-label Classification

The RCV1 dataset is a benchmark dataset on text categorization. It is a collection of newswire articles producd by Reuters in 1996-1997. It contains 804,414 manually labeled newswire documents, and categorized with respect to three controlled vocabularies: industries, topics and regions.

320 PAPERS • 6 BENCHMARKS

New York Times Annotated Corpus

The New York Times Annotated Corpus contains over 1.8 million articles written and published by the New York Times between January 1, 1987 and June 19, 2007 with article metadata provided by the New York Times Newsroom, the New York Times Indexing Service and the online production staff at nytimes.com. The corpus includes:

265 PAPERS • 8 BENCHMARKS

WOS

WOS (Web of Science Dataset)

Web of Science (WOS) is a document classification dataset that contains 46,985 documents with 134 categories which include 7 parents categories.

48 PAPERS • 4 BENCHMARKS

Cellcycle Funcat

Hierarchical multi-label classification dataset for functional genomics

2 PAPERS • 1 BENCHMARK

Derisi Funcat