OpenML-CC18

Introduced by Bischl et al. in OpenML Benchmarking Suites

We advocate the use of curated, comprehensive benchmark suites of machine learning datasets, backed by standardized OpenML-based interfaces and complementary software toolkits written in Python, Java and R. We demonstrate how to easily execute comprehensive benchmarking studies using standardized OpenML-based benchmarking suites and complementary software toolkits written in Python, Java and R. Major distinguishing features of OpenML benchmark suites are (i) ease of use through standardized data formats, APIs, and existing client libraries; (ii) machine-readable meta-information regarding the contents of the suite; and (iii) online sharing of results, enabling large scale comparisons. As a first such suite, we propose the OpenML-CC18, a machine learning benchmark suite of 72 classification datasets carefully curated from the thousands of datasets on OpenML.

The inclusion criteria are: * classification tasks on dense data set independent observations * number of classes >= 2, each class with at least 20 observations and ratio of minority to majority class must exceed 5% * 500 <= number of observations <= 100000 * number of features after one-hot-encoding < 5000 * no artificial data sets * no subsets of larger data sets nor binarizations of other data sets * no data sets which are perfectly predictable by using a single feature or by using a simple decision tree * source or reference available

If you use this benchmarking suite, please cite:

Bernd Bischl, Giuseppe Casalicchio, Matthias Feurer, Frank Hutter, Michel Lang, Rafael G. Mantovani, Jan N. van Rijn and Joaquin Vanschoren. “OpenML Benchmarking Suites” arXiv:1708.03731v2 [stats.ML] (2019).

@article{oml-benchmarking-suites,
title={OpenML Benchmarking Suites},
author={Bernd Bischl and Giuseppe Casalicchio and Matthias Feurer and Frank Hutter and Michel Lang and Rafael G. Mantovani and Jan N. van Rijn and Joaquin Vanschoren},
year={2019},
journal={arXiv:1708.03731v2 [stat.ML]}
}

Homepage

Benchmarks

Add a new result Link an existing benchmark

No benchmarks yet. Start a new benchmark or link an existing one.

Papers

Paper	Code	Results	Date	Stars

OpenML-CC18

Benchmarks

Add a new result Link an existing benchmark

Papers

Dataset Loaders

Add Remove

Tasks

Similar Datasets

PIMA Diabetes Dataset with Paper, Experiments, and Code

Usage

License

Modalities

Languages

OpenML-CC18

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit

Similar Datasets

PIMA Diabetes Dataset with Paper, Experiments, and Code

Usage

License Edit

Modalities Edit

Languages Edit

Benchmarks

Add a new result Link an existing benchmark

Dataset Loaders

Add Remove

Tasks

License

Modalities

Languages