Tox21

The Tox21 data set comprises 12,060 training samples and 647 test samples that represent chemical compounds. There are 801 "dense features" that represent chemical descriptors, such as molecular weight, solubility or surface area, and 272,776 "sparse features" that represent chemical substructures (ECFP10, DFS6, DFS8; stored in Matrix Market Format ). Machine learning methods can either use sparse or dense data or combine them. For each sample there are 12 binary labels that represent the outcome (active/inactive) of 12 different toxicological experiments. Note that the label matrix contains many missing values (NAs). The original data source and Tox21 challenge site is https://tripod.nih.gov/tox21/challenge/.

Source: Tox21 Machine Learning Data Set

Homepage

Benchmarks

Add a new result Link an existing benchmark

Task	Dataset Variant	Best Model
Molecular Property Prediction	Tox21	BioAct-Het
Drug Discovery	Tox21	elEmBERT-V1
Graph Regression	Tox21	CensNet
Graph Classification	Tox21	GMT
Molecular Property Prediction (1-shot))	Tox21	Meta-MGNN