OQM9HK: A Large-Scale Graph Dataset for Machine Learning in Materials Science
We introduce a large-scale dataset of quantum-mechanically calculated properties of crystalline materials for graph representation learning that contains approximately 900k entries (OQM9HK). This dataset is constructed on the basis of the Open Quantum Materials Database (OQMD) v1.5 containing more than one million entries, and is the successor to the OQMD v1.2 dataset containing approximately 600k entries (OQM6HK). We develop the graph creation algorithm to produce a binary edge-labeled (BEL) graph representing a crystalline material. The BEL graph has higher representability of crystal structure than the edge-unlabeled ones. In materials property prediction tasks, crystal graph neural networks trained on the BEL graph dataset perform better than ones on the other graph datasets. The OQM9HK graph dataset is available at the Zenodo repository, https://doi.org/10.5281/zenodo.7124330
PDF AbstractCode
Task | Dataset | Model | Metric Name | Metric Value | Global Rank | Benchmark |
---|---|---|---|---|---|---|
Total Magnetization | OQM9HK | CGNN Full Ensemble | MAE | 0.10147 | # 1 | |
AUC | 0.96857 | # 1 | ||||
Total Magnetization | OQM9HK | CGNN Trio Ensemble | MAE | 0.10754 | # 2 | |
AUC | 0.96537 | # 2 | ||||
Total Magnetization | OQM9HK | CGNN | MAE | 0.11762 ± 0.00098 | # 3 | |
AUC | 0.95927 ± 0.00017 | # 3 | ||||
Band Gap | OQM9HK | CGNN Full Ensemble | MAE | 0.4175 | # 1 | |
AUC | 0.97338 | # 1 | ||||
Band Gap | OQM9HK | CGNN Trio Ensemble | MAE | 0.4353 | # 2 | |
AUC | 0.97127 | # 2 | ||||
Band Gap | OQM9HK | CGNN | MAE | 0.4891 ± 0.0021 | # 3 | |
AUC | 0.96449 ± 0.00091 | # 3 | ||||
Formation Energy | OQM9HK | CGNN Trio Ensemble | MAE | 0.03658 | # 2 | |
Formation Energy | OQM9HK | CGNN Full Ensemble | MAE | 0.03433 | # 1 | |
Formation Energy | OQM9HK | CGNN | MAE | 0.04249 ± 0.00037 | # 3 |