OQM9HK: A Large-Scale Graph Dataset for Machine Learning in Materials Science

We introduce a large-scale dataset of quantum-mechanically calculated properties of crystalline materials for graph representation learning that contains approximately 900k entries (OQM9HK). This dataset is constructed on the basis of the Open Quantum Materials Database (OQMD) v1.5 containing more than one million entries, and is the successor to the OQMD v1.2 dataset containing approximately 600k entries (OQM6HK). We develop the graph creation algorithm to produce a binary edge-labeled (BEL) graph representing a crystalline material. The BEL graph has higher representability of crystal structure than the edge-unlabeled ones. In materials property prediction tasks, crystal graph neural networks trained on the BEL graph dataset perform better than ones on the other graph datasets. The OQM9HK graph dataset is available at the Zenodo repository, https://doi.org/10.5281/zenodo.7124330

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Total Magnetization OQM9HK CGNN Full Ensemble MAE 0.10147 # 1
AUC 0.96857 # 1
Total Magnetization OQM9HK CGNN Trio Ensemble MAE 0.10754 # 2
AUC 0.96537 # 2
Total Magnetization OQM9HK CGNN MAE 0.11762 ± 0.00098 # 3
AUC 0.95927 ± 0.00017 # 3
Band Gap OQM9HK CGNN Full Ensemble MAE 0.4175 # 1
AUC 0.97338 # 1
Band Gap OQM9HK CGNN Trio Ensemble MAE 0.4353 # 2
AUC 0.97127 # 2
Band Gap OQM9HK CGNN MAE 0.4891 ± 0.0021 # 3
AUC 0.96449 ± 0.00091 # 3
Formation Energy OQM9HK CGNN Trio Ensemble MAE 0.03658 # 2
Formation Energy OQM9HK CGNN Full Ensemble MAE 0.03433 # 1
Formation Energy OQM9HK CGNN MAE 0.04249 ± 0.00037 # 3

Methods


No methods listed for this paper. Add relevant methods here