OGB-LSC: A Large-Scale Challenge for Machine Learning on Graphs

17 Mar 2021  ยท  Weihua Hu, Matthias Fey, Hongyu Ren, Maho Nakata, Yuxiao Dong, Jure Leskovec ยท

Enabling effective and efficient machine learning (ML) over large-scale graph data (e.g., graphs with billions of edges) can have a great impact on both industrial and scientific applications. However, existing efforts to advance large-scale graph ML have been largely limited by the lack of a suitable public benchmark. Here we present OGB Large-Scale Challenge (OGB-LSC), a collection of three real-world datasets for facilitating the advancements in large-scale graph ML. The OGB-LSC datasets are orders of magnitude larger than existing ones, covering three core graph learning tasks -- link prediction, graph regression, and node classification. Furthermore, we provide dedicated baseline experiments, scaling up expressive graph ML models to the massive datasets. We show that expressive models significantly outperform simple scalable baselines, indicating an opportunity for dedicated efforts to further improve graph ML at scale. Moreover, OGB-LSC datasets were deployed at ACM KDD Cup 2021 and attracted more than 500 team registrations globally, during which significant performance improvements were made by a variety of innovative techniques. We summarize the common techniques used by the winning solutions and highlight the current best practices in large-scale graph ML. Finally, we describe how we have updated the datasets after the KDD Cup to further facilitate research advances. The OGB-LSC datasets, baseline code, and all the information about the KDD Cup are available at https://ogb.stanford.edu/docs/lsc/ .

PDF Abstract

Datasets


Introduced in the Paper:

OGB-LSC PCQM4Mv2-LSC

Used in the Paper:

OGB

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Node Classification MAG240M-LSC GraphSAGE (NS) Test Accuracy 66.25 # 3
Node Classification MAG240M-LSC SIGN Validation Accuracy 66.64 # 1
Test Accuracy 66.09 # 4
Node Classification MAG240M-LSC R-GraphSAGE (NS) Test Accuracy 68.94 # 1
Node Classification MAG240M-LSC GAT (NS) Test Accuracy 66.63 # 2
Graph Regression PCQM4M-LSC GCN-Virtual Validation MAE 0.1536 # 7
Test MAE 15.79 # 3
Graph Regression PCQM4M-LSC GIN-virtual Validation MAE 0.1396 # 6
Test MAE 14.87 # 2
Graph Regression PCQM4M-LSC GIN Test MAE 16.78 # 4
Graph Regression PCQM4M-LSC GCN Validation MAE 0.1684 # 8
Test MAE 18.38 # 5
Graph Regression PCQM4M-LSC MLP-fingerprint Validation MAE 0.2044 # 9
Test MAE 20.68 # 6
Graph Regression PCQM4Mv2-LSC MLP-Fingerprint Validation MAE 0.1753 # 18
Test MAE 0.1760 # 13
Knowledge Graphs WikiKG90M-LSC ComplEx-Concat Validation MRR 0.8425 # 2
Test MRR 0.8637 # 2
Knowledge Graphs WikiKG90M-LSC TransE-Concat Validation MRR 0.8494 # 1
Test MRR 85.48 # 1
Knowledge Graphs WikiKG90M-LSC ComplEx-RoBERTa Validation MRR 0.7052 # 3
Test MRR 0.7186 # 3
Knowledge Graphs WikiKG90M-LSC TransE-RoBERTa Validation MRR 0.6039 # 4
Test MRR 0.6288 # 4

Methods


No methods listed for this paper. Add relevant methods here