N-Gram Graph: Simple Unsupervised Representation for Graphs, with Applications to Molecules

Machine learning techniques have recently been adopted in various applications in medicine, biology, chemistry, and material engineering. An important task is to predict the properties of molecules, which serves as the main subroutine in many downstream applications such as virtual screening and drug design. Despite the increasing interest, the key challenge is to construct proper representations of molecules for learning algorithms. This paper introduces the N-gram graph, a simple unsupervised representation for molecules. The method first embeds the vertices in the molecule graph. It then constructs a compact representation for the graph by assembling the vertex embeddings in short walks in the graph, which we show is equivalent to a simple graph neural network that needs no training. The representations can thus be efficiently computed and then used with supervised learning methods for prediction. Experiments on 60 tasks from 10 benchmark datasets demonstrate its advantages over both popular graph neural networks and traditional representation methods. This is complemented by theoretical analysis showing its strong representation and prediction power.

PDF Abstract NeurIPS 2019 PDF NeurIPS 2019 Abstract
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Molecular Property Prediction FreeSolv N-GramXGB RMSE 5.061 # 9
Molecular Property Prediction QM8 N-GramRF MAE 0.0236 # 8
Molecular Property Prediction QM8 N-GramXGB MAE 0.0215 # 5
Molecular Property Prediction QM9 N-GramXGB MAE 0.00964 # 5
Molecular Property Prediction QM9 N-GramRF MAE 0.01037 # 8

Results from Other Papers


Task Dataset Model Metric Name Metric Value Rank Uses Extra
Training Data
Source Paper Compare
Molecular Property Prediction BACE N-GramXGB ROC-AUC 79.1 # 10
Molecular Property Prediction BACE N-GramRF ROC-AUC 77.9 # 11
Molecular Property Prediction BBBP N-GramRF ROC-AUC 69.7 # 9
Molecular Property Prediction BBBP N-GramXGB ROC-AUC 69.1 # 11
Molecular Property Prediction ClinTox N-GramRF ROC-AUC 77.5 # 12
Molecular Property Prediction ClinTox N-GramXGB ROC-AUC 87.5 # 7
Molecular Property Prediction FreeSolv N-GramRF RMSE 2.688 # 7
Molecular Property Prediction Lipophilicity N-GramXGB RMSE 2.072 # 10
Molecular Property Prediction Lipophilicity N-GramRF RMSE 0.812 # 7
Molecular Property Prediction QM7 N-GramRF MAE 92.8 # 5
Molecular Property Prediction QM7 N-GramXGB MAE 81.9 # 3
Molecular Property Prediction SIDER N-GramXGB ROC-AUC 65.5 # 7
Molecular Property Prediction SIDER N-GramRF ROC-AUC 66.8 # 5
Molecular Property Prediction Tox21 N-GramXGB ROC-AUC 75.8 # 9
Molecular Property Prediction Tox21 N-GramRF ROC-AUC 74.3 # 10

Methods


No methods listed for this paper. Add relevant methods here