Analyzing Learned Molecular Representations for Property Prediction

Advancements in neural machinery have led to a wide range of algorithmic solutions for molecular property prediction. Two classes of models in particular have yielded promising results: neural networks applied to computed molecular fingerprints or expert-crafted descriptors, and graph convolutional neural networks that construct a learned molecular representation by operating on the graph structure of the molecule. However, recent literature has yet to clearly determine which of these two methods is superior when generalizing to new chemical space. Furthermore, prior research has rarely examined these new models in industry research settings in comparison to existing employed models. In this paper, we benchmark models extensively on 19 public and 16 proprietary industrial datasets spanning a wide variety of chemical endpoints. In addition, we introduce a graph convolutional model that consistently matches or outperforms models using fixed molecular descriptors as well as previous graph neural architectures on both public and proprietary datasets. Our empirical findings indicate that while approaches based on these representations have yet to reach the level of experimental reproducibility, our proposed model nevertheless offers significant improvements over models currently used in industrial workflows.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Molecular Property Prediction QM8 D-MPNN MAE 0.0190 # 3
Molecular Property Prediction QM9 D-MPNN MAE 0.00814 # 3

Results from Other Papers


Task Dataset Model Metric Name Metric Value Rank Uses Extra
Training Data
Source Paper Compare
Molecular Property Prediction BACE D-MPNN ROC-AUC 80.9 # 8
Molecular Property Prediction BBBP D-MPNN ROC-AUC 71.0 # 7
Molecular Property Prediction ClinTox D-MPNN ROC-AUC 90.6 # 5
Molecular Property Prediction ESOL D-MPNN RMSE 1.050 # 5
Molecular Property Prediction FreeSolv D-MPNN RMSE 2.082 # 4
Molecular Property Prediction Lipophilicity D-MPNN RMSE 0.683 # 3
Molecular Property Prediction QM7 D-MPNN MAE 103.5 # 7
Molecular Property Prediction SIDER D-MPNN ROC-AUC 57.0 # 14
Molecular Property Prediction Tox21 D-MPNN ROC-AUC 75.9 # 8
Molecular Property Prediction ToxCast D-MPNN ROC-AUC 65.5 # 4

Methods


No methods listed for this paper. Add relevant methods here