Uni-Mol: A Universal 3D Molecular Representation Learning Framework

Molecular representation learning (MRL) has gained tremendous attention due to its critical role in learning from limited supervised data for applications like drug design. In most MRL methods, molecules are treated as 1D sequential tokens or 2D topology graphs, limiting their ability to incorporate 3D information for downstream tasks and, in particular, making it almost impossible for 3D geometry prediction or generation. Herein, we propose Uni-Mol, a universal MRL framework that significantly enlarges the representation ability and application scope of MRL schemes. Uni-Mol is composed of two models with the same SE(3)-equivariant transformer architecture: a molecular pretraining model trained by 209M molecular conformations; a pocket pretraining model trained by 3M candidate protein pocket data. The two models are used independently for separate tasks, and are combined when used in protein-ligand binding tasks. By properly incorporating 3D information, Uni-Mol outperforms SOTA in 14/15 molecular property prediction tasks. Moreover, Uni-Mol achieves superior performance in 3D spatial tasks, including protein-ligand binding pose prediction, molecular conformation generation, etc. Finally, we show that Uni-Mol can be successfully applied to the tasks with few-shot data like pocket druggability prediction. The model and data will be made publicly available at https://github.com/dptech-corp/Uni-Mol.

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Benchmark
Molecular Property Prediction BACE Uni-Mol ROC-AUC 85.7 # 2
Molecular Property Prediction BBBP Uni-Mol ROC-AUC 72.9 # 11
Molecular Property Prediction ClinTox Uni-Mol ROC-AUC 91.9 # 4
Molecules (M) 19 # 3
Molecular Property Prediction ESOL Uni-Mol RMSE 0.788 # 1
Molecular Property Prediction FreeSolv Uni-Mol RMSE 1.620 # 2
Molecular Property Prediction HIV Uni-Mol ROC-AUC 80.8 # 1
Molecular Property Prediction Lipophilicity Uni-Mol RMSE 0.603 # 1
Molecular Property Prediction MUV Uni-Mol ROC-AUC 82.1 # 1
Molecular Property Prediction PCBA Uni-Mol ROC-AUC 88.5 # 1
Molecular Property Prediction QM7 Uni-Mol MAE 41.8 # 1
Molecular Property Prediction QM8 Uni-Mol MAE 0.0156 # 1
Molecular Property Prediction QM9 Uni-Mol MAE 0.00467 # 1
Molecular Property Prediction SIDER Uni-Mol ROC-AUC 65.9 # 6
Molecular Property Prediction Tox21 Uni-Mol ROC-AUC 79.6 # 3
Molecular Property Prediction ToxCast Uni-Mol ROC-AUC 69.6 # 1

Methods


No methods listed for this paper. Add relevant methods here