LEP-AD: Language Embedding of Proteins and Attention to Drugs predicts drug target interactions

Predicting drug-target interactions is a tremendous challenge for drug development and lead optimization. Recent advances include training algorithms to learn drug-target interactions from data and molecular simulations. Here we utilize Evolutionary Scale Modeling (ESM-2) models to establish a Transformer protein language model for drug-target interaction predictions. Our architecture, LEP-AD, combines pre-trained ESM-2 and Transformer-GCN models predicting binding affinity values. We report new best-in-class state-of-the-art results compared to competing methods such as SimBoost, DeepCPI, Attention-DTA, GraphDTA, and more using multiple datasets, including Davis, KIBA, DTC, Metz, ToxCast, and STITCH. Finally, we find that a pre-trained model with embedding of proteins (the LED-AD) outperforms a model using an explicit alpha-fold 3D representation of proteins (e.g., LEP-AD supervised by Alphafold). The LEP-AD model scales favorably in performance with the size of training data. Code available at https://github.com/adaga06/LEP-AD

PDF Abstract

Datasets


Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Protein Language Model DAVIS-DTA LEP-AD CI 89.5 # 1

Methods