ProGAE: A Geometric Autoencoder-based Generative Model for Disentangling Protein Dynamics

1 Jan 2021 · Norman Joseph Tatro, Payel Das, Pin-Yu Chen, Vijil Chenthamarakshan, Rongjie Lai ·

Understanding molecular dynamics is critical, especially for large flexible molecules like proteins. Protein function, as well as modulations thereof due to ligand binding or changes in environment, are intimately connected with these dynamics. This work focuses on learning a generative neural network on molecular simulation trajectories to characterize the distinct dynamics of a protein bound to various drug molecules. Specifically, we use a geometric autoencoder framework to learn separate latent space encodings of the intrinsic and extrinsic geometries of the system. For this purpose, the proposed Protein Geometric AutoEncoder (ProGAE) model is trained on the length of the alpha-carbon pseudobonds and the orientation of the backbone bonds of the protein. Using ProGAE latent embeddings, we reconstruct protein simulation trajectories at or near the experimental resolution. Empowered by the disentangled latent space learning, the extrinsic latent embedding is successfully used for classification or property prediction of different drugs bound to a specific protein. Additionally, ProGAE is able to be transferred to a trajectory of a different state of the same protein or to a completely different protein, where only the dense layer decoding from the latent representation needs to be retrained. Results show that our geometric learning-based method enjoys both accuracy and efficiency for generating complex molecular dynamics, charting the path toward scalable and improved approaches for analyzing and enhancing molecular simulations.

PDF Abstract