no code implementations • 19 Apr 2022 • Vijil Chenthamarakshan, Samuel C. Hoffman, C. David Owen, Petra Lukacik, Claire Strain-Damerell, Daren Fearon, Tika R. Malla, Anthony Tumber, Christopher J. Schofield, Helen M. E. Duyvesteyn, Wanwisa Dejnirattisai, Loic Carrique, Thomas S. Walter, Gavin R. Screaton, Tetiana Matviiuk, Aleksandra Mojsilovic, Jason Crain, Martin A. Walsh, David I. Stuart, Payel Das
The COVID-19 pandemic has highlighted the urgency for developing more efficient molecular discovery pathways.
Additionally, our multi-task approach is comprehensive in the sense that it is comparable to state-of-the-art approaches for specific endpoints in in vitro, in vivo and clinical platforms.
Learning effective protein representations is critical in a variety of tasks in biology such as predicting protein function or structure.
Photo-acid generators (PAGs) are compounds that release acids ($H^+$ ions) when exposed to light.
Here we consider three recently proposed deep generative frameworks for protein design: (AR) the sequence-based autoregressive generative model, (GVP) the precise structure-based graph neural network, and Fold2Seq that leverages a fuzzy and scale-free representation of a three-dimensional fold, while enforcing structure-to-sequence (and vice versa) consistency.
Designing novel protein sequences for a desired 3D topological fold is a fundamental yet non-trivial task in protein engineering.
Various representation learning methods in a supervised setting, including the features extracted using graph neural nets, have emerged for such tasks.
Deep generative models have emerged as a powerful tool for learning useful molecular representations and designing novel molecules with desired properties, with applications in drug discovery and material design.
Empowered by the disentangled latent space learning, the extrinsic latent embedding is successfully used for classification or property prediction of different drugs bound to a specific protein.
Machine learning based methods have shown potential for optimizing existing molecules with more desirable properties, a critical step towards accelerating new chemical discovery.
In this work, we propose a method for measuring how well the latent space of deep generative models is able to encode structural and chemical features of molecular datasets by correlating latent space metrics with metrics from the field of topological data analysis (TDA).
Chemical toxicity prediction using machine learning is important in drug development to reduce repeated animal and human testing, thus saving cost and time.
2 code implementations • 22 May 2020 • Payel Das, Tom Sercu, Kahini Wadhawan, Inkit Padhi, Sebastian Gehrmann, Flaviu Cipcigan, Vijil Chenthamarakshan, Hendrik Strobelt, Cicero dos Santos, Pin-Yu Chen, Yi Yan Yang, Jeremy Tan, James Hedrick, Jason Crain, Aleksandra Mojsilovic
De novo therapeutic design is challenged by a vast chemical repertoire and multiple constraints, e. g., high broad-spectrum potency and low toxicity.
Generative feature matching network (GFMN) is an approach for training implicit generative models for images by performing moment matching on features from pre-trained neural networks.
no code implementations • • Vijil Chenthamarakshan, Payel Das, Samuel C. Hoffman, Hendrik Strobelt, Inkit Padhi, Kar Wai Lim, Benjamin Hoover, Matteo Manica, Jannis Born, Teodoro Laino, Aleksandra Mojsilovic
CogMol also includes insilico screening for assessing toxicity of parent molecules and their metabolites with a multi-task toxicity classifier, synthetic feasibility with a chemical retrosynthesis predictor, and target structure binding with docking simulations.
We present the pipeline in an interactive visual tool to enable the exploration of the metrics, analysis of the learned latent space, and selection of the best model for a given task.
Though SSG is sequential in nature, it does not penalize the ordering of the appearance of the set elements and can be applied to a variety of set output problems, such as a set of classification labels or sequences.
Our model learns a rich latent space of the biological peptide context by taking advantage of abundant, unlabeled peptide sequences.
In this paper, we introduce the Fairness GAN, an approach for generating a dataset that is plausibly similar to a given multimedia dataset, but is more fair with respect to protected attributes in allocative decision making.
State of the art approaches for (embedding based) unsupervised semantic search exploits either compositional similarity (of a query and a passage) or pair-wise word (or term) similarity (from the query and the passage).
In traditional rule based IE frameworks, these layout cues are mapped to rules that operate on the HTML source of the webpages.