1 code implementation • 2 Dec 2024 • Zuobai Zhang, Pascal Notin, Yining Huang, Aurélie Lozano, Vijil Chenthamarakshan, Debora Marks, Payel Das, Jian Tang
Designing novel functional proteins crucially depends on accurately modeling their fitness landscape.
no code implementations • 4 Apr 2024 • Jerret Ross, Brian Belgodere, Samuel C. Hoffman, Vijil Chenthamarakshan, Youssef Mroueh, Payel Das
Impressively, we find GP-MoLFormer is able to generate a significant fraction of novel, valid, and unique SMILES even when the number of generated molecules is in the 10 billion range and the reference set is over a billion.
no code implementations • 18 Mar 2024 • Payel Das, Subhajit Chaudhury, Elliot Nelson, Igor Melnyk, Sarath Swaminathan, Sihui Dai, Aurélie Lozano, Georgios Kollias, Vijil Chenthamarakshan, Jiří, Navrátil, Soham Dan, Pin-Yu Chen
Efficient and accurate updating of knowledge stored in Large Language Models (LLMs) is one of the most pressing research challenges today.
no code implementations • 10 Feb 2024 • Zuobai Zhang, Jiarui Lu, Vijil Chenthamarakshan, Aurélie Lozano, Payel Das, Jian Tang
Protein function annotation is an important yet challenging task in biology.
1 code implementation • 7 Feb 2024 • Zuobai Zhang, Jiarui Lu, Vijil Chenthamarakshan, Aurélie Lozano, Payel Das, Jian Tang
To address this issue, we introduce the integration of remote homology detection to distill structural information into protein language models without requiring explicit protein structures as input.
3 code implementations • 11 Mar 2023 • Zuobai Zhang, Chuanrui Wang, Minghao Xu, Vijil Chenthamarakshan, Aurélie Lozano, Payel Das, Jian Tang
Recent sequence representation learning methods based on Protein Language Models (PLMs) excel in sequence-based tasks, but their direct adaptation to tasks involving protein structures remains a challenge.
1 code implementation • NeurIPS 2023 • Zuobai Zhang, Minghao Xu, Aurélie Lozano, Vijil Chenthamarakshan, Payel Das, Jian Tang
Considering the essential protein conformational variations, we enhance DiffPreT by a method called Siamese Diffusion Trajectory Prediction (SiamDiff) to capture the correlation between different conformers of a protein.
no code implementations • 13 Oct 2022 • Sourya Basu, Prasanna Sattigeri, Karthikeyan Natesan Ramamurthy, Vijil Chenthamarakshan, Kush R. Varshney, Lav R. Varshney, Payel Das
We also provide a novel group-theoretic definition for fairness in NLG.
1 code implementation • 5 Oct 2022 • Igor Melnyk, Vijil Chenthamarakshan, Pin-Yu Chen, Payel Das, Amit Dhurandhar, Inkit Padhi, Devleena Das
Results on antibody design benchmarks show that our model on low-resourced antibody sequence dataset provides highly diverse CDR sequences, up to more than a two-fold increase of diversity over the baselines, without losing structural integrity and naturalness.
1 code implementation • 5 Oct 2022 • Igor Melnyk, Aurelie Lozano, Payel Das, Vijil Chenthamarakshan
This model can then be used as a structure consistency regularizer in training the inverse folding model.
no code implementations • 13 Aug 2022 • Brian Belgodere, Vijil Chenthamarakshan, Payel Das, Pierre Dognin, Toby Kurien, Igor Melnyk, Youssef Mroueh, Inkit Padhi, Mattia Rigotti, Jarret Ross, Yair Schiff, Richard A. Young
With the prospect of automating a number of chemical tasks with high fidelity, chemical language processing models are emerging at a rapid speed.
1 code implementation • 8 Jul 2022 • Matteo Manica, Jannis Born, Joris Cadow, Dimitrios Christofidellis, Ashish Dave, Dean Clarke, Yves Gaetan Nana Teukam, Giorgio Giannone, Samuel C. Hoffman, Matthew Buchan, Vijil Chenthamarakshan, Timothy Donovan, Hsiang Han Hsu, Federico Zipoli, Oliver Schilter, Akihiro Kishimoto, Lisa Hamada, Inkit Padhi, Karl Wehden, Lauren McHugh, Alexy Khrabrov, Payel Das, Seiji Takeda, John R. Smith
With the growing availability of data within various scientific domains, generative models hold enormous potential to accelerate scientific discovery.
no code implementations • 20 May 2022 • N. Joseph Tatro, Payel Das, Pin-Yu Chen, Vijil Chenthamarakshan, Rongjie Lai
Massive molecular simulations of drug-target proteins have been used as a tool to understand disease mechanism and develop therapeutics.
no code implementations • 19 Apr 2022 • Vijil Chenthamarakshan, Samuel C. Hoffman, C. David Owen, Petra Lukacik, Claire Strain-Damerell, Daren Fearon, Tika R. Malla, Anthony Tumber, Christopher J. Schofield, Helen M. E. Duyvesteyn, Wanwisa Dejnirattisai, Loic Carrique, Thomas S. Walter, Gavin R. Screaton, Tetiana Matviiuk, Aleksandra Mojsilovic, Jason Crain, Martin A. Walsh, David I. Stuart, Payel Das
To perform target-aware design of novel inhibitor molecules, a protein sequence-conditioned sampling on the generative foundation model is performed.
1 code implementation • 13 Apr 2022 • Bhanushee Sharma, Vijil Chenthamarakshan, Amit Dhurandhar, Shiranee Pereira, James A. Hendler, Jonathan S. Dordick, Payel Das
Additionally, our multi-task approach is comprehensive in the sense that it is comparable to state-of-the-art approaches for specific endpoints in in vitro, in vivo and clinical platforms.
2 code implementations • 11 Mar 2022 • Zuobai Zhang, Minghao Xu, Arian Jamasb, Vijil Chenthamarakshan, Aurelie Lozano, Payel Das, Jian Tang
Despite the effectiveness of sequence-based approaches, the power of pretraining on known protein structures, which are available in smaller numbers only, has not been explored for protein property prediction, though protein structures are known to be determinants of protein function.
no code implementations • 2 Dec 2021 • Samuel C. Hoffman, Vijil Chenthamarakshan, Dmitry Yu. Zubarev, Daniel P. Sanders, Payel Das
Photo-acid generators (PAGs) are compounds that release acids ($H^+$ ions) when exposed to light.
no code implementations • 12 Nov 2021 • Igor Melnyk, Payel Das, Vijil Chenthamarakshan, Aurelie Lozano
Here we consider three recently proposed deep generative frameworks for protein design: (AR) the sequence-based autoregressive generative model, (GVP) the precise structure-based graph neural network, and Fold2Seq that leverages a fuzzy and scale-free representation of a three-dimensional fold, while enforcing structure-to-sequence (and vice versa) consistency.
1 code implementation • 24 Jun 2021 • Yue Cao, Payel Das, Vijil Chenthamarakshan, Pin-Yu Chen, Igor Melnyk, Yang shen
Designing novel protein sequences for a desired 3D topological fold is a fundamental yet non-trivial task in protein engineering.
1 code implementation • 17 Jun 2021 • Jerret Ross, Brian Belgodere, Vijil Chenthamarakshan, Inkit Padhi, Youssef Mroueh, Payel Das
Models based on machine learning can enable accurate and fast molecular property predictions, which is of interest in drug discovery and material design.
no code implementations • 8 Jun 2021 • Yair Schiff, Vijil Chenthamarakshan, Samuel Hoffman, Karthikeyan Natesan Ramamurthy, Payel Das
Deep generative models have emerged as a powerful tool for learning useful molecular representations and designing novel molecules with desired properties, with applications in drug discovery and material design.
no code implementations • 1 Jan 2021 • Norman Joseph Tatro, Payel Das, Pin-Yu Chen, Vijil Chenthamarakshan, Rongjie Lai
Empowered by the disentangled latent space learning, the extrinsic latent embedding is successfully used for classification or property prediction of different drugs bound to a specific protein.
1 code implementation • 3 Nov 2020 • Samuel Hoffman, Vijil Chenthamarakshan, Kahini Wadhawan, Pin-Yu Chen, Payel Das
Machine learning based methods have shown potential for optimizing existing molecules with more desirable properties, a critical step towards accelerating new chemical discovery.
no code implementations • NeurIPS Workshop TDA_and_Beyond 2020 • Yair Schiff, Vijil Chenthamarakshan, Karthikeyan Natesan Ramamurthy, Payel Das
In this work, we propose a method for measuring how well the latent space of deep generative models is able to encode structural and chemical features of molecular datasets by correlating latent space metrics with metrics from the field of topological data analysis (TDA).
no code implementations • 23 Sep 2020 • Kar Wai Lim, Bhanushee Sharma, Payel Das, Vijil Chenthamarakshan, Jonathan S. Dordick
Chemical toxicity prediction using machine learning is important in drug development to reduce repeated animal and human testing, thus saving cost and time.
2 code implementations • 22 May 2020 • Payel Das, Tom Sercu, Kahini Wadhawan, Inkit Padhi, Sebastian Gehrmann, Flaviu Cipcigan, Vijil Chenthamarakshan, Hendrik Strobelt, Cicero dos Santos, Pin-Yu Chen, Yi Yan Yang, Jeremy Tan, James Hedrick, Jason Crain, Aleksandra Mojsilovic
De novo therapeutic design is challenged by a vast chemical repertoire and multiple constraints, e. g., high broad-spectrum potency and low toxicity.
no code implementations • ACL 2020 • Inkit Padhi, Pierre Dognin, Ke Bai, Cicero Nogueira dos santos, Vijil Chenthamarakshan, Youssef Mroueh, Payel Das
Generative feature matching network (GFMN) is an approach for training implicit generative models for images by performing moment matching on features from pre-trained neural networks.
no code implementations • NeurIPS 2020 • Vijil Chenthamarakshan, Payel Das, Samuel C. Hoffman, Hendrik Strobelt, Inkit Padhi, Kar Wai Lim, Benjamin Hoover, Matteo Manica, Jannis Born, Teodoro Laino, Aleksandra Mojsilovic
CogMol also includes insilico screening for assessing toxicity of parent molecules and their metabolites with a multi-task toxicity classifier, synthetic feasibility with a chemical retrosynthesis predictor, and target structure binding with docking simulations.
no code implementations • ICLR Workshop DeepGenStruct 2019 • Tom Sercu, Sebastian Gehrmann, Hendrik Strobelt, Payel Das, Inkit Padhi, Cicero dos Santos, Kahini Wadhawan, Vijil Chenthamarakshan
We present the pipeline in an interactive visual tool to enable the exploration of the metrics, analysis of the learned latent space, and selection of the best model for a given task.
no code implementations • 12 Mar 2019 • Tian Gao, Jie Chen, Vijil Chenthamarakshan, Michael Witbrock
Though SSG is sequential in nature, it does not penalize the ordering of the appearance of the set elements and can be applied to a variety of set output problems, such as a set of classification labels or sequences.
no code implementations • 17 Oct 2018 • Payel Das, Kahini Wadhawan, Oscar Chang, Tom Sercu, Cicero dos Santos, Matthew Riemer, Vijil Chenthamarakshan, Inkit Padhi, Aleksandra Mojsilovic
Our model learns a rich latent space of the biological peptide context by taking advantage of abundant, unlabeled peptide sequences.
no code implementations • 24 May 2018 • Prasanna Sattigeri, Samuel C. Hoffman, Vijil Chenthamarakshan, Kush R. Varshney
In this paper, we introduce the Fairness GAN, an approach for generating a dataset that is plausibly similar to a given multimedia dataset, but is more fair with respect to protected attributes in allocative decision making.
no code implementations • 22 Apr 2018 • Md. Faisal Mahbub Chowdhury, Vijil Chenthamarakshan, Rishav Chakravarti, Alfio M. Gliozzo
State of the art approaches for (embedding based) unsupervised semantic search exploits either compositional similarity (of a query and a passage) or pair-wise word (or term) similarity (from the query and the passage).
no code implementations • 28 Jun 2015 • Vijil Chenthamarakshan, Prasad M Desphande, Raghu Krishnapuram, Ramakrishna Varadarajan, Knut Stolze
In traditional rule based IE frameworks, these layout cues are mapped to rules that operate on the HTML source of the webpages.