Text-based de novo Molecule Generation
11 papers with code • 1 benchmarks • 3 datasets
Text-based de novo molecule generation involves utilizing natural language processing (NLP) techniques and chemical information to generate entirely new molecular structures. In this approach, molecular structures are typically encoded as text strings, resembling chemical formulas or SMILES (Simplified Molecular Input Line Entry System). Subsequently, by applying NLP models such as recurrent neural networks (RNNs) or Transformer models, these text strings are processed to generate novel molecular structures with desired properties.
Most implemented papers
MolFM: A Multimodal Molecular Foundation Model
In this study, we introduce MolFM, a multimodal molecular foundation model designed to facilitate joint representation learning from molecular structures, biomedical texts, and knowledge graphs.
Translation between Molecules and Natural Language
We present $\textbf{MolT5}$ $-$ a self-supervised learning framework for pretraining models on a vast amount of unlabeled natural language text and molecule strings.
Unifying Molecular and Textual Representations via Multi-task Language Modelling
Here, we propose the first multi-domain, multi-task language model that can solve a wide range of tasks in both the chemical and natural language domains.
Empowering Molecule Discovery for Molecule-Caption Translation with Large Language Models: A ChatGPT Perspective
In this work, we propose a novel LLM-based framework (MolReGPT) for molecule-caption translation, where an In-Context Few-Shot Molecule Learning paradigm is introduced to empower molecule discovery with LLMs like ChatGPT to perform their in-context learning capability without domain-specific pre-training and fine-tuning.
GIT-Mol: A Multi-modal Large Language Model for Molecular Science with Graph, Image, and Text
Large language models have made significant strides in natural language processing, enabling innovative applications in molecular science by processing textual representations of molecules.
BioT5: Enriching Cross-modal Integration in Biology with Chemical Knowledge and Natural Language Associations
Recent advancements in biological research leverage the integration of molecules, proteins, and natural language to enhance drug discovery.
Text-Guided Molecule Generation with Diffusion Language Model
In this work, we propose the Text-Guided Molecule Generation with Diffusion Language Model (TGM-DLM), a novel approach that leverages diffusion models to address the limitations of autoregressive methods.
BioT5+: Towards Generalized Biological Understanding with IUPAC Integration and Multi-task Tuning
However, previous efforts like BioT5 faced challenges in generalizing across diverse tasks and lacked a nuanced understanding of molecular structures, particularly in their textual representations (e. g., IUPAC).
LDMol: Text-to-Molecule Diffusion Model with Structurally Informative Latent Space
With the emergence of diffusion models as the frontline of generative models, many researchers have proposed molecule generation techniques with conditional diffusion models.
A Bayesian Flow Network Framework for Chemistry Tasks
In this work, we introduce ChemBFN, a language model that handles chemistry tasks based on Bayesian flow networks working on discrete data.