Text-based de novo Molecule Generation

8 papers with code • 1 benchmarks • 1 datasets

Text-based de novo molecule generation involves utilizing natural language processing (NLP) techniques and chemical information to generate entirely new molecular structures. In this approach, molecular structures are typically encoded as text strings, resembling chemical formulas or SMILES (Simplified Molecular Input Line Entry System). Subsequently, by applying NLP models such as recurrent neural networks (RNNs) or Transformer models, these text strings are processed to generate novel molecular structures with desired properties.

Datasets


Most implemented papers

MolFM: A Multimodal Molecular Foundation Model

biofm/openbiomed 6 Jun 2023

In this study, we introduce MolFM, a multimodal molecular foundation model designed to facilitate joint representation learning from molecular structures, biomedical texts, and knowledge graphs.

Translation between Molecules and Natural Language

blender-nlp/MolT5 25 Apr 2022

We present $\textbf{MolT5}$ $-$ a self-supervised learning framework for pretraining models on a vast amount of unlabeled natural language text and molecule strings.

Unifying Molecular and Textual Representations via Multi-task Language Modelling

gt4sd/multitask_text_and_chemistry_t5 29 Jan 2023

Here, we propose the first multi-domain, multi-task language model that can solve a wide range of tasks in both the chemical and natural language domains.

Empowering Molecule Discovery for Molecule-Caption Translation with Large Language Models: A ChatGPT Perspective

phenixace/molregpt 11 Jun 2023

In this work, we propose a novel LLM-based framework (MolReGPT) for molecule-caption translation, where an In-Context Few-Shot Molecule Learning paradigm is introduced to empower molecule discovery with LLMs like ChatGPT to perform their in-context learning capability without domain-specific pre-training and fine-tuning.

GIT-Mol: A Multi-modal Large Language Model for Molecular Science with Graph, Image, and Text

ai-hpc-research-team/git-mol 14 Aug 2023

Large language models have made significant strides in natural language processing, enabling innovative applications in molecular science by processing textual representations of molecules.

BioT5: Enriching Cross-modal Integration in Biology with Chemical Knowledge and Natural Language Associations

QizhiPei/BioT5 11 Oct 2023

Recent advancements in biological research leverage the integration of molecules, proteins, and natural language to enhance drug discovery.

Text-Guided Molecule Generation with Diffusion Language Model

deno-v/tgm-dlm 20 Feb 2024

In this work, we propose the Text-Guided Molecule Generation with Diffusion Language Model (TGM-DLM), a novel approach that leverages diffusion models to address the limitations of autoregressive methods.

BioT5+: Towards Generalized Biological Understanding with IUPAC Integration and Multi-task Tuning

QizhiPei/BioT5 27 Feb 2024

However, previous efforts like BioT5 faced challenges in generalizing across diverse tasks and lacked a nuanced understanding of molecular structures, particularly in their textual representations (e. g., IUPAC).