Text-based de novo Molecule Generation

8 papers with code • 1 benchmarks • 1 datasets

Text-based de novo molecule generation involves utilizing natural language processing (NLP) techniques and chemical information to generate entirely new molecular structures. In this approach, molecular structures are typically encoded as text strings, resembling chemical formulas or SMILES (Simplified Molecular Input Line Entry System). Subsequently, by applying NLP models such as recurrent neural networks (RNNs) or Transformer models, these text strings are processed to generate novel molecular structures with desired properties.

Benchmarks

Add a Result

These leaderboards are used to track progress in Text-based de novo Molecule Generation

Trend	Dataset	Best Model	Paper	Code	Compare
	ChEBI-20	BioT5+			See all

Datasets

ChEBI-20

Most implemented papers

Most implemented Social Latest No code

MolFM: A Multimodal Molecular Foundation Model

biofm/openbiomed • • 6 Jun 2023

In this study, we introduce MolFM, a multimodal molecular foundation model designed to facilitate joint representation learning from molecular structures, biomedical texts, and knowledge graphs.

Paper
Code

Translation between Molecules and Natural Language

blender-nlp/MolT5 • • 25 Apr 2022

We present $\textbf{MolT5}$ $-$ a self-supervised learning framework for pretraining models on a vast amount of unlabeled natural language text and molecule strings.

Paper
Code

Unifying Molecular and Textual Representations via Multi-task Language Modelling

gt4sd/multitask_text_and_chemistry_t5 • • 29 Jan 2023

Here, we propose the first multi-domain, multi-task language model that can solve a wide range of tasks in both the chemical and natural language domains.

Paper
Code

Empowering Molecule Discovery for Molecule-Caption Translation with Large Language Models: A ChatGPT Perspective

phenixace/molregpt • • 11 Jun 2023

In this work, we propose a novel LLM-based framework (MolReGPT) for molecule-caption translation, where an In-Context Few-Shot Molecule Learning paradigm is introduced to empower molecule discovery with LLMs like ChatGPT to perform their in-context learning capability without domain-specific pre-training and fine-tuning.

Paper
Code

GIT-Mol: A Multi-modal Large Language Model for Molecular Science with Graph, Image, and Text

ai-hpc-research-team/git-mol • • 14 Aug 2023

Large language models have made significant strides in natural language processing, enabling innovative applications in molecular science by processing textual representations of molecules.

Paper
Code

BioT5: Enriching Cross-modal Integration in Biology with Chemical Knowledge and Natural Language Associations

QizhiPei/BioT5 • • 11 Oct 2023

Recent advancements in biological research leverage the integration of molecules, proteins, and natural language to enhance drug discovery.

Paper
Code

Text-Guided Molecule Generation with Diffusion Language Model

deno-v/tgm-dlm • • 20 Feb 2024

In this work, we propose the Text-Guided Molecule Generation with Diffusion Language Model (TGM-DLM), a novel approach that leverages diffusion models to address the limitations of autoregressive methods.

Paper
Code

BioT5+: Towards Generalized Biological Understanding with IUPAC Integration and Multi-task Tuning

QizhiPei/BioT5 • • 27 Feb 2024

However, previous efforts like BioT5 faced challenges in generalizing across diverse tasks and lacked a nuanced understanding of molecular structures, particularly in their textual representations (e. g., IUPAC).

Paper
Code

Text-based de novo Molecule Generation

Benchmarks Add a Result

Datasets

Most implemented papers

Content

Benchmarks

Add a Result