InstructMol: Multi-Modal Integration for Building a Versatile and Reliable Molecular Assistant in Drug Discovery

27 Nov 2023  ·  He Cao, Zijing Liu, Xingyu Lu, Yuan YAO, Yu Li ·

The rapid evolution of artificial intelligence in drug discovery encounters challenges with generalization and extensive training, yet Large Language Models (LLMs) offer promise in reshaping interactions with complex molecular data. Our novel contribution, InstructMol, a multi-modal LLM, effectively aligns molecular structures with natural language via an instruction-tuning approach, utilizing a two-stage training strategy that adeptly combines limited domain-specific data with molecular and textual information. InstructMol showcases substantial performance improvements in drug discovery-related molecular tasks, surpassing leading LLMs and significantly reducing the gap with specialized models, thereby establishing a robust foundation for a versatile and dependable drug discovery assistant.

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Molecule Captioning ChEBI-20 InstructMol-GS BLEU-2 47.5 # 20
BLEU-4 37.1 # 20
ROUGE-1 56.6 # 16
ROUGE-2 39.4 # 17
ROUGE-L 50.2 # 17
METEOR 50.9 # 20
Molecule Captioning ChEBI-20 InstructMol-G BLEU-2 46.6 # 21
BLEU-4 36.5 # 21
ROUGE-1 54.7 # 17
ROUGE-2 36.5 # 18
ROUGE-L 47.9 # 18
METEOR 49.1 # 21

Methods


No methods listed for this paper. Add relevant methods here