TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Single-step retrosynthesis	USPTO-50k	Chemformer	Top-1 accuracy	54.3	# 4
Single-step retrosynthesis	USPTO-50k	Chemformer	Top-5 accuracy	62.3	# 17
Single-step retrosynthesis	USPTO-50k	Chemformer	Top-10 accuracy	63.0	# 17

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/chemformer-a-pre-trained-transformer-for/single-step-retrosynthesis-on-uspto-50k)](https://paperswithcode.com/sota/single-step-retrosynthesis-on-uspto-50k?p=chemformer-a-pre-trained-transformer-for)`

Chemformer: a pre-trained transformer for computational chemistry

Machine Learning: Science and Technology 2022 · Ross Irwin, Spyridon Dimitriadis, Jiazhen He, Esben Jannik Bjerrum ·

Transformer models coupled with a simplified molecular line entry system (SMILES) have recently proven to be a powerful combination for solving challenges in cheminformatics. These models, however, are often developed specifically for a single application and can be very resource-intensive to train. In this work we present the Chemformer model—a Transformer-based model which can be quickly applied to both sequence-to-sequence and discriminative cheminformatics tasks. Additionally, we show that self-supervised pre-training can improve performance and significantly speed up convergence on downstream tasks. On direct synthesis and retrosynthesis prediction benchmark datasets we publish state-of-the-art results for top-1 accuracy. We also improve on existing approaches for a molecular optimisation task and show that Chemformer can optimise on multiple discriminative tasks simultaneously. Models, datasets and code will be made available after publication.

PDF Abstract