Energy-based View of Retrosynthesis
Retrosynthesis -- the process of identifying a set of reactants to synthesize a target molecule -- is of vital importance to material design and drug discovery. Existing machine learning approaches based on language models and graph neural networks have achieved encouraging results. In this paper, we propose a framework that unifies sequence- and graph-based methods as energy-based models (EBMs) with different energy functions. This unified perspective provides critical insights about EBM variants through a comprehensive assessment of performance. Additionally, we present a novel dual variant within the framework that performs consistent training over Bayesian forward- and backward-prediction by constraining the agreement between the two directions. This model improves state-of-the-art performance by 9.6% for template-free approaches where the reaction type is unknown.
PDF AbstractDatasets
Task | Dataset | Model | Metric Name | Metric Value | Global Rank | Benchmark |
---|---|---|---|---|---|---|
Single-step retrosynthesis | USPTO-50k | Dual-TF | Top-1 accuracy | 55.3 | # 1 | |
Top-3 accuracy | 69.7 | # 8 | ||||
Top-5 accuracy | 73.0 | # 10 | ||||
Top-10 accuracy | 75.0 | # 12 | ||||
Single-step retrosynthesis | USPTO-50k | Dual-TB | Top-1 accuracy | 55.2 | # 2 | |
Top-3 accuracy | 74.6 | # 4 | ||||
Top-5 accuracy | 80.5 | # 6 | ||||
Top-10 accuracy | 86.9 | # 4 |