Nature Communications

Table 1 Top-k accuracies of RSGPT model and baselines on USPTO-50k⁶⁰ dataset with reaction class unknown

From: RSGPT: a generative transformer model for retrosynthesis planning pre-trained on ten billion datapoints

Model	Reaction class unknown
Model	Top-1	Top-3	Top-5	Top-10
*Template-based*
RetroSim⁶¹	37.3	54.7	63.6	74.1
NeuralSym⁶²	44.4	65.3	72.4	78.9
GLN⁸	52.5	69.0	75.6	83.7
LocalRetro⁵²	53.4	77.5	85.9	92.4
RetroComposer⁹	54.5	77.2	83.2	87.7
*Semi-template-based*
G2G⁶³	48.9	67.6	72.5	75.5
RetroXpert⁵¹	50.4	61.1	62.3	63.4
RetroPrime⁶⁴	51.4	70.8	74.0	76.1
G²Retro⁶⁵	54.1	74.1	81.2	86.7
SemiRetro¹²	54.9	75.3	80.4	84.1
Graph2Edits¹³	55.1	77.3	83.4	89.4
*Template-free*
SCROP¹⁷	43.7	60.0	65.2	68.7
MEGAN⁶⁶	48.1	70.7	78.4	86.1
Graph2SMILES¹⁸	52.9	66.5	70.0	72.9
R-SMILES⁴³	56.3	79.2	86.2	91.0
NAG2G¹⁰	55.1	76.9	83.4	89.9
EditRetro⁴⁵ (×20) ^a	60.8	80.6	86.0	90.3
RSGPT	63.4	84.2	89.2	93.0
RSGPT (×20) ^a	77.0	90.9	94.3	96.7

^a Twenty-fold augmentation was implemented for SMILES in both the training and test sets.
The performance regarding existing methods is derived from their references. The best-performing results are marked in bold. The different types of models are distinguished by the italicized terms “Template-based”, “Semi-template-based”, and “Template-free”.

Back to article page

Search

Advanced search

Quick links