Fig. 2: Illustration of the generation of synthetic data using RDChiral and synthetic data.

a Method for generating synthetic data using RDChiral. Molecules from PubChem40, ChEMBL41, and Enamine42 were fragmented to submolecules. Submolecules were then matched with reaction centers of templates, as shown in the grey shaded part, and complete reactions were generated based on corresponding templates by concatenating reactants SMILES and products SMILES into a complete reaction text. b Examples of synthetic data. Case 1 is a coupling reaction, and Case 2 is a nucleophilic substitution reaction. The templates, reaction SMILES, and visualized reaction schemes for both examples are displayed.