Extended Data Fig. 1: Distribution of types of reactions in the USPTO-50k and ORD datasets.
From: Large language models to accelerate organic chemistry synthesis

(a) Data organization of USPTO-50k and ORD datasets. All of reactions in USPTO-50k are from patents in the United States; Most of reactions in ORD are from literature. (b-c) Power law fitting of the reactant distribution in the USPTO-50k and the catalyst distribution in the ORD, where the shallow points show the probability density and the deep dashed-line shows the ideal power-law fitting, respectively. (d-e) The bar charts of fifteen most common reactants and catalysts in the USPTO-50k and ORD, respectively. The shallow color presents the decimal-scale proportion and the deep color presents the log-scale count.