Extended Data Fig. 4: Toward out-of-distribution generation. | Nature Computational Science

Extended Data Fig. 4: Toward out-of-distribution generation.

From: PropMolFlow: property-guided molecule generation with geometry-complete flow matching

Extended Data Fig. 4: Toward out-of-distribution generation.

a, Distribution of DFT-calculated and GVP-predicted values for PropMolFlow generated molecules, and the property distribution of the QM9 training data is also included. The vertical black dashed line in histograms represents the target property value q0.99, which corresponds to the 99th quatile of training data distributions. Curves on top of histograms are fitted with a kernel density estimation. b, Three example molecules that do not exist in QM9 but are found in a larger PubChem dataset. Numbers below the configurations are DFT calculated property values on raw molecules generated by PropMolFlow models. C, H, O, N, and F are in gray, white, red, blue, and yellow colors, respectively. Property values for α, Δϵ, and Cv are in units of Bohr3, eV, and cal/(mol ⋅ K), respectively. c, Maximum Tanimoto similarity of generated and filtered molecules compared to the training data using a Morgan fingerprint. Dashed lines indicate the 0.8 similarity cutoff to define novel molecules.

Source data.

Back to article page