Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Brief Communication
  • Published:

PropMolFlow: property-guided molecule generation with geometry-complete flow matching

A preprint version of the article is available at arXiv.

Abstract

Molecule generation is advancing rapidly in chemical discovery and drug design. Flow-matching methods have recently set the state of the art (SOTA) in unconditional molecule generation, surpassing score-based diffusion models. However, diffusion models still lead in property-guided generation. In this work, we introduce PropMolFlow, an approach for property-guided molecule generation based on geometry-complete SE(3)-equivariant flow matching. Integrating five different property embedding methods with a Gaussian expansion of scalar properties, PropMolFlow achieves competitive performance against previous SOTA diffusion models in conditional molecule generation while maintaining high structural stability and validity. Additionally, it enables higher sampling speed with fewer time steps compared with baseline models. We highlight the importance of validating the properties of generated molecules through density functional theory calculations. Furthermore, we introduce a task to assess the model’s ability to propose molecules with under-represented property values, assessing its capacity for out-of-distribution generalization.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Overview of the PropMolFlow methodology.

Similar content being viewed by others

Data availability

Source data for Fig. 1d and Extended Data Figs. 1, 2b–d, 3 and 4a,c are available with this Brief Communication. All data can be found in the Zenodo repository36, including revised QM9 SDF data, SDF files for generated raw molecules, DFT-calculated structures and their molecular properties saved in the extxyz format, full property MAE and structural-validity results in CSV format for all saved PropMolFlow checkpoint models, sampled structures and their full PoseBusters results for all baseline models, and EGNN property-predictor pretrained models and notebook examples to use the property predictors.

Code availability

Our PropMolFlow implementation is available at https://github.com/Liu-Group-UF/PropMolFlow and on Zenodo40. A web interface is also provided for a simple demonstration at https://propmolflow-website.vercel.app.

References

  1. Sanchez-Lengeling, B. & Aspuru-Guzik, A. Inverse molecular design using machine learning: generative models for matter engineering. Science 361, 360–365 (2018).

    Article  Google Scholar 

  2. Hoogeboom, E., Satorras, V. G., Vignac, C. & Welling, M. Equivariant diffusion for molecule generation in 3D. Proc.Mach. Learn. Res. 162, 8867–8887 (2022).

  3. Satorras, V. G., Hoogeboom, E. & Welling, M. E(n) equivariant graph neural networks. Proc. Mach. Learn. Res. 139, 9323–9332 (2021).

  4. Xu, M., Powers, A. S., Dror, R. O., Ermon, S. & Leskovec, J. Geometric latent diffusion models for 3D molecule generation. Proc. Mach. Learn. Res. 202, 38592–38610 (2023).

  5. Liu, X., Gong, C. & Liu, Q. Flow straight and fast: learning to generate and transfer data with rectified flow. In Eleventh International Conference on Learning Representations (ICLR, 2023); https://openreview.net/forum?id=XVjTT1nw5z

  6. Lipman, Y., Chen, R. T. Q., Ben-Hamu, H., Nickel, M. & Le, M. Flow matching for generative modeling. In Eleventh International Conference on Learning Representations (ICLR, 2023); https://openreview.net/forum?id=PqvMRDCJT9t

  7. Albergo, M. S. & Vanden-Eijnden, E. Building normalizing flows with stochastic interpolants. In Eleventh International Conference on Learning Representations (ICLR, 2023); https://openreview.net/forum?id=li7qeBbCR1t

  8. Höllmer, P. et al. Open materials generation with stochastic interpolants. In International Conference on Machine Learning 23417–23450 (PMLR, 2025).

  9. Campbell, A., Yim, J., Barzilay, R., Rainforth, T. & Jaakkola, T. Generative flows on discrete state-spaces: enabling multimodal flows with applications to protein co-design. In International Conference on Machine Learning 5453–5512 (PMLR, 2024).

  10. Dunn, I. & Koes, D. R. Exploring discrete flow matching for 3D de novo molecule generation. Preprint at https://arxiv.org/abs/2411.16644 (2024).

  11. Song, Y. et al. Equivariant flow matching with hybrid probability transport for 3D molecule generation. Adv. Neural Inf. Process. Syst. 36, 549–568 (2023).

    Google Scholar 

  12. Dumitrescu, A. et al. E(3)-equivariant models cannot learn chirality: field-based molecular generation. In Thirteenth International Conference on Learning Representations (ICLR, 2025); https://openreview.net/forum?id=mXHTifc1Fn

  13. Vignac, C., Osman, N., Toni, L. & Frossard, P. MiDi: mixed graph and 3D denoising diffusion for molecule generation. In Machine Learning and Knowledge Discovery in Databases: Research Track (eds Koutra, D. et al.) 560–576 (Springer, 2023).

  14. Morehead, A. & Cheng, J. Geometry-complete diffusion for 3D molecule generation and optimization. Commun. Chem. 7, 150 (2024).

    Article  Google Scholar 

  15. Ramakrishnan, R., Dral, P. O., Rupp, M. & Von Lilienfeld, O. A. Quantum chemistry structures and properties of 134 kilo molecules. Sci. Data 1, 140022 (2014).

    Article  Google Scholar 

  16. Wu, Z. et al. MoleculeNet: a benchmark for molecular machine learning. Chem. Sci. 9, 513–530 (2018).

    Article  Google Scholar 

  17. Gebauer, N. W. A., Gastegger, M., Hessmann, S. S. P., Müller, K.-R. & Schütt, K. T. Inverse design of 3D molecular structures with conditional generative neural networks. Nat. Commun. 13, 973 (2022).

    Article  Google Scholar 

  18. Jing, B., Eismann, S., Suriana, P., Townshend, R. J. L. & Dror, R. Learning from protein structure with geometric vector perceptrons. In International Conference on Learning Representations (ICLR, 2021); https://openreview.net/forum?id=1YLJDvSx6J4

  19. Bao, F. et al. Equivariant energy-guided SDE for inverse molecular design. In Eleventh International Conference on Learning Representations (ICLR, 2023); https://openreview.net/forum?id=r0otLtOwYW

  20. Huang, H., Sun, L., Du, B. & Lv, W. Learning joint 2-D and 3-D graph diffusion models for complete molecule generation. IEEE Trans. Neural Netw. Learn. Syst. 35, 11857–11871 (2024).

    Article  Google Scholar 

  21. Landrum, G. et al. RDKit: Open-Source Cheminformatics Software (RDKit, 2016).

  22. Buttenschoen, M., Morris, G. M. & Deane, C. M. PoseBusters: AI-based docking methods fail to generate physically valid poses or generalise to novel sequences. Chem. Sci. 15, 3130–3139 (2024).

    Article  Google Scholar 

  23. Morgan, H. L. The generation of a unique machine description for chemical structures—a technique developed at Chemical Abstracts Service. J. Chem. Doc. 5, 107–113 (1965).

    Article  Google Scholar 

  24. Kim, S. et al. PubChem in 2021: new data content and improved web interfaces. Nucleic Acids Res. 49, D1388–D1395 (2021).

    Article  Google Scholar 

  25. Ho, J. & Salimans, T. Classifier-free diffusion guidance. In NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications (2021); https://openreview.net/forum?id=qw8AKxfYbI

  26. Karras, T. et al. Guiding a diffusion model with a bad version of itself. Adv. Neural Inf. Process. Syst. 37, 52996–53021 (2024).

  27. Boffi, N. M., Albergo, M. S. & Vanden-Eijnden, E. How to build a consistency model: learning flow maps via self-distillation. In Thirty-ninth Annual Conference on Neural Information Processing Systems (2025); https://openreview.net/forum?id=Di5apl8HSH

  28. Liu, G.-H., Choi, J., Chen, Y., Miller, B. K. & Chen, R. T. Q. Adjoint Schrödinger bridge sampler. In Thirty-ninth Annual Conference on Neural Information Processing Systems (2025); https://openreview.net/forum?id=rMhQBlhh4c

  29. Dunn, I. & Koes, D. R. FlowMol3: flow matching for 3D de novo small-molecule generation. Preprint at https://arxiv.org/abs/2508.12629 (2025).

  30. Dunn, I. & Koes, D. R. Mixed continuous and categorical flow matching for 3D de novo molecule generation. Preprint at https://arxiv.org/abs/2404.19739 (2024).

  31. Gat, I. et al. Discrete flow matching. Adv. Neural Inf. Process. Syst. 37, 133345–133385 (2024).

  32. Le, T., Cremer, J., Noe, F., Clevert, D.-A. & Schütt, K. T. Navigating the design space of equivariant diffusion-based generative models for de novo 3D molecule generation. In Twelfth International Conference on Learning Representations (ICLR, 2024); https://openreview.net/forum?id=kzGuiRXZrQ

  33. Axelrod, S. & Gómez-Bombarelli, R. GEOM, energy-annotated molecular conformations for property prediction and molecular generation. Sci. Data 9, 185 (2022).

    Article  Google Scholar 

  34. Becke, A. D. Density-functional thermochemistry. III. The role of exact exchange. J. Chem. Phys. 98, 5648–5652 (1993).

    Article  Google Scholar 

  35. Lee, C., Yang, W. & Parr, R. G. Development of the Colle–Salvetti correlation-energy formula into a functional of the electron density. Phys. Rev. B 37, 785–789 (1988).

    Article  Google Scholar 

  36. Zeng, C., Jin, J. & Liu, M. PropMolFlow: property-guided molecule generation with geometry-complete flow matching. Zenodo https://doi.org/10.5281/zenodo.17726328 (2025).

  37. Moret, M. et al. Leveraging molecular structure and bioactivity with chemical language models for de novo drug design. Nat. Commun. 14, 114 (2023).

    Article  Google Scholar 

  38. O’Boyle, N. M. et al. Open Babel: an open chemical toolbox. J. Cheminform. 3, 33 (2011).

    Article  Google Scholar 

  39. Frisch, M. J. et al. Gaussian 16 revision C.01 (2016).

  40. Zeng, C., Jin, J. & Mingjie, L. PropMolFlow code. Zenodo https://doi.org/10.5281/zenodo.17702415 (2025).

Download references

Acknowledgements

We acknowledge funding from NSF Grant OAC-2311632 and from the AI and Complex Computational Research Award of the University of Florida. S.M. also acknowledges support from the Simons Center for Computational Physical Chemistry (Simons Foundation grant 839534). We also acknowledge UFIT Research Computing for computational resources and consultation, as well as the Nvidia AI Technology Center at UF. We would like to thank A. Morehead (University of Missouri) for providing the checkpoint models for the conditional generation of GeoLDM.

Author information

Authors and Affiliations

Authors

Contributions

C.Z., J.J. and M.L. conceptualized the study. C.Z. and J.J. developed the methodology, implemented the code and ran the computational experiments under the guidance of M.L. C.A. developed the PropMolFlow demo web interface under the guidance of C.Z. and M.L. M.T., G.K., E.B.T., R.G.H., A.R. and S.M. provided guidance for various parts of this project. C.Z., M.L. and S.M. wrote the original draft, and all authors participated in the discussion and review of the paper.

Corresponding authors

Correspondence to Stefano Martiniani or Mingjie Liu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Computational Science thanks Andreas Luttens and Arne Schneuing for their contribution to the peer review of this work. Primary Handling Editor: Kaitlin McCardle, in collaboration with the Nature Computational Science team. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Chemical validity and sampling efficiency of PropMolFlow against five baseline models.

a, Molecule stability. b, RDKit validity. c, Uniqueness. d, Closed-shell ratio. e, PoseBusters validity. f, Sampling time. The y-axis for sampling time uses a broken scale to expand 0–150 min and compress 150–360 min by a ratio of 5 for visual clarity. Each box plot summarizes the metric values computed for six molecular properties (n = 6) and 10000 sampled molecules. The median is shown as a solid line. The edges of the box correspond to the first and third quartiles, and the whiskers extend to values within 1.5 × interquartile ranges. All individual data points are overlaid as black dots. PropMolFlow results use the top-performing models of each property in the ID tasks for property alignment.

Source data.

Extended Data Fig. 2 Performance of GVP property predictors without and with DFT relaxation.

a, Comparison between Target, DFT and GVP shows the reliability of GVP to evaluate the MAE metrics commonly used for property-guided generation. Comparison between GVP and GVP-R, and between DFT and DFT-R shows the structural dependence of GVP-predicted and DFT-predicted properties, respectively. Comparison between GVP and DFT with and without relaxation shows the reliability of GVP in capturing the ground-truth DFT values over raw and relaxed structures. ‘Target’, ‘DFT’ and ‘GVP’ are respective input, DFT-calculated and GVP-predicted property values on raw molecules. ‘DFT-R’ and ‘GVP-R’ refer to values evaluated on DFT-relaxed molecules. The ‘d’ indicates the MAE distances between two property value vectors. b, Pairwise comparison between Target, DFT and GVP on raw molecules. c, GVP versus DFT for both raw and DFT-relaxed molecules. d, Root mean squared distances (RMSDs) and normalized property sensitivity due to DFT relaxation. Molecules with the highest RMSDs for each property and their DFT and GVP values are shown. In molecule representation, gray, red, blue, and white colors indicate C, O, N, and H, respectively. Property values for α, Δϵ, and Cv are in units of Bohr3, eV, and cal/(mol K), respectively.

Source data.

Extended Data Fig. 3 Interpolation study by varying property values.

The minimum and maximum target properties in red and corresponding DFT-calculated properties in blue are provided below the configurations. All molecules chosen here pass the filtering criteria and have DFT values closest to the target properties. C, H, O, and N are in gray, white, red, and blue colors, respectively. Property units are provided in the square brackets under each property symbol.

Source data.

Extended Data Fig. 4 Toward out-of-distribution generation.

a, Distribution of DFT-calculated and GVP-predicted values for PropMolFlow generated molecules, and the property distribution of the QM9 training data is also included. The vertical black dashed line in histograms represents the target property value q0.99, which corresponds to the 99th quatile of training data distributions. Curves on top of histograms are fitted with a kernel density estimation. b, Three example molecules that do not exist in QM9 but are found in a larger PubChem dataset. Numbers below the configurations are DFT calculated property values on raw molecules generated by PropMolFlow models. C, H, O, N, and F are in gray, white, red, blue, and yellow colors, respectively. Property values for α, Δϵ, and Cv are in units of Bohr3, eV, and cal/(mol K), respectively. c, Maximum Tanimoto similarity of generated and filtered molecules compared to the training data using a Morgan fingerprint. Dashed lines indicate the 0.8 similarity cutoff to define novel molecules.

Source data.

Supplementary information

Supplementary Information (download PDF )

Supplementary Figs. 1–14, Tables 1–14, proofs, extended results and discussions, and additional references.

Peer Review file (download PDF )

Source data

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zeng, C., Jin, J., Ambrose, C. et al. PropMolFlow: property-guided molecule generation with geometry-complete flow matching. Nat Comput Sci 6, 233–242 (2026). https://doi.org/10.1038/s43588-025-00946-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1038/s43588-025-00946-y

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics