Improving diffusion-based protein backbone generation with global-geometry-aware latent encoding

Zhang, Yuyang; Liu, Yuhang; Ma, Zinnia; Li, Min; Xu, Chunfu; Gong, Haipeng

doi:10.1038/s42256-025-01059-x

Article
Published: 18 June 2025

Improving diffusion-based protein backbone generation with global-geometry-aware latent encoding

Nature Machine Intelligence volume 7, pages 1104–1118 (2025)Cite this article

2500 Accesses
1 Citations
1 Altmetric
Metrics details

Subjects

A preprint version of the article is available at bioRxiv.

Abstract

The global structural properties of a protein, such as shape, fold and topology, strongly affect its function. Although recent breakthroughs in diffusion-based generative models have greatly advanced de novo protein design, particularly in generating diverse and realistic structures, it remains challenging to design proteins of specific geometries without residue-level control over the topological details. A more practical, top-down approach is needed for prescribing the overall geometric arrangements of secondary structure elements in the generated protein structures. In response, we propose TopoDiff, an unsupervised framework that learns and exploits a global-geometry-aware latent representation, enabling both unconditional and controllable diffusion-based protein generation. Trained on the Protein Data Bank and CATH datasets, the structure encoder embeds protein global geometries into a 32-dimensional latent space, from which latent codes sampled by the latent sampler serve as informative conditions for the diffusion-based backbone decoder. In benchmarks against existing baselines, TopoDiff demonstrates comparable performance on established metrics including designability, diversity and novelty, as well as markedly improves coverage over the fold types of natural proteins in the CATH dataset. Moreover, latent conditioning enables versatile manipulations at the global-geometry level to control the generated protein structures, through which we derived a number of novel folds of mainly beta proteins with comprehensive experimental validation.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 2: Analysis of TopoDiff’s learned latent representations.**

**Fig. 3: Evaluation of TopoDiff’s generative performance for unconditional sampling.**

**Fig. 4: Exploring controllable protein structure generation with TopoDiff.**

**Fig. 5: Experimental validation of novel mainly beta protein designs.**

Protein structure generation via folding diffusion

Article Open access 05 February 2024

Deep generative models of protein structure uncover distant relationships across a continuous fold space

Article Open access 16 September 2024

Mask-prior-guided denoising diffusion improves inverse protein folding

Article Open access 16 June 2025

Data availability

The dataset used for model training, along with the trained model weights, benchmark data and protein designs selected for experimental validation, is available via Zenodo at https://zenodo.org/records/13879811 (ref. ⁹⁰). The crystal structure models have been deposited in the Protein Data Bank (accession codes 9KGZ and 9KGY). Source data are provided with this paper.

Code availability

The TopoDiff model is implemented in PyTorch. Full scripts (including the training code) and guidance for utilizing the model are available via GitHub at https://github.com/meneshail/TopoDiff/tree/main (ref. ⁹¹). A reproducible code capsule of TopoDiff is available via CodeOcean at https://doi.org/10.24433/CO.8705528.v1 (ref. ⁹²).

References

Chevalier, A. et al. Massively parallel de novo protein design for targeted therapeutics. Nature 550, 74–79 (2017).
Article Google Scholar
Silva, D.-A. et al. De novo design of potent and selective mimics of IL-2 and IL-15. Nature 565, 186–191 (2019).
Article Google Scholar
Roy, A. et al. De novo design of highly selective miniprotein inhibitors of integrins avβ6 and avβ8. Nat. Commun. 14, 5660 (2023).
Article Google Scholar
Yeh, A. H.-W. et al. De novo design of luciferases using deep learning. Nature 614, 774–780 (2023).
Article Google Scholar
Langan, R. A. et al. De novo design of bioactive protein switches. Nature 572, 205–210 (2019).
Article Google Scholar
Chen, Z. et al. De novo design of protein logic gates. Science 368, 78–84 (2020).
Article Google Scholar
Pan, X. & Kortemme, T. Recent advances in de novo protein design: principles, methods, and applications. J. Biol. Chem. 296, 100558 (2021).
Article Google Scholar
Wu, K. E. et al. Protein structure generation via folding diffusion. Nat. Commun. 15, 1059 (2024).
Article Google Scholar
Ni, B., Kaplan, D. L. & Buehler, M. J. Generative design of de novo proteins based on secondary-structure constraints using an attention-based diffusion model. Chem 9, 1828–1849 (2023).
Article Google Scholar
Lee, J. S., Kim, J. & Kim, P. M. Score-based generative modeling for de novo protein design. Nat. Comput. Sci. 3, 382–392 (2023).
Article Google Scholar
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Article Google Scholar
Baek, M. et al. Efficient and accurate prediction of protein structure using RoseTTAFold2. Preprint at bioRxiv https://doi.org/10.1101/2023.05.24.542179 (2023).
Anand, N. & Achim, T. Protein structure and sequence generation with equivariant denoising diffusion probabilistic models. Preprint at https://arxiv.org/abs/2205.15019 (2022).
Luo, S. et al. Antigen-specific antibody design and optimization with diffusion-based generative models for protein structures. In Proc. 36th International Conference on Neural Information Processing Systems (eds Koyejo, S. et al.) 9754–9767 (Curran Associates Inc., 2022).
Yim, J. et al. SE(3) diffusion model with application to protein backbone generation. In Proc. 40th International Conference on Machine Learning (eds Krause, A. et al.) 40001–40039 (JMLR.org, 2023).
Lin, Y. & AlQuraishi, M. Generating novel, designable, and diverse protein structures by equivariantly diffusing oriented residue clouds. In Proc. 40th International Conference on Machine Learning (eds Krause, A. et al.) 20978–21002 (PMLR, 2023).
Watson, J. L. et al. De novo design of protein structure and function with RFDiffusion. Nature 620, 1089–1100 (2023).
Article Google Scholar
Berman, H. M. et al. The Protein Data Bank. Nucleic Acids Res. 28, 235–242 (2000).
Article Google Scholar
Orengo, C. A. et al. CATH—a hierarchic classification of protein domain structures. Structure 5, 1093–1108 (1997).
Article Google Scholar
Sillitoe, I. et al. CATH: increased structural coverage of functional space. Nucleic Acids Res. 49, D266–D273 (2021).
Article Google Scholar
Bennett, N. R. et al. Atomically accurate de novo design of single-domain antibodies. Preprint at https://doi.org/10.1101/2024.03.14.585103 (2024).
Ingraham, J. B. et al. Illuminating protein space with a programmable generative model. Nature 623, 1070–1078 (2023).
Sadreyev, R. I., Kim, B.-H. & Grishin, N. V. Discrete-continuous duality of protein structure space. Curr. Opin. Struct. Biol. 19, 321–328 (2009).
Article Google Scholar
Pascual-García, A., Abia, D., Ortiz, A. R. & Bastolla, U. Cross-over between discrete and continuous protein structure space: insights into automatic classification and networks of protein structures. PLoS Comput. Biol. 5, e1000331 (2009).
Article Google Scholar
Martin, A. C. et al. Protein folds and functions. Structure 6, 875–884 (1998).
Article Google Scholar
Hegyi, H. & Gerstein, M. The relationship between protein structure and function: a comprehensive survey with application to the yeast genome 1. J. Mol. Biol. 288, 147–164 (1999).
Article Google Scholar
Micheletti, C. Prediction of folding rates and transition-state placement from native-state geometry. Proteins 51, 74–84 (2003).
Article Google Scholar
Wang, J. & Panagiotou, E. The protein folding rate and the geometry and topology of the native state. Sci. Rep. 12, 6384 (2022).
Article Google Scholar
Luo, C. Understanding diffusion models: a unified perspective. Preprint at https://arxiv.org/abs/2208.11970 (2022).
Maaten, L. v. d. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).
Google Scholar
Murzin, A. G., Brenner, S. E., Hubbard, T. & Chothia, C. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247, 536–540 (1995).
Article Google Scholar
Hubbard, T. J., Murzin, A. G., Brenner, S. E. & Chothia, C. SCOP: a structural classification of proteins database. Nucleic Acids Res. 25, 236–239 (1997).
Article Google Scholar
Cheng, H. et al. ECOD: an evolutionary classification of protein domains. PLoS Comput. Biol. 10, e1003926 (2014).
Article Google Scholar
Day, R., Beck, D. A., Armen, R. S. & Daggett, V. A consensus view of fold space: combining SCOP, CATH, and the Dali Domain Dictionary. Protein Sci. 12, 2150–2160 (2003).
Article Google Scholar
Csaba, G., Birzele, F. & Zimmer, R. Systematic comparison of SCOP and CATH: a new gold standard for protein structure analysis. BMC Struct. Biol. 9, 23 (2009).
Article Google Scholar
Schaeffer, R. D., Kinch, L. N., Pei, J., Medvedev, K. E. & Grishin, N. V. Completeness and consistency in structural domain classifications. ACS Omega 6, 15698–15707 (2021).
Article Google Scholar
Mura, C., Veretnik, S. & Bourne, P. E. The Urfold: structural similarity just above the superfold level? Protein Sci. 28, 2119–2126 (2019).
Article Google Scholar
Kynkäänniemi, T., Karras, T., Laine, S., Lehtinen, J. & Aila, T. Improved precision and recall metric for assessing generative models. In Proc. 33rd International Conference on Neural Information Processing Systems (eds Wallach, H. M. et al.) 3927–3936 (Curran Associates Inc., 2019).
Listov, D., Goverde, C. A., Correia, B. E. & Fleishman, S. J. Opportunities and challenges in design and optimization of protein function. Nat. Rev. Mol. Cell Biol. 25, 639–653 (2024).
Chu, A. E., Lu, T. & Huang, P.-S. Sparks of function by de novo protein design. Nat. Biotechnol. 42, 203–215 (2024).
Article Google Scholar
Krishna, R. et al. Generalized biomolecular modeling and design with RoseTTAFold All-Atom. Science 384, eadl2528 (2024).
Article Google Scholar
Naeem, M. F., Oh, S. J., Uh, Y., Choi, Y. & Yoo, J. Reliable fidelity and diversity metrics for generative models. In Proc. 37th International Conference on Machine Learning (eds Daumé, H. & Singh, A.) 7176–7185 (JMLR.org, 2020).
Zhang, Y. & Skolnick, J. Scoring function for automated assessment of protein structure template quality. Proteins 57, 702–710 (2004).
Article Google Scholar
Greener, J. G. & Jamali, K. Fast protein structure searching using structure graph embeddings. Bioinform. Adv. 5, vbaf042 (2025).
Bose, A. J. et al. Proc. 12th International Conference on Learning Representations (OpenReview.net, 2024).
Lin, Y., Lee, M., Zhang, Z. & AlQuraishi, M. Out of many, one: designing and scaffolding proteins at the scale of the structural universe with Genie 2. Preprint at https://arxiv.org/abs/2405.15489 (2024).
Huguet, G. et al. Sequence-augmented SE(3)-flow matching for conditional protein generation. In Advances in Neural Information Processing Systems 37 (eds Globerson, A. et al.) 33007–33036 (Curran Associates, Inc., 2024).
Chronowska, M., Stam, M. J., Woolfson, D. N., Di Costanzo, L. F. & Wood, C. W. The Protein Design Archive (PDA): insights from 40 years of protein design. Nat. Biotechnol. 43, 669–671 (2024).
Hermosilla, A. M., Berner, C., Ovchinnikov, S. & Vorobieva, A. A. Validation of de novo designed water-soluble and transmembrane β-barrels by in silico folding and melting. Protein Sci. 33, e5033 (2024).
Article Google Scholar
Liu, Y., Chen, L. & Liu, H. Diffusion in a quantized vector space generates non-idealized protein structures and predicts conformational distributions. Preprint at bioRxiv https://doi.org/10.1101/2023.11.18.567666 (2023).
Fu, C. et al. A latent diffusion model for protein structure generation. In Proc. Second Learning on Graphs Conference (eds Villar, S. & Chamberlain, B.) 29:1–29:17 (PMLR, 2024).
Ramesh, A., Dhariwal, P., Nichol, A., Chu, C. & Chen, M. Hierarchical text-conditional image generation with CLIP latents. Preprint at https://arxiv.org/abs/2204.06125 (2022).
Preechakul, K., Chatthee, N., Wizadwongsa, S. & Suwajanakorn, S. Proc. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE, 2022).
Kim, S. W. et al. Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE, 2023).
Praetorius, F. et al. Design of stimulus-responsive two-state hinge proteins. Science 381, 754–760 (2023).
Article Google Scholar
Berger, S. et al. Preclinical proof of principle for orally delivered Th17 antagonist miniproteins. Cell 187, 4305–4317.e18 (2024).
Article Google Scholar
Glögl, M. et al. Target-conditioned diffusion generates potent TNFR superfamily antagonists and agonists. Science 386, 1154–1161 (2024).
Article Google Scholar
Huang, B. et al. Designed endocytosis-inducing proteins degrade targets and amplify signals. Nature 638, 796–804 (2024).
Baker, D. et al. De novo designed proteins neutralize lethal snake venom toxins. Nature 639, 225–231 (2024).
An, L. et al. Binding and sensing diverse small molecules using shape-complementary pseudocycles. Science 385, 276–282 (2024).
Article Google Scholar
Chu, A. E. et al. An all-atom protein generative model. Proc. Natl Acad. Sci. USA 121, e2311500121 (2024).
Article Google Scholar
Campbell, A., Yim, J., Barzilay, R., Rainforth, T. & Jaakkola, T. Generative flows on discrete state-spaces: enabling multimodal flows with applications to protein co-design. In Proc. 41st International Conference on Machine Learning (eds Salakhutdinov, R. et al.) 5453–5512 (JMLR.org, 2024).
Dietmann, S. et al. A fully automatic evolutionary classification of protein folds: Dali Domain Dictionary version 3. Nucleic Acids Res. 29, 55–57 (2001).
Article Google Scholar
Xu, J. & Zhang, J. Impact of structure space continuity on protein fold classification. Sci. Rep. 6, 23263 (2016).
Article Google Scholar
Skolnick, J., Arakaki, A. K., Lee, S. Y. & Brylinski, M. The continuity of protein structure space is an intrinsic property of proteins. Proc. Natl Acad. Sci. USA 106, 15690–15695 (2009).
Article Google Scholar
Woolfson, D. N. et al. De novo protein design: how do we expand into the universe of possible protein structures? Curr. Opin. Struct. Biol. 33, 16–26 (2015).
Article Google Scholar
Greener, J. G., Moffat, L. & Jones, D. T. Design of metalloproteins and novel protein folds using variational autoencoders. Sci. Rep. 8, 16189 (2018).
Article Google Scholar
Hawkins-Hooker, A. et al. Generating functional protein variants with variational autoencoders. PLoS Comput. Biol. 17, e1008736 (2021).
Article Google Scholar
Guo, X., Du, Y., Tadepalli, S., Zhao, L. & Shehu, A. Generating tertiary protein structures via interpretable graph variational autoencoders. Bioinform. Adv. 1, vbab036 (2021).
Article Google Scholar
Eguchi, R. R., Choe, C. A. & Huang, P.-S. Ig-VAE: generative modeling of protein structure by direct 3D coordinate generation. PLoS Comput. Biol. 18, e1010271 (2022).
Article Google Scholar
Lai, B., McPartlon, M. & Xu, J. End-to-end deep structure generative model for protein design. Preprint at bioRxiv https://doi.org/10.1101/2022.07.09.499440 (2022).
Rombach, R., Blattmann, A., Lorenz, D., Esser, P. & Ommer, B. Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE, 2022).
Podell, D. et al. Proc. 12th International Conference on Learning Representations (OpenReview.net, 2024).
Esser, P. et al. Scaling rectified flow transformers for high-resolution image synthesis. In Proc. 41st International Conference on Machine Learning (eds Salakhutdinov, R. et al.) 12606–12633 (JMLR.org, 2024).
Poličar, P. G., Stražar, M. & Zupan, B. openTSNE: a modular Python library for t-SNE dimensionality reduction and embedding. J. Stat. Softw. 109, 1–30 (2024).
Article Google Scholar
Scott, D. W. Multivariate Density Estimation: Theory, Practice, and Visualization 1st edn (Wiley, 1992).
Virtanen, P. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020).
Article Google Scholar
Zhang, Y. & Skolnick, J. TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res. 33, 2302–2309 (2005).
Article Google Scholar
Kryshtafovych, A., Schwede, T., Topf, M., Fidelis, K. & Moult, J. Critical assessment of methods of protein structure prediction (CASP)-Round XV. Proteins 91, 1539–1549 (2023).
Article Google Scholar
Greener, J. G. & Jamali, K. Fast protein structure searching using structure graph embeddings. Bioinform. Adv. 5, vbaf042 (2022).
Dauparas, J. et al. Robust deep learning-based protein sequence design using ProteinMPNN. Science 378, 49–56 (2022).
Article Google Scholar
Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023).
Article MathSciNet Google Scholar
Van Kempen, M. et al. Fast and accurate protein structure search with Foldseek. Nat. Biotechnol. 42, 243–246 (2023).
Song, J., Meng, C. & Ermon, S. Proc. 9th International Conference on Learning Representations (OpenReview.net, 2021).
Otwinowski, Z. & Minor, W. in Methods in Enzymology (ed. Carter, C. W. Jr) 307–326 (Elsevier, 1997).
Adams, P. D. et al. PHENIX: a comprehensive Python-based system for macromolecular structure solution. Acta Crystallogr. D 66, 213–221 (2010).
Article Google Scholar
Emsley, P. & Cowtan, K. Coot: model-building tools for molecular graphics. Acta Crystallogr. D 60, 2126–2132 (2004).
Article Google Scholar
The PyMOL Molecular Graphics System (Schrödinger, LLC, 2015).
Meng, E. C. et al. UCSF ChimeraX: tools for structure building and analysis. Protein Sci. 32, e4792 (2023).
Article Google Scholar
Zhang, Y. et al. Improving diffusion-based protein backbone generation with global-geometry-aware latent encoding. Preprint at bioRxiv https://doi.org/10.1101/2024.10.05.616664 (2024).
Zhang, Y. meneshail/TopoDiff: v1.1.0. GitHub https://github.com/meneshail/TopoDiff/tree/main (2025).
Zhang, Y., Liu, Y., Ma, Z., Li, M. & Chunfu, X. CodeOcean release of ‘TopoDiff: improving diffusion-based protein backbone generation with global-geometry-aware latent encoding’, version 1. CodeOcean https://doi.org/10.24433/CO.8705528.v1 (2025).

Download references

Acknowledgements

This work has been supported by the Ministry of Science and Technology of China (no. 2023YFF1204400 to H.G.), the National Natural Science Foundation of China (no. 32171243 to H.G.) and the Beijing Frontier Research Center for Biological Structure. We thank the staff of beamlines BL02U1, BL10U2, BL18U1 and BL19U1 at the Shanghai Synchrotron Radiation Facility as well as the X-ray crystallography platform, National Protein Science Facility, Tsinghua University, for assistance in the X-ray diffraction data collection and analysis. We thank J. Hu, Z. Zhu, Y. Xue and C. Song for helpful discussions.

Author information

These authors contributed equally: Yuyang Zhang, Yuhang Liu, Zinnia Ma.

Authors and Affiliations

MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China
Yuyang Zhang & Haipeng Gong
Beijing Frontier Research Center for Biological Structure, Tsinghua University, Beijing, China
Yuyang Zhang & Haipeng Gong
National Institute of Biological Sciences, Beijing, China
Yuhang Liu & Chunfu Xu
Tsinghua Institute of Multidisciplinary Biomedical Research, Tsinghua University, Beijing, China
Yuhang Liu & Chunfu Xu
Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA
Zinnia Ma
National Center for Protein Sciences, Beijing, China
Min Li
X-ray Crystallography Facility, Technology Center for Protein Sciences, Tsinghua University, Beijing, China
Min Li

Authors

Yuyang Zhang
View author publications
Search author on:PubMed Google Scholar
Yuhang Liu
View author publications
Search author on:PubMed Google Scholar
Zinnia Ma
View author publications
Search author on:PubMed Google Scholar
Min Li
View author publications
Search author on:PubMed Google Scholar
Chunfu Xu
View author publications
Search author on:PubMed Google Scholar
Haipeng Gong
View author publications
Search author on:PubMed Google Scholar

Contributions

Y.Z. and H.G. conceived the study. Y.Z. and Z.M. designed and implemented the model. Y.Z. and Z.M. performed the in silico experiments and analysed the results. Y.Z. designed the candidate proteins for experimental validation. Y.L. designed, executed and analysed all the wet-laboratory experiments. H.G. supervised the development of the model and the result analysis. C.X. supervised the design of the candidate proteins and wet-laboratory experiments. M.L. contributed to the X-ray structure determination. Y.Z. drafted the initial paper. Y.Z., Y.L. and Z.M. created the final figures. All authors contributed to writing and improving the paper, and approved the submission.

Corresponding authors

Correspondence to Chunfu Xu or Haipeng Gong.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Zhuoran Qiao, Limei Wang and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Persistent underrepresentation of mainly-beta proteins in de novo design.

All de novo designed proteins deposited in PDB up to September 2024 were collected from the PDA database, and filtered to exclude small peptides (length ≤ 50) as well as designs originating from sequence mutations or redesigns of naturally occurring backbones (maximum TM-score to PDB ≥ 0.9). (a) Cumulative number of de novo protein design entries over the time. General proteins and mainly-beta proteins (with beta ratio ≥ 0.5) are colored in blue and purple, respectively. (b) Distribution of natural proteins of the CATH dataset (left) and de novo designed proteins (right) based on the proportion of beta sheets. (c) Scatter plot of all de novo designed proteins, where the horizontal axis represents novelty (maximum TM-score to PDB) and the vertical axis represents the proportion of beta sheets. Each protein is denoted as a point, colored based on protein length. Detailed discussion of these data could be found in Supplementary Results 6.3.

Source data

Supplementary information

Supplementary Information

Supplementary Figs. 1–28, Tables 1–16, Methods, Results and Algorithms (for pseudocodes).

Reporting Summary

Source data

Source Data Fig. 2

t-SNE-reduced embedding and annotations for all structures in the three databases.

Source Data Fig. 3

Statistical source data for Fig. 3b,c.

Source Data Fig. 4

Statistical source data for Fig. 4b.

Source Data Fig. 5

Source data for the SEC profiles, CD spectra and melting curve.

Source Data Fig. 5

Unprocessed western blots.

Source Data Fig. 5

Unprocessed western blots.

Source Data Extended Data Fig./Table 1

Statistics of filtered de novo designed proteins in PDB.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhang, Y., Liu, Y., Ma, Z. et al. Improving diffusion-based protein backbone generation with global-geometry-aware latent encoding. Nat Mach Intell 7, 1104–1118 (2025). https://doi.org/10.1038/s42256-025-01059-x

Download citation

Received: 03 October 2024
Accepted: 12 May 2025
Published: 18 June 2025
Issue date: July 2025
DOI: https://doi.org/10.1038/s42256-025-01059-x