Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

A comprehensive foundation model for cryo-EM image processing

Abstract

Cryogenic electron microscopy (cryo-EM) has become a premier technique for determining high-resolution structures of biological macromolecules. However, its broad application is constrained by the demand for specialized expertise. Here, to address this limitation, we introduce the Cryo-EM Image Evaluation Foundation (Cryo-IEF) model, a versatile tool pre-trained on ~65 million cryo-EM particle images through unsupervised learning. Cryo-IEF performs diverse cryo-EM processing tasks, including particle classification by structure, pose-based clustering and image quality assessment. Building on this foundation, we developed CryoWizard, a fully automated single-particle cryo-EM processing pipeline enabled by fine-tuned Cryo-IEF for efficient particle quality ranking. CryoWizard resolves high-resolution structures across samples of varied properties and effectively mitigates the prevalent challenge of preferred orientation in cryo-EM.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Contrastive learning framework for Cryo-IEF pre-training.
Fig. 2: Structural classification performance of Cryo-IEF.
Fig. 3: Reconstruction of heterogeneous structures by CryoSolver.
Fig. 4: Pose-clustering performance of Cryo-IEF.
Fig. 5: Particle quality assessment by Cryo-IEF and CryoRanker.
Fig. 6: Automated cryo-EM processing with CryoWizard.

Similar content being viewed by others

Data availability

Cryo-EM micrograph data and particle image data from EMPIAR used in training are available at https://www.ebi.ac.uk/empiar/; accession codes are given in Supplementary Tables 1 and 2. Density maps used to generate simulated cryo-EM particle images were downloaded from the EMDB, which is available at https://www.ebi.ac.uk/emdb/. Particle data from cryoPPP are available at https://calla.rnet.missouri.edu/cryoppp/. The raw data for 12 simulated and four genuine particle datasets can be found at https://zenodo.org/records/17066236 (ref. 59) and https://zenodo.org/uploads/17066297 (ref. 60), respectively. The resampled version of the CryoBench Ribosembly datasets is available on Zenodo (https://zenodo.org/records/17066704)61. The reconstructed results obtained from CryoSolver and CryoWizard can be accessed at Zenodo (https://zenodo.org/records/17062718)62.

Code availability

Codes with introduction details are available at https://github.com/westlake-repl/Cryo-IEF, which is based on PyTorch.

References

  1. Nogales, E. The development of cryo-EM into a mainstream structural biology technique. Nat. Methods 13, 24–27 (2016).

    PubMed  PubMed Central  Google Scholar 

  2. Cheng, Y. Single-particle cryo-EM—how did it get here and where will it go. Science 361, 876–880 (2018).

    PubMed  PubMed Central  Google Scholar 

  3. Bai, X.-C., McMullan, G. & Scheres, S. H. How cryo-EM is revolutionizing structural biology. Trends Biochem. Sci. 40, 49–57 (2015).

    PubMed  Google Scholar 

  4. Frank, J. Advances in the field of single-particle cryo-electron microscopy over the last decade. Nat. Protoc. 12, 209–212 (2017).

    PubMed  PubMed Central  Google Scholar 

  5. Holcomb, J. et al. Protein crystallization: eluding the bottleneck of X-ray crystallography. AIMS Biophys. 4, 557–575 (2017).

    PubMed  PubMed Central  Google Scholar 

  6. Scheiner, G. The resolution revolution. Diabetes Self Manag. 32, 28–29 (2015).

  7. Amunts, A. et al. Structure of the yeast mitochondrial large ribosomal subunit. Science 343, 1485–1489 (2014).

    PubMed  PubMed Central  Google Scholar 

  8. Liao, M., Cao, E., Julius, D. & Cheng, Y. Structure of the TRPV1 ion channel determined by electron cryo-microscopy. Nature 504, 107–112 (2013).

    PubMed  PubMed Central  Google Scholar 

  9. McMullan, G., Faruqi, A. & Henderson, R. Direct electron detectors. Methods Enzymol. 579, 1–17 (2016).

    PubMed  Google Scholar 

  10. Zheng, S. Q. et al. MotionCor2: anisotropic correction of beam-induced motion for improved cryo-electron microscopy. Nat. Methods 14, 331–332 (2017).

    PubMed  PubMed Central  Google Scholar 

  11. Li, X. et al. Electron counting and beam-induced motion correction enable near-atomic-resolution single-particle cryo-EM. Nat. Methods 10, 584–590 (2013).

    PubMed  PubMed Central  Google Scholar 

  12. Bai, X.-C., Fernandez, I. S., McMullan, G. & Scheres, S. H. Ribosome structures to near-atomic resolution from thirty thousand cryo-EM particles. Elife 2, e00461 (2013).

    PubMed  PubMed Central  Google Scholar 

  13. Scheres, S. H. RELION: implementation of a Bayesian approach to cryo-EM structure determination. J. Struct. Biol. 180, 519–530 (2012).

    PubMed  PubMed Central  Google Scholar 

  14. Punjani, A., Rubinstein, J. L., Fleet, D. J. & Brubaker, M. A. cryoSPARC: algorithms for rapid unsupervised cryo-EM structure determination. Nat. Methods 14, 290–296 (2017).

    PubMed  Google Scholar 

  15. Zhou, Y., Moscovich, A., Bendory, T. & Bartesaghi, A. Unsupervised particle sorting for high-resolution single-particle cryo-EM. Inverse Probl. 36, 044002 (2020).

    Google Scholar 

  16. Zivanov, J. et al. New tools for automated high-resolution cryo-EM structure determination in RELION-3. Elife 7, e42166 (2018).

    PubMed  PubMed Central  Google Scholar 

  17. Bepler, T. et al. Positive-unlabeled convolutional neural networks for particle picking in cryo-electron micrographs. Nat. Methods 16, 1153–1160 (2019).

    PubMed  PubMed Central  Google Scholar 

  18. Wagner, T. et al. SPHIRE-crYOLO is a fast and accurate fully automated particle picker for cryo-EM. Commun. Biol. 2, 218 (2019).

    PubMed  PubMed Central  Google Scholar 

  19. Kimanius, D., Dong, L., Sharov, G., Nakane, T. & Scheres, S. H. W. New tools for automated cryo-EM single-particle analysis in RELION-4.0. Biochem. J. 478, 4169–4185 (2021).

    PubMed  Google Scholar 

  20. Li, Y., Cash, J. N., Tesmer, J. J. G. & Cianfrocco, M. A. High-throughput cryo-EM enabled by user-free preprocessing routines. Structure 28, 858–869 (2020).

    PubMed  PubMed Central  Google Scholar 

  21. Zhong, E. D., Bepler, T., Berger, B. & Davis, J. H. CryoDRGN: reconstruction of heterogeneous cryo-EM structures using neural networks. Nat. Methods 18, 176–185 (2021).

    PubMed  PubMed Central  Google Scholar 

  22. Pfab, J., Phan, N. M. & Si, D. DeepTracer for fast de novo cryo-EM protein structure modeling and special studies on CoV-related complexes. Proc. Natl Acad. Sci. USA 118, e2017525118 (2021).

    PubMed  Google Scholar 

  23. Jamali, K. et al. Automated model building and protein identification in cryo-EM maps. Nature 628, 450–457 (2024).

    PubMed  PubMed Central  Google Scholar 

  24. Scheres, S. H. Processing of structurally heterogeneous cryo-EM data in RELION. Methods Enzymol. 579, 125–157 (2016).

    PubMed  Google Scholar 

  25. Zhu, D. et al. Correction of preferred orientation-induced distortion in cryo-electron microscopy maps.Sci. Adv. 10, eadn0092 (2024).

    PubMed  PubMed Central  Google Scholar 

  26. Zhang, H. et al. CryoPROS: Correcting misalignment caused by preferred orientation using AI-generated auxiliary particles. Nat. Commun. 16, 4565 (2025).

    PubMed  PubMed Central  Google Scholar 

  27. Liu, Y. et al. Overcoming the preferred-orientation problem in cryo-EM with self-supervised deep learning. Nat. Methods 22, 113–123 (2025).

    PubMed  Google Scholar 

  28. Chen, T., Kornblith, S., Norouzi, M. & Hinton, G. A simple framework for contrastive learning of visual representations. In International Conference on Machine Learning 1597–1607 (PMLR, 2020).

  29. He, K., Fan, H., Wu, Y., Xie, S. & Girshick, R. Momentum contrast for unsupervised visual representation learning. In Proc. IEEE/CVF International Conference on Computer Vision 9726–9735 (IEEE, 2020).

  30. Chen, X., Xie, S. & He, K. An empirical study of training self-supervised vision transformers. In Proc. IEEE/CVF International Conference on Computer Vision 9640–9649 (IEEE, 2021).

  31. Oquab, M. et al. Dinov2: learning robust visual features without supervision. In Transactions on Machine Learning Research https://openreview.net/pdf?id=a68SUt6zFt (2024).

  32. Zhou, Y. et al. A foundation model for generalizable disease detection from retinal images. Nature 622, 156–163 (2023).

    PubMed  PubMed Central  Google Scholar 

  33. Pai, S. et al. Foundation model for cancer imaging biomarkers. Nat. Mach. Intell. 6, 354–367 (2024).

    PubMed  PubMed Central  Google Scholar 

  34. Wang, X. A pathology foundation model for cancer diagnosis and prognosis prediction. Nature 634, 970–978 (2024).

    PubMed  PubMed Central  Google Scholar 

  35. Xu, H. A whole-slide foundation model for digital pathology from real-world data. Nature 630, 181–188 (2024).

    PubMed  PubMed Central  Google Scholar 

  36. Ma, C., Tan, W., He, R. & Yan, B. Pretraining a foundation model for generalizable fluorescence microscopy-based image restoration. Nat. Methods 21, 1558–1567 (2024).

    PubMed  Google Scholar 

  37. Iudin, A. et al. EMPIAR: the Electron Microscopy Public Image Archive. Nucleic Acids Res. 51, D1503–D1511 (2023).

    PubMed  Google Scholar 

  38. Dhakal, A., Gyawali, R., Wang, L. & Cheng, J. A large expert-curated cryo-EM image dataset for machine learning protein particle picking. Sci. Data 10, 392 (2023).

    PubMed  PubMed Central  Google Scholar 

  39. El Banani, M. et al. Probing the 3d awareness of visual foundation models. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 21795–21806 (IEEE, 2024).

  40. Zhong, E. D., Lerer, A., Davis, J. H. & Berger, B. CryoDRGN2: ab initio neural reconstruction of 3D protein structures from real cryo-EM images. In Proc. IEEE/CVF International Conference on Computer Vision 4066–4075 (IEEE, 2021).

  41. Luo, Z., Ni, F., Wang, Q. & Ma, J. OPUS-DSD: deep structural disentanglement for cryo-EM single-particle analysis. Nat. Methods 20, 1729–1738 (2023).

    PubMed  PubMed Central  Google Scholar 

  42. Levy, A. et al. CryoDRGN-AI: neural ab initio reconstruction of challenging cryo-EM and cryo-ET datasets. Nat. Methods 22, 1486–1494 (2025).

    PubMed  Google Scholar 

  43. Qin, B. et al. Cryo-EM captures early ribosome assembly in action. Nat. Commun. 14, 898 (2023).

    PubMed  PubMed Central  Google Scholar 

  44. Jeon, M. et al. CryoBench: diverse and challenging datasets for the heterogeneity problem in cryo-EM. In Proc. 38th Conference on Neural Information Processing Systems 89468–89512 (Curran, 2024).

  45. Hu, M. et al. A particle-filter framework for robust cryo-EM 3D reconstruction. Nat. Methods 15, 1083–1089 (2018).

    PubMed  Google Scholar 

  46. Tan, Y. Z. et al. Addressing preferred specimen orientation in single-particle cryo-EM through tilting. Nat. Methods 14, 793–796 (2017).

    PubMed  PubMed Central  Google Scholar 

  47. Liu, Z. et al. Determination of the ribosome structure to a resolution of 2.5 Å by single-particle cryo-EM. Protein Sci. 26, 82–92 (2017).

    PubMed  Google Scholar 

  48. Fan, X. et al. Single particle cryo-EM reconstruction of 52 kDa streptavidin at 3.2 Angstrom resolution. Nat. Commun. 10, 2386 (2019).

    PubMed  PubMed Central  Google Scholar 

  49. Baxter, W. T., Grassucci, R. A., Gao, H. & Frank, J. Determination of signal-to-noise ratios and spectral SNRs in cryo-EM low-dose imaging of molecules. J. Struct. Biol. 166, 126–132 (2009).

    PubMed  PubMed Central  Google Scholar 

  50. Palovcak, E., Asarnow, D., Campbell, M. G., Yu, Z. & Cheng, Y. Enhancing the signal-to-noise ratio and generating contrast for cryo-EM images with convolutional neural networks. IUCrJ 7, 1142–1150 (2020).

    PubMed  PubMed Central  Google Scholar 

  51. Liu, Y.-T., Hu, J. & Zhou, Z. H. Resolving the Preferred Orientation Problem in CryoEM Reconstruction with Self-Supervised Deep Learning (Oxford Univ. Press, 2023).

  52. McInnes, L., Healy, J. & Melville, J. UMAP: uniform manifold approximation and projection for dimension reduction. Preprint at https://arxiv.org/abs/1802.03426 (2018).

  53. Pettersen, E. F. et al. UCSF ChimeraX: structure visualization for researchers, educators, and developers. Protein Sci. 30, 70–82 (2021).

    PubMed  Google Scholar 

  54. Rohou, A. & Grigorieff, N. CTFFIND4: fast and accurate defocus estimation from electron micrographs. J. Struct. Biol. 192, 216–221 (2015).

    PubMed  PubMed Central  Google Scholar 

  55. Lawson, C. L. et al. EMDataBank unified data resource for 3DEM. Nucleic Acids Res. 44, D396–D403 (2016).

    PubMed  Google Scholar 

  56. Dosovitskiy, A. An image is worth 16 × 16 words: transformers for image recognition at scale. In International Conference on Learning Representations https://openreview.net/pdf?id=YicbFdNTTy (2021).

  57. Grill, J.-B. et al. Bootstrap your own latent—a new approach to self-supervised learning. In 34th Conference on Neural Information Processing Systems (NeurIPS 2020) https://papers.nips.cc/paper/2020/file/f3ada80d5c4ee70142b17b8192b2958e-Paper.pdf (2020).

  58. Punjani, A., Zhang, H. & Fleet, D. J. Non-uniform refinement: adaptive regularization improves single-particle cryo-EM reconstruction. Nat. Methods 17, 1214–1221 (2020).

    PubMed  Google Scholar 

  59. Yan, Y., Fan, S., Yuan, F. & Shen, H. Simulated cryo-EM particle datasets from paper ‘A comprehensive foundation model for cryo-EM image processing’. Zenodo https://zenodo.org/records/17066236 (2025).

  60. Yan, Y., Fan, S., Yuan, F. & Shen, H. Genuine particle datasets for paper ‘A comprehensive foundation model for cryo-EM image processing’. Zenodo https://zenodo.org/uploads/17066297 (2025).

  61. Yan, Y., Fan, S., Yuan, F. & Shen, H. The resampled CryoBench Ribosembly dataset used in paper ‘A comprehensive foundation model for cryo-EM image processing’. Zenodo https://zenodo.org/records/17066704 (2025).

  62. Yan, Y., Fan, S., Yuan, F. & Shen, H. The reconstruction results of CryoSolver and CryoWizard. Zenodo https://zenodo.org/records/17062718 (2025).

Download references

Acknowledgements

We thank the HPC Center of Westlake University for providing computational facility support and technical assistance. This work was supported by the Ministry of Science and Technology of the People’s Republic of China (2024YFA0916903 to H.S. and 2022ZD0115100 to F.Y.), the National Science Foundation of China (32122042 and 32071208 to H.S. and U21A20427 to F.Y.), the Zhejiang Provincial Natural Science Foundation (DQ24C050001 to H.S.), the Research Center for Industries of the Future, Westlake University and the Westlake Education Foundation (to H.S and F.Y.). We thank our colleagues Y. Shi, H. Yu, P. Lu, D. Ma, Q. Hu, Q. Zhou, J. Wu, Z. Yan, Z. Shi and J. Chai for generously sharing their in-house cryo-EM data. We also acknowledge the use of data from EMDB and EMPIAR for training our models.

Author information

Authors and Affiliations

Authors

Contributions

The project was conceived and supervised by F.Y. and H.S. Y.Y. was primarily responsible for training the AI models, while S.F. mainly handled the preparation and processing of cryo-EM data as well as the construction of the automated data-processing pipeline. The initial draft of the manuscript was written by Y.Y. and S.F. and subsequently revised and finalized by F.Y. and H.S. All authors reviewed and provided feedback on the manuscript.

Corresponding authors

Correspondence to Fajie Yuan or Huaizong Shen.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Methods thanks Wah Chiu and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Allison Doerr, in collaboration with the Nature Methods team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Detailed architectures and scale and ablation tests of the AI models.

(a-c) The detailed architectures of the prediction head (a), the projection head (b), and the classifier head (c) in the AI models are illustrated. (d)(e) Scale test of the dataset (d) and backbone (e) sizes on the performance of the Cryo-IEF model. (f)(g) Different training parameters (f) and loss functions (g) on the fine-tuned performance of CryoRanker.

Extended Data Fig. 2 Pipeline for preparing training datasets.

The pre-training dataset contains particles from various sources (EMPIAR, CryoPPP, and In-house datasets). The fine-tuning dataset is a subset of the pre-training one, where particles were processed by 2D classification in CryoSPARC and assigned quality scores adjusted based on Class Ranker evaluation.

Extended Data Fig. 3 Pose-dependent feature clustering in Cryo-IEF.

(a) Cryo-IEF feature distributions for particles from four distinct 2D classes in EMPIAR-10217 (n = 3,000 particles per class; see Supplementary Fig. 2 for 2D classification results). (b) Corresponding feature analysis for three particle clusters in EMPIAR-10096 (n = 3,000 particles per cluster; Supplementary Fig. 3). In both datasets, Cryo-IEF demonstrates consistent ability to separate particles by orientation, as evidenced by distinct clustering patterns corresponding to different projection views.

Extended Data Fig. 4 Evaluation of CryoRanker by precision and recall metrics.

The precision scores in relation to the predicted particle scores (left panel) and the recall values in relation to the labeled particle scores (right panel) of the four genuine particle datasets are displayed, indicating the performance of the CryoRanker model.

Extended Data Fig. 5 Correlation between CryoRanker scores and reconstruction resolutions.

Particles from six genuine datasets were divided into five equal-sized stacks based on CryoRanker scores (highest to lowest). Each stack was independently processed through CryoSPARC’s non-uniform refinement pipeline. Higher CryoRanker scores consistently yielded improved reconstruction resolutions, demonstrating the metric’s effectiveness for particle quality assessment.

Extended Data Fig. 6 Detailed flowchart for the fully automated cryo-EM data processing pipeline, CryoWizard.

The default pipeline is marked by blue arrows, while the pipeline for addressing the preferred orientation problem is marked by orange arrows. The preferred orientation problem is diagnosed by calculating the cFAR value of the refined structure during the initial model search step. Current pipeline is implemented in Python and interfaces with CryoSPARC-tools, an open-source Python library that enables scripted access to the CryoSPARC software package.

Extended Data Fig. 7 Automated structure resolution using CryoWizard with RELION.

CryoWizard incorporating RELION as the data processing platform resolved 3.3-Å cryo-EM map using the dataset EMPIAR-10556. Please refer to Methods for details.

Extended Data Fig. 8 CryoWizard overcomes preferred orientation challenges.

(a) For datasets with severe preferred orientation (for example, EMPIAR-10217), CryoWizard’s clustering module groups Cryo-IEF-extracted features (UMAP visualization) using K-Means + +. Among eight resulting classes, Class 5 particles produce an isotropic template (cFAR score: 0.75) for final refinement, while other classes show varying degrees of orientation bias (cFAR scores shown). (b-e) Manually selected particles from EMPIAR-10217 (b) and EMPIAR-10096 (d) yielded severely anisotropic maps (cFAR=0.01 and 0.03, respectively), whereas CryoWizard processing (c,e) achieved dramatically improved isotropy (cFAR=0.74 for EMPIAR-10217 at 2.37 Å; cFAR=0.34 for EMPIAR-10096 at 2.78 Å). Manual selection details in Supplementary Figs. 2 and 3.

Extended Data Fig. 9 Comparative performance of CryoWizard and spIsoNet in addressing preferred orientation.

The figure compares orientation correction results between conventional processing (a,g), CryoWizard (d,j), and subsequent spIsoNet processing for EMPIAR-10096 (a-f) and EMPIAR-10217 (g-l) datasets. For each dataset: (1) Initial maps from manual particle selection (a,g) and CryoWizard (d,j) were processed through spIsoNet’s Anisotropy Correction module (b,e,h,k) or Misalignment Correction module (c,f,i,l). CryoWizard-generated maps served as superior inputs for spIsoNet processing compared to conventional maps, demonstrating the complementary strengths of both approaches. Manual selection protocols are detailed in Supplementary Figs. 2 and 3, with spIsoNet parameters described in Methods.

Supplementary information

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yan, Y., Fan, S., Yuan, F. et al. A comprehensive foundation model for cryo-EM image processing. Nat Methods 23, 88–95 (2026). https://doi.org/10.1038/s41592-025-02916-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1038/s41592-025-02916-8

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing