A comprehensive foundation model for cryo-EM image processing

Yan, Yang; Fan, Shiqi; Yuan, Fajie; Shen, Huaizong

doi:10.1038/s41592-025-02916-8

Article
Published: 27 November 2025

A comprehensive foundation model for cryo-EM image processing

Nature Methods volume 23, pages 88–95 (2026)Cite this article

6507 Accesses
7 Altmetric
Metrics details

Subjects

Abstract

Cryogenic electron microscopy (cryo-EM) has become a premier technique for determining high-resolution structures of biological macromolecules. However, its broad application is constrained by the demand for specialized expertise. Here, to address this limitation, we introduce the Cryo-EM Image Evaluation Foundation (Cryo-IEF) model, a versatile tool pre-trained on ~65 million cryo-EM particle images through unsupervised learning. Cryo-IEF performs diverse cryo-EM processing tasks, including particle classification by structure, pose-based clustering and image quality assessment. Building on this foundation, we developed CryoWizard, a fully automated single-particle cryo-EM processing pipeline enabled by fine-tuned Cryo-IEF for efficient particle quality ranking. CryoWizard resolves high-resolution structures across samples of varied properties and effectively mitigates the prevalent challenge of preferred orientation in cryo-EM.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to the full article PDF.

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Contrastive learning framework for Cryo-IEF pre-training.**

**Fig. 2: Structural classification performance of Cryo-IEF.**

**Fig. 3: Reconstruction of heterogeneous structures by CryoSolver.**

**Fig. 4: Pose-clustering performance of Cryo-IEF.**

**Fig. 5: Particle quality assessment by Cryo-IEF and CryoRanker.**

**Fig. 6: Automated cryo-EM processing with CryoWizard.**

A large expert-curated cryo-EM image dataset for machine learning protein particle picking

Article Open access 22 June 2023

Understanding the invisible hands of sample preparation for cryo-EM

Article 07 May 2021

A minority of final stacks yields superior amplitude in single-particle cryo-EM

Article Open access 10 December 2023

Data availability

Cryo-EM micrograph data and particle image data from EMPIAR used in training are available at https://www.ebi.ac.uk/empiar/; accession codes are given in Supplementary Tables 1 and 2. Density maps used to generate simulated cryo-EM particle images were downloaded from the EMDB, which is available at https://www.ebi.ac.uk/emdb/. Particle data from cryoPPP are available at https://calla.rnet.missouri.edu/cryoppp/. The raw data for 12 simulated and four genuine particle datasets can be found at https://zenodo.org/records/17066236 (ref. ⁵⁹) and https://zenodo.org/uploads/17066297 (ref. ⁶⁰), respectively. The resampled version of the CryoBench Ribosembly datasets is available on Zenodo (https://zenodo.org/records/17066704)⁶¹. The reconstructed results obtained from CryoSolver and CryoWizard can be accessed at Zenodo (https://zenodo.org/records/17062718)⁶².

Code availability

Codes with introduction details are available at https://github.com/westlake-repl/Cryo-IEF, which is based on PyTorch.

References

Nogales, E. The development of cryo-EM into a mainstream structural biology technique. Nat. Methods 13, 24–27 (2016).
PubMed PubMed Central Google Scholar
Cheng, Y. Single-particle cryo-EM—how did it get here and where will it go. Science 361, 876–880 (2018).
PubMed PubMed Central Google Scholar
Bai, X.-C., McMullan, G. & Scheres, S. H. How cryo-EM is revolutionizing structural biology. Trends Biochem. Sci. 40, 49–57 (2015).
PubMed Google Scholar
Frank, J. Advances in the field of single-particle cryo-electron microscopy over the last decade. Nat. Protoc. 12, 209–212 (2017).
PubMed PubMed Central Google Scholar
Holcomb, J. et al. Protein crystallization: eluding the bottleneck of X-ray crystallography. AIMS Biophys. 4, 557–575 (2017).
PubMed PubMed Central Google Scholar
Scheiner, G. The resolution revolution. Diabetes Self Manag. 32, 28–29 (2015).
Amunts, A. et al. Structure of the yeast mitochondrial large ribosomal subunit. Science 343, 1485–1489 (2014).
PubMed PubMed Central Google Scholar
Liao, M., Cao, E., Julius, D. & Cheng, Y. Structure of the TRPV1 ion channel determined by electron cryo-microscopy. Nature 504, 107–112 (2013).
PubMed PubMed Central Google Scholar
McMullan, G., Faruqi, A. & Henderson, R. Direct electron detectors. Methods Enzymol. 579, 1–17 (2016).
PubMed Google Scholar
Zheng, S. Q. et al. MotionCor2: anisotropic correction of beam-induced motion for improved cryo-electron microscopy. Nat. Methods 14, 331–332 (2017).
PubMed PubMed Central Google Scholar
Li, X. et al. Electron counting and beam-induced motion correction enable near-atomic-resolution single-particle cryo-EM. Nat. Methods 10, 584–590 (2013).
PubMed PubMed Central Google Scholar
Bai, X.-C., Fernandez, I. S., McMullan, G. & Scheres, S. H. Ribosome structures to near-atomic resolution from thirty thousand cryo-EM particles. Elife 2, e00461 (2013).
PubMed PubMed Central Google Scholar
Scheres, S. H. RELION: implementation of a Bayesian approach to cryo-EM structure determination. J. Struct. Biol. 180, 519–530 (2012).
PubMed PubMed Central Google Scholar
Punjani, A., Rubinstein, J. L., Fleet, D. J. & Brubaker, M. A. cryoSPARC: algorithms for rapid unsupervised cryo-EM structure determination. Nat. Methods 14, 290–296 (2017).
PubMed Google Scholar
Zhou, Y., Moscovich, A., Bendory, T. & Bartesaghi, A. Unsupervised particle sorting for high-resolution single-particle cryo-EM. Inverse Probl. 36, 044002 (2020).
Google Scholar
Zivanov, J. et al. New tools for automated high-resolution cryo-EM structure determination in RELION-3. Elife 7, e42166 (2018).
PubMed PubMed Central Google Scholar
Bepler, T. et al. Positive-unlabeled convolutional neural networks for particle picking in cryo-electron micrographs. Nat. Methods 16, 1153–1160 (2019).
PubMed PubMed Central Google Scholar
Wagner, T. et al. SPHIRE-crYOLO is a fast and accurate fully automated particle picker for cryo-EM. Commun. Biol. 2, 218 (2019).
PubMed PubMed Central Google Scholar
Kimanius, D., Dong, L., Sharov, G., Nakane, T. & Scheres, S. H. W. New tools for automated cryo-EM single-particle analysis in RELION-4.0. Biochem. J. 478, 4169–4185 (2021).
PubMed Google Scholar
Li, Y., Cash, J. N., Tesmer, J. J. G. & Cianfrocco, M. A. High-throughput cryo-EM enabled by user-free preprocessing routines. Structure 28, 858–869 (2020).
PubMed PubMed Central Google Scholar
Zhong, E. D., Bepler, T., Berger, B. & Davis, J. H. CryoDRGN: reconstruction of heterogeneous cryo-EM structures using neural networks. Nat. Methods 18, 176–185 (2021).
PubMed PubMed Central Google Scholar
Pfab, J., Phan, N. M. & Si, D. DeepTracer for fast de novo cryo-EM protein structure modeling and special studies on CoV-related complexes. Proc. Natl Acad. Sci. USA 118, e2017525118 (2021).
PubMed Google Scholar
Jamali, K. et al. Automated model building and protein identification in cryo-EM maps. Nature 628, 450–457 (2024).
PubMed PubMed Central Google Scholar
Scheres, S. H. Processing of structurally heterogeneous cryo-EM data in RELION. Methods Enzymol. 579, 125–157 (2016).
PubMed Google Scholar
Zhu, D. et al. Correction of preferred orientation-induced distortion in cryo-electron microscopy maps.Sci. Adv. 10, eadn0092 (2024).
PubMed PubMed Central Google Scholar
Zhang, H. et al. CryoPROS: Correcting misalignment caused by preferred orientation using AI-generated auxiliary particles. Nat. Commun. 16, 4565 (2025).
PubMed PubMed Central Google Scholar
Liu, Y. et al. Overcoming the preferred-orientation problem in cryo-EM with self-supervised deep learning. Nat. Methods 22, 113–123 (2025).
PubMed Google Scholar
Chen, T., Kornblith, S., Norouzi, M. & Hinton, G. A simple framework for contrastive learning of visual representations. In International Conference on Machine Learning 1597–1607 (PMLR, 2020).
He, K., Fan, H., Wu, Y., Xie, S. & Girshick, R. Momentum contrast for unsupervised visual representation learning. In Proc. IEEE/CVF International Conference on Computer Vision 9726–9735 (IEEE, 2020).
Chen, X., Xie, S. & He, K. An empirical study of training self-supervised vision transformers. In Proc. IEEE/CVF International Conference on Computer Vision 9640–9649 (IEEE, 2021).
Oquab, M. et al. Dinov2: learning robust visual features without supervision. In Transactions on Machine Learning Research https://openreview.net/pdf?id=a68SUt6zFt (2024).
Zhou, Y. et al. A foundation model for generalizable disease detection from retinal images. Nature 622, 156–163 (2023).
PubMed PubMed Central Google Scholar
Pai, S. et al. Foundation model for cancer imaging biomarkers. Nat. Mach. Intell. 6, 354–367 (2024).
PubMed PubMed Central Google Scholar
Wang, X. A pathology foundation model for cancer diagnosis and prognosis prediction. Nature 634, 970–978 (2024).
PubMed PubMed Central Google Scholar
Xu, H. A whole-slide foundation model for digital pathology from real-world data. Nature 630, 181–188 (2024).
PubMed PubMed Central Google Scholar
Ma, C., Tan, W., He, R. & Yan, B. Pretraining a foundation model for generalizable fluorescence microscopy-based image restoration. Nat. Methods 21, 1558–1567 (2024).
PubMed Google Scholar
Iudin, A. et al. EMPIAR: the Electron Microscopy Public Image Archive. Nucleic Acids Res. 51, D1503–D1511 (2023).
PubMed Google Scholar
Dhakal, A., Gyawali, R., Wang, L. & Cheng, J. A large expert-curated cryo-EM image dataset for machine learning protein particle picking. Sci. Data 10, 392 (2023).
PubMed PubMed Central Google Scholar
El Banani, M. et al. Probing the 3d awareness of visual foundation models. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 21795–21806 (IEEE, 2024).
Zhong, E. D., Lerer, A., Davis, J. H. & Berger, B. CryoDRGN2: ab initio neural reconstruction of 3D protein structures from real cryo-EM images. In Proc. IEEE/CVF International Conference on Computer Vision 4066–4075 (IEEE, 2021).
Luo, Z., Ni, F., Wang, Q. & Ma, J. OPUS-DSD: deep structural disentanglement for cryo-EM single-particle analysis. Nat. Methods 20, 1729–1738 (2023).
PubMed PubMed Central Google Scholar
Levy, A. et al. CryoDRGN-AI: neural ab initio reconstruction of challenging cryo-EM and cryo-ET datasets. Nat. Methods 22, 1486–1494 (2025).
PubMed Google Scholar
Qin, B. et al. Cryo-EM captures early ribosome assembly in action. Nat. Commun. 14, 898 (2023).
PubMed PubMed Central Google Scholar
Jeon, M. et al. CryoBench: diverse and challenging datasets for the heterogeneity problem in cryo-EM. In Proc. 38th Conference on Neural Information Processing Systems 89468–89512 (Curran, 2024).
Hu, M. et al. A particle-filter framework for robust cryo-EM 3D reconstruction. Nat. Methods 15, 1083–1089 (2018).
PubMed Google Scholar
Tan, Y. Z. et al. Addressing preferred specimen orientation in single-particle cryo-EM through tilting. Nat. Methods 14, 793–796 (2017).
PubMed PubMed Central Google Scholar
Liu, Z. et al. Determination of the ribosome structure to a resolution of 2.5 Å by single-particle cryo-EM. Protein Sci. 26, 82–92 (2017).
PubMed Google Scholar
Fan, X. et al. Single particle cryo-EM reconstruction of 52 kDa streptavidin at 3.2 Angstrom resolution. Nat. Commun. 10, 2386 (2019).
PubMed PubMed Central Google Scholar
Baxter, W. T., Grassucci, R. A., Gao, H. & Frank, J. Determination of signal-to-noise ratios and spectral SNRs in cryo-EM low-dose imaging of molecules. J. Struct. Biol. 166, 126–132 (2009).
PubMed PubMed Central Google Scholar
Palovcak, E., Asarnow, D., Campbell, M. G., Yu, Z. & Cheng, Y. Enhancing the signal-to-noise ratio and generating contrast for cryo-EM images with convolutional neural networks. IUCrJ 7, 1142–1150 (2020).
PubMed PubMed Central Google Scholar
Liu, Y.-T., Hu, J. & Zhou, Z. H. Resolving the Preferred Orientation Problem in CryoEM Reconstruction with Self-Supervised Deep Learning (Oxford Univ. Press, 2023).
McInnes, L., Healy, J. & Melville, J. UMAP: uniform manifold approximation and projection for dimension reduction. Preprint at https://arxiv.org/abs/1802.03426 (2018).
Pettersen, E. F. et al. UCSF ChimeraX: structure visualization for researchers, educators, and developers. Protein Sci. 30, 70–82 (2021).
PubMed Google Scholar
Rohou, A. & Grigorieff, N. CTFFIND4: fast and accurate defocus estimation from electron micrographs. J. Struct. Biol. 192, 216–221 (2015).
PubMed PubMed Central Google Scholar
Lawson, C. L. et al. EMDataBank unified data resource for 3DEM. Nucleic Acids Res. 44, D396–D403 (2016).
PubMed Google Scholar
Dosovitskiy, A. An image is worth 16 × 16 words: transformers for image recognition at scale. In International Conference on Learning Representations https://openreview.net/pdf?id=YicbFdNTTy (2021).
Grill, J.-B. et al. Bootstrap your own latent—a new approach to self-supervised learning. In 34th Conference on Neural Information Processing Systems (NeurIPS 2020) https://papers.nips.cc/paper/2020/file/f3ada80d5c4ee70142b17b8192b2958e-Paper.pdf (2020).
Punjani, A., Zhang, H. & Fleet, D. J. Non-uniform refinement: adaptive regularization improves single-particle cryo-EM reconstruction. Nat. Methods 17, 1214–1221 (2020).
PubMed Google Scholar
Yan, Y., Fan, S., Yuan, F. & Shen, H. Simulated cryo-EM particle datasets from paper ‘A comprehensive foundation model for cryo-EM image processing’. Zenodo https://zenodo.org/records/17066236 (2025).
Yan, Y., Fan, S., Yuan, F. & Shen, H. Genuine particle datasets for paper ‘A comprehensive foundation model for cryo-EM image processing’. Zenodo https://zenodo.org/uploads/17066297 (2025).
Yan, Y., Fan, S., Yuan, F. & Shen, H. The resampled CryoBench Ribosembly dataset used in paper ‘A comprehensive foundation model for cryo-EM image processing’. Zenodo https://zenodo.org/records/17066704 (2025).
Yan, Y., Fan, S., Yuan, F. & Shen, H. The reconstruction results of CryoSolver and CryoWizard. Zenodo https://zenodo.org/records/17062718 (2025).

Download references

Acknowledgements

We thank the HPC Center of Westlake University for providing computational facility support and technical assistance. This work was supported by the Ministry of Science and Technology of the People’s Republic of China (2024YFA0916903 to H.S. and 2022ZD0115100 to F.Y.), the National Science Foundation of China (32122042 and 32071208 to H.S. and U21A20427 to F.Y.), the Zhejiang Provincial Natural Science Foundation (DQ24C050001 to H.S.), the Research Center for Industries of the Future, Westlake University and the Westlake Education Foundation (to H.S and F.Y.). We thank our colleagues Y. Shi, H. Yu, P. Lu, D. Ma, Q. Hu, Q. Zhou, J. Wu, Z. Yan, Z. Shi and J. Chai for generously sharing their in-house cryo-EM data. We also acknowledge the use of data from EMDB and EMPIAR for training our models.

Author information

These authors contributed equally: Yang Yan, Shiqi Fan.

Authors and Affiliations

Research Center for Industries of the Future, Westlake University, Hangzhou, China
Yang Yan, Shiqi Fan, Fajie Yuan & Huaizong Shen
School of Engineering, Westlake University, Hangzhou, China
Yang Yan, Shiqi Fan & Fajie Yuan
Zhejiang Key Laboratory of Structural Biology, School of Life Sciences, Westlake University, Hangzhou, China
Huaizong Shen
Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, China
Huaizong Shen
Institute of Biology, Westlake Institute for Advanced Study, Hangzhou, China
Huaizong Shen

Authors

Yang Yan
View author publications
Search author on:PubMed Google Scholar
Shiqi Fan
View author publications
Search author on:PubMed Google Scholar
Fajie Yuan
View author publications
Search author on:PubMed Google Scholar
Huaizong Shen
View author publications
Search author on:PubMed Google Scholar

Contributions

The project was conceived and supervised by F.Y. and H.S. Y.Y. was primarily responsible for training the AI models, while S.F. mainly handled the preparation and processing of cryo-EM data as well as the construction of the automated data-processing pipeline. The initial draft of the manuscript was written by Y.Y. and S.F. and subsequently revised and finalized by F.Y. and H.S. All authors reviewed and provided feedback on the manuscript.

Corresponding authors

Correspondence to Fajie Yuan or Huaizong Shen.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Methods thanks Wah Chiu and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Allison Doerr, in collaboration with the Nature Methods team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Detailed architectures and scale and ablation tests of the AI models.

(a-c) The detailed architectures of the prediction head (a), the projection head (b), and the classifier head (c) in the AI models are illustrated. (d)(e) Scale test of the dataset (d) and backbone (e) sizes on the performance of the Cryo-IEF model. (f)(g) Different training parameters (f) and loss functions (g) on the fine-tuned performance of CryoRanker.

Extended Data Fig. 2 Pipeline for preparing training datasets.

The pre-training dataset contains particles from various sources (EMPIAR, CryoPPP, and In-house datasets). The fine-tuning dataset is a subset of the pre-training one, where particles were processed by 2D classification in CryoSPARC and assigned quality scores adjusted based on Class Ranker evaluation.

Extended Data Fig. 3 Pose-dependent feature clustering in Cryo-IEF.

(a) Cryo-IEF feature distributions for particles from four distinct 2D classes in EMPIAR-10217 (n = 3,000 particles per class; see Supplementary Fig. 2 for 2D classification results). (b) Corresponding feature analysis for three particle clusters in EMPIAR-10096 (n = 3,000 particles per cluster; Supplementary Fig. 3). In both datasets, Cryo-IEF demonstrates consistent ability to separate particles by orientation, as evidenced by distinct clustering patterns corresponding to different projection views.

Extended Data Fig. 4 Evaluation of CryoRanker by precision and recall metrics.

The precision scores in relation to the predicted particle scores (left panel) and the recall values in relation to the labeled particle scores (right panel) of the four genuine particle datasets are displayed, indicating the performance of the CryoRanker model.

Extended Data Fig. 5 Correlation between CryoRanker scores and reconstruction resolutions.

Particles from six genuine datasets were divided into five equal-sized stacks based on CryoRanker scores (highest to lowest). Each stack was independently processed through CryoSPARC’s non-uniform refinement pipeline. Higher CryoRanker scores consistently yielded improved reconstruction resolutions, demonstrating the metric’s effectiveness for particle quality assessment.

Extended Data Fig. 6 Detailed flowchart for the fully automated cryo-EM data processing pipeline, CryoWizard.

The default pipeline is marked by blue arrows, while the pipeline for addressing the preferred orientation problem is marked by orange arrows. The preferred orientation problem is diagnosed by calculating the cFAR value of the refined structure during the initial model search step. Current pipeline is implemented in Python and interfaces with CryoSPARC-tools, an open-source Python library that enables scripted access to the CryoSPARC software package.

Extended Data Fig. 7 Automated structure resolution using CryoWizard with RELION.

CryoWizard incorporating RELION as the data processing platform resolved 3.3-Å cryo-EM map using the dataset EMPIAR-10556. Please refer to Methods for details.

Extended Data Fig. 8 CryoWizard overcomes preferred orientation challenges.

(a) For datasets with severe preferred orientation (for example, EMPIAR-10217), CryoWizard’s clustering module groups Cryo-IEF-extracted features (UMAP visualization) using K-Means + +. Among eight resulting classes, Class 5 particles produce an isotropic template (cFAR score: 0.75) for final refinement, while other classes show varying degrees of orientation bias (cFAR scores shown). (b-e) Manually selected particles from EMPIAR-10217 (b) and EMPIAR-10096 (d) yielded severely anisotropic maps (cFAR=0.01 and 0.03, respectively), whereas CryoWizard processing (c,e) achieved dramatically improved isotropy (cFAR=0.74 for EMPIAR-10217 at 2.37 Å; cFAR=0.34 for EMPIAR-10096 at 2.78 Å). Manual selection details in Supplementary Figs. 2 and 3.

Extended Data Fig. 9 Comparative performance of CryoWizard and spIsoNet in addressing preferred orientation.

The figure compares orientation correction results between conventional processing (a,g), CryoWizard (d,j), and subsequent spIsoNet processing for EMPIAR-10096 (a-f) and EMPIAR-10217 (g-l) datasets. For each dataset: (1) Initial maps from manual particle selection (a,g) and CryoWizard (d,j) were processed through spIsoNet’s Anisotropy Correction module (b,e,h,k) or Misalignment Correction module (c,f,i,l). CryoWizard-generated maps served as superior inputs for spIsoNet processing compared to conventional maps, demonstrating the complementary strengths of both approaches. Manual selection protocols are detailed in Supplementary Figs. 2 and 3, with spIsoNet parameters described in Methods.

Supplementary information

Supplementary Information (download PDF )

Supplementary Figs. 1–4 and Tables 1–5.

Reporting Summary (download PDF )

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yan, Y., Fan, S., Yuan, F. et al. A comprehensive foundation model for cryo-EM image processing. Nat Methods 23, 88–95 (2026). https://doi.org/10.1038/s41592-025-02916-8

Download citation

Received: 22 November 2024
Accepted: 20 October 2025
Published: 27 November 2025
Version of record: 27 November 2025
Issue date: January 2026
DOI: https://doi.org/10.1038/s41592-025-02916-8

This article is cited by

A comprehensive foundation model for cryo-EM image processing
- Yang Yan
- Shiqi Fan
- Huaizong Shen
Nature Methods (2026)