Background & Summary

Background

Cryo-electron microscopy (cryo-EM) is a revolutionary biological imaging technology that enables near-atomic resolution 3D structure determination of biomolecules in their native states1,2. This technique has become indispensable in structural biology and drug discovery, particularly for studying macromolecular complexes, protein folding, and viral structures3,4. By rapidly freezing samples, cryo-EM preserves the native conformations of biomolecules, providing a powerful tool for investigating heterogeneous or difficult-to-crystallize targets5,6,7,8.

Cryo-EM Data Processing Workflow

The cryo-EM image processing pipeline consists of a series of critical steps aimed at reconstructing high-resolution 3D structure from raw data. The workflow typically includes motion correction, Contrast Transfer Function (CTF) estimation, micrograph curation, particle picking, and 3D reconstruction (Fig. 1).

Fig. 1
figure 1

Overview of cryo-EM data processing pipeline. The pipeline covers several steps from movie capturing, motion correction, CTF estimation, micrograph curation, particle picking and 3D reconstruction. (a) demonstrates rigid motion, bending motion and patch-wise motion estimation during of movies. (b) shows the 2D fitting data obtained during CTF estimation of micrographs, with the upper part of the image representing reciprocal space and the lower part showing simulated Thon rings. (c) evaluates the quality of images to determine their suitability for reconstruction. (d) accurately picks all particles from the micrograph. (f) performs 2D classification and averaging of the picked particles, removing unsuitable ones. (g) estimates particle poses and performs reconstruction without prior pose information. Finally, (h) refines the reconstructed volume data to achieve high-resolution results.

Motion correction

During cryo-EM imaging, samples are rapidly frozen and then imaged using an electron beam, producing a time-series of images known as movies. Due to noise and sample drift, these movies require motion correction. Motion correction involves aligning and averaging frames from cryo-EM movies to generate a single micrograph, significantly improving SNR by reducing motion blur. During imaging, samples often undergo complex 3D deformations due to electron beam exposure, leading to anisotropic motion9, where different regions of the image shift in varying directions. Traditional methods10,11,12,13,14 such as MotionCor214, rely on frame alignment and image processing algorithms like optical flow and subpixel alignment.

CTF estimation

Contrast transfer function (CTF) estimation involves determining the defocus parameters of the objective lens to correct phase and amplitude modulations during cryo-EM imaging process. The CTF describes how microscope contrast varies with spatial frequency, and its phase inversion effects significantly impact image quality. Uncorrected micrographs can lead to particle phase cancellation, limiting reconstruction resolution. Current methods15,16,17,18,19, such as goCTF17, patch-based CTF18 and CTFFIND416, rely on Thon ring features to match the micrograph’s power spectrum with a CTF model. While they can estimate accurate CTF parameters, there is no existing CTF correction method robust to more comprehensive experimental settings including solution conditions and hardware.

Micrograph curation

Micrograph curation is crucial for high-resolution 3D reconstruction. Due to challenges such as uneven ice thickness, sample impurities, and low SNR, many micrographs are of insufficient quality for high-precision reconstruction. Traditional curation methods rely on CTF estimation results, using thresholds for defocus, astigmatism, and CTF fitting parameters to filter high-quality data.

Particle picking

Particle picking is a critical step in cryo-EM image processing, aiming to locate particles in micrographs for 3D reconstruction. Its accuracy directly impacts reconstruction resolution and efficiency. However, low SNR and sample heterogeneity make this task challenging. Traditional methods, such as DoG Picker20, Xmipp21, AutoCryoPicker22, and KLT Picker23, rely on template matching or local contrast enhancement but struggle with noise and complex samples.

3D reconstruction

3D reconstruction is the final step in cryo-EM processing, where particles are classified and aligned, and a 3D model is constructed. This step requires accurate particle picking and micrograph curation to ensure high-quality reconstructions. Current software tools like CryoSPARC24, RELION14, and Scipion13 offer sophisticated algorithms for 3D reconstruction, with each providing unique strengths in handling noise, particle heterogeneity, and resolution enhancement.

Cryo-EM in the Deep Learning Era

Deep learning has revolutionized scientific imaging, driving breakthroughs that were once considered unattainable. In cryo-EM, it has become a cornerstone for data analysis, enabling significant advancements across the entire image processing pipeline.

Motion correction

Has traditionally been hindered by the extremely low SNR of cryo-EM images, making precise, pixel-level motion estimation a formidable challenge. Recent deep learning-based approaches, such as Noiseflow25 and DST-net26, address these limitations by leveraging synthetic cryo-EM movie data for training. These methods achieve state-of-the-art performance, improving motion correction accuracy and enhancing image quality.

Micrograph curation

Has similarly benefited from deep learning’s ability to improve efficiency, consistency, and accuracy. Automated curation tools, including MicrographCleaner27 and Miffi28, have demonstrated their potential in overcoming challenges like subjective bias and manual inefficiency. For instance, Miffi achieves a 93% higher accuracy than traditional CTF-based methods, setting a new standard for automated micrograph curation.

Particle picking

Remains one of the most critical and challenging steps in cryo-EM processing due to low SNR and sample heterogeneity. Deep learning methods29,30,31,32,33,34,35,36,37,38,39,40, such as APPLE Picker31, crYOLO32, and Topaz33, have significantly enhanced particle localization efficiency and accuracy. Furthermore, recent Transformer-based models like CryoTransformer41, CryoMAE42, and CryoSegNet43 demonstrate remarkable improvements in handling complex samples, setting a new benchmark in particle picking.

Challenges in Constructing Cryo-EM Foundation Model Training Data

Over the past decades, data-driven deep learning methods have achieved unprecedented progress in scientific imaging44,45. Recent advances in foundation models46 have demonstrated remarkable generalization capabilities by leveraging self-supervised learning on large-scale, high-quality datasets47,48. These models have been widely applied to various medical imaging modalities, including CT49,50, X-ray51, ultrasound52, and digital pathology53,54,55,56,57,58,59,60,61, significantly improving downstream task performance.

However, these advancements rely on large-scale, high-quality datasets specific to each imaging modality. Merlin62, a 3D vision language models is trained on a high-quality clinical dataset of paired CT scans (6+ million images from 15,331 CTs), EHR diagnosis codes (1.8+ million codes), and radiology reports (6+ million tokens). UNI61, a general-purpose self-supervised model for pathology, is pretrained using more than 100 million images from over 100,000 diagnostic H&E-stained WSIs (>77 TB of data). RETFound59, a foundation model for retinal images is trained on 1.6 million unlabelled retinal images by means of self-supervised learning and then adapted to disease detection tasks with explicit labels. In cryo-EM, the extremely low signal-to-noise ratio (SNR) and complex sample characteristics11,63,64,65,66,67,68 make it challenging to obtain high-quality images without extensive and meticulous processing. This difficulty has hindered the development of foundation models for cryo-EM.

The largest public repository for cryo-EM data, EMPIAR69, has become a critical resource for algorithm development in the field (see Table 1). The database hosts more than 1,700 single-particle cryo-EM datasets and 10 million cryo-EM movies or micrographs, encompassing diverse stages from sample preparation to image processing. Additionally, the database continues to grow steadily.However, the preprocessed data in EMPIAR comes from diverse sources with inconsistent quality and formats, limiting the efficient construction of high-quality datasets. To address this, we introduce CryoCRAB, the first large-scale, standardized dataset designed for training cryo-EM foundation models. CryoCRAB comprises 152,385 sets of raw movie frames, covering 746 distinct proteins, with a total data volume of 116.8 TB. To generate high-quality micrographs from raw data, we developed an automated processing pipeline incorporating motion correction and Contrast Transfer Function (CTF) estimation. Specifically, to support denoising-related pretraining tasks70, we split each movie into odd and even frames, processing them separately to generate paired micrographs.

Table 1 Overview of recent deep learning methods for cryo-EM image processing tasks.

Additionally, CryoCRAB offers unique visualization and filtering capabilities. We provide detailed statistical analyses of each micrograph, including intensity distributions, CTF parameters (e.g., resolution and defocus), and pixel motion characteristics. These features enable users to filter subsets of data tailored to specific needs, enhancing training efficiency. To optimize storage and access, CryoCRAB employs the HDF5 format with chunked storage, which significantly improves random sampling efficiency and reduces I/O bottlenecks compared to traditional image formats. This efficient data organization ensures reliable support for large-scale foundation model training and complex computational tasks.

We believe that CryoCRAB will provide diverse data support for foundation model development in cryo-EM, driving the widespread adoption of deep learning in cryo-EM data processing and offering new tools for exploring protein structures and dynamics.

Methods

CryoCRAB is a large-scale, high-quality dataset constructed through a systematic data processing pipeline, specifically designed for training cryo-EM foundation models (see Fig. 2). The pipeline encompasses multiple steps, including data collection, curation, processing, model-training preprocessing, and efficient storing, ensuring both the diversity and quality of the dataset.

Fig. 2
figure 2

Overview of CryoCRAB Dataset. The crucial processing steps includes EMPIAR crawling, motion correction, CTF estimation, micrograph curation, and pre-processing. (a) We crawl file path information and experimental metadata from the EMPIAR database and download the curated movies and gain files. (b) We perform gain correction and motion correction for movies to obtain two types of motion annotations, full-diff micrograph pairs and background estimates. (c) We perform CTF estimation for micrographs to estimate CTF parameters such as defocus value, astigmatism, and phase shift. (d) We curate the processed images based on median intensity, rigid motion statistics, and CTF estimation statistics, which classify the quality of images from 0 to 7. (e) We propose a cryo-EM micrograph pre-processing pipeline to transform the images into the input format required for pre-training models by background subtraction, band-limit CTF filtering, contrast normalization and Z-score standardization.

First, our cryo-EM foundation model dataset is sourced from the EMPIAR public database. The foundation models trained on CryoCRAB are applicable to cryo-EM downstream tasks such as motion correction, CTF estimation, and particle picking, all of which take movies or micrographs as input (see Section 2). We obtained raw cryo-EM movies and associated metadata from EMPIAR, followed by rigorous data cleaning and filtering to ensure reliability and accuracy (see Section 2.1). Subsequently, we processed the data using CryoSPARC with standard steps including motion correction, CTF estimation, and micrograph curation. Notably, we splited each raw movie into odd and even frames, generating full-even-odd micrograph triplets (see Section 2.2).

During preprocessing, we uniformly removed background from images, band-limited the frequency domain to 3Å, applied CTF filtering, and performed contrast normalization to filter low image quality outliers. We also computed the mean and standard deviation of the images for Z-score normalization during training (see Section 2.3). Furthermore, we adopted an efficient storing strategy by converting each full-even-odd micrograph triplet into a full-diff micrograph pair and storing the outlier-removed image data along with normalization parameters in HDF5 format, which significantly improves the data I/O speed, ensuring efficiency during large-scale model training (see Section 2.4).

Input Data Formats for Cryo-EM Foundation Model Pre-training

To construct cryo-EM foundation models, CryoCRAB leverages raw cryo-EM movies as the primary source of image information, addressing the challenge of limited annotations in the field. From these movies, we derive full-even-odd micrograph triplets and generate high-quality annotations, forming the core of the CryoCRAB dataset. This comprehensive approach enables effective pre-training of foundation models, which can be fine-tuned for downstream tasks such as motion correction, CTF estimation, micrograph denoising and micrograph curation.

In cryo-EM workflow, the primary forms of image data are movies, micrographs, and particles. Movies are dynamic image sequences composed of multiple frames, typically captured by direct detector devices (DDDs)71, which offer significantly higher detective quantum efficiency (DQE) compared to traditional cameras. This technology allows cryo-EM to record micrographs as multi-frame movies rather than single-exposure images. These movies capture dynamic information, aiding subsequent processing steps, such as motion correction. Micrographs are generated by corrected movie frames and serve as the basic unit for further analysis. Particles, extracted from micrographs through particle picking techniques, represent individual molecules or fragments used for high-resolution structure reconstruction and modeling. In particular, since the acquisition of particles requires extensive annotation, CryoCRAB does not include particle data in its unannotated dataset.

To enhance the effectiveness of foundation model training, the CryoCRAB dataset not only includes traditional micrographs generated by full-frame averaging but also incorporates even-odd micrographs created by averaging odd and even frames separately. These three types-full, even, and odd micrographs-are collectively referred to as full-even-odd micrograph triplets. This approach is motivated by insights into noisy image restoration, particularly inspired by the Noise2Noise (N2N)72 method, which demonstrates that image denoising can be achieved using only pairs of noisy images. In the context of cryo-EM, these pairs correspond to the even-odd frames derived from the same movie, a concept first effectively utilized for denoising by Topaz73. Leveraging the characteristics of cryo-EM transmission imaging, each full-even-odd micrograph triplet can be efficiently represented as a full micrograph and a difference of even and odd micrographs (diff micrograph), forming a full-diff micrograph pair.

EMPIAR Datasets Collection and Curation

When constructing the foundation model dataset, we sourced cryo-EM data from public databases rather than direct experimental acquisition. EMPIAR is one of the largest cryo-EM data repositories, and we accessed EMPIAR through its REST API to automatically retrieve all EMPIAR entries as well as the associated EMDB74 entries and PDB75 entries. The EMPIAR entries cover basic image information (e.g., image size and pixel size), while the EMDB and PDB entries provide further experimental and structural details (see Fig. 2).

We select 200 raw movies per dataset in order to make a trade-off between storage and diversity. These data are further processed for model training. To meet the requirements for data diversity and high quality in model training, we performed a preliminary filtering process. First, we ensured that the selected datasets are generated using single-particle analysis (SPA) and contain raw movie data, resulting in 746 EMPIAR datasets. On one hand, we aimed to include as many sample preparation and imaging conditions as possible. On the other hand, we recognized the unique value of images captured under different conditions but belonging to the same protein for foundation model training. Therefore, we did not filter datasets by protein type at this stage.

EMPIAR datasets are available in various formats, primarily TIFF, MRC, and EER. TIFF is commonly used for storing multi-frame images with compression, suitable for smaller datasets. MRC stores raw data directly, resulting in larger files that are less storage efficient. EER, supporting thousands of frames, is one of the most primitive formats in cryo-EM. The gain file, which records pixel-specific correction factors, is an essential component of the dataset. However, the formats for gain files in EMPIAR are inconsistent, including DM4, DAT, MRC, and TIFF. Processed gain data is typically stored in MRC or TIFF formats, while raw gain data may be provided in DM4 or DAT formats. To address these inconsistencies and standardize gain data, we utilize the open-source software EMAN2 to convert gain files from DM4 and DAT formats into the MRC format.

These data curation and preprocessing steps ensure that each dataset meets high-quality standards, providing both diversity and consistency for model training.

Processing Movies to Micrographs using CryoSPARC

We employ CryoSPARC for a standardized data processing pipeline. First, we used CryoSPARC’s Patch Motion Correction to align and average movie frames, generating motion-corrected full-even-odd micrograph triplets. Next, we performed CTF estimation to determine key parameters such as defocus, tilt angle, and astigmatism, which are essential for subsequent CTF filtering. Finally, after motion correction and CTF estimation, we curated and annotated the micrographs to ensure high-quality data for foundation model training and downstream task validation.

Motion correction

Motion correction is a critical step in cryo-EM data processing, aimed at compensating for sample displacement caused by radiation pressure, mechanical vibrations, or environmental instability during electron microscopy exposure. This step improves the signal-to-noise ratio (SNR) and contrast of the final images. The motion correction pipeline, as illustrated in Fig. 3, includes: (1) applying gain correction on each frame, (2) estimating motion trajectories from movies, and (3) separately correcting and averaging odd and even frames. CryoCRAB utilizes CryoSPARC’s Patch Motion Correction for this purpose. By integrating experimental parameters from the empiar entries, we generated full-even-odd micrograph triplets from movies and gain files, along with motion data. Notably, leveraging the characteristics of cryo-EM transmission imaging, we converted the full-even-odd micrograph triplet into a full-diff micrograph pair for efficient storage.

Fig. 3
figure 3

Details of Motion Correction Pipeline. (a) The pipeline starts with the input of raw movies and their corresponding gain reference. First, gain correction is applied to address the detector’s non-uniform response. Then, We use the patch motion correction algorithm in CryoSPARC to estimate motion for each frame, followed by motion correction and alignment of even and odd frames separately. Notably, CryoSPARC performs background estimation and background subtraction after the motion correction. Finally, to reduce data storage overhead, the even micrograph is subtracted from the odd micrograph to generate the full-diff MRC pair. (b) The left image shows the patch-wise motion estimation for all frames, with the starting frame in blue and the ending frame in yellow. The global rigid motion and local patch-level bending motion are combined and amplified by a factor of 20 for visualization. The right image displays the pixel-wise optical flow estimation obtained through spline interpolation, as well as the patch-wise motion direction of the current frame relative to the previous frame. (c) We generate the full-even-odd triplet by combining motion-corrected even and odd frames. This approach supports Noise2Noise training. (d) We perform background estimation on the motion-corrected images to separate and subtract the background, enhancing the image contrast.

Gain correction compensates for the non-uniform response of the detector, ensuring uniform and accurate image intensity. The gain file, records correction factors for each pixel. Depending on the detector type, these factors are applied differently. For example, Gatan k2/k3 cameras require dividing raw pixel values by the correction factor, while Falcon 4 cameras require multiplication. Due to the absence of gain reference metadata in EMPIAR, we manually determine the optimal parameters by evaluating all possible flip and rotation combinations for their impact on the uniformity of contrast in the gain-corrected images.

We categorize motion into two types based on its source: (1) Rigid Motion: This refers to mechanical drift of the sample-stage during long-exposure imaging76. Experimental evidence suggests that most observed motion arises from beam-induced bending rather than mechanical instability77. (2) Bending Motion: This is associated with the interaction between the support foil and grid bars due to differential cooling rates during plunge freezing9. The slower cooling of grid bars creates transient tensile stress, which, upon release, can subject the sample to compressive stress, leading to radiation-induced creep and sample warping under electron beam exposure.

For rigid motion, we model the overall sample movement caused by thermal expansion or microscope drift without accounting for anisotropic motion from ice layer changes. This is achieved by iteratively optimizing the full-frame trajectory to maximize inter-frame correlation. For bending motion, we divide the movie into overlapping patches and estimate the displacement of each patch per frame under a spatially and temporally smooth motion model to account for anisotropic motion caused by ice layer changes.

During frame averaging, motion correction results are applied separately to odd and even frames, producing odd-micrographs and even-micrographs. Averaging these yields the full-micrograph. These corrected micrographs effectively eliminate motion blur caused by ice layer changes or equipment vibrations, enhancing contrast and high-frequency information. CryoCRAB records the full-even-odd micrograph triplet, along with the estimated rigid and bending motion data.

CTF estimation

Contrast transfer function (CTF) estimation involves determining the defocus parameters of the objective lens, particularly the defocus, from cryo-EM micrographs. The CTF describes how lens aberrations, including defocus, affect the contrast of the recorded images. By fitting the microscope’s CTF model to the image’s amplitude spectrum, defocus parameters can be estimated, enabling subsequent image correction and processing. This correction is necessary because the CTF introduces frequency-dependent amplitude modulation in the image78.

In single-particle analysis, samples are typically thin (20–100 nm) and can be treated as weak-phase objects77. However, biological macromolecules in solution are primarily composed of light elements, which have a weak effect on the phase of electron waves. As a result, images captured without defocus exhibit minimal or no contrast. Therefore, cryo-EM introduces a controlled amount of defocus during imaging to generate phase contrast. However, due to stage tilt, uneven sample surfaces, or non-uniform particle distribution along the optical axis, the CTF may vary across the micrograph.

CryoCRAB employs CryoSPARC’s Patch CTF Estimation for CTF estimation. As illustrated in Fig. 4, we estimated spatially and temporally smooth defocus distributions for tilted, bent, and deformed samples. The CTF parameter model used by CryoSPARC is consistent with the simplified version in CTFFIND416 for computational efficiency79,80. In subsequent preprocessing steps, we also used this CTF parameter model for CTF filtering to enhance image contrast.

$${\bf{CTF}}(\lambda ,{\bf{g}},\Delta {f}_{1},\Delta {f}_{2},{{\rm{C}}}_{s},\Delta \varphi ,\omega ,\alpha )=\cos \,[{\boldsymbol{\chi }}(\lambda ,{\bf{g}},\Delta {f}_{1},\Delta {f}_{2},{{\rm{C}}}_{s},\Delta \varphi ,\omega ,\alpha )]$$
(1)
Fig. 4
figure 4

Details of CTF Estimation. (a) The CTF estimation pipeline begins with the input of a micrograph. We first perform initial CTF estimation, followed by calculating the envelope function to correct for attenuation effects. Next, we optimize CTF parameters through 2D CTF estimation and patch-wise CTF refinement, ultimately generating a 2D defocus landscape to visualize the defocus variation across the entire micrograph. (b) We display the 1D search plot, where the peak reflects the optimal defocus value, indicating the best match between the ideal CTF curve and the experimental data at that defocus value. (c) The 1D CTF fit is used to evaluate data quality and the accuracy of CTF fitting. The gray line represents the radial average of the image power spectrum, with its oscillations (Thon rings) reflecting CTF characteristics. The red line shows the ideal CTF curve obtained through patch CTF estimation, representing the average defocus of the micrograph. The blue line indicates the correlation between the power spectrum and the ideal CTF, with the green vertical line marking the CTF fit resolution as a reference for data quality. (d) We analyze the uniformity of the ice layer by estimating the ice thickness. The ice thickness is calculated by comparing the background signal centered at 0.265 Å−1 with a broader frequency band, where a higher background signal typically indicates thicker ice.

The CTF parameter model used by CryoSPARC is a 2D cosine function of the frequency-dependent phase shift χ. g is the spatial frequency vector, and λ is the electron wavelength. Δf1 and Δf2 are the most critical parameters in CTF estimation, used to calculate the microscope’s defocus and astigmatism. The remaining four parameters are optical: Cs is the spherical aberration, Δφ is the additional phase shift introduced by a phase plate (if absent, Δφ = 0), ω is the proportion of total contrast due to amplitude contrast (e.g., electrons scattered outside the objective aperture or energy-filtered81), and α is the azimuthal angle or astigmatism angle, representing the angle between the image x-axis and the Δf1 direction.

$$\begin{array}{l}{\boldsymbol{\chi }}(\lambda ,{\bf{g}},\Delta {f}_{1},\Delta {f}_{2},{{\rm{C}}}_{s},\Delta \varphi ,\omega ,\alpha )=\pi \lambda [({\rm{DF}}+{{\rm{df}}}_{xx})\langle {{\bf{g}}}_{x},{{\bf{g}}}_{x}\rangle +2{{\rm{df}}}_{xy}\langle {{\bf{g}}}_{x},{{\bf{g}}}_{y}\rangle +\\ ({\rm{DF}}-{{\rm{df}}}_{xx})\langle {{\bf{g}}}_{y},{{\bf{g}}}_{y}\rangle ]-\frac{1}{2}\pi {\lambda }^{3}{{\rm{C}}}_{s}{\langle {\bf{g}},{\bf{g}}\rangle }^{2}+\Delta \varphi -{\cos }^{-1}(\omega )\end{array}$$
(2)

Here, , denotes the vector inner product, and gxgy are the x- and y-axis components of the 2D frequency vector g. \(\,{\rm{DF}}\,:\,=\frac{1}{2}(\Delta {f}_{1}+\Delta {f}_{2})\) is the defocus along the optical axis, \(\,{\rm{df}}\,:\,=\frac{1}{2}(\Delta {f}_{1}-\Delta {f}_{2})\) is half the astigmatism along the optical axis, and \({{\rm{df}}}_{xx}\,:\,=\cos (2\alpha ){\rm{df}}\) and \({{\rm{df}}}_{xy}\,:\,=\sin (2\alpha ){\rm{df}}\) represent astigmatism along the Δf1 and Δf2 directions, respectively. We provide a more efficient method for calculating the phase shift χ. Specifically, for the same dataset, we no longer need to recompute defocus for all frequency components when Δf1 and Δf2 change. Instead, we only calculate four variables: DF, df, dfxx, and dfxy, with the latter two depending solely on the azimuthal angle α and not on the frequency component direction.

Cryo-EM samples are often not perfectly “flat”82. Before freezing, particles tend to concentrate near the air-water interface, and the ice surface itself is often irregular83. Since defocus affects the CTF, particles in the same image may have different defocus values, leading to varying contrast transfer functions. We computed a Bézier-curve-smoothed defocus surface by examining multiple regions of the micrograph. First, we performed a coarse CTF estimation assuming no astigmatism on entire micrograph, finding the best-fit defocus by correlating with the radially averaged power spectrum. Next, we used this coarse defocus estimate to compute a new envelope function84, followed by estimating the 2D CTF for the entire micrograph, including astigmatism. Finally, we refined the defocus estimation for each patch, fitting these patch CTF estimates to a spline function to estimate local defocus across the micrograph.

By utilizing non-dose-weighted full micrographs, we derived smooth defocus estimates for specified 2D coordinates, accompanied by a comprehensive set of CTF-related parameters. These parameters include the defocus values Δf1 and Δf2 along the two principal axes, defocus DF along the optical axis, astigmatism df, azimuthal angle α, amplitude contrast proportion ω, and relative ice thickness.

Micrograph curation

The quality of cryo-EM images is influenced by various factors, including sample solution conditions, microscope acquisition parameters, and the target protein type. Existing approaches for cryo-EM image quality assessment, such as the manual labeling scheme proposed by Miffi28, suffer from limitations including incomplete standards, subjective inconsistencies, and a lack of automation. To address these challenges, we designed a comprehensive quality curation scheme based on seven key metrics to annotate each micrograph in a unified and automated manner. These parameters include the Median Intensity of motion-corrected micrographs, the Total Rigid Motion and Total Rigid Motion Curvature derived from motion correction, and four CTF-related statistics: CTF Fit Resolution, Tilt Angle, Defocus Range, and Astigmatism. These metrics help us annotate quality of micrographs, ensuring that high-quality images are selected for subsequent model training. For each metric, we calculate the mean (μ) and standard deviation (σ) across the dataset, using the 3σ interval to determine whether a metric falls within the acceptable range. The image quality score is computed as follows: each metric within the dataset’s acceptable range contributes 1 to the quality score, resulting in a maximum quality score of 7 and a minimum of 0. As shown in Fig. 5, micrographs are categorized into low quality (0–2), medium quality (3–5), and high quality (6–7) based on this scoring system.

Fig. 5
figure 5

Results of Micrograph Curation. We display micrographs categorized into low quality (0–2), medium quality (3–5), and high quality (6–7) based on our quality screening criteria. To evaluate cryo-EM image quality, we design a screening scheme based on seven key parameters, including median intensity after motion correction, total rigid motion and its rate of change (total rigid motion curvature), as well as CTF fit resolution, tilt angle, defocus range, and astigmatism derived from CTF estimation. For each parameter, we calculate the mean (μ) and standard deviation (σ) of the dataset and use the 3σ interval to annotate each parameter. Each parameter within the screening interval contributes 1 point to the image quality score, resulting in a final quality score ranging from 0 to 7. Through this screening process, we have observed a significant improvement in micrograph quality as the quality score increases.

Algorithm 1

Contrast Normalization Algorithm.

Median intensity

The median pixel intensity of each micrograph is calculated to assess overall brightness and contrast. Cryo-EM images often exhibit strong background noise, particularly when the ice layer is uneven, leading to significant variations in noise intensity across regions. By focusing on the median value, this metric effectively mitigates the influence of background noise, ensuring that the evaluation is more representative of protein particle regions. We also observed that protein particle signals in micrographs are typically weaker than background signals. To more accurately estimate protein pixel intensity using the median, we first applied contrast normalization (see Algorithm 1) before computing the median intensity.

Total rigid motion

During motion correction, multiple consecutive frames are aligned and averaged to produce a single micrograph, significantly improving the signal-to-noise ratio. Rigid motion estimation measures the positional offset of each frame relative to a certain frame to achieve maximum global correlation. Total Rigid Motion represents the cumulative rigid motion between adjacent frames in a movie, reflecting the overall displacement of the sample during acquisition. By calculating rigid motion between frames, we can assess sample stability during imaging and identify potential displacement errors or sample drift issues.

Total rigid motion curvature. In addition to Total Rigid Motion, we compute Total Rigid Motion Curvature, which represents the cumulative rate of change in rigid motion between adjacent frames. A high curvature value often indicates significant motion variations during exposure, which tends to degrade image quality. This metric helps identify abrupt or rapidly changing motions that could destabilize the image, providing valuable information for image curation.

CTF fit resolution. We calculate the correlation between the micrograph’s power spectrum and the ideal CTF derived from Patch CTF Estimation. The spatial frequency corresponding to a correlation threshold is defined as the CTF Fit Resolution. This metric is not a hard constraint on the quality of the data but rather than a reference. For example, high-resolution fits may indicate the presence of carbon layers, while low-resolution fits could result from crystalline ice, motion correction failure, or severe radiation damage.

Tilt angle. Unlike “beam tilt” or “coma aberration” in cryo-EM, the Tilt Angle refers to the angle between the tilt axis and the image coordinate axes. To mitigate preferential orientation issues, some SPA experiments collect data with a specific tilt angle applied to the sample stage. However, this angle is not directly input into CryoSPARC during processing but is indirectly estimated through the defocus landscape fitted during Patch CTF estimation. CryoSPARC estimates tilt information by computing the defocus tilt normal vector. Specifically, for a given micrograph, the defocus tilt normal vector [normal[0], normal[1], −1] represents the plane’s normal in 3D coordinates. By normalizing this vector and computing its dot product with the unit normal vector [0, 0, −1], the cosine of the tilt angle is derived, from which the actual tilt angle is calculated.

Defocus range. CryoSPARC’s Patch CTF Estimation measures the defocus landscape across the entire micrograph. The Defocus Range is the difference between the maximum and minimum defocus values within a micrograph, reflecting variations in focus. A large defocus range may indicate uneven focusing, potentially leading to regions with varying contrast in the image.

Astigmatism. Astigmatism, caused by lens asymmetry, typically manifests as focal shifts in certain regions of the image, resulting in elliptical distortion in the frequency domain. Severe astigmatism can blur the image, adversely affecting subsequent reconstruction quality.

Pre-processing of Cryo-EM Micrograph for the Contrast Enhancement

In cryo-EM data processing, preprocessing steps are crucial for enhancing the quality of images used for model training. We employ several key steps during data preprocessing: background subtraction, band-limiting and CTF filtering, contrast normalization, and Z-score standardization. First, we used Gaussian blur to remove background variations caused by uneven ice layers, thereby enhancing the contrast between particles and the background. Next, we applied band-limiting to the frequency domain and performed CTF filtering to reduce the impact of low signal-to-noise ratios and improve image quality. To further enhance visualization, we adjust the pixel value range through contrast normalization, focusing on protein particle regions to improve the contrast between particles and the background. Additionally, we applied Z-score standardization to normalize pixel values to a distribution with zero mean and unit variance, eliminating brightness variations due to acquisition conditions and ensuring data consistency and training stability. These steps lay a solid foundation for subsequent image analysis, particle picking, and model training.

Algorithm 2

Background Subtraction.

Background subtraction

In cryo-EM micrographs, uneven ice thickness affects image processing and the quality of the final reconstructed density maps. Specifically, non-uniform ice distribution leads to variations in contrast between particles and the background across different regions of the micrograph, particularly impacting the accuracy of particle picking models. To address this, we used Gaussian blur to estimate the ice background and subtracted it from the original image, achieving uniform contrast. The implementation details can be found in Algorithm 2. Notably, CryoSPARC’s Patch Motion Correction algorithm automatically performs background subtraction, while other motion correction algorithms without this feature require an additional step for background removal.

Algorithm 3

CTF Filtering.

Band-limit and CTF filtering

To mitigate the impact of low signal-to-noise ratios on model training, we applied band-limiting and CTF filtering to the micrographs. Specifically, we first downsampled the micrographs to a pixel size of about 3 Åto minimize high-frequency noise. Then, using the CTF parameters obtained from CTF estimation, we inverted the CTF before the first peak and applied phase flipping after the first peak. Before the first peak, we multiplied the frequency domain by the reciprocal of the CTF curve; after the first peak, we applied phase flipping by multiplying by the sign of the CTF curve. The implementation details are in Algorithm 3.

Contrast normalization

In cryo-EM data processing, the pixel values in raw cryo-EM movies typically represent the number of electrons detected by the beam at a specific time, usually in integer format. After preprocessing steps such as gain correction and motion correction, the resulting micrograph pixel values represent the corrected cumulative electron dose, typically in floating-point format. However, cryo-EM micrographs often exhibit significant noise and low contrast, making it difficult to distinguish protein particles from the background. To improve visualization, we introduced contrast normalization. The core idea is that protein particle pixel values are concentrated in a narrow range, while background regions (e.g., ice and carbon layers) have extreme pixel values. By adjusting the pixel value range to focus on the protein particle region, we enhance the contrast between particles and the background. The implementation details are in Algorithm 1.

Z-score standardization

In cryo-EM data processing, for each movie image, we independently applied z-score standardization to the full-even-odd triplet obtained from motion correction. The purpose of z-score standardization is to standardize the pixel values of each image to a distribution with zero mean and unit variance, eliminating brightness variations caused by differences in acquisition conditions or background noise. This not only improves numerical stability during training but also accelerates model convergence. Additionally, when using the Noise2Noise loss function for model training, ensuring consistent pixel value distributions across frames or images is critical. If the distributions differ significantly, the model may fail to learn the noise-to-clean mapping effectively, leading to training instability or failure. Through z-score standardization, we ensure data consistency, avoiding training instability due to brightness or noise distribution variations.

The contrast enhancement process from raw movies to preprocessed micrographs is demonstrated in Fig. 6. Each preprocessing step (e.g., background subtraction, band-limiting, CTF correction, contrast normalization) significantly contributes to improving image contrast, further enhancing the quality and effectiveness of the training data. By comparing preprocessed and original images, we can clearly see how each processing step enhances image quality, providing better input data for training foundational models. While contrast improvements can aid in training some deep learning models, contrast normalization or CTF-filtered micrographs may not be ideal for cryo-EM tasks related to particle reconstruction. Contrast normalization, for example, can lead to information loss by clipping image values into a narrower range. Additionally, traditional reconstruction methods, such as RELION14, do not apply CTF correction directly to the images. Instead, they backproject the effect of the CTF to volume space and use the CTF to regularize the reconstructed 3D volume. For these reasons, we recommend that readers avoid using contrast-enhanced micrographs for reconstruction-related cryo-EM tasks.

Fig. 6
figure 6

Validation details of CryoCRAB. The left column shows the micrograph images and their corresponding frequency domain representations, while the right column displays the intensity histograms of the images along with their minimum and maximum values. The preprocessing pipeline includes the following steps: (a) Input raw images: Displays the original unprocessed images and their frequency domain characteristics. (b) Background subtraction: Eliminates background variations caused by ice layer inhomogeneity using Gaussian blur, enhancing the contrast between signals and the background. (c) Band-limit to 3 Å: Applies band-limiting to the frequency domain of the images, reducing the impact of low signal-to-noise ratio regions and improving image quality. (d) CTF Filtered: Applies CTF filtering to further optimize the signal-to-noise ratio of the images. (e) Normalization and Standardization: Adjusts the pixel value range through contrast normalization, focusing on protein particle regions, followed by Z-score standardization to transform pixel values into a distribution with zero mean and unit variance, eliminating brightness bias and ensuring data consistency.

Storing Data into Half-Precision Full-Diff Micrograph Pairs in HDF5

In cryo-EM data storage, efficiently managing large volumes of micrograph data is a critical challenge. To address this, we propose an optimized data storage strategy aimed at reducing storage requirements and improving data access efficiency. First, we converted each full-even-odd triplet into a full-diff micrograph pair, leveraging the relationships “full = even + odd” and “diff = even - odd” to save one-third of the storage space. Next, we used the HDF5 format for data storage, which supports large-scale data management and random access. The chunked storage and compression capabilities of HDF5 significantly accelerate data reading while reducing storage space. Additionally, we converted data from single-precision float format to half-precision float format, maintaining image quality while further reducing storage requirements and I/O overhead. These techniques provide an efficient and scalable storage solution for large-scale data processing and model training.

Half-Precision Block Storage

To accommodate the random cropping strategy commonly used in large-scale model training, we adopted a storage scheme better suited for high-resolution images. Traditional cryo-EM micrographs are typically stored in MRC files following the CCP4 format, but MRC format suffers from significant I/O bottlenecks during large-scale random access, which can severely impact training efficiency. Therefore, we introduced HDF5-based storage, which supports efficient chunked storage and significantly accelerates random access. Additionally, HDF5 supports various compression methods to reduce storage space and is compatible with multiple data types, allowing the embedding of metadata (e.g., annotations and statistics) for easier data management and usage. Furthermore, we optimized data precision. Experiments show that converting micrographs from float32 to half-precision float16 format has minimal impact on 3D reconstruction resolution and image details. Therefore, to further reduce storage requirements, we stored all data in float16 format, ensuring data quality while significantly reducing storage space and I/O overhead.

Data Records

The dataset is available at ScienceDB (https://doi.org/10.57760/sciencedb.17922)85. CryoCRAB comprises 152,385 sets of raw movie frames, covering 746 datasets from EMPIAR. Each EMPIAR dataset typically includes approximately 200 cryo-EM images, consisting of raw movies, motion-corrected full-diff micrographs in MRC format along with estimated background images, and preprocessed full-diff micrographs in HDF5 format (see Fig. 2). The entire dataset, including micrographs and metadata from empiar entries86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255,256,257,258,259,260,261,262,263,264,265,266,267,268,269,270,271,272,273,274,275,276,277,278,279,280,281,282,283,284,285,286,287,288,289,290,291,292,293,294,295,296,297,298,299,300,301,302,303,304,305,306,307,308,309,310,311,312,313,314,315,316,317,318,319,320,321,322,323,324,325,326,327,328,329,330,331,332,333,334,335,336,337,338,339,340,341,342,343,344,345,346,347,348,349,350,351,352,353,354,355,356,357,358,359,360,361,362,363,364,365,366,367,368,369,370,371,372,373,374,375,376,377,378,379,380,381,382,383,384,385,386,387,388,389,390,391,392,393,394,395,396,397,398,399,400,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,416,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,469,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486,487,488,489,490,491,492,493,494,495,496,497,498,499,500,501,502,503,504,505,506,507,508,509,510,511,512,513,514,515,516,517,518,519,520,521,522,523,524,525,526,527,528,529,530,531,532,533,534,535,536,537,538,539,540,541,542,543, totals approximately 12.18 TB.

Due to the substantial storage size of raw cryo-EM movie data (86.2 TB), which far exceeds the size of the micrograph portion, we have not uploaded the movie data to conserve storage resources and avoid redundancy. However, the metadata files for the micrographs include FTP paths to the original movies and gain references on EMPIAR, allowing users to download them as needed.

The file directory is organized by EMPIAR ID, with each folder containing subfolders named “micrograph” for MRC-format micrographs, “micrograph_h5” for HDF5-format micrographs, and “background” for estimated background images. Additionally, JSON-format metadata files are provided for each dataset.

Movie

The raw cryo-EM data is stored in EMPIAR, with common file formats including MRC, TIFF, and EER. Corresponding gain reference files are provided for gain correction to eliminate the impact of optical artifacts on image contrast.

Micrograph

Raw movies are processed using CryoSPARC’s Patch Motion Correction module to generate motion-corrected micrographs, with the sample background estimated via Gaussian blur subtracted. To support downstream denoising tasks, CryoCRAB further optimizes the MRC-format data by storing it as Float16-precision full-diff micrographs, achieving data compression while preserving image quality.

These micrographs have undergone CryoSPARC Patch Motion Correction but have not been processed further. They are considered raw micrographs, intended as input for single-particle analysis. Since these micrographs are stored in float16 format, they may not be properly visualized by some commonly used software tools, such as IMOD. To visualize them, we recommend using a custom approach like: (1) read the micrograph using the mrcfile library in Python, and (2) display it using matplotlib with a grayscale colormap.

Micrograph_h5

To enhance micrograph contrast and reduce the impact of high-frequency noise on foundation model training, we applied a series of preprocessing steps to the raw micrographs. These steps include frequency-domain band-limiting, CTF correction, contrast normalization, and Z-score standardization. The final images are stored in HDF5 format as chunked Float16-precision full-diff pairs, significantly improving data loading speed and optimizing the efficiency of foundation model training.

Background

To eliminate the influence of sample background on micrograph contrast, CryoCRAB uses a Gaussian blur-based algorithm to estimate and subtract the background from raw micrographs. We also provide the estimated background images, which can be used for training tasks requiring background-inclusive data or to reconstruct raw micrographs with background through simple operations. Traditional compression methods reduce the background images to a few KB, minimizing storage usage.

Metadata

CryoCRAB records the parameters and results from the processing pipeline for each movie in JSON format. The metadata includes optical parameters used during acquisition, FTP paths to the original movies and gain references on EMPIAR, as well as processed image dimensions, pixel size, storage size, and relative paths within the CryoCRAB dataset. Specifically, the metadata stores rigid motion and bending motion generated during motion correction, along with CTF estimation parameters such as defocus and astigmatism. We recommend importing CryoCRAB metadata into MongoDB and using MongoDB Compass for GUI-based querying and management to enhance usability and efficiency.

3D Reconstruction Metadata

In addition to the image-level metadata generated during image processing, CryoCRAB also provides the corresponding EMDB entry ID for each EMPIAR dataset. This information is stored in the ’cryocrab_emdb.csv’ file alongside the datasets, making it easy to access 3D reconstructions from the public EMDB74 data bank using the provided EMDB ID.

Technical Validation

Analysis of Training DRACO on CryoCRAB

To validate the correctness and the diversity of CryoCRAB data processing pipeline, we present the number of outliers for each metric in Fig. 7(a), the distributions of the seven curation metrics in Fig. 7 (b ~ h), and the distribution of dataset quality scores in Fig. 7(i). To demonstrate that CryoCRAB is of sufficient quality to support the training of cryo-EM foundation models, we trained the cryo-EM foundation model DRACO70 on CryoCRAB and validated its performance on the downstream task of Micrograph Denoising.

Fig. 7
figure 7

Qualitative evaluation of CryoCRAB’s quality on curation statistics. We demonstrate the number of outliers for each curation metric (a) and the distributions of seven curation statistics (bh), including: Median Intensity (b), Total Rigid Motion (c), Total Rigid Motion Curvature (d), CTF Fit Resolution (e), Tilt Angle (f), Astigmatism (g), and Defocus Range (h). Additionally, we show the distribution of quality scores for the dataset (i).

DRACO, trained on CryoCRAB by dividing full-even-odd micrograph triplets for simultaneous mask autoencoder (MAE)70 pretraining and N2N denoising, demonstrates robust feature extraction capabilities and strong generalization in downstream tasks such as Micrograph Denoising. The training pipeline is shown in Fig. 8. We divided all micrographs with Quality = 7 in CryoCRAB into training and validation sets, with 5 images randomly selected from each dataset to form a validation set of 3,730 images, and the remaining images forming a training set of 139,594 images. Using the same hyperparameters as in the original DRACO paper, we trained DRACO for 200 epochs on two versions of CryoCRAB: Bin1MRC, which includes only background subtraction and is stored in MRC format at the original resolution, and Bin3ÅH5, which undergoes full preprocessing (including frequency-domain truncation to 3Å) and is stored in chunked HDF5 format. The DRACO model trained on Bin1MRC is denoted as DRACO Bin1, and the one trained on Bin3ÅH5 is denoted as DRACO Bin3Å.

Fig. 8
figure 8

Qualitative evaluation of CryoCRAB’s quality on DRACO’s pre-training. (a) We illustrate the training pipeline of DRACO, which includes MAE pre-training and Noise2Noise (N2N) denoising using full-even-odd micrograph triplets, resulting in a robust cryo-EM image feature extractor with strong generalization capabilities. (b) We compare the original images and denoising results of Bin1MRC and Bin3Å data. Bin3ÅH5 data enhances low-frequency information through frequency domain downsampling to 3 Å and bandpass CTF filtering, significantly improving image contrast after denoising, making it more suitable for visual inspection tasks such as particle picking. In contrast, Bin1MRC data retains the original resolution, making it more suitable for reconstruction tasks. (c) We show the training loss curves and training time comparison of Bin1MRC and Bin3ÅH5 data on the DRACO model. The training loss of Bin3ÅH5 is significantly lower than that of Bin1MRC, and the training time is reduced by nearly 8 times, indicating that the preprocessing pipeline significantly accelerates disk I/O and improves training speed.

In Fig. 8(b), we compare the denoising performance of DRACO Bin1 and DRACO Bin3Å. It is evident that DRACO Bin3Å produces micrographs with higher contrast, as the preprocessing steps in CryoCRAB enhance low-frequency information through downsampling to 3Å and correct phase flipping caused by the CTF during imaging. This correction helps the network distinguish between background and protein particle signals more effectively. In summary, Bin3ÅH5 data is more suitable for visual inspection tasks like particle picking, where resolution is not strictly required, while Bin1MRC data, retaining the original resolution, is better suited for reconstruction-related downstream tasks.

In Fig. 8(c), we present the loss curves for DRACO Bin1 and DRACO Bin3Å during training, as well as a comparison of training times for 200 epochs. We observe that DRACO Bin3Å achieves significantly lower training loss, indicating that Bin3Å data contains richer information, enabling the network to converge to a better solution. Additionally, DRACO Bin3Å reduces training time by nearly six times compared to DRACO Bin1, demonstrating that the preprocessing pipeline in CryoCRAB significantly accelerates disk I/O during training, improving overall training efficiency.

Analysis of Even-Odd Micrographs in CryoCRAB

CryoCRAB consists of even-odd micrograph pairs, which are generated by applying consistent motion correction to the even and odd frames of cryo-EM movies. These pairs are particularly useful for cryo-EM tasks that involve noise, such as micrograph denoising73 and training denoising-reconstruction models70. Each even-odd pair contains the same signal (e.g., protein particles, vitreous ice), but the noise differs between the pairs, while remaining consistently distributed. We illustrate this by calculating the consistent signal-to-noise ratio (SNR) for the even-odd micrograph pairs in CryoCRAB from 746 datasets.

Even-odd pair modeling

Consider a framed-averaged cryo-EM micrograph with dimensions m by n. We define its 1-D flattened representation M = S + N as a vector of pixels, where the signal vector \(S \sim {{\mathbb{R}}}^{m\times n}\) and the noise vector \(N \sim {{\mathbb{R}}}^{m\times n}\) represent the underlying signal and additive noise, respectively. The noise N encompasses all types of signal-independent noise, such as detector shot noise. Since the even-odd pair contains the same signal, we represent the even micrograph as Me = S + Ne and the odd micrograph as Mo = S + No, where the only difference is the i.i.d. noise components Ne and No.

SNR of even-odd micrograph pairs

The SNR of a micrograph is defined as the ratio of the signal variance to the noise variance: \(\,{\rm{SNR}}(M)=\frac{{\rm{Var}}(S)}{{\rm{Var}}(N)}\), where Var(S) and Var(N) represent the variances of the signal and noise, respectively. Under the two key assumptions that (1) the signal and noise are independent, and (2) the noise in the even and odd frames is i.i.d., we can derive the following:

$$\,{\rm{Var}}(S)={\rm{Cov}}\,[{M}_{e},{M}_{o}]\,{\rm{Var}}(N)={\rm{Cov}}\,[M,M]-\,{\rm{Cov}}\,[{M}_{e},{M}_{o}]$$
(3)
$$\,{\rm{Var}}({N}_{e})={\rm{Cov}}\,[{M}_{e},{M}_{e}]-{\rm{Cov}}\,[{M}_{e},{M}_{o}]\,{\rm{Var}}({N}_{o})={\rm{Cov}}\,[{M}_{o},{M}_{o}]-\,{\rm{Cov}}\,[{M}_{e},{M}_{o}]$$
(4)

Here, Cov[XY] represents the covariance between two images and is computed as \(\,{\rm{Cov}}\,[X,Y]=\frac{1}{mn}{\sum }_{i}^{mn}\)\(({X}_{i}-\bar{X})({Y}_{i}-\bar{Y})\). Using this formula, we can calculate the signal-to-noise ratio (SNR) of the even and odd micrographs. For the even micrograph, the SNR is given by \(\,{\rm{SNR}}({M}_{e})=\frac{{\rm{Var}}(S)}{{\rm{Var}}\,({N}_{e})}\), and for the odd micrograph, it is \(\,{\rm{SNR}}({M}_{o})=\frac{{\rm{Var}}(S)}{{\rm{Var}}\,({N}_{o})}\).

Additionally, based on the two assumptions about cryo-EM noise, we can express the SNR of the full micrographs in an equivalent form as \(\,{\rm{SNR}}(M)=\frac{{\rm{Var}}(S)}{{\rm{Var}}(N)}\). This leads to the following equation for the overall SNR of the full micrographs:

$$\,{\rm{SNR}}\,(M)=\frac{4\,\,{\rm{SNR}}\,({M}_{e})\,\,{\rm{SNR}}({M}_{o})}{{\rm{SNR}}({M}_{e})+{\rm{SNR}}\,({M}_{o})}$$
(5)

To analyze the relationship between even and odd micrographs, we examined 7,430 Bin1MRC full-diff pairs and generated an even-odd SNR scatter plot (Fig. 9(a)). The data revealed a linear relationship described by SNR(Me) ≈ kSNR(Mo) with k = 1.01. From this observation, we draw two conclusions: (1) because of the radiation damage caused by the electron beam, the smaller numbered frames should have a higher signal-to-noise ratio. Even micrographs demonstrate consistently higher SNR values compared to their odd counterparts, indicating a systematic quality difference between these subsets. This consistent bias (approximately 2.4%) likely results from beam-induced specimen damage during the acquisition sequence. (2) despite this quality differential, the strong linear correlation between even and odd micrograph SNRs confirms that both subsets preserve comparable structural information, thus supporting the established practice of utilizing even-odd pairs for effective micrograph denoising.

Fig. 9
figure 9

Quantitative evaluation of CryoCRAB’s SNR distribution. (a) Scatter plot comparing the SNR of even and odd micrographs, demonstrating a consistent linear relationship with similar SNR values for both. (b) Scatter plot for full micrographs vs. even-odd micrographs, illustrating that full micrographs have a higher SNR, approximately twice the value of even-odd micrographs. (c) Histograms of the SNR (in dB) distribution for full, even, and odd micrographs, showing that the SNR of cryo-EM micrographs follows a Gaussian distribution, where SNR in dB is calculated by \({{\rm{SNR}}}_{db}=10{\log }_{10}({\rm{SNR}})\).

SNR relation between full and even-odd micrographs

Raw cryo-EM images are captured using high-speed (~100 FPS) direct detector devices (DDD) as stacks of frames. Full micrographs are generated by averaging all motion-corrected frames, while even-odd micrographs utilize only half of the frames (either even or odd frames). Due to this difference in frame averaging, noise is reduced approximately twice as much in full micrographs compared to even-odd micrographs. To quantify this relationship, we analyzed SNR scatter plots for full-even and full-odd micrograph pairs as shown in Fig. 9(b). The plots demonstrate that full micrographs consistently exhibit an SNR that is approximately twice (one time higher than) that of even-odd micrographs, which aligns with theoretical expectations based on noise reduction properties of frame averaging.

Gaussian estimation of cryo-EM SNR distribution

We present three histograms of the SNR for CryoCRAB full-even-odd micrographs in Fig. 9(c). The histograms show that the SNR of cryo-EM micrographs can be well approximated by the Gaussian distribution, where \(\,{\rm{SNR}}(M) \sim {\mathcal{N}}(\mu ,\sigma )\), \(\,{\rm{SNR}}\,\,({M}_{e}) \sim {\mathcal{N}}({\mu }_{e},{\sigma }_{e})\) and \(\,{\rm{SNR}}({M}_{o}) \sim {\mathcal{N}}({\mu }_{o},{\sigma }_{o})\). This Gaussian behavior is consistent across all three micrograph types, with the full micrographs exhibiting a higher mean SNR value (μ) compared to even (μe) and odd (μo) micrographs. The standard deviations (σ, σe, σo) exhibit minimal variation due to the properties of the logarithmic transformation applied on raw SNR.