Fig. 2 | Scientific Data

Fig. 2

From: A large-scale curated and filterable dataset for cryo-EM foundation model pre-training

Fig. 2

Overview of CryoCRAB Dataset. The crucial processing steps includes EMPIAR crawling, motion correction, CTF estimation, micrograph curation, and pre-processing. (a) We crawl file path information and experimental metadata from the EMPIAR database and download the curated movies and gain files. (b) We perform gain correction and motion correction for movies to obtain two types of motion annotations, full-diff micrograph pairs and background estimates. (c) We perform CTF estimation for micrographs to estimate CTF parameters such as defocus value, astigmatism, and phase shift. (d) We curate the processed images based on median intensity, rigid motion statistics, and CTF estimation statistics, which classify the quality of images from 0 to 7. (e) We propose a cryo-EM micrograph pre-processing pipeline to transform the images into the input format required for pre-training models by background subtraction, band-limit CTF filtering, contrast normalization and Z-score standardization.

Back to article page