Fig. 1: Analysis pipeline of four GATA1 ChIP-seq datasets on primary erythroblasts.

Raw sequencing data were extracted from publicly available databases (n = 4) and subjected to the above analysis pipeline adapted from nf-core for ChIP-seq analysis. Functions for each analysis step are shown next to the corresponding arrows along with the software/packages used (italicized, blue text). The format of the files processed in each step is noted in the flowchart (white text on blue background). Background colors represent the four major steps in the data processing. After all analysis and data filtering steps, 193 GATA1 binding sites were found across 33 blood group-related genes, including 6 for CR1. *The pipeline predicted 193 sites with peaks that overlapped in at least two datasets including the reference dataset (defined as the dataset containing the most peaks), 156 sites with overlapping peaks in at least three datasets including the reference, and finally 114 sites that were found in all four datasets.