Extended Data Fig. 5: Factors influencing the number of significant eRegulons identified by scMORE.

a. Total number of significant eRegulons (summed across all cell types) as a function of the number of identified cell types per dataset. The two-sided Wilcoxon test was used to assess statistical significance. Boxplot shows the median (center line), interquartile range (IQR, box) and 1.5×IQR bounds (whiskers). Minima and maxima are represented by the whiskers. b. Average number of significant eRegulons per cell type in each dataset, shown across datasets grouped by the total number of identified cell types. The two-sided Wilcoxon test was used to assess statistical significance. Boxplot shows the median (center line), interquartile range (IQR, box) and 1.5×IQR bounds (whiskers). Minima and maxima are represented by the whiskers. c. Number of significant eRegulons across datasets with varying total cell numbers (1 K, 5 K, 10 K, 15 K, 20 K, and 25 K). For Panel A-B, we extracted matched cell types from PBMC single-cell multiomc dataset (n = 10,554 cells) to construct four datasets with cell type counts of 2 (ct2), 4 (ct4), 6 (ct6), 8 (ct8)), each with five random replicates (totaling 20 datasets). For Panel C, we selected the top 10 most abundant cell types from PBMC single-cell multiomics dataset while maintaining their original proportions, constructing six datasets with varying total cell numbers of 1 K, 5 K, 10 K, 15 K, 20 K, and 25 K. We then applied scMORE to integrate GWAS of 10 blood cell traits with these 20 cell-type count-varied and 6 cell number-varied single-cell datasets to identify significant trait-relevant eRegulons. The two-sided Wilcoxon test was used to assess statistical significance. Boxplot shows the median (center line), interquartile range (IQR, box) and 1.5×IQR bounds (whiskers). Minima and maxima are represented by the whiskers. d. Impact of GWAS sample size on the number of significant eRegulons across cell types. Boxplot show the number of significant scMORE eRegulons identified across seven brain cell types (AS, OPC, EC, T, MG, N, and ODC) using Parkinson’ disease (PD) GWAS summary statistics with varying samples (2 K, 6 K, 20 K, 100 K, and 500 K). Each colored dot represents the result from a specific sample size. Error bars indicates standard deviation across eRegulons identified at each sample size. Boxplot shows the median (center line), interquartile range (IQR, box) and 1.5×IQR bounds (whiskers). Minima and maxima are represented by the whiskers.