Fig. 2: Patterns of the simulation study.

a True log-scale mean expression levels for each cell type α + β. Each row represents a gene, and each column corresponds to a cell type. The intrinsic genes that are differentially expressed between cell types can have high, medium high, median low or low expression levels. b True batch effects. Each row represents a gene, and each column corresponds to a batch. c True underlying expression levels X. Each row represents a gene, and each column corresponds to a cell. The upper color bar indicates the batches, and the lower color bar represents the cell types. There are a total of 3000 genes. The sample sizes for each batch are 300, 300, 200, and 200, respectively. d The simulated observed data Y. The overall dropout rate is 27.3%, whereas the overall zero rate is 50.8%. e The BIC plot. The BIC attains the minimum at K = 5, identifying the true cell type number. f The estimated log-scale mean expression levels for each cell type \(\widehat{{\boldsymbol{\alpha }}}+\widehat{{\boldsymbol{\beta }}}\). g Estimated batch effects. h Imputed expression levels \(\widehat{{\bf{X}}}\). i Corrected count data \(\widetilde{{\bf{X}}}\) grouped by batches. j Scatter plot of the estimated versus the true cell-specific size factors.