Fig. 2: Expression probability model parameterized by UMI counts per cell.
From: scPower accelerates and optimizes the design of multi-sample single cell transcriptomic studies

a The expression probabilities for genes in pseudobulk of a newly planned experiment are estimated based on the expression prior and the planned experimental parameters. For this, the expression prior is derived from the mean and dispersion parameters of gene-wise negative binomial distributions fitted from a matching pilot data set. b Using this approach, the number of expressed genes expected under our model (dashed line) closely matches the observed number of expressed genes (solid line) dependent on the number of cells per cell type (cell type indicated by point symbol) for one batch of the training PBMC data set (Supplementary Table S2). The data is subsampled to different read depths (indicated by colour). The r2 values between estimated and expressed genes were highly significant for both expression thresholds. c The model performed similarly well for the three batches of an independent validation PBMC data set47. Used expression threshold: count > 10 (right panels of b, c) or count > 0 (left panels of b, c) in more than 50% of the individuals.