Fig. 2: Single-cell DE methods are biased towards highly expressed genes.
From: Confronting false discoveries in single-cell differential expression

a Schematic illustration of the creation of ‘pseudobulks’ from single-cell data. Top, biological replicate from which each cell was obtained. Bottom, simulated gene expression matrix. Read counts for each gene are aggregated across cells of a given type within each biological replicate. b Mean AUCCs across eighteen ground-truth datasets after dividing the transcriptome into terciles of lowly, moderately, or highly expressed genes. c Mean expression levels of the 100 top-ranked false-positive genes from each DE method. d Spearman correlation between the mean expression of 80 ERCC spike-ins expressed in at least three cells and the –log10 p-value of differential expression assigned by each DE method. e Scatterplots of mean ERCC expression vs. –log10 p-value for exemplary single-cell and pseudobulk DE methods. Trend lines and shaded areas show local polynomial regression and the 95% confidence interval, respectively. f Mean expression levels of the 200 top-ranked genes from each DE method in a collection of 46 scRNA-seq datasets.