Table 1 Type I error rates of some currently applied tools in single-cell analyses. Type I error rates of ten different methods under twenty different conditions and a significance threshold of p < 0.05. In all, 250,000 iterations were computed to obtain an error rate for each method. The inflated type I error rates computed with mixed models at the lower number of individuals per group are a consequence of the two-part hurdle model simultaneously testing two hypotheses and an overabundance of subsampling with small sample sizes. Type I error rates are well-controlled for with mixed models and pseudo-bulk methods, while type I error rates increase with other methods as additional independent samples or more cells are added. Pseudo-bulk methods are overly conservative. Confidence intervals (95%) are included in the enlarged version of this table (Supplementary Table 1).

From: A practical solution to pseudoreplication bias in single-cell studies

Nind

Ncells

Two-part hurdle

Tweedie

GEE1

Pseudo-bulk

Tobit

Modified t

Default

Corrected

RE

GLMM

GLM

Mean

Sum

5

50

0.561

0.637

0.069

0.082

0.340

0.114

0.023

0.035

0.353

0.400

100

0.677

0.719

0.064

0.084

0.463

0.110

0.022

0.032

0.471

0.510

250

0.798

0.778

0.066

0.083

0.609

0.103

0.023

0.028

0.628

0.644

500

0.862

0.803

0.065

0.081

0.705

0.104

0.023

0.026

0.725

0.718

10

50

0.563

0.611

0.055

0.064

0.350

0.076

0.024

0.021

0.345

0.397

100

0.689

0.718

0.053

0.065

0.462

0.077

0.024

0.020

0.470

0.502

250

0.810

0.793

0.049

0.064

0.610

0.074

0.022

0.019

0.624

0.635

500

0.875

0.827

0.049

0.061

0.705

0.073

0.021

0.018

0.722

0.717

20

50

0.562

0.606

0.051

0.056

0.344

0.063

0.024

0.016

0.343

0.393

100

0.687

0.705

0.048

0.056

0.459

0.064

0.024

0.014

0.466

0.503

250

0.817

0.805

0.042

0.058

0.610

0.060

0.022

0.011

0.619

0.637

500

0.884

0.844

0.042

0.055

0.705

0.062

0.021

0.010

0.720

0.716

30

50

0.563

0.604

0.053

0.054

0.341

0.058

0.025

0.013

0.344

0.395

100

0.691

0.698

0.049

0.056

0.463

0.058

0.025

0.012

0.469

0.504

250

0.818

0.803

0.044

0.055

0.608

0.057

0.022

0.010

0.624

0.636

500

0.886

0.853

0.041

0.055

0.707

0.058

0.022

0.009

0.719

0.706

40

50

0.561

0.602

0.051

0.054

0.345

0.055

0.025

0.013

0.340

0.393

100

0.689

0.699

0.049

0.053

0.455

0.055

0.026

0.012

0.467

0.502

250

0.820

0.803

0.044

0.053

0.607

0.053

0.022

0.010

0.622

0.639

500

0.888

0.856

0.042

0.053

0.704

0.054

0.022

0.008

0.721

0.713

  1. Default denotes MAST was implemented without random effects, RE denotes random effects, Corrected denotes data were batch-corrected for individual with ComBat prior to analysis without using individual as a random effect, GLM denotes generalized linear model, and GLMM denotes generalized linear mixed-effects model.
  2. Two-part hurdle model as implemented in MAST, Tweedie distribution as implemented in “glmmTMB”, GEE1 as implemented in “geepack”, Pseudo-bulk averaged or summed across cells within an individual and was implemented in DESeq2, Modified t as implemented in ROTS, and Tobit as implemented in Monocle.