Extended Data Fig. 1: Persistent batch effect when integrating other published hematopoiesis single cell RNA studies. | Nature Methods

Extended Data Fig. 1: Persistent batch effect when integrating other published hematopoiesis single cell RNA studies.

From: The dynamics of hematopoiesis over the human lifespan

Extended Data Fig. 1

(a) Schematic of ages of human samples profiled by nine other human hematopoiesis single cell RNA-sequencing (scRNA-seq) studies with accessible read count data and samples profiled in our current study. Each dot denotes a single point in developmental time captured by each study. The nine combined previously published studies span from four weeks gestation through age 87 and our current study spans from 10 weeks’ gestation through age 77. (b) Assessment of integrated scRNA-seq datasets obtained by batch correction using fastMNN, Harmony, Scanorama, scVI, or Seurat. Displayed are a uniform manifold approximation and projection (UMAP), k-nearest neighbor batch effect test (kBET), and lifetime dynamics of lymphoid contribution to total hematopoiesis for the datasets generated using each of the five batch correction methods. For UMAP panels, Louvain clustering was performed, and progenitor cell types were identified using marker gene analysis and SingleCellNet random forest classification against hematopoietic lineage cell type profiles21. For kBET panels, box and whisker plots denote kBET rejection rate, where a lower rejection rate indicates fewer persistent batch effects (that is replicates are well-mixed). The ideal/expected value represents the expected rejection rate if batch correction ideally mixes the collection of integrated studies, while the observed value represents the rejection rate after employing the indicated batch-correction method. For lymphoid contribution to hematopoiesis panels, for each dataset resulting from the indicated batch correction methods, the fraction of hematopoietic progenitor cells identified as lymphoid progenitor cells was identified for each sample and a LOESS fit was applied to generate a curve over the time course of all samples. Note that none of the five methods yield a batch-corrected dataset where there is a spike in lymphoid contribution to hematopoiesis during childhood.

Back to article page