Extended Data Fig. 4: CellSpace’s embedding implicitly mitigates donor- and assay-specific batch effects in large-scale scATAC-seq datasets. | Nature Methods

Extended Data Fig. 4: CellSpace’s embedding implicitly mitigates donor- and assay-specific batch effects in large-scale scATAC-seq datasets.

From: Scalable and unbiased sequence-informed embedding of single-cell ATAC-seq data with CellSpace

Extended Data Fig. 4

a. Batches and human donors are well mixed in the CellSpace embedding of the large human hematopoietic dataset (visualized in Fig. 4b). b. CellSpace embedding of the large human hematopoietic dataset restricted to 30,211 natural killer and T cells. c. CellSpace embedding of 37,818 cells from a basal cell carcinoma TME scATAC-seq dataset from 7 patients, annotated by cell type and by donor, recovers immune and stromal cell types with no evident donor batch effect. d. Performance metrics (aggregated biological conservation score, aggregated batch correction score, and overall score) for all methods on the large human hematopoietic and TME datasets, excluding the tumor clusters, with 95% confidence intervals over 1000 bootstrap samples. For each metric, all methods were compared in pairwise, two-sided tests on the bootstrapping samples, under the null hypothesis that the score difference is zero. The p-value for each comparison was computed using confidence interval inversion, and the values were FDR-adjusted across all comparisons. Only FDR-adjusted p-values comparing CellSpace to other methods are shown; *: adjusted p < 0.05; **: adjusted p < 0.01. e. Seurat’s SNN-based clustering after CellSpace joint embedding of the (single-modal) scATAC-seq and the scATAC-seq readout of the multiome human cortex datasets. f. Membership of annotated cell types from multiome and (single-modal) scATAC-seq human cortex datasets in CellSpace clusters as shown in e, after joint embedding, showing coherent clusters with membership from both assays.

Back to article page