Extended Data Fig. 1: ScRNA-seq data collection and processing overview. | Nature Cancer

Extended Data Fig. 1: ScRNA-seq data collection and processing overview.

From: Spatiotemporal analyses of the pan-cancer single-cell landscape reveal widespread profibrotic ecotypes associated with tumor immunity

Extended Data Fig. 1

Related to Fig. 1. (a) Workflow of data collection and processing, including quality control, malignant cell identification, batch effect removal, clustering, and annotation. (b) Quality control for BRCA_GSE148673 dataset: High-quality cells (blue) are defined as those with more than 1000 UMI counts and 500 gene counts, while low-quality cells (red) fall below these thresholds. (c) Doublet identification for the BRCA_GSE148673 dataset using Scrublet: Doublets were highlighted in red. (d) Malignant cells identification in the BRCA_GSE148673 dataset by CopyKat: Copy number variations are indicated as gain (red) and loss (blue), with the left bar designating malignant cells (in orange) and non-malignant cells (in green). (e) Entropy distribution measuring batch effects across 21 datasets, including 146 patients with associated batch information. In each box (dataset), entropy was computed for each cell, based on the patient distribution within its neighborhood (30 nearest neighbors). The datasets were classified into two types, ‘with batch effects’ and ‘without batch effects’ according to entropy median (0.7 as the cutoff). The entropy of raw data and batch-removed data are colored by green and orange, respectively. The bottom of the box represents the first quartile (Q1), and the top of the box represents the third quartile (Q3). The height of the box represents the interquartile range (IQR), while the horizontal line inside the box indicates the median. The whiskers extend to the positions of Q1 - 1.5 * IQR and Q3 + 1.5 * IQR. (f) Batch effect removal for BRCA_GSE114727_10X dataset: The panel on the left displays cells across patients before batch effect removal, while the panel on the right showcases the same cells post-batch removal. (g) UMAP visualization of BRCA_GSE148673 dataset: Clustering and cell type identification are visualized using UMAP, with distinct colors representing clusters and cell types. (h) Dot plot showing the expression of representative signature genes for each cell type in the BRCA_GSE148673 dataset.

Source data

Back to article page