Fig. 2: Clustering performances on scRNA-seq and CyTOF datasets. | Nature Communications

Fig. 2: Clustering performances on scRNA-seq and CyTOF datasets.

From: Clustering by measuring local direction centrality for data with heterogeneous density and weak connectivity

Fig. 2

a Overview of the standard preprocessing pipeline for scRNA-seq clustering35. First, the raw Unique Molecular Identified (UMI) matrices are input, and the unreliable cells are removed through the quality control. The feature expression of each cell is then normalized to correct the differences of read counts. Next, the Highly Variable Genes (HVG) are selected according to their Average Expression (AE) and Variable Coefficient (CV). PCA is used to reduce the feature dimensions. b Clustering accuracy reported by ARI of 15 clustering algorithms conducted in PCA and UMAP spaces on nine scRNA-seq datasets, where the red points denote the max ARI scores and the white ones refer to the average ARI scores. CLARA-PCA, DIANA-PCA, AGNES-U2, AGNES-PCA are not applicable on VISp and TM. hclust-U2, hclust-PCA and DIANA-U2 are not applicable on TM. Boxes show the median and the 25–75% range, while whiskers refer to the 1.5 times interquartile range. c A Sankey diagram shows the match between the CDC-U2 results and the published cell type annotation on BM. d The t-distributed Stochastic Neighbor Embedding (t-SNE)63 plots present that the best CDC-U2 results are almost identical to the ground truth (GT). e The identification accuracy of different cell types of ALM by CDC-U2, SC3, SNN-Louvain-U2, Kmeans-U2. f Performances of 16 clustering algorithms on two CyTOF datasets, Levine39 and Samusik40 datasets, including CDC-U2, ACCENSE64, ClusterX65, DensVM66, FLOCK67, flowClust68, flowMeans69, flowMerge70, flowPeaks71, flowSOM72, immunoClust73, K-means, MeanShift74, PhenoGraph39, Rclusterpp75, and SamSPECTRAL76, where flowMerge is not applicable on Levine.

Back to article page