Fig. 3: Multi-round cluster analysis on MIHPF dataset composed of more than one million cells.

a Confusion matrix of the first-round CDC result with high recall score (k = 30, ratio = 0.99), where the transcriptomic taxonomy tree and labels of classes and subclasses are from Yao et al. 202142. b Confusion matrices of the CDC results through the second and third rounds of clustering optimization. c The published cell type annotation of MIHPF. d, e Clustering accuracies reported by ARI score of K-means, SNN-Louvain and CDC using single-round clustering and three rounds of clustering optimization respectively, where SNN-Louvain only conduct clustering two rounds because of the iteration criterion. Boxes show the median and the 25–75% range, while whiskers refer to the 1.5 times interquartile range. f Clustering results of CDC in 2D UMAP space through three rounds of clustering optimization.