Extended Data Fig. 1: Self-projection accuracy comparison between the ground truth annotation and the clustering with under-clustering or over-clustering. | Nature Methods

Extended Data Fig. 1: Self-projection accuracy comparison between the ground truth annotation and the clustering with under-clustering or over-clustering.

From: Putative cell type discovery from single-cell gene expression data

Extended Data Fig. 1

This test measures the self-projection accuracy on three conditions: 1) the “ground truth” clustering as annotated by human experts (marked as ‘correct-clustering’); 2) over-clustering and 3) under-clustering. The violin plots on the left column show the self-projection accuracy distributions (of both cross-validation as red and on the test set as green) for these three conditions in all the datasets by repeating the random sampling 100 times. These plots demonstrate that the “ground truth” clustering corresponds to the highest self-projection accuracy in almost all cases. According to the test results on the datasets: Hrvatin(48,266 cells), Tasic2018 (21,874 cells), Shekhar (26,830 cells), Segerstolpe (2,108 cells), Zeisel (3,005 cells), Baron (Mouse, 1,886 cells) and Baron (Human, 8,199 cells), it is possible to identify the best clustering using self-projection as the clustering consistency test. As for any classifier, it is always easier to perform well on fewer clusters. Thus if two clusterings show a similar level of self-projection accuracy, for example, Baron (Mouse), the clustering with more clusters should be chosen for consideration.

Source data

Back to article page