Fig. 1: Silhouette’s assumptions are not met in data integration contexts. | Nature Biotechnology

Fig. 1: Silhouette’s assumptions are not met in data integration contexts.

From: Shortcomings of silhouette in single-cell integration benchmarking

Fig. 1

a, Silhouette was designed to select a suitable cluster number for a single embedding, with cluster membership resulting from unsupervised algorithms2. bd, In data integration, we compare distinct embeddings and assign cluster membership by external labels: cell type (b,c) or batch (d). b, Silhouette’s bias for compact, spherical clusters does not reflect integration quality. c, Label-based clusters can have irregular shapes, violating silhouette’s assumptions and yielding unreliable scores. d, Silhouette’s focus on nearest neighboring clusters misses remaining batch effects if samples are partially integrated, limiting its sensitivity. All data shown are 2D simulated examples.

Back to article page