Fig. 2: Unsupervised cluster analysis of TP53 missense variants based on cellular TP53 functional features.

a Matrix correlation plots comparing scaled data (Z-score normalization) TP53 mutagenesis datasets covering the entire protein (LOF loss-of-function, DDR DNA damage repair, DN dominant negative activity, TA transcriptional activity)13,14. Variants are colour-coded based on the functional domain they reside in: NTD N-terminal domain (red), DBD DNA-binding domain (black), OD oligomerization domain (cyan), CTD C-terminal domain (purple). Linear relationships between functional screens were assessed using Pearson’s correlation coefficient (r). b Principal component analysis (PCA) and unsupervised k-means clustering performed with TP53 mutagenesis cellular functional assay measurements. c Uniform manifold approximation and projection (UMAP) performed using TP53 mutagenesis cellular functional assay measurements and colour-coded based on PCA k-means clustering (N = 200, D = 0.4). d Heatmap displaying the codon frequencies and distributions of TP53 variant clusters. Red arrowheads indicate variant hotspots in cluster 5. PRR = proline-rich region. e–h Violin plots comparing the functional consequences of variants within each cluster. Sample sizes: cluster 1 (n = 791), cluster 2 (n = 419), cluster 3 (n = 381), cluster 4 (n = 448), cluster 5 (n = 269). P-values on plots were calculated using Kruskal-Wallis tests. Two-tailed Mann-Whitney U tests were used for pairwise comparisons (****p < 0.0001). Source data are provided as a Source Data file.