Abstract
Li-Fraumeni syndrome (LFS) has recently been redefined as a ‘spectrum’ cancer predisposition disorder to reflect its broad phenotypic heterogeneity. This variability is thought to stem in part from the diverse functional impacts of TP53 variants, although the underlying mechanisms remain poorly understood and there is an unmet clinical need for effective risk stratification. Here, we apply unsupervised clustering to functional datasets and identify distinct TP53 variant groups with clinical relevance, including a monomeric subgroup enriched in osteosarcoma cases. In cellular validation assays, dermal fibroblasts from carriers of more functionally impaired variants exhibit increased metabolic growth rates, mirroring trends observed in cluster-stratified clinical outcomes. These findings demonstrate the feasibility of developing diagnostic assays to guide personalized cancer risk assessment. More broadly, our results show that nuances in TP53 dysfunction shape the germline TP53-related cancer susceptibility spectrum and provide a framework for functionally delineating variant carriers.
Similar content being viewed by others
Introduction
Inherited cancer susceptibility in Li-Fraumeni Syndrome (LFS; MIM #151623) was first described in 1969 based on the observation of several families with a high incidence of rare and diverse cancer types1. Since this initial observation, germline TP53 variants were identified as the underlying cause of LFS2,3 and the clinical classification has been adapted by multiple revised sets of criteria4,5,6,7 to incorporate individuals who do not fit the strict ‘classic LFS’ diagnosis8. Recently, LFS was reclassified as a ‘spectrum’ of disease rather than a syndrome to reflect the heterogeneity that has become increasingly evident through genotype-phenotype association studies9. The LFS spectrum encompasses a highly variable age-of-onset with nearly 30% affected during childhood, largely unpredictable sites of tumor manifestation, and inconsistent cancer penetrance. To date, the factors causing this heterogeneity in LFS remain unclear, although unequal TP53 variant functional consequences have been linked to different cancer phenotypes10,11,12.
Increased use of DNA sequencing in the clinic has led to the discovery of hundreds of germline variants that span the entire TP53 gene. Missense mutations leading to single amino acid changes represent the vast majority of TP53 variants that are associated with LFS. There are thousands of possible single amino acid substitutions in TP53, and currently, at least 1,703 different germline variants have been linked to individuals with cancer, as catalogued by the National Cancer Institute’s (NCI) TP53 Database (https://tp53.isb-cgc.org). However, research efforts have focused on variants at a few “hotspot” residues that account for only 30% of germline TP53 missense variants found in patients, leaving the majority largely uncharacterized and many still considered variants of uncertain significance (VUS). Therefore, it is of great interest to classify all variants in order to provide more accurate cancer risk assessments with tailored surveillance strategies and treatment plans.
Mutagenesis studies are an important clinical tool used to assess TP53 variant pathogenicity, however, it is often unclear without substantial clinical evidence that is difficult to obtain from predominantly rare and infrequent variants. The results from large-scale mutagenesis experiments have demonstrated a broad range of functional consequences associated with different variants, ranging from near wild-type (WT) TP53 capabilities to severe loss-of-function (LOF)13,14,15. As a multifaceted protein, TP53 is composed of several domains containing complex structured and unstructured regions that enable a myriad of cellular tasks. Primarily operating as a transcription factor, TP53 binds to DNA recognition sequences as a tetramer and regulates the expression of several thousand target genes to orchestrate cellular responses centering around DNA-damage repair (DDR) and cell fate decisions16. Hence, variants arising at different locations of the protein can impart different cellular consequences by altering its transcriptional activity (TA) as well as its non-TA. Some variant TP53 species can also exert a dominant-negative (DN) effect over WT TP5317,18. This is an important feature because LFS and germline TP53-related cancer predisposition follows an autosomal dominant inheritance pattern and most patients are heterozygous carriers who retain one copy of WT TP53. Moreover, TP53 variant LOF can vary under different cellular conditions. For example, the R337H variant—the most frequently reported germline variant due to a genetic founder event in Brazil—has a conditional pH-dependent LOF and is associated with the tissue-specific development of adrenocortical carcinoma (ACC)19,20. Thus, the functional assessments derived from comprehensive variant library screens conducted in varying cellular contexts provide clinically relevant measurements of specific TP53 activities.
Here, we incorporated TP53 variant library screens using an unsupervised cluster analysis and uncovered five functionally distinct variant clusters (Fig. 1). When applied to germline variant carriers from the NCI TP53 Database and a validation dataset, we identified unique cancer patterns and significant differences between ages at cancer diagnosis. Additionally, assessment of variants in cancer cell lines and primary patient-derived healthy skin and blood samples recapitulated the clinical observations. Our variant clustering model provides a classification tool for the risk stratification and consideration for variant interpretation of TP53 variant carriers.
Results
Unsupervised learning identifies five clusters of TP53 variants based on cellular functional features
Various functional features of TP53 have been evaluated using TP53 saturation mutagenesis screens conducted with different cellular systems and drug treatments. These features include TA, DN activity, LOF, DDR, and proliferation13,14,15. Each study has independently reported a wide-ranging functional gradient using separate variant libraries. In a preliminary analysis, we compared the results of these large-scale assays and observed weak to moderately strong inter-assay correlations (Fig. 2a and Supplementary Fig. 1a, b). We found that the correlations were influenced by the variant locations by examining the assay results in relation to the resident functional domains. Strong correlations were observed between variants within the DNA-binding domain (DBD), which displayed a broad gradient of functionality across the assays. In contrast, these relationships were weak or absent for C-terminal domain (CTD) variants, in which most scores indicate the retention of all functional activities with little diversity. Variants in the N-terminal domain (NTD) and oligomerization domain (OD) exhibited a functional gradient similar to DBD variants, showing weak to moderate linear relationships when comparing between LOF, DDR, and TA. However, DN activity was predominantly restricted to DBD variants. These findings highlight the phenotypic array and domain-specific features of TP53 variants, prompting the integration of multiple assays to better capture the complex interplay of these diverse functional effects.
a Matrix correlation plots comparing scaled data (Z-score normalization) TP53 mutagenesis datasets covering the entire protein (LOF loss-of-function, DDR DNA damage repair, DN dominant negative activity, TA transcriptional activity)13,14. Variants are colour-coded based on the functional domain they reside in: NTD N-terminal domain (red), DBD DNA-binding domain (black), OD oligomerization domain (cyan), CTD C-terminal domain (purple). Linear relationships between functional screens were assessed using Pearson’s correlation coefficient (r). b Principal component analysis (PCA) and unsupervised k-means clustering performed with TP53 mutagenesis cellular functional assay measurements. c Uniform manifold approximation and projection (UMAP) performed using TP53 mutagenesis cellular functional assay measurements and colour-coded based on PCA k-means clustering (N = 200, D = 0.4). d Heatmap displaying the codon frequencies and distributions of TP53 variant clusters. Red arrowheads indicate variant hotspots in cluster 5. PRR = proline-rich region. e–h Violin plots comparing the functional consequences of variants within each cluster. Sample sizes: cluster 1 (n = 791), cluster 2 (n = 419), cluster 3 (n = 381), cluster 4 (n = 448), cluster 5 (n = 269). P-values on plots were calculated using Kruskal-Wallis tests. Two-tailed Mann-Whitney U tests were used for pairwise comparisons (****p < 0.0001). Source data are provided as a Source Data file.
We employed a principal component analysis (PCA) for visualization and dimensionality reduction of the functional features measured by mutagenesis screens covering the TP53 protein. At least two spatially separated clusters were visible within a clearly disjointed distribution of data points (Supplementary Fig. 2a). Variants located in the same domain demonstrated a tendency to cluster together, with minimal spatial separation between domains, except for a distinct subgroup of DBD variants (Supplementary Fig. 2b). Notably, each of the cancer-associated hotspot variants as well as monomeric TP53 variants were grouped in close proximity, respectively (Supplementary Fig. 2b). Next, unsupervised k-means clustering was performed with five predicted clusters (Fig. 2b) as determined by the gap statistic method (Supplementary Fig. 2c). The gap statistic was chosen because it is a robust and subjective method that uses statistical testing to determine the ideal number of clusters. Unlike the elbow method, which relies on visual identification of an inflection point, and the silhouette method, which can be sensitive to noise or non-ideal data distributions, the gap statistic compares the clustering structure to that of a null model, reducing the risk of overfitting. To validate this with a non-linear approach, we applied uniform manifold approximation and projection (UMAP) which demonstrated congruent results to the PCA with a similar spatial distribution of data points with one clearly isolated cluster of DBD variants (Fig. 2c).
TP53 variant clusters have distinct signatures affecting different structural and functional characteristics
By mapping the clusters onto the TP53 coding sequence, we identified unique variant signatures (Fig. 2d). First, variants in cluster 1 were found throughout the protein and broadly coincide with intrinsically disordered regions (IDRs) predicted by the IUPred3 computational method, which are regions expected to be more tolerant to alterations21 (Fig. 2d and Supplementary Fig. 3a). In contrast, clusters 2 and 3 were centered within the highly structured DBD. We found that cluster 3 variants were particularly enriched within helix 2 (H2) of the loop-sheet-helix motif (L1/S/H2) that binds to the major groove of DNA. DNA contact sites were affected most frequently in cluster 3, although interestingly, a variety of different structural/functional elements were impacted in this cluster that also encompassed all hotspot variants. Conversely, cluster 2 variants were primarily located in regions distal to the DNA contact sites (Supplementary Fig. 3b, c). Cluster 4 variants were distinctively concentrated at the CTD—another IDR with various roles implicated in regulating TP53 activities through post-translational modifications (Fig. 2d)22. Lastly, cluster 5 had a wide variant distribution with three sites of markedly increased variant frequencies located in structured regions known to impact protein stability (Fig. 2d). These three loci were found within TAD1 (transactivation domain 1), OD, and an NTD hinge region (residues 86-93) associated with stabilizing the TP53 tetramer through its interaction with the DBD23. Specifically, the residues affected most frequently in cluster 5 are involved in binding to the main TP53 negative regulator MDM2 (residues F19, W23, and L26)24, DBD stabilization (W91)23, TP53 oligomerization (I332, R337, and F341), and transactivation (F19, L22, W23, and L26)25. Further, we found a significant difference between the expected frequency of proline substitutions, which tend to be more structurally disruptive changes, when comparing between the clusters (p < 0.0001, chi-squared test). Cluster 5 contains the most substitutions involving proline, even after excluding the proline-rich region (PRR; residues 60–90) where several cluster 5 variants are located (Supplementary Fig. 3d). Clusters 3 and 5 both contain a higher proportion of substitutions to proline compared to clusters 1, 2, and 4, suggesting that variants in clusters 3 and 5 may confer greater structural impairments (Supplementary Fig. 3e).
Grantham’s distance score assessment reveals that cluster 3 has significantly higher scores than all other variant clusters (p < 0.0001, Mann-Whitney U test, Supplementary Fig. 3f), indicating a greater evolutionary distance and suggesting more deleterious effects for amino acid substitutions within this cluster. Post-clustering analyses of the functional datasets, which were used as input features for the PCA, align with the structural evaluations, further supporting the observed cluster distinctions. Variants in cluster 3 display the most severe functional consequences across cellular assays, characterized by a high degree of DN activity (Fig. 2e–h). By contrast, clusters 1 and 4 predominantly consist of variants with hypermorphic or near-WT functionality, consistent with their association to IDRs. Cluster 2 variants are hypomorphic, exhibiting partial TA, WT-like DDR, and minimal DN potential. Lastly, cluster 5 variants show high LOF and impaired DDR, however, most variants in this cluster retain TA and lack DN activity.
TP53 variant clusters display differing associations with cancer
The TP53 Variant Curation Expert Panel (VCEP) plays an important role in evaluating molecular and clinical evidence to provide variant interpretations accessible through ClinVar (https://clinicalgenome.org/affiliation/50013)26. VCEP uses variant functional information, including the mutagenesis assays used in our variant clustering model, in addition to clinical and population data to determine variant pathogenicity. However, TP53 variants can be exceedingly rare and often lack sufficient clinical evidence to establish a clear association with cancer—particularly those with reduced penetrance or adult onset (attenuated phenotypes)27. Up to date, 109 of 2,314 potential missense variants within the gene have undergone evaluation, and still, 24 of these remain VUS. We compared the VCEP annotated variants to our unsupervised clustering model to examine the clinical associations of each variant cluster (Fig. 3a). A remarkable division was observed between pathogenic/likely pathogenic (P/LP; p < 0.0001, chi-squared test) and benign/likely benign (B/LB; p < 0.0001, chi-squared test), with cluster 3 displaying a complete separation for P/LP and cluster 1 for B/LB (Supplementary Tables 2 and 3). Within clusters 2, 4, and 5, the distribution of VCEP annotated variants were mixed or contained more VUS. If TP53 variant penetrance is a gradient resulting in a spectrum of cancer risk, then clusters 2, 4, and 5 may represent intermediate risk between B/LB cluster 1 and P/LP cluster 3. Subsequently, we explored germline sequencing datasets to determine the prevalence of each variant cluster in individuals with or without a history of cancer (Supplementary Table 4). Cancer-associated missense variants in the NCI TP53 Database were predominantly classified in cluster 3 (875/1260 or 69%) (Fig. 3b). Similarly, cluster 3 variants comprised nearly 60% of missense variants identified in a multi-institutional validation dataset (n = 334) collected from cancer centers across Canada, the United States, and Germany (Fig. 3c). Non-cancer associated germline variants from the gnomAD database, representing the general healthy population, were most often classified in cluster 1 (79/186 or 42.5%) (Fig. 3d). Moreover, cluster 1 variants were highly enriched in healthy older women >70 years (19/27 or 70%) with no history of cancer (FLOSSIES database) (Fig. 3e). Overall, the divergence of variant clusters in germline carriers affected versus unaffected by cancer and the alignment with VCEP annotations supports the potential value of our clustering strategy to delineate germline carriers into risk stratified groups.
a Stacked bar plot showing VCEP annotated variants applied to the variant clusters (n = 106). Cluster 1 (42 classified of 791), cluster 2 (18 classified of 419), cluster 3 (21 classified of 381), cluster 4 (22 classified of 448), cluster 5 (3 classified of 269). b Bar graphs displaying the frequencies of germline variants in each cluster found in the NCI TP53 Database (n = 3113), (c) a multi-institutional validation cohort (n = 458), (d) gnomAD non-cancer dataset (n = 186), and (e) FLOSSIES database (n = 27). P-values on plots were calculated using chi-square tests. Source data are provided as a Source Data file.
The age of cancer onset is significantly stratified across TP53 variant clusters
Carriers of TP53 variants endure a lifelong high susceptibility to cancer with wide-ranging ages at onset. After separating TP53 variant carriers into cluster-based groups, we found significant differences between the ages at diagnosis of cancer (p < 0.0001, Kruskal-Wallis test) (Fig. 4a, b). Carriers of protein-truncating variants (frameshift, nonsense, and deletions; FS/NS/DEL) were included as another variant group. Cancer diagnoses were significantly earlier in carriers of clusters 3, 5, and FS/NS/DEL variants when compared to carriers of variants in clusters 1 and 4, with as much as 14.5 years between the median age at onset between clusters 3 and 1 carriers (Fig. 4a and Supplementary Table 5). Importantly, our validation cohort demonstrated the same pattern of divergent ages at diagnosis (p < 0.0001, Kruskal-Wallis test), where cluster 1 carriers exhibited the oldest onset, while clusters 3, 5, and FS/NS/DEL carriers consistently displayed the earliest onset (Fig. 4b and Supplementary Table 5). These observations are consistent with the structural/functional consequences identified between variant clusters. Cluster 1 and 4 variants are found within IDRs, whereas cluster 3 and 5 variants are located within TP53 structural elements (Supplementary Fig. 3). FS/NS/DEL protein truncating variants are also expected to result in severe LOF. Cluster 2 variants are primarily in the DBD structural domain, however, unlike cluster 3, these variants tend to occur away from DNA contact sites and are predicted to be less deleterious (Supplementary Fig. 3). As such, carriers of cluster 2 variants appear to have intermediate cancer onset relative to the other clusters.
a Line plot displaying the cumulative cancer diagnosis ages over time and a box-and-whisker plot representing the ages at cancer diagnosis in the NCI TP53 Database (****p < 0.0001 compared to clusters 1 and 4, *p = 0.014 vs cluster 1, *p = 0.024 vs cluster 4); sample sizes: cluster 1 (n = 81), cluster 2 (n = 181), cluster 3 (n = 1126), cluster 4 (n = 46), cluster 5 (n = 40), FS/NS/DEL (n = 407). b Line plot displaying the cumulative cancer diagnosis ages over time and a box-and-whisker plot representing the ages at cancer diagnosis in the validation cohort (*p < 0.0001 compared to clusters 3 and FS/NS/DEL, *p = 0.014 compared to cluster 2, *p = 0.022 vs cluster 5, *p = 0.054 vs cluster 4); sample sizes: cluster 1 (n = 13), cluster 2 (n = 86), cluster 3 (n = 255), cluster 4 (n = 3), cluster 5 (n = 4), FS/NS/DEL (n = 125). c Line plot displaying the frequency of breast tumor diagnosis over time and a box-and-whisker plot representing the ages at cancer diagnosis (***p = 0.0003 vs cluster 1, ***p < 0.0001 vs cluster 2, ***p = 0.0002 vs cluster 4, **p = 0.0042 vs cluster 1, **p = 0.0064 vs cluster 2, **p = 0.0013 vs cluster 4, ****p < 0.0001 vs clusters 1, 2, and 4); sample sizes: cluster 1 (n = 15), cluster 2 (n = 69), cluster 3 (n = 307), cluster 4 (n = 15), cluster 5 (n = 13), FS/NS/DEL (n = 143). d Line plot displaying the frequency of ACC diagnosis over time and box-and-whisker plot representing the age at ACC onset (**p = 0.0045 vs cluster 2, **p = 0.0002 vs cluster 3, **p < 0.0001 vs FS/NS/DEL); sample sizes: cluster 1 (n = 2), cluster 2 (n = 18), cluster 3 (n = 73), cluster 4 (n = 4), cluster 5 (n = 2), FS/NS/DEL (n = 21). For all box-and-whisker plots, the center line represents the median, the box spans the interquartile range (25th to 75th percentiles), and the whiskers extend to the 10th and 90th percentiles. P-values on plots were calculated using Kruskal-Wallis tests. Two-tailed Mann-Whitney U tests were used for pairwise comparisons. Source data are provided as a Source Data file.
Upon analysis of cancer diagnoses in the NCI TP53 Database, we noticed a drastic increase in frequency for cluster 5 carriers around age 30 due to a surge of breast tumor cases (Fig. 4a). Breast tumor onset varied considerably between the groups (p < 0.0001, Kruskal-Wallis test). Carriers of clusters 3, 5, and FS/NS/DEL variants each had significantly earlier onset when compared to carriers of variants in clusters 1, 2, and 4 (Fig. 4c and Supplementary Table 5). Furthermore, in our analysis of ACC diagnosis, which is typically within the first few years of life, we found a significant difference between the patient groups (p = 0.0144; Kruskal-Wallis test). In this case, cluster 4 variant carriers manifested ACC during young adulthood with a median age at diagnosis of 26.5 years, whereas all other patient groups experienced early childhood-onset ACC (Fig. 4d and Supplementary Table 5).
Classification of TP53 variant carriers based on clusters reveals unique cancer-type distributions and an osteosarcoma-prone subgroup
Individuals carrying a pathogenic germline TP53 variant are affected by a broad range of cancers during childhood and adulthood, and some experience multiple cancers. We classified TP53 variant carriers in the NCI TP53 Database based on the five clusters and investigated the cancer-type distributions. Each group of patients exhibited a diverse range of cancers (Fig. 5a) and our analysis revealed significant differences (chi-squared test) between the occurrence of specific cancer types when comparing across all groups (Supplementary Table 6). Interestingly, cluster 1 variant carriers presented with considerably fewer core LFS cancer types (57.1%) versus all other patient groups that each totalled approximately 75% LFS core cancers (Fig. 5b and Supplementary Data 2). The core cancers that are commonly associated with LFS are osteosarcoma, soft tissue sarcomas (STS), breast tumors, brain tumors, adrenocortical carcinoma (ACC), and leukemia. Colorectal cancer is relatively uncommon in LFS, representing 3.4% (136/3987) of total cancers in the NCI TP53 Database. We found a significant difference between the patient groups (p < 0.0001, chi-squared test) in which colorectal cancers represented 12.4% (13/105) of total cancers affecting cluster 1 variant carriers (Fig. 5c and Supplementary Data 2). This atypical tumor pattern associated with cluster 1 variant carriers demonstrates a less “LFS-like” phenotype. Moreover, cluster 1 carriers, as well as clusters 2 and 4 carriers, were less frequently diagnosed with more than one cancer during their lifetimes (Fig. 5d), each having over two-fold reduced odds (Supplementary Data 2) of developing multiple cancers compared to cluster 3 and FS/NS/DEL carriers. Cluster 5 also had a high frequency of carriers affected by multiple tumors. Although not statistically significant, this result aligns with the severity of functional impairment (Fig. 2f, g) observed in cluster 5 variants located in structurally significant regions (Fig. 2d), as well as their association with early cancer onset (Fig. 4).
a Stacked bar plot displaying the cancer type distribution patterns manifested in germline TP53 variant carriers. Cluster 1 (n = 105), cluster 2 (n = 219), cluster 3 (n = 1255), cluster 4 (n = 55), cluster 5 (n = 50), FS/NS/DEL (n = 467). b Bar graph showing the frequency of LFS-core cancer types (**FDR = 0.0035 vs cluster 2 and 3, **FDR = 0.0012 vs FS/NS/DEL). c Bar graph showing the frequency of colorectal cancers (*FDR = 0.0092 vs cluster 2, *FDR = 0.00015 vs cluster 3 and FS/NS/DEL). d Bar graph showing the frequency of patients experiencing multiple tumors (**FDR = 0.0031 cluster 3 vs cluster 1, **FDR = 0.0002 cluster 3 vs cluster 2, **FDR = 0.0031 cluster 3 vs cluster 4, **FDR = 0.0097 FS/NS/DEL vs cluster 1, **FDR = 0.0031 FS/NS/DEL vs cluster 2, **FDR = 0.0073 FS/NS/DEL vs cluster 4). e Bar graph showing frequency of carriers fulfilling classic LFS or Chompret criteria (****FDR < 0.0001 comparing to clusters 3, 5, and FS/NS/DEL). f Bar graph showing frequency of carriers fulfilling LFS-like criteria (LFL; Eeles and Birch criteria4,5) (**FDR = 0.0027 vs cluster 1, **FDR < 0.0001 vs cluster 2, **FDR = 0.0007 vs cluster 3, **FDR = 0.0005 vs cluster 5, **FDR < 0.0001 vs FS/NS/DEL). g Bar graph showing the frequency of osteosarcomas (**FDR = 0.0003 vs cluster 1, **FDR = 0.0002 vs cluster 2, **FDR = 0.0013 vs cluster 3, **FDR = 0.0003 vs cluster 4, **FDR = 0.004 vs FS/NS/DEL). P-values on plots were calculated using chi-square tests. Two-tailed Fisher’s exact tests were used for pairwise comparisons with FDR-corrected p-values. Source data are provided as a Source Data file.
Next, we analyzed variant carriers across clusters based on their fulfillment of diagnostic criteria for LFS, which incorporate factors such as age of onset, cancer type, number of cancers, and family history28. Classic LFS and Chompret criteria are more stringent compared to the more inclusive criteria for an LFS-like diagnosis (LFL; Eeles and Birch criteria4,5). Significant differences were observed between the variant carrier groups when comparing between stricter LFS diagnoses (classic LFS/Chompret; p < 0.0001, Chi-squared test) as well as LFL diagnoses (p < 0.0001, Chi-squared test; Supplementary Table 7 and Supplementary Data 2). Clusters 1, 2, and 4 each had fewer individuals meeting the stricter LFS criteria compared to clusters 3, 5, and FS/NS/DEL carriers (Fig. 5e), indicating attenuated LFS phenotypes among those in clusters 1, 2, and 4. Notably, cluster 4 had no classic LFS diagnoses but had the most LFL diagnoses (Fig. 5f), again, suggesting attenuated phenotypes or conditional pathogenicity. Collectively, the retention of TP53 functionality observed in cluster 1 variants (Fig. 2e, f), their association with less “LFS-like” phenotypes (Figs. 4a, b and 5), as well as the enrichment of these variants in cancer-free individuals (Fig. 3d, e), suggest that they are B/LB. We propose that other genetic or environmental cancer drivers may play a more significant role in tumorigenesis in carriers of these variants.
Strikingly, osteosarcomas occurred at more than twice the frequency in patients carrying a cluster 5 variant (26%; 13/50 cancers) compared to all other TP53 variant groups (p < 0.0001, chi-squared test) (Fig. 5g and Supplementary Table 8). Pairwise comparisons determined that the odds of developing osteosarcoma for cluster 5 variant carriers is significantly higher than in all other variant carrier groups, with odds ratios (OR) = 8.7 versus cluster 1 (95% confidence interval (CI) = 2.5–39.1; FDR = 0.0007, Fisher’s exact test), OR = 6.0 versus cluster 2 (95% CI = 2.3–15.7; FDR = 0.0007, Fisher’s exact test), OR = 3.5 versus cluster 3 (95% CI = 1.6–6.9; FDR = 0.0024, Fisher’s exact test), OR = 18.5 versus cluster 4 (95% CI = 2.6–817; FDR = 0.0013, Fisher’s exact test), and OR = 3.1 versus FS/NS/DEL (95% CI = 1.4–6.4; FDR = 0.012, Fisher’s exact test) (Supplementary Table 6). All bone tumors in cluster 5 (n = 14) were classified as osteosarcomas with the exception of one unclassified bone sarcoma (not otherwise specified). Notably, upon analysis of the TP53 variants in cluster 5 patients (n = 37), we found that most were located in the OD and abolish oligomerization, resulting in monomeric TP53 species (L344P, R337C, R337L, R337P, and R342P; n = 31 individuals from 20 unrelated families)12,29. Among carriers of these monomeric TP53 variants, osteosarcomas accounted for 29.5% (13/44) of all cancers in this subgroup (Supplementary Tables 8 and 9). To further investigate the significance of TP53 variant clustering, we validated these findings using osteosarcoma cells and carrier-derived non-cancer cells.
Functional assessments of TP53 variants in cancer cells and patient-derived non-cancer cells validate the biological significance of the variant clusters
To test the biological significance of the variant clusters, we performed cellular assays using cancer cells as well as non-cancer cells from carriers. Since we observed that carriers of monomeric variants had significantly increased osteosarcoma development compared to non-monomeric variant carriers, we expressed monomeric and non-monomeric variants in TP53-null osteosarcoma cells (Saos-2) and performed clonogenic assays. Osteosarcoma cells expressing monomeric variants R342P and R337C resulted in the formation of significantly more colonies compared to variants representing the other clusters, except for cluster 3 variants (Fig. 6a). Given the severe functional consequences associated with cluster 3 variants that conferred some of the earliest onset ages in carriers (Fig. 4), this result was expected. When expressed in TP53-null lung adenocarcinoma cells (H1299), the results were less pronounced or not statistically significant when comparing monomeric to non-monomeric variants, suggesting that TP53 monomers have a greater tumorigenic influence in the context of osteosarcoma (Fig. 6b and Supplementary Data 3). Glutaraldehyde crosslinking and immunoblotting confirmed TP53 variant expression and oligomeric status in cells (Supplementary Fig. 4).
a Bar graphs displaying clonogenic assay results in Saos-2 p53-null osteosarcoma cells and (b) in H1299 p53-null lung adenocarcinoma cells expressing TP53 variants representing each cluster (mean with SEM). c Bar graph showing clonogenic assay results in H1299 cells comparing R175 variants (mean with SEM). d Box-and-whisker plot of metabolic growth rates as measured by the MTT colorimetric assay using unstressed dermal fibroblasts from patient skin biopsies. e Box-and-whisker plot of TP53 variant scores as measured by mean TP53 target gene mRNA expression in patient blood-derived cells (fresh peripheral blood lymphocytes or lymphoblastoid cell lines)35. Wild-type (WT; n = 54), cluster 1 (n = 7), cluster 2 (n = 8), cluster 3 (n = 19), cluster 4 (n = 4), FS/NS/DEL (n = 8). Clonogenic and MTT assays were repeated at least twice and performed in triplicate and sextuplicate, respectively. For all box-and-whisker plots, the center line represents the median, the box spans the interquartile range (25th and 75th percentiles), and the whiskers extend to the minimum and maximum values. All pairwise statistical comparisons in the clonogenic assay are made relative to WT p53. P-values were calculated using one-tailed unpaired t-tests. Exact p-values for all pairwise comparisons are provided in Supplementary Data 3. Source data are provided as a Source Data file.
In many cases, amino acid substitutions at the same position were classified into different clusters. For example, R175C, R175G, R175H, R175L, R175P, and R175S, classified into clusters 4, 3, 3, 2, 2, and 2, respectively. We generated and expressed each R175 variant in H1299 cells to determine whether their differential clustering is substantiated by their oncogenic effects in cells. Based on the functional and clinical associations, we would expect R175C in cluster 4 to exhibit less pathogenicity compared to variants in clusters 2 or 3. Indeed, R175C behaved like wild-type TP53 to significantly repress colony formation (Fig. 6c). Furthermore, the expression of cluster 3 variants R175G/H both resulted in more colony formation compared to the cluster 2 variants R175L/P/S. These results are consistent with the ages at cancer onset between cluster-stratified carriers: cluster 3 was associated with the earliest onset and greater cancer penetrance in terms of LFS diagnoses, cluster 2 variants led to intermediate onset ages with fewer meeting LFS diagnosis criteria, and cluster 4 carriers experienced later onset with LFL rather than stricter LFS diagnoses (Fig. 4, Fig. 5e, f). Counterintuitively, substituting arginine (R) for cysteine (C) is predicted to introduce the largest difference based on the physical properties of amino acids (Grantham distance = 180) (Supplementary Fig. 5), however, this variant had no impact on TP53 activity based on the results of the clonogenic assays (Fig. 6c). In addition, despite having the most similar properties to R, the change from R to histidine (H) (Grantham distance = 29) resulted in severely impaired tumor suppressor activity in the clonogenic assays (Fig. 6c). Analysis of the impact of these specific substitutions on the TP53 structure can explain these discrepancies since R175 is involved in maintaining the structural integrity of a zinc binding pocket. R175H is known to destabilize the interaction with zinc, which is necessary for the sequence-specific DNA binding of TP5330. Replacing R with H results in considerable steric hindrance that would alter the zinc binding region, whereas replacing R with C creates no steric overlap and may still coordinate zinc binding through a thiol group (Supplementary Fig. 5). Substitution with glycine (G) also creates no steric overlap (Supplementary Fig. 5), however, the lack of a side chain might offer greater flexibility and result in the loss of structural stability, which can explain the lack of tumor suppressor activity in the clonogenic assays and its classification in cluster 3 (Fig. 6c). These results highlight the importance of cellular functional assays for assessing TP53 variant pathogenicity.
Finally, we investigated whether biological differences exist between healthy non-cancerous cells obtained from variant carriers representing different clusters. Unique biochemical and growth characteristics have been detected in dermal fibroblasts from individuals predisposed to cancer31,32,33,34, however, this has not been reported for TP53 variant carriers. Given the central importance of TP53 to control cellular growth, we cultured dermal fibroblasts from skin biopsies of variant carriers and measured the metabolic growth rate. We found that all variant carrier fibroblasts had elevated metabolic activity compared to fibroblasts with wild-type TP53 (Fig. 6d). Moreover, the metabolic activity of variant carrier fibroblasts was distinctly stratified according to the variant cluster classifications, reflecting their relative pathogenicity based on our clinical analyses. Specifically, fibroblasts from cluster 3 variant carriers (Y163C and G245S) had the highest metabolic activity, fibroblasts from a cluster 2 carrier (S241Y) had intermediate activity, and a cluster 1 carrier (P222L) exhibited the lowest activity among the variants tested, which was slightly elevated compared to fibroblasts with wild-type TP53. Although the cluster 5 variant F338S is not monomeric, fibroblasts from this carrier displayed relatively high metabolic activity, supporting the pathogenic nature of cluster 5 observed in clinical analyses. These results indicate that quantifiable cellular properties could be harnessed in the development of a diagnostic test to determine personal cancer risk for TP53 variant carriers. Towards this goal, colleagues have previously attempted to assess the transcriptional function of TP53 VUSs using a blood-based assay. Raad et al. exposed fresh blood lymphocytes or lymphoblastoid cell lines to doxorubicin in order to induce a TP53 response before measuring mRNA levels from 10 TP53 responsive genes35. In total, this assay was conducted on 46 blood samples from variant carriers harboring 31 unique missense variants representing clusters 1, 2, 3, 4, as well as 8 truncating variants. We compiled and stratified this data based on the TP53 variant clusters. Overall, variant carriers had lower TP53 functionality compared to 54 individuals with wild-type TP53 (Fig. 6e). Carriers of cluster 1 variants had a modest reduction in TP53 function, while carriers of variants in clusters 2, 3, 4, and FS/NS/DEL had considerably lower scores. Again, the results of this assay displayed a remarkable resemblance to the cancer onset ages observed in our clinical analysis of cluster-stratified carriers (Fig. 4a–c), demonstrating the biological significance of our variant clustering strategy.
Discussion
The broad phenotypic spectrum related to TP53 variants necessitates a risk stratification strategy. Diagnosis of LFS and the understanding of specific cancer risk is still a challenge because TP53 variants are often rare with limited clinical evidence to base decisions on. The population prevalence of pathogenic variants is also unclear for the same reasons. Strikingly, TP53 variants leading to protein changes are estimated to occur in 1.3% of the general population (based on gnomAD v4.0, n = 807,162 individuals; not including WT variant P72R), however most are likely benign. Current estimates suggest that pathogenic variants are found in 1 in 3555–5476 individuals36. This number was suggested to be as high as 1 in 400-865 using a less stringent definition of pathogenicity, although it is now considered an overestimate. Clarity around variant interpretations will help to avoid misclassification and improve patient management.
In this study, we integrated functional screens to investigate the diverse effects of TP53 variants and to gain a comprehensive understanding of the genotype-phenotype relationship. Previous models to predict variant pathogenicity treated it as a binary variable and limited the focus to pathogenic versus benign37,38. These studies did not examine the clinical effects of variants across a spectrum. In contrast, our approach refines the categorization of pathogenicity and links these findings to clinical features, providing a comprehensive exploration of the spectrum of genetic variants. Using unsupervised clustering we identified five distinct TP53 variant clusters with unique functional properties and clinical relevance. Table 1 summarizes the main structural/functional features and clinical outcomes associated with each variant cluster, and Supplementary Data 4 provides a reference for variant cluster assignments. A limitation of this study was the smaller sample size in the validation cohort that restricted the cancer type-specific analyses. Within the validation dataset, patient selection in the German cohort was more stringent for the inclusion of P/LP variants according to the Fortuno criteria39, thus restricting the representation of variants in clusters outside of these criteria.
Cluster 1 and 4 variants performed best overall in functional mutagenesis screens by retaining the most TP53 activity (Fig. 2e–h). Both of these clusters mapped to IDRs of the protein (Supplementary Fig. 3a, b), suggesting that they are more tolerable structurally and functionally. Indeed, patients harboring these germline variants displayed significantly delayed cancer onset in comparison to carriers of variants located in more highly structured TP53 protein regions from clusters 2, 3, 5 or truncating variants (Fig. 4a–c). Cluster 1 variant carriers were affected by fewer LFS core cancer types, and conversely, a greater proportion of colorectal cancers (Fig. 5b, c). Taken together, the enrichment of cluster 1 variants in populations with no cancer history (Fig. 3d, e), atypical and less “LFS-like” cancer types (Fig. 5a, b), later onset ages (Fig. 4a–c), as well as having the fewest cases affected by multiple cancers (Fig. 5d) and less LFS diagnoses (Fig. 5e) suggests that cluster 1 variants might be B/LB. In agreement with this, the cluster 1 variants that have been reviewed by VCEP are all categorized as B/LB (Fig. 3a). The relatively higher proportion of colorectal cancer observed in this variant cluster (12.4%; 13/105 cancers) compared to carriers of other variants (2.4%; 50/2046 cancers) reflects the proportion of worldwide cancer diagnoses that are colorectal cancer (10%) according to the World Health Organization. Notably, a previous study suggested that multigene panel testing in colorectal cancer has expanded the mutational analysis of TP53 to a wider range of patients, including those who do not meet classic LFS criteria, which can result in sampling bias that inflates cancer risk estimates40. We propose that the presence of cluster 1 variants in the NCI TP53 germline dataset may be a result of sampling bias and that other cancer risk factors contribute to oncogenesis in these individuals. In our assessment of cluster 4 variants, the retention of TP53 function (Fig. 2e–h, Fig. 6a–c), a relatively later age of onset, particularly in cases like ACC (Fig. 4d), and a greater likelihood of meeting LFL criteria rather than stricter LFS criteria (Fig. 5e, f), suggests a distinct and attenuated LFS phenotype.
In contrast to clusters 1 and 4, variant clusters that mapped to highly structured regions conferred the earliest onset. Cluster 3 variants are centered in the DBD, primarily at DNA-contact sites, are functionally deleterious, and have DN activity (Fig. 2d–h and Supplementary Fig. 3). These features each contribute to the highly pathogenic phenotypes associated with cluster 3 variants, being the most frequently reported germline variants (Fig. 3b, c). Cluster 2 variants are similarly centered in the DBD, however, unlike cluster 3, they are mostly situated distal to DNA-contact sites and are predicted to be less structurally disruptive (Supplementary Fig. 3). As such, cluster 2 variants conferred an intermediate phenotype between the highly pathogenic early-onset cluster 3 variants and the late-onset or B/LB cluster 1 variants (Fig. 4a–c), having significantly fewer carriers meeting strict LFS criteria versus cluster 3 (Fig. 5e). Furthermore, carriers of cluster 3 variants, along with those harboring highly structurally disruptive variants in cluster 5 and FS/NS/DEL variants, were more frequently affected by multiple cancers over their lifetimes compared to carriers of clusters 1, 2, and 4 (Fig. 5d).
Interestingly, nearly all individuals in cluster 5 had variants located in the OD that disrupt oligomerization and result in monomeric TP53 species. Our functional and clinical evaluations of cluster 5 variants reveal that they are pathogenic, linked to early onset (Fig. 4), and frequently fulfill the diagnostic criteria for LFS (Fig. 5e). A significantly higher incidence of osteosarcoma was found among this group compared to all other variant carrier groups, suggesting an increased susceptibility to develop osteosarcoma (Fig. 5g). To explore this possibility, we expressed monomeric variants in Saos-2 osteosarcoma cells and compared their ability to suppress colony formation versus non-monomeric variants. Monomeric variants led to increased clonogenic activity in Saos-2 cells, and this effect was dampened in H1299 lung adenocarcinoma cells in comparison to variants representing other clusters (Fig. 6a, b and Supplementary Data 3). Based on the functional trends observed in Saos-2 and H1299 cells, and the significantly higher osteosarcoma incidence, these results suggest that monomeric variants may have greater oncogenic potential in bone cells. Future work should explore this mechanism to identify potential therapeutic targets.
Our examination of codon 175 variants underscores how subtle differences in TP53 amino acid substitutions can lead to remarkably diverse cellular outcomes (Fig. 6c). Variants like R175H and R175G lead to uncontrolled cell proliferation, while R175L, R175P, and R175S have more moderate effects, and R175C retains wild-type functionality. Alterations of amino acid residue 175 specifically impact zinc binding, affecting TP53’s affinity for DNA in a sequence-specific manner30. This example highlights the precision of our TP53 variant clustering method in distinguishing between nuanced structural and functional differences. Across TP53, similar differential effects of variations at the same positions could influence a range of TP53 features, including molecular interactions and post-translational modifications.
Although TP53 impairment is a well-known driver of cancer, many additional genetic and epigenetic factors contribute to an individual’s overall cancer risk. We questioned whether differences between TP53 variants could be observed in non-transformed cells, or if compensatory mechanisms might offset TP53 dysfunction. Previous research has revealed characteristic changes in skin fibroblasts from individuals with an elevated cancer risk, but this has not yet been established in TP53 variant carriers31,32,33,34. Using unstressed dermal fibroblasts, we discovered that TP53 variant carriers exhibited an elevated metabolic rate, which aligned with the pathogenicity associated with each variant cluster (Fig. 6d). Furthermore, a comparison of TP53 activity in blood lymphocytes from variant carriers supported the findings observed in dermal fibroblasts (Fig. 6e). Collectively, these cellular assays validate the biological differences between TP53 variant clusters and confirm the accuracy of our clustering strategy. Importantly, our ability to detect these characteristics in healthy, non-cancerous skin and blood-derived cells offers the potential for a personalized predictor of cancer risk in TP53 variant carriers. We are now pursuing this further in the development of a cellular diagnostic assay.
In conclusion, our study reveals a continuum of cancer risk associated with TP53 variants, ranging from classic LFS phenotypes to more attenuated forms. This risk can be stratified through the comprehensive integration of functional assays and clinical data, offering a more nuanced understanding of variant pathogenicity (Fig. 7). The identified variant clusters provide a framework for risk stratification, serving as a tool for VUS interpretations and informing personalized approaches to cancer surveillance. Our findings suggest that tailored adjustments to cancer surveillance protocols based on variant cluster may be beneficial. For example, carriers of monomeric variants could benefit from enhanced monitoring for bone lesions during whole-body MRI, while those with cluster 3, 5, and FS/NS/DEL variants require intensive early screening, particularly for early-onset breast cancer. Additionally, clusters 2 and 4 variants are associated with attenuated phenotypes, whereas our analyses suggest that cluster 1 variants are benign. These findings highlight the need for additional population-based studies to refine their cancer risk profiles. While these findings offer insights into potential adjustments to surveillance, carriers of P/LP variants should continue with established surveillance routines, irrespective of clustering, until further evidence enables individualized tumor surveillance plans based on cluster affiliation. Looking ahead, for carriers of variants associated with attenuated phenotypes, a personalized cellular diagnostic assay could be a valuable tool in determining the need for surveillance. Overall, our clustering approach offers a path toward more precise clinical expectations and surveillance strategies, with the potential to significantly improve outcomes through personalized cancer risk assessment.
This schematic representation illustrates TP53 variant clusters positioned along a continuum of pathogenicity, ordered by increasing cancer risk. WT (teal) corresponds to normal p53 function and no elevated cancer risk. Cluster 1 (red) includes benign or conditionally pathogenic variants associated with lower risk. Cluster 4 (blue) comprises variants that retain WT-like function but are linked to attenuated phenotypes and intermediate risk. Cluster 2 (olive green) includes hypomorphic or partially functional variants that confer intermediate risk. Cluster 5 (magenta) encompasses monomeric variants with high risk as well as others with unclear mechanisms and uncertain risk. T (grey) represents truncating variants associated with high risk due to loss of p53 activity. Cluster 3 (green) consists of DN variants that interfere with WT function and confer high risk.
Methods
Functional datasets and unsupervised cluster analysis
We selected saturation mutagenesis cellular functional screens as features for dimensionality reduction. Giacomelli et al. measured TP53 cellular loss-of-function (LOF; nutlin-3 treatment in TP53NULL human A549 cells), dominant-negative activity (DN; nutlin-3 treatment in isogenic TP53WT A549 cells), and DNA-damage repair (DDR; etoposide treatment of TP53NULL A549 cells) using competitive growth assays14. Kato et al. measured transcriptional activity (TA) in yeast using a transcriptional reporter assay13. We used the median TA values (eight TP53 promoter-specific elements to WAF1, MDM2, BAX, 14-3-3α, AIP1, GADD45, Noxa, and p53R2) normalized to WT TP53. Matrix correlation plots comparing scaled data (Z-score normalization) were generated using the pairplot function in Python’s Data Visualization library Seaborn using the R package reticulate41,42. Dimensionality reduction was performed using PCA (principal component analysis) and UMAP (Uniform Manifold Approximation and Projection). We performed unsupervised clustering using the k-means algorithm with the clusGap() function from the R package cluster, which applies the Gap Statistic method to determine the optimal number of clusters43. For this, we used 50 Monte Carlo samples and tested 1 to 10 clusters with 25 random starts for each. The input data was scaled prior to clustering. PCA clustering results were visualized using the fviz_cluster function from the R package factoextra44. The UMAP results were visualized using the R package umap45.
Structure/function analysis
Rendering of the structure of two TP53 core DNA-binding domains bound to DNA (PDB #3EXJ) was done using PyMOL software (version 3.1.0). Structural/functional domain annotations for each variant were obtained from the NCI TP53 Database. Prediction of intrinsically disordered regions (IDRs) was conducted using IUPred321. Post-clustering functional analysis of variant clusters was performed using scaled data (Z-score normalized).
Clinical data curation and selection criteria
The National Cancer Institute’s (NCI) TP53 Database (version R20; https://tp53.isb-cgc.org) is a repository for germline variant carriers that was used as the discovery cohort (n = 1597) to evaluate clinical associations. Carriers of multiple variants were removed from the dataset because the relative contribution of the variants is unknown. In addition, carriers of the R337H variant (n = 291) were removed from the dataset due to a known sampling bias as a founder variant in Brazil. Patients or tumors with no ID were removed to avoid inclusion of duplicates. In addition, only confirmed or obligate germline TP53 variant carriers were included in the analysis. The validation dataset (n = 426 confirmed carriers) is a multi-institutional collection of individuals and families carrying germline TP53 variants from registries at three cancer centers: in Canada (The Hospital for Sick Children)46, the United States (Perelman School of Medicine at the University of Pennsylvania) (Supplementary Data 1), and Germany (Hannover Medical School)47. The cohort from Germany included only carriers of pathogenic or likely pathogenic variants based on the Fortuno classification39. All carriers of protein-truncating variants (frameshift, non-sense, and deletions) were included as a separate group (FS/NS/DEL) in each analysis. The datasets include both male and female participants. No stratified analyses by sex or gender were performed, except for female breast cancer, due to the limited sample sizes within this rare disease cohort, which constrained subgroup analysis. Sex of participants was assigned in the validation cohorts, although it is not reported in the gnomAD database, and it is unknown whether sex was assigned or self-reported in the NCI TP53 Database. All participants provided written informed consent for use of their clinical data and acquisition of skin fibroblasts. For pediatric participants, consent was obtained from a parent or legal guardian. No compensation was provided for participation.
Comparisons with non-cancer associated datasets and the Variant Curation Expert Panel (VCEP) annotations
We used the Genome Aggregation Database (gnomAD; https://gnomad.broadinstitute.org/) v3.1.2 non-cancer dataset mapped against the canonical transcript (Ensembl transcript ID ENST00000269305.9) to investigate the prevalence of variants in the general population with no history of cancer. Prevalence estimate of TP53 variants leading to protein changes in the general population was based on gnomAD v4.0 (n = 807,162 individuals), after filtering out 3’UTR, 5’UTR, intronic, and synonymous variants, as well as the WT variant P72R, and accounting for homozygote counts. The allele count was 10,522, corresponding to a prevalence of 1.3%. The Fabulous Ladies Over Seventy (FLOSSIES) Database (https://whi.color.com/) contains germline DNA sequencing from 27 genes (including TP53) collected from 10,000 women over 70 years of age who have never had cancer. We identified 27 TP53 variants in this dataset and mapped them against the TP53 clusters. TP53 variant clusters were also compared to VCEP annotations, which were obtained from the ClinVar database (https://www.ncbi.nlm.nih.gov/clinvar/).
Construction of TP53 variants
TP53 variants were generated by introducing point mutations in wild-type human TP53 cDNA (inserted in the pcDNA3.1 mammalian expression vector) using site-directed mutagenesis. To introduce the desired variants, PCR amplification of the TP53 gene was conducted using oligonucleotide primers containing the variants of interest (Supplementary Table 1).
Cell culture and transfection
Saos-2 and H1299 cells were cultured in Dulbecco’s Modified Eagle Medium (DMEM; Gibco) supplemented with 15% and 10% (v/v) heat-inactivated fetal bovine serum, respectively (American Type Culture Collection #HTB-85 and #CRL-5803). No commonly misidentified cell lines were used in this study. Transfections of pcDNA3.1 constructs were performed using Lipofectamine 3000 reagent following the manufacturer’s instructions (Thermo Fisher Scientific). Skin biopsy samples were collected from patients at The Hospital for Sick Children, dissociated with collagenase for 1.5 h at 37 °C and centrifuged at 370 x g for 10 min. Cell pellets were homogenized with trypsin/EDTA and incubated at 37 °C for 30 min. Cells were centrifuged and washed in phosphate buffered saline (PBS) before incubation with Alpha Minimum Essential Medium (AMEM; Gibco) supplemented with 10% fetal bovine serum. All cells were maintained at 37 °C in a humidified 5% carbon dioxide atmosphere.
MTS colorimetric assay
Dermal fibroblasts were seeded at a density of 2000 cells per well in 96-well tissue culture plates and cultured for 48 h. The protocol was conducted according to the manufacturer’s instructions (ab197010; Abcam). Briefly, two hours after the MTS (3-(4,5-dimethylthiazol-2-yl)-5-(3-carboxymethoxyphenyl)-2-(4-sulfophenyl)-2H-tetrazolium) reagent was added to each well, absorbance at 490 nm was measured using a microplate reader (ELX800; BioTek).
Western blotting
Whole cell lysates were prepared in lysis buffer (25 mM HEPES pH 7.0, 100 mM NaCl, 1% Triton X-100) supplemented with a protease inhibitor cocktail (Roche). Cell lysates were resolved by SDS-polyacrylamide gel electrophoresis (SDS-PAGE; 4–12% acrylamide gradient) and transferred to polyvinylidene fluoride (PVDF) membranes (Amersham). TP53 and β-actin were detected using a mouse anti-human TP53 monoclonal antibody (mAb; 1:1000 dilution; IC12, Cell Signalling) and rabbit anti-human β-actin (1:1000 dilution; 13E5, Cell Signalling).
Oligomerization assay
Supernatants from lysed total cell extracts (8ug) were incubated with 0.025% glutaraldehyde prepared in PBS. Crosslinking reactions were left to proceed for 30 min at 4 °C. Reactions were ended with the addition of 100 mM Tris-HCl pH 7.5. Multimeric TP53 complexes were resolved by SDS-PAGE before transfer to nitrocellulose membrane and immunoblotting.
Clonogenic assay
Saos-2 and H1299 cells were transfected with pcDNA3.1 TP53 variants. After 6 hours, the cells were dissociated with trypsin, washed with PBS, and seeded at a density of 2000 cells in 6-well plates in DMEM with FBS. The media was replaced after 24 h with media containing 750 ug/mL G418 antibiotic and cells were incubated for 10 days (H1299 cells) or 14 days (Saos-2 cells). Media changes were done every 5 days. Cells were then fixed and stained in 80% methanol, 3.7% formaldehyde, and 0.025% crystal violet, and colonies were counted manually.
3D visualization of protein structures
The crystal structure of a TP53 core tetramer was downloaded from the Protein Data Bank (PDB #3EXJ; https://www.rcsb.org/) and uploaded to the PyMOL Molecular Graphics System version 3.1.0 (Schrödinger, LLC) for structural assessments and renderings. The PyMOL Mutagenesis command was used to analyze amino acid substitutions.
Statistical analysis
All statistical tests were performed using RStudio (version 2022.07.1 + 554; Posit, PBC) or GraphPad Prism (version 9). Linear relationships in matrix correlation plots were assessed using Pearson’s correlation coefficient. Kruskal-Wallis tests were used to determine differences between functional scores, Grantham’s distance, and ages at onset across all groups. Pairwise comparisons were then performed using Mann-Whitney U tests. Chi-squared tests were used to determine significance across groups for cancer type comparisons in variant carriers and differences of proline substitutions in clusters. Subsequently, pairwise comparisons were done using two-tailed Fisher’s exact test. All multiple hypothesis corrections for pairwise tests were done using the Benjamini-Hochberg adjustment to obtain the false discovery rate (FDR). Statistical analyses of dermal fibroblast and blood cell-based functional assays data was done using one-tailed unpaired t-tests.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
The data that support the findings of this study are available in The National Cancer Institute’s (NCI) publicly available TP53 Database (version R20; https://tp53.cancer.gov/get_tp53data#get_germ_mut). The Perelman School of Medicine validation cohort dataset is available in Supplementary Data 1. Clinical data from the LFS registry in Germany and in Canada is available in the following publications (https://doi.org/10.1186/s13045-022-01332-1 and https://doi.org/10.1158/2767-9764.CRC-22-0402). TP53 variant functional mutagenesis datasets are available in the following publications: https://doi.org/10.1038/s41588-018-0204-y, https://doi.org/10.1016/j.molcel.2018.06.012, https://doi.org/10.1073/pnas.1431692100. Clinical TP53 variant interpretations (Variant Curation Expert Panel) are available from https://clinicalgenome.org/affiliation/50013/. Population TP53 genomic datasets are available from gnomAD v3.1.2 (https://gnomad.broadinstitute.org) and FLOSSIES Database (https://whi.color.com/). Source data are provided with this paper.
References
Li, F. P. & Fraumeni, J. F. Jr. Rhabdomyosarcoma in children: epidemiologic study and identification of a familial cancer syndrome. J. Natl Cancer Inst. 43, 1365–1373 (1969).
Malkin, D. et al. Germ line p53 mutations in a familial syndrome of breast cancer, sarcomas, and other neoplasms. Science 250, 1233–1238 (1990).
Srivastava, S. et al. Germ-line transmission of a mutated p53 gene in a cancer-prone family with Li-Fraumeni syndrome. Nature 348, 747–749 (1990).
Birch, J. M. et al. Prevalence and diversity of constitutional mutations in the p53 gene among 21 Li-Fraumeni families. Cancer Res 54, 1298–1304 (1994).
Eeles, R. A. Germline mutations in the TP53 gene. Cancer Surv. 25, 101–124 (1995).
Chompret, A. et al. Sensitivity and predictive value of criteria for p53 germline mutation screening. J. Med Genet 38, 43–47 (2001).
Tinat, J. et al. 2009 version of the Chompret criteria for Li Fraumeni syndrome. J. Clin. Oncol. 27, e108–e109 (2009).
Li, F. P. et al. A cancer family syndrome in twenty-four kindreds. Cancer Res 48, 5358–5362 (1988).
Kratz, C. P. et al. Analysis of the Li-Fraumeni Spectrum Based on an International Germline TP53 Variant Data Set: An International Agency for Research on Cancer TP53 Database Analysis. JAMA Oncol. 7, 1800–1805 (2021).
Kennedy, M. C. & Lowe, S. W. Mutant p53: it’s not all one and the same. Cell Death Differ. 29, 983–987 (2022).
Sabapathy, K. & Lane, D. P. Therapeutic targeting of p53: all mutants are equal, but some mutants are more equal than others. Nat. Rev. Clin. Oncol. 15, 13–30 (2018).
Fischer, N. W. et al. Association Between the Oligomeric Status of p53 and Clinical Outcomes in Li-Fraumeni Syndrome. J. Natl Cancer Inst. 110, 1418–1421 (2018).
Kato, S. et al. Understanding the function-structure and function-mutation relationships of p53 tumor suppressor protein by high-resolution missense mutation analysis. Proc. Natl Acad. Sci. USA 100, 8424–8429 (2003).
Giacomelli, A. O. et al. Mutational processes shape the landscape of TP53 mutations in human cancer. Nat. Genet 50, 1381–1387 (2018).
Kotler, E. et al. A Systematic p53 Mutation Library Links Differential Functional Impact to Cancer Mutation Pattern and Evolutionary Conservation. Mol. Cell 71, 873 (2018).
Fischer, M. Census and evaluation of p53 target genes. Oncogene 36, 3943–3956 (2017).
Willis, A. et al. Mutant p53 exerts a dominant negative effect by preventing wild-type p53 from binding to the promoter of its target genes. Oncogene 23, 2330–2338 (2004).
Gencel-Augusto, J. & Lozano, G. p53 tetramerization: at the center of the dominant-negative effect of mutant p53. Genes Dev. 34, 1128–1146 (2020).
Pinto, E. M. et al. Founder effect for the highly prevalent R337H mutation of tumor suppressor p53 in Brazilian patients with adrenocortical tumors. Arq. Bras. Endocrinol. Metab. 48, 647–650 (2004).
DiGiammarino, E. L. et al. A novel mechanism of tumorigenesis involving pH-dependent destabilization of a mutant p53 tetramer. Nat. Struct. Biol. 9, 12–16 (2002).
Erdos, G., Pajkos, M. & Dosztanyi, Z. IUPred3: prediction of protein disorder enhanced with unambiguous experimental annotation and visualization of evolutionary conservation. Nucleic Acids Res 49, W297–W303 (2021).
Laptenko, O. et al. The Tail That Wags the Dog: How the Disordered C-Terminal Domain Controls the Transcriptional Activities of the p53 Tumor-Suppressor Protein. Trends Biochem Sci. 41, 1022–1034 (2016).
Natan, E. et al. Interaction of the p53 DNA-binding domain with its n-terminal extension modulates the stability of the p53 tetramer. J. Mol. Biol. 409, 358–368 (2011).
Kussie, P. H. et al. Structure of the MDM2 oncoprotein bound to the p53 tumor suppressor transactivation domain. Science 274, 948–953 (1996).
Raj, N., Attardi, L.D. The Transactivation Domains of the p53 Protein. Cold Spring Harb Perspect Med. 7,(2017).
Fortuno, C. et al. Specifications of the ACMG/AMP variant interpretation guidelines for germline TP53 variants. Hum. Mutat. 42, 223–236 (2021).
Fischer, N. W., Ma, Y. V. & Gariepy, J. Emerging insights into ethnic-specific TP53 germline variants. J. Natl Cancer Inst. 115, 1145–1156 (2023).
Schneider, K. et al. Li-Fraumeni Syndrome, in GeneReviews((R)), M.P. Adam et al. Editors. Seattle (WA) (1993).
Kawaguchi, T. et al. The relationship among p53 oligomer formation, structure and transcriptional activity using a comprehensive missense mutation library. Oncogene 24, 6976–6981 (2005).
Butler, J. S. & Loh, S. N. Structure, function, and aggregation of the zinc-free form of the p53 DNA binding domain. Biochemistry 42, 2396–2403 (2003).
Antecol, M. H. et al. Altered biochemical properties of actin in normal skin fibroblasts from individuals predisposed to dominantly inherited cancers. Cancer Res 46, 1867–1873 (1986).
Haggie, J. A. et al. Fibroblasts from relatives of patients with hereditary breast cancer show fetal-like behaviour in vitro. Lancet 1, 1455–1457 (1987).
Fan, M. et al. Altered transcriptome signature of phenotypically normal skin fibroblasts heterozygous for CDKN2A in familial melanoma: relevance to early intervention. Oncotarget 4, 128–141 (2013).
Rubin, H. Saturation density of skin fibroblasts as a quantitative screen for human cancer susceptibility. Cancer Epidemiol. Biomark. Prev. 18, 2366–2372 (2009).
Raad, S. et al. Blood functional assay for rapid clinical interpretation of germline TP53 variants. J. Med Genet 58, 796–805 (2021).
de Andrade, K. C. et al. Higher-than-expected population prevalence of potentially pathogenic germline TP53 variants in individuals unselected for cancer history. Hum. Mutat. 38, 1723–1730 (2017).
Carbonnier, V. et al. Comprehensive assessment of TP53 loss of function using multiple combinatorial mutagenesis libraries. Sci. Rep. 10, 20368 (2020).
Ben-Cohen, G. et al. TP53_PROF: a machine learning model to predict impact of missense mutations in TP53. Brief Bioinform. 23, bbab524 (2022).
Fortuno, C. et al. An updated quantitative model to classify missense variants in the TP53 gene: A novel multifactorial strategy. Hum. Mutat. 42, 1351–1361 (2021).
Terradas, M. et al. TP53, a gene for colorectal cancer predisposition in the absence of Li-Fraumeni-associated phenotypes. Gut 70, 1139–1146 (2021).
Waskom, M.L. seaborn: statistical data visualization. J. Open Source Softw. 6, 3021 (2021).
Ushey, K., Allaire, J., Tang, Y., reticulate: Interface to ‘Python’. R package version 1.39.0, (https://CRAN.R-project.org/package=reticulate) (2024).
Maechler, M., Rousseeuw, P., Struyf, A., Hubert, M. & Hornik, K. cluster: Cluster Analysis Basics and Extensions. R package version 2.1.6, (https://CRAN.R-project.org/package=cluster) (2023).
Kassambara, A. & Mundt, F. factoextra: Extract and Visualize the Results of Multivariate Data Analyses. R. package version 1, 7 (2020).
Konopka, T. umap: Uniform Manifold Approximation and Projection. R package version https://CRAN.R-project.org/package=umap (2023).
Subasri, V. et al. Multiple Germline Events Contribute to Cancer Development in Patients with Li-Fraumeni Syndrome. Cancer Res Commun. 3, 738–754 (2023).
Penkert, J. et al. Genotype-phenotype associations within the Li-Fraumeni spectrum: a report from the German Registry. J. Hematol. Oncol. 15, 107 (2022).
Acknowledgements
This work is supported in part by a Terry Fox New Frontiers Program Project grant from the Terry Fox Research Institute (#1084). D.M. holds the CIBC Children’s Foundation Chair in Child Health Research. C.P.K. has been supported by the BMBF ADDRess (01GM2205A) and by the Deutsche Kinderkrebsstiftung (DKS2021.25). During the final editing stages of the manuscript, Prof. Hainaut tragically passed away. He played a critical role in the conceptualization of the project, data analysis and editing of the manuscript.
Author information
Authors and Affiliations
Contributions
N.W.F: conceptualization, data curation, investigation, performed experiments, formal analysis, methodology, data visualization, wrote the manuscript. N.O., P.P., C.G.: assisted with experiments. B.L.: statistical analysis and data visualization. N.A.: data curation. R.K., P.H., E.M.: investigation and data interpretation. K.N.M.: data acquisition and curation. C.P.K.: data acquisition, data curation, writing. D.M.: data interpretation, writing, supervision, funding acquisition.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks Svetlana Bajalica-Lagercrantz who co-reviewed with Alexander Sun Zhang; Dung-Fang Lee and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Source data
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Fischer, N.W., Ong, N., Laverty, B. et al. TP53 variant clusters stratify phenotypic diversity in germline carriers and reveal an osteosarcoma-prone subgroup. Nat Commun 16, 8546 (2025). https://doi.org/10.1038/s41467-025-63528-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467-025-63528-6