Abstract
Molecular correlates of cancer patient overall survival (OS) can provide new insights. Here, we systematically cataloged pan-cancer, multi-omic correlates of OS transcending tumor lineage across 11,019 patients, involving multiple cancer types while correcting for cancer-type-intrinsic OS differences. Significant fractions of genes with mRNA associated with OS in pan-cancer analyses showed concordant associations at the levels of DNA copy number alteration or methylation. Pan-cancer gene signatures of T-cell and macrophage tumor infiltrates were associated with better and worse OS, respectively. Pathways implicated by molecular OS associations included metabolism, PI3K/Akt, Wnt, and TGF-beta receptor. Significant fractions of worse OS-associated genes were essential for cell growth. A pan-cancer RNA signature of aggressive cancers associated with greater sensitivity in vitro to inhibitors of MEK1/2, glycolysis pathway, and HSP90, and with chemotherapy response in patient breast tumors. With therapeutic implications, pan-cancer molecular associations with patient survival reveal genes and pathways underlying more aggressive diseases.
Similar content being viewed by others
Introduction
Comprehensive knowledge of the genes underlying human cancers is a critical foundation for improving diagnosis and treatment1. The Cancer Genome Atlas (TCGA) was a large-scale scientific endeavor to comprehensively characterize over 10,000 human cancers at the molecular level across multiple data platforms, including somatic mutation, RNA, and DNA methylation2,3,4,5. For most of the 32 major cancer types profiled by TCGA, the consortium led a systematic molecular study focused on comprehensively characterizing the molecular landscape of the given cancer type as defined by tissue or histology. Each cancer type-specific study often involved defining molecular subtypes of the disease and the top somatically mutated genes. Top mutated genes could involve genes with non-random patterns of small mutation (single-nucleotide variants or insertions-deletions, i.e., indels) or copy number alteration (CNA) that would suggest positive selection for cancer gene drivers1,6. As the same laboratories uniformly generated TCGA data across cancer types, this aspect facilitated later “pan-cancer” studies by TCGA and others, involving cross-cancer type analyses to identify molecular patterns that could cut across or transcend conventional tissue or histology distinctions2,3. TCGA-led pan-cancer studies were largely carried out in two major waves, the first involving some 12 cancer types3,4 and the second involving all cancers that were ever profiled2,7. TCGA pan-cancer studies combined data from multiple cancer types, e.g., to define significant patterns by a particular data type1,6, to define pan-cancer molecular subtypes4,7, to study multiple types falling within a general category (e.g., gynecological cancers)8, or to focus on specific pathways9,10,11.
TCGA collected clinicopathologic annotation data along with multiplatform molecular profiles, allowing for defining molecular correlates of patient outcomes, with some limitations12. For many cancer types, TCGA clinical data involve relatively short-term clinical follow-up intervals, given the worldwide scope and severe time constraints for sample accrual goals determined at the time of TCGA program initiation and funding4,12,13. Using overall survival (OS) as an endpoint has advantages over other outcome measures, as there is minimal ambiguity in defining an OS event, and as TCGA clinical data are more complete for OS12. While most individual studies by cancer type carried out by TCGA did not feature molecular correlates of patient survival, for some cancer types, the length of patient follow-up on average was sufficient for carrying out such analyses14,15,16,17,18. One potential application of molecular signatures of patient survival would be as a prognostic tool in the clinical setting, e.g., to predict the likelihood of disease recurrence based on the data from the resected tumor19. At the same time, the practical utility of prognostic molecular signatures in the clinical setting has been debated20. An alternative application for prognostic signatures is to provide molecular clues about the drivers of aggressive cancers, thereby gaining insights into disease biology. Such an approach was demonstrated in TCGA’s study of clear cell renal cell carcinoma14, whereby aggressive cancers demonstrated evidence of a metabolic shift, suggestive of a “Warburg effect”21, based on pathway analyses of molecular survival correlates from multiple data platforms.
In addition to pan-cancer molecular subtypes or mutated genes, a systematic cataloging of pan-cancer molecular survival correlates could yield insights into the biology of more aggressive disease. While the time of patient follow-up may be fairly short for many individual cancer types in TCGA, we can leverage the very large case numbers in TCGA while correcting for cancer type in statistical models associating each molecular feature with OS. While different cancer types based on histology or tissue of origin may involve differences in patient survival over time4,12, such differences can be corrected for to identify any associations of molecular features with survival that would not be explainable by differences involving tumor type representation alone10,22,23,24. We have utilized this pan-cancer analysis approach to identify molecular survival correlates for gene subsets or gene signatures in the TCGA cohort and other cohorts10,13,22,25,26. Previous pan-cancer molecular studies of patient survival may involve carrying out analyses separately by cancer type, then examining which molecular features may have survival associations common across more than one cancer type27,28. In contrast, here we seek to systematically catalog the genes significantly associated with survival across the entire TCGA cohort, with all tumors evaluated together but correcting for cancer type.
In this present study, we defined pan-cancer, multi-omic correlates of survival transcending tumor lineage across 11,019 patients in the TCGA cohort. At the RNA level, TCGA expression datasets were recently updated to include tens of thousands of additional transcripts beyond protein-coding genes. As gene expression may be impacted by either CNA or DNA methylation, we defined pan-cancer molecular survival correlates for each data level, along with the respective sets of genes with consistent associations between RNA and CNA or between RNA and methylation. We searched the genes with pan-cancer survival associations for associated pathways, as well as integrated our results with those of external cell line and tumor datasets to identify genes essential for cell growth or involving drug responses. The molecular survival associations explored here would broadly cut across multiple cancer types, though they may not necessarily apply to all cancer types. Rather than deriving molecular signatures to predict patient prognosis in the clinical setting, this study aimed to gain insight into the pathways and processes involving more aggressive diseases.
Results
TCGA pan-cancer datasets
Our study utilized 11,019 patient tumor samples with corresponding OS data, one sample for each patient, representing 32 different major types, for which TCGA generated data on one or more of the following molecular characterization platforms (Supplementary Data 1): RNA sequencing (10,271 patients), reverse-phase protein array (RPPA, 7757), CNA by SNP array (10,737), somatic small mutation by exome sequencing (10,080), and DNA methylation arrays (8818). As expected, OS rates differed substantially by cancer type (Fig. 1a), with survival plots for most cancer types being similar to those of prior independent studies12. These survival differences would entail the need for cancer type correction in any pan-cancer molecular analyses across types, given the widespread global differences by tissue or cell of origin involved4,5.
a Kaplan–Meier plot of TCGA patient overall survival (OS) by cancer type12, indicating the need for cancer type correction in pan-cancer analyses. b Using different statistical cutoffs, the numbers of RNA transcripts significantly associated with patient survival are indicated (by Cox), with or without correction for cancer type. c Based on their tumor transcriptome, TCGA patients are stratified according to a pan-cancer RNA signature of worse survival (FDR < 1%, Cox correcting for cancer type), with a Kaplan–Meier plot showing the differences in survival outcome. d A top set of 9555 RNA transcripts (out of 60,660) were significantly associated with patient survival (FDR < 5%, Cox correcting for cancer type) in pan-cancer analyses across 10,271 cancers. For each individual cancer type, the respective associations with patient survival are indicated for the 9555 RNAs. Also indicated are the numbers of RNAs significantly associated with survival for each cancer type in analyses restricted to the given cancer type. e Pan-cancer survival associations (by Cox correcting for cancer type) were determined for 433 proteins measured by Reverse-Phase Protein Array (RPPA)29. The overlaps of the RPPA-based survival correlates with the RNA survival correlates corresponding to each gene are represented, considering worse versus better survival correlates separately. Genes listed individually are well-established cancer-associated genes by COSMIC57. Enrichment p values by one-sided Fisher’s exact test. f The RNA pan-cancer survival signature from part (d) was applied to eight independent tumor gene expression datasets outside of TCGA (bladder26, breast31, liver32, lung adenocarcinoma33, ovarian34, pediatric brain tumors25, prostate35, renal36) and in each instance was able to stratify patients according to survival outcome.
Without any cancer type correction, most molecular features associated with patient OS across all cancers would likely involve tissue-specific markers rather than true biologically relevant correlates of survival spanning cancer types. To address this issue, multivariate Cox models can incorporate the cancer type in addition to the molecular feature levels, whereby any features statistically significantly associated with survival in pan-cancer analysis would not be explainable by overall cancer type differences alone. The model incorporating cancer type as a covariate would also help to calibrate the relative molecular feature differences intrinsic to cancer type, correcting for these baseline differences when evaluating survival associations by molecular feature. Across the TCGA cohort, the median patient follow-up was 1.95 years. Although TCGA did not set survival analyses as a primary program objective12, resulting in some cancer types having patient follow-up data that would not be optimally mature enough for within-cancer-type survival analyses, pan-cancer analyses could leverage the large patient numbers across types for greater power.
Pan-cancer RNA correlates
Across 10,271 patients, we associated RNA levels in the tumor with OS for each of 60,660 transcripts, 19,962 of which are protein-coding (Supplementary Data 2). Without corrections for cancer type (by Cox analysis), most transcripts were significantly associated with survival (Fig. 1b) due to the widespread molecular differences attributable to cancer type. Incorporating cancer type as a covariate in Cox analyses yielded much smaller numbers of significant survival correlates. At a False Discovery Rate (FDR) of <10%, 12,465 RNA transcripts (including 6660 protein-coding genes) were associated with OS in pan-cancer analyses, 5975 associated with worse survival (higher levels predicting worse survival), and 6490 associated with better survival (higher levels predicting better survival). At a stricter cutoff of FDR < 1%, 5694 RNA transcripts were associated with OS in pan-cancer analyses (Fig. 1b). The selection of statistical criteria is a balance between minimizing both false positives and false negatives. However, at FDR < 10%, 90% of significant transcripts are anticipated to represent true positives, not due to multiple testing. As expected, TCGA patients stratified according to the pan-cancer RNA signature of worse survival at FDR < 1% showed dramatic differences in patient OS (Fig. 1c), with each stratified group involving tumors from all cancer types represented in the cohort (Supplementary Data 1). In a separate training-testing scenario, using half of the TCGA patients as a training set could predict survival in the other half (Supplementary Fig. 1a).
The gene-level association patterns involving pan-cancer RNA signature of patient survival were broadly reflected in most individual cancer types within the TCGA cohort (Fig. 1d), even where patient follow-up times could vary. Across 7757 patients, we associated protein levels in the tumor with OS for each of a limited set of 433 protein features by RPPA29, of which 116 had significant pan-cancer survival associations (FDR < 10%, Supplementary Data 2). We found highly significant overlapping genes involving the RNA worse survival correlates and the protein worse survival correlates or involving the RNA better survival correlates and the protein better survival correlates (Fig. 1e). Genes with both RNA and protein associated with worse survival included BRD4, EGFR, PDGFRB, and TAZ, and genes with both RNA and protein associated with better survival included PTEN10 and SMAD4. When taking the top RNA survival genes (FDR < 1%) and correlating the expression patterns between all gene pairs across tumors, distinctly correlated groupings of genes could be identified (Supplementary Fig. 1b), suggesting coordinate processes involving different patient subsets. One group of worse prognosis genes involved cell cycle and DNA repair genes, while two other groups of worse prognosis genes involved extracellular matrix and cell adhesion genes (Supplementary Fig. 1c). In contrast, one group of better prognosis genes involved the immune response, another group involved fatty oxidation, and another involved neuronal differentiation (Supplementary Fig. 1c). When comparing the top 5721 prognostic RNAs, with p < 0.001, after cancer type correction to results from a separate linear model incorporating tumor purity30 in addition to cancer type (involving 8389 patients), 96% of the 5721 RNAs were significant, with p < 0.05, in the latter model (Supplementary Data 2).
The RNA pan-cancer survival signature represented information as to the risk of poor outcome in individual cancer types, as also demonstrated using gene expression datasets external to TCGA. In each of eight independent tumor gene expression datasets—representing bladder cancer26, breast cancer31, liver cancer32, lung adenocarcinoma33, serous ovarian cancer34, pediatric brain tumors25, prostate cancer35, and clear cell renal cell carcinoma36—we stratified patients according to the TCGA-based RNA signature pattern (distinguishing worse outcome-associated genes from better outcome-associated genes). For each dataset, the TCGA signature was able to stratify patients according to survival outcome (Fig. 1f). For the external breast, ovarian, prostate, and renal expression datasets for which other clinical variables were available, multivariate Cox analyses indicated that the RNA signature could provide additional prognostic information in conjunction with the clinical variables (Supplementary Fig. 1d). This might suggest that the signature, in principle, could help predict patient survival in the clinical setting as part of a multi-gene assay, similar to the established Oncotype DX assays37,38,39, though exploring this possibility here would not be a primary goal of our present study.
Immune system-related correlates
Emerging hallmarks and enabling characteristics of cancer cells include evading attack and elimination by immune cells and tumor-promoting inflammation40. We scored TCGA RNA profiles for each of a set of gene expression signatures representing immune infiltrates41 and determined the association of each signature scoring with patient OS (Supplementary Data 2). Out of 24 different immune signatures evaluated, 11 were associated with patient survival in pan-cancer analyses (FDR < 5%, nominal p < 0.01, Cox correcting for cancer type), of which 10 involved individual canonical gene marker associations consistent with the signature trends (Fig. 2a–d). Immune signatures associated with worse patient survival across all cancers included macrophages and neutrophils, while signatures associated with better patient survival included T cells, B cells, and dendritic cells. In addition to the pan-cancer analysis results, macrophages and neutrophils were significantly associated with worse outcomes for seven and eight individual cancer types, respectively, while T cells were significantly associated with better outcomes for 10 individual cancer types (Fig. 2a). In examining the macrophage and T cell signatures in particular, these could respectively stratify patients into higher and lower risk groups with statistical significance accounting for differences by cancer type (Fig. 2b), though with the absolute risk differences appearing more subtle compared to what might be observed using other genes or signatures.
a TCGA tumor mRNA expression profiles were separately scored for each of 24 gene expression-based signatures of immune cell infiltrates41, with 11 signatures associating with patient survival in pan-cancer analyses (FDR < 5%, nominal p < 0.01, Cox correcting for cancer type). For top immune signatures, heat maps represent the respective associations with patient survival for each individual cancer type. T helper 2 cells not represented here, as the genes in this signature appeared not specific to T cells41. b Across the TCGA pan-cancer cohort, associations with patient OS of the macrophage mRNA signature (top) and the T cell mRNA signature (bottom). Stratified log-rank p values correct for cancer type. c Diagram of immune checkpoint pathway5 (featuring interactions between T cells and antigen-presenting cells, including tumor cells), with significant pan-cancer mRNA associations with patient survival indicated (Red, higher with worse survival; Blue, higher with better survival; Cox correcting for cancer type). d Similar to part (b), but for immune cell types and their associated markers46 (left) and tumor-associated macrophage roles in the tumor microenvironment44 (right). Genes with CNA associated with survival (p < 0.05, Cox correcting for cancer type) in the same direction as the mRNA association are highlighted using gold rectangles. Genes with CGI associated with survival (p < 0.05, Cox correcting for cancer type) in the opposite direction from the mRNA association are highlighted using purple rectangles. e The macrophage signature and the T cell signature from part (a) were separately applied to eight independent gene expression datasets outside of TCGA (from Fig. 1f). For four datasets, the macrophage signature was associated with worse survival, while for the other four datasets, the T cell signature was associated with better survival.
In addition to RNA signatures, pan-cancer survival associations involving canonical immune cell gene markers similarly reflected potential roles for immune cell infiltrates in more aggressive diseases (Fig. 2c, d). Modulation of immune checkpoint pathways represents a mechanism by which some tumors may avoid elimination by the immune system42. In examining mRNA levels for known genes related to interactions between T cells and antigen-presenting cells (including tumor cells)5, most genes representing the T cell component had higher mRNA levels associated with better outcomes in pan-cancer analysis (Fig. 2c). These genes included PDCD1 (PD1 gene), CTLA4, CD247, CD244, BTLA, CD4, and LAG3. In contrast, genes representing the antigen-presenting cells were not consistently associated with patient survival. However, specific genes encoding cancer testis antigens43, including MAGEA4 and SAGE1, had higher mRNA levels associated with worse survival. Individual gene-level survival associations could lend support to tumor-associated macrophage (TAM) roles in the tumor microenvironment (Fig. 2d)44, including facilitating several processes, such as: epithelial-mesenchymal transition, with mesenchymal marker genes associating with worse survival; tissue remodeling, with matrix metalloproteinase genes associating with worse survival; and hypoxia, with HIF1A and its target genes associating with worse survival.
The RNA-based macrophage and T cell signatures were also associated with patient outcome in gene expression datasets external to TCGA (Fig. 2e). We applied these signatures to each of the eight independent tumor gene expression datasets evaluated above for the TCGA-based RNA survival signature (Fig. 1f). For four datasets—representing breast, bladder, prostate, and ovarian cancers—the macrophage signature was associated with worse survival, while for the other four datasets—representing lung, pediatric brain, liver, and renal cancers—the T cell signature was associated with better survival. These signature associations were consistent with the respective directions of association in the pan-cancer analyses.
Pan-cancer somatic mutation correlates
Across 10,737 patients, we associated gene-level CNA in the tumor with OS for each of 24,776 genes (Supplementary Data 3). Consistent with previous observations25,45,46, the global extent of CNA in the tumor sample was associated with worse patient OS (Fig. 3a), with groups stratified by global CNA involving tumors from all cancer types (Supplementary Data 1), involving dramatic differences in relative risk. As with the RNA analyses, incorporating cancer type as a covariate in pan-cancer gene-level CNA analyses yielded much smaller numbers of significant survival correlates (Fig. 3b). At FDR < 10%, 5436 genes had CNAs associated with OS in pan-cancer analyses, of which 1253 had a corresponding RNA patient survival association (FDR < 10%) in the same direction as the CNA association. Of the 1253 genes, 536 had worse survival associations (both higher copy numbers by CNA and higher RNA predicting worse patient outcomes), while 717 genes had better survival associations (Supplementary Data 3). The gene-level CNA association patterns involving the pan-cancer CNA signature of patient survival were broadly reflected in several individual cancer types within the TCGA cohort (Fig. 3c), though no genes had survival associations in most cancer types.
a Across the TCGA pan-cancer cohort, the global extent of DNA copy number alteration (CNA) is associated with worse OS. The CNA index is the standard deviation of the gene-level CNA values across genes in the tumor profile. Stratified log-rank p value corrects for cancer type. b The numbers of genes significantly associated with patient survival are indicated (by Cox), with or without correction for cancer type, and with cancer type correction and with RNA patient survival association (FDR < 10%) in the same direction. c A top set of 1253 genes (out of 24,776) had both RNA and CNA significantly associated with patient survival in the same direction (FDR < 10% for each, Cox correcting for cancer type) in pan-cancer analyses across 10,843 cancers. For each individual cancer type, the respective gene-level CNA associations with patient survival are indicated for the 1253 genes. d Enriched cytoband regions involving the gene sets from part (c) (worse versus better survival gene sets, with both RNA and CNA association with survival in the same direction). These top regions are enriched with FDR < 10%47 by one-sided Fisher’s exact test and with more than one-third of the genes in the cytoband region being included in the pan-cancer survival-associated genes. e The entire set of genes in the 7p15.2 region was applied as a gene signature to eight independent gene expression datasets outside of TCGA (from Fig. 1f). For four datasets, the 7p15.2 signature was associated with worse survival. f Similar to part (e) but applying a signature of all genes in the 10q25.2 region, where four out of eight datasets showed an association with better survival.
The top 1253 genes with concordant CNA and RNA associations with patient survival in pan-cancer analysis were significantly enriched for a set of 27 cytoband regions (Fig. 3d, FDR < 10%47, one-sided Fisher’s exact test, and involving more than one-third of the genes in the cytoband region). Of the 27 cytoband regions, 13 involved genes associated with worse survival, including regions within 1q31, 7p, and 8q. The 13 regions included genes such as EGFR, EIF3E, RAC148, and genes in the homeobox (HOX) cluster at 7p15.249. The 14 regions involving genes associated with better survival included regions within 4p15, 8p, 9p, and 10q, as well as genes like TCF7L250 and NDUFB651. CNA patterns involving the 27 cytoband regions spanned most cancer types surveyed (Fig. 3d). Of the 27 cytoband regions, we examined two of interest in each of the eight independent tumor gene expression datasets evaluated above for the TCGA-based RNA survival signature (Fig. 1f). We examined the entire set of 7p15.2 genes as a group, which included the HOX cluster genes, and the entire set of 10q25.2 genes, including TCF7L2. We used all genes in the respective cytoband regions, regardless of their associations with survival in the TCGA pan-cancer cohort, to derive signatures of average gene expression, taking the average of the normalized expression values for the signature score. For four of the eight datasets—representing ovarian, pediatric brain, prostate, and lung cancers—the 7p15.2 signature was associated with worse survival (Fig. 3e). For four datasets—representing bladder, breast, prostate, and renal cancers—the 10q25.2 signature was associated with better survival (Fig. 3f).
We also examined associations between small somatic mutation events (e.g., missense mutations or indels) and patient survival. We focused here on a core set of 196 genes involving key oncogenic or tumor-suppressive pathways as assembled previously5. At FDR < 10% (Cox correcting for cancer type), only one gene, IDH1, was significantly associated with patient survival in pan-cancer analysis, with small mutations predicting better survival (p < 1E-7). Using a relaxed cutoff of nominal p < 0.01 (FDR < 21%), eight genes were associated with survival, with STK11 mutations associated with worse patient survival10 and the other genes associated with better survival (Supplementary Data 3).
Pan-cancer DNA methylation correlates
Across 8818 patients, we associated DNA methylation levels in the tumor with OS for each of 114,617 methylation array probes involving CpG Islands (CGIs, involved in gene silencing52) and 17,723 array probes located near one of 23,146 gene enhancers25. At a cutoff of FDR < 10% and p < 0.001 (Cox correcting for cancer type and patient sex), 6415 CGIs and 2397 enhancers were associated with OS in pan-cancer analyses (Fig. 4a and Supplementary Data 4). The association patterns involving these top methylation probes were broadly reflected in most individual cancer types within the TCGA cohort (Fig. 4a), particularly for the probes associated with better survival (i.e., higher methylation levels associated with better patient outcomes). CGI probes associated with patient survival, particularly those associated with worse survival, were strongly enriched for gene body CGIs and anti-enriched for promoter-associated CGIs (Fig. 4b, p < 1E-45 for each, chi-squared test).
a A top set of 6415 CpG Island (CGI) array probes and a top set of 2397 methylation probes near enhancers were significantly associated with patient survival (both FDR < 10% and nominal p < 0.001, Cox correcting for cancer type) in respective pan-cancer analyses across 8905 cancers. For these probes, heat maps represent the respective associations with patient survival for each individual cancer type. b Top: Fraction of promoter-associated CGIs for the CGI probes, respectively, associated with worse or better patient survival in pan-cancer analyses (from part a). Bottom: Breakdown by probe position relative to the gene for CGI probes associated with worse or better survival, respectively. TSS, transcription start site; UTR, untranslated region. c The CGI survival signature from part (a) was applied to four independent DNA methylation array datasets outside of TCGA (pediatric brain tumors25, ependymoma53, glioblastoma54, and breast55) and was able to stratify patients according to survival outcome. d Overlap between CGI probes with pan-cancer methylation survival association (from part a) and the corresponding genes with pan-cancer mRNA survival association (FDR < 10%). COSMIC57 genes with both CGI methylation associated with better survival and mRNA associated with worse survival are listed by name, with the CGI probe numbers involved. For b and d, enrichment p values by chi-squared test. e For top enhancers with methylation probes associated with pan-cancer survival (from part a), the genes within 500 kb of the enhancer were examined for mRNA survival association (FDR < 10%). Chi-squared tests compare the number of genes with mRNA survival association with the chance expected. f For genes EZH2 (top) and KRAS (bottom), respective associations of mRNA and methylation of nearby enhancer (“ENH”) with patient survival across the TCGA cohort. P values by stratified log-rank test correcting for cancer type.
Like what was demonstrated above regarding the RNA pan-cancer survival signature (Fig. 1f), the methylation CGI signature represented information as to the risk of poor outcome in individual cancer types across methylation datasets external to TCGA (Fig. 4c). In each of four independent tumor DNA methylation datasets—representing pediatric brain tumors25, ependymoma53, glioblastoma54, and breast55—we stratified patients according to the TCGA-based CGI signature pattern (distinguishing worse outcome-associated CGIs from better outcome-associated CGIs). For each dataset, the TCGA signature could stratify patients according to survival outcome (Fig. 4c). Furthermore, the genes with CGIs associated with better survival in pan-cancer analysis significantly overlapped with the genes with RNA expression associated with worse survival (Fig. 4d), suggestive of loss of silencing involving increased gene expression56. Of the 3640 CGI probes having better survival association (FDR < 10%, p < 0.001), 1087 involved genes with mRNAs having worse survival association (FDR < 10%), representing 510 unique genes (enrichment p < 1E-30, chi-squared test). These genes included many with well-established cancer associations (e.g., by COSMIC57), including CDK6, DNMT3A, HMGA1, HMGA2, MAP2K1, MET, MYC, PIM1, and SMAD3.
We also looked beyond CGIs to examine DNA methylation survival associations involving gene enhancers, as DNA methylation at distal enhancer regions has been implicated in gene regulation58,59,60,61. For each of 23,146 enhancers62, we mapped the nearest DNA methylation probe and determined its methylation association with survival. Of the 2397 enhancers with methylation probes having a pan-cancer survival association (Fig. 4a), 2971 were associated with better survival. Significant fractions of the pan-cancer enhancer methylation survival associations also involved corresponding RNA survival associations for genes within 500 kb of the enhancer (Fig. 4e and Supplementary Data 4). Of the RNA transcripts (out of 60,660 profiled) with a nearby enhancer (within 500 kb) having methylation associated with better survival, 3419 and 3733 had worse and better survival associations (FDR < 10%), respectively, representing significant gene set overlaps (p < 1E-9 and p < 1E-11, respectively, chi-squared tests). EZH2 and KRAS are two example genes with associated enhancers showing better pan-cancer survival association at the methylation level, in conjunction with worse survival associations involving the RNA levels of the respective genes (Fig. 4f).
Associated pathways
The information contained within the pan-cancer gene signatures of patient OS can represent altered pathways or functional gene categories. In terms of represented pathways, we examined the wikiPathway63 gene sets, for which genes were assigned to pathways by manual curation. We carried out pathway enrichment analysis for each of the various gene sets arising from our TCGA pan-cancer analyses involving mRNAs being associated with patient survival. Specifically, we considered the respective sets of genes having RNA associations with worse or better survival (FDR < 10%), the subsets of these genes involving either concordant CNA associations (from Fig. 3c) or concordant CGI methylation associations (from Fig. 4d). Concordant CNA associations entailed the same direction of association as the corresponding RNA, and concordant CGI methylation associations entailed the opposite direction of association (consistent with gene silencing relationships). In all, we examined six TCGA-related gene sets for enrichment of wikiPathways (Fig. 5a). Out of 906 wikiPathways, 128 were significantly associated (p < 0.001, one-sided Fisher’s exact test) with at least one of the worse survival-associated TCGA gene sets, while 49 were significantly associated (p < 0.001) with at least one of the better survival-associated gene sets (Supplementary Data 5).
a Significance of enrichment (by one-sided Fisher’s exact test) for selected wikiPathway63 gene sets with the respective sets of genes with mRNAs being associated with patient survival in pan-cancer analyses (RNA worse/better, FDR < 10%; RNA/CNA, from Fig. 3c; RNA/CGI meth., or methylation, from Fig. 4d). The pathways represented were significant (FDR < 10%47) for at least one patient survival-associated gene set. b Pathway diagram representing core metabolic pathways5,14,63,64,65, with significant pan-cancer mRNA associations with patient survival indicated (Red, higher with worse survival; Blue, higher with better survival; Cox correcting for cancer type). Genes with CNA associated with survival (p < 0.05, Cox correcting for cancer type) in the same direction as the mRNA association are highlighted using gold rectangles. Genes with CGI associated with survival (p < 0.05, Cox correcting for cancer type) in the opposite direction from the mRNA association are highlighted using purple rectangles. c Similar to part (b), but for the PI3K/Akt pathway10. d Similar to part (b), but for the Wnt signaling pathway63. e Similar to part (b), but for the TGF-beta receptor signaling pathway63.
Selected enriched wikiPathways of interest involved pathways spanning metabolism, PI3K/Akt signaling, Wnt signaling, and TGF-beta receptor signaling (Fig. 5a). Core metabolic pathways5,14,63,64,65, reflected in the genes having pan-cancer expression associations with survival, included pathways related to glycolysis being associated with worse survival (with higher expression of the genes in these pathways being predictive of worse patient outcomes) and pathways related to oxidative phosphorylation and the electron transport chain being associated with better survival (Fig. 5a, b). The differential survival associations between glycolysis and oxidative phosphorylation would be suggestive of a Warburg effect underlying more aggressive cancers, whereby tumors from higher-risk patients would tend to express higher levels of glycolysis pathway genes and lower levels of oxidate phosphorylation and electron transport chain14,21,66,67. In addition, the glycolysis pathway genes included several for which both higher RNA and lower CGI methylation were associated with worse outcomes, including HK1, HK2, PFKP, PKM, and LDHA (Fig. 5b). In contrast, electron transport chain genes included several for which both lower RNA and decreased copy numbers were associated with worse outcomes (Fig. 5b), including complex I genes (NDUFA7, NDUFA11, NDUFB1, NDUFB6, NDUFB8, NDUFC1, NDUFS4, NDUFS7) and complex III genes (UQCR11 and UQCRC2).
Additional pathways of interest enriched for genes with survival associations included the PI3K/Akt pathway10 (Fig. 5c), with pathway promoter genes—including AKT2, RHEB, and MTOR—having RNA expression associated with worse outcomes, and with pathway suppressor genes—including PIK3R1, PTEN, STK11, and DEPDC5—having RNA expression associated with better outcomes. Concordant CNA survival associations for PI3K/Akt pathway genes involved AKT2, PIK3R1, PTEN, and DEPDC5. Wnt signaling pathway63 genes with RNA associated with worse outcomes included WNT5B, WNT6, WNT7A, WNT11, FZD2, FZD6, FZD7, DVL1, and DVL3, with DVL1 also involving an association of lower CGI methylation with worse outcomes (Fig. 5d). TGF-beta receptor signaling pathway63 genes with RNA associated with worse survival included TGFB1, INHBA, TGFBR1, SMAD3, with TGFB1 and INHBA involving concordant CNA associations and with SMAD3 involving concordant CGI methylation associations (Fig. 5e).
Essential genes
Significant fractions of mRNAs associated with worse patient survival in pan-cancer analysis were essential for cell growth in vitro (Fig. 6a). The Cancer Dependency Map (DepMap) project globally assessed gene essentiality in CRISPR knockout screens across 1070 cell lines68,69. We examined each of the various gene sets arising from our TCGA pan-cancer analyses involving mRNAs being associated with patient survival (Figs. 2b, 3c, 4d and 5a) for representation of genes found essential (DepMap gene effect score < −0.7569) for at least 10% of cell lines. Out of 3935 unique RNAs associated with worse patient survival in pan-cancer analyses and represented in other databases, 540 (~14%) were essential, i.e., knocking down the gene inhibited cell growth in substantial numbers of cell lines, representing a highly significant overlap (p < 1E-60, one-sided Fisher’s exact test). RNAs associated with worse patient survival were also highly enriched for cancer-associated genes by COSMIC57, somatic driver genes2, genes with associated predisposing germline mutations2, genes core to the DNA Damage Response9, genes involved in microsatellite instability (MSI)2, and genes with potential for drug targeting by DrugBank70 (Fig. 6a). The DepMap patterns involving 540 essential genes having worse survival association spanned cell lines from all tissue types, with a high fraction of these genes being essential for >50% of cell lines (Fig. 6b).
a Significance of enrichment (by one-sided Fisher’s exact test) for externally annotated gene sets of interest with the respective sets of genes with mRNAs being associated with patient survival in pan-cancer analyses (RNA worse/better, FDR < 10%; RNA/CNA, from Fig. 3c; RNA/CGI meth., or methylation, from Fig. 4d). The annotated gene sets include genes essential in >10% of cell lines by Cancer Dependency Map (DepMap)69, cancer-associated genes by COSMIC57, somatic driver genes2, genes with associated predisposing germline mutations2, genes core to the DNA Damage Response (CoreDDR)9, genes involved in microsatellite instability (MSI)2, and genes with potential for drug targeting by DrugBank70. b For 540 genes with both RNA worse survival association and being essential in >10% of cell lines by DepMap, DepMap gene effect scores are represented, with a high fraction of these genes being essential for >50% of cell lines. c Selected significantly enriched GO terms72 for the respective sets of RNA pan-cancer worse survival genes (FDR < 10%) that were either essential in more than 50% of cell lines by DepMap or essential in >10% and <50% of cell lines. For each GO term, enrichment p values and numbers of genes in the essential gene set are indicated. Enrichment p values by one-sided Fisher’s exact test. d In total, 136 genes had a worse RNA survival association (FDR < 10%), were essential in >10% of cell lines by DepMap, and had a DrugBank entry. Represented here for these genes are the patient survival associations in TCGA pan-cancer and eight independent tumor gene expression datasets outside of TCGA (from Fig. 1f). e For selected genes from part (d)—PGK1, PKMYT1, PLK1, and TTK—associations of mRNA with patient survival across the TCGA cohort. P values by stratified log-rank test correcting for cancer type.
We examined the 540 essential genes for functional gene categories by Gene Ontology (GO)71. For GO term enrichment analysis72, we examined separately the 397 genes essential for most DepMap cell lines and the other 143 genes (Fig. 6c and Supplementary Data 6). Genes with worse survival association and essential in over half of the cell lines were highly enriched (one-sided Fisher’s exact test) for GO terms such as “cell division,” “DNA repair,” “chromosome organization,” “proteasome complex,” and “RNA metabolic process.” Genes essential in less than half but more than 10% of cell lines were significantly enriched for GO terms such as “cell junction,” “focal adhesion,” and “nucleotide biosynthetic process.” (Fig. 6c). In total, 136 genes had a worse RNA survival association, were essential by DepMap, and had a DrugBank entry, most of these genes having associations of higher expression with worse survival across several independent tumor gene expression datasets outside of TCGA (Fig. 6d), including genes encoding kinases, a protein class with general potential as druggable targets73,74. Kinase genes from the 136 included PGK1, PKMYT1, PLK1, and TTK (Fig. 6e and Supplementary Data 6), with PGK1 involving glycolysis pathways (Fig. 5a).
Drug response associations
Integrating molecular data on cancer cell lines with their responses to anticancer drugs can identify therapeutic options for cancer subsets75,76,77. Here, we were interested in drug responses in cell lines manifesting the pan-cancer RNA signature of survival (FDR < 10%). We therefore scored transcriptional profiles of 959 cancer cell lines represented in the Genomics of Drug Sensitivity in Cancer (GDSC)75 dataset using the TCGA RNA signature (Fig. 7a, distinguishing worse outcome-associated genes from better outcome-associated genes). For each of 518 drug compounds with half maximal inhibitory concentration (IC50) measurements in these cell lines, we correlated IC50 values with the RNA signature scoring across cell lines (Supplementary Data 6). We found widespread associations of RNA signature of worse survival with responses to specific drugs, well-exceeding the chance expected (Fig. 7b), with 290 drugs showing resistance in cell lines manifesting the worse survival signature (p < 0.001, Pearson’s correlation) and 37 drugs showing sensitivity in cell lines manifesting the signature.
a Transcriptional profiles of 959 cancer cell lines represented in the Genomics of Drug Sensitivity in Cancer (GDSC)75 dataset were scored by the pan-cancer RNA signature of patient survival (FDR < 10%). Expression patterns in cell lines for this RNA signature are shown (involving the genes represented in GDSC), with the cell lines ordered by tertiles of the signature score. b From the GDSC cell lines sorted according to the RNA signature (same ordering as for part a), drug compound treatments with increases or decreases in half maximal inhibitory concentration (IC50) associated with the RNA signature scoring (p < 0.001, Pearson’s correlation) are represented along with their natural log-transformed IC50 values centered across cell lines. Listed by name are selected drug compounds negatively correlated between RNA signature and IC50 (where cell lines with higher signature score tend to be more sensitive to the drug), involving inhibitors of MEK1/2, PKM (glycolysis), and HSP90. c For genes MAP2K1, PKM, and HSP90AA1, associations of mRNA with patient survival across the TCGA cohort. P values by stratified log-rank test correcting for cancer type. d Across cell lines, association of RNA survival signature with increased sensitivity to selected inhibitors of MEK1/2, PKM (glycolysis), and HSP90. Correlations by Pearson’s. e Across cell lines, positive association of MAP2K1 gene essentiality score by DepMap69 (lower score denoting more essential for the cell line) with sensitivity to MEK inhibitor (left), positive association between PKM mRNA expression and sensitivity to PKM inhibitor (middle), and positive association between HSP90AA1 mRNA expression and sensitivity to HSP90 inhibitor. Correlations by Pearson’s. Of the 37 drug compounds negatively correlated between RNA signature and IC50 from part (b), all but one remained significant (one-sided p < 0.05) when factoring in cell line cancer type (by linear model).
The 37 drug compounds negatively correlated between RNA signature and IC50 included inhibitors of MEK1/2, PKM (core to the glycolysis pathway), and HSP90. Genes encoding for the targets of these three compounds—specifically, MAP2K1, PKM, and HSP90AA1—all had higher expression respectively associated with worse patient survival in pan-cancer analyses (Fig. 7c). As the global associations suggested, cell lines with higher signature scores, indicative of more aggressive cancers, tended to be more sensitive to the three drugs (Fig. 7d, p < 0.0001 for each, Pearson’s). Additionally, cell lines sensitive to MEK1/2 inhibition also tended to hold MAP2K1 as an essential gene for cell growth (Fig. 7e, p = 2E-5, Pearson’s), while cell lines sensitive to PKM or HSP90 inhibition tended to show high expression of PKM or HSP90AA1, respectively (Fig. 7e, p ≤ 0.02). The associations involving a PKM inhibitor, in particular, provide another link between more aggressive cancers and higher expression of glycolysis pathway genes (Fig. 5b).
The 37 drug compounds negatively correlated between RNA signature and IC50 (Fig. 7b) included chemotherapeutics and drugs targeting cell cycle, DNA replication, genome integrity, or mitosis (Fig. 8a). We could extend these cell line-based findings to gene expression profiles of breast tumor biopsies from patients treated with neoadjuvant chemotherapy, with responses recorded at the end of treatment78,79,80,81,82,83,84, involving 1240 patients and spanning eight different profile datasets85 (Fig. 8b). When scoring the 1240 breast tumors for the pan-cancer RNA signature of survival (FDR < 10%), patients with pathologic chemotherapy response scored higher for the signature over patients who were non-responders (Fig. 8c), consistent with the survival signature being predictive of sensitivity to chemotherapeutic drugs in cell lines. Basal-like breast cancers scored highly on average for the survival signature, where the basal-like subtype tends to be more responsive to chemotherapy86. As noted above, genes associated with worse survival in pan-cancer analyses were highly enriched for DNA Damage Response and MSI genes (Fig. 6a), along with genes related to cell cycle and DNA mismatch repair pathways (Fig. 5a). Across the 1240 breast tumor expression profiles, DNA Damage Response and MSI genes as a group were higher in responders versus non-responders (Fig. 8d), consistent with the scenario of cancers harboring extensive DNA alterations with accompanying DNA damage being the most sensitive to chemotherapy, while at the same time representing the more aggressive cancers. RNA pan-cancer worse survival genes with higher expression robustly associated with chemotherapy responses across patient breast tumors included DDR/MSI genes FANCI, FEN1, MSH2, MSH6, PCNA, POLE, REV1, RFC3, RFC4, RFC5, RPA3, and TOPBP1 (Fig. 8d).
a With GDSC cell lines sorted according to the RNA signature (same ordering as for Fig. 7b), drug compound treatments with decreases in half maximal inhibitory concentration (IC50) associated with the RNA signature scoring (p < 0.001, Pearson’s correlation), for drugs targeting cell cycle, DNA replication, genome integrity, or mitosis. b We assembled a compendium of eight separate gene expression profiling datasets of pre-treatment breast tumor biopsies from patients treated with neoadjuvant chemotherapy, with patient response recorded at the end of treatment78,79,80,81,82,83,84, representing 1240 tumor expression profiles85. c We scored each of the 1240 breast tumors by the pan-cancer RNA signature of patient survival (FDR < 10%) and observed higher signature scores for tumors from patients with pathologic chemotherapy response versus tumors from patients who were non-responders. P value by t-test. d Across the 1240 breast tumor expression profiles (sorted by chemotherapy responders versus non-responders), expression heat maps for all 81 CoreDDR or MSI genes represented in the dataset (top panel, from Fig. 6a) and for the subset of DDR/MSI genes (bottom panel) that were both in the pan-cancer RNA survival signature and positively correlated with path CR (p < 0.05 by linear modeling correcting for Pam50 subtype99).
Discussion
While numerous studies have defined molecular correlates of patient survival for individual cancer types, the pan-cancer molecular correlates of our present study provide a framework for examining pathways or processes underlying more aggressive cancers that would span different types as defined by histology or tissue of origin. Patient survival associations involving individual genes are ubiquitously presented in published studies as evidence for their having functional roles. For example, higher expression of oncogenes is often associated with worse patient outcomes, while higher expression of tumor suppressor genes is often associated with better outcomes. Our study’s results involve large fractions of protein-coding genes—as much as a third of all genes, depending on the statistical cutoffs—having pan-cancer survival associations. With so many genes involved, one might wonder about the biological significance of these survival associations. At the same time, the direction of the association, worse versus better outcomes, has relevance in its interpretation. Our present study also leveraged data from other platforms, e.g., identifying gene-level concordant associations involving RNA and either CNA or DNA methylation. Concordant CNA and RNA survival associations would suggest selective somatic mutations relevant to aggressive disease. Furthermore, we leveraged external information for the top genes to show how they implicate pathways or would be essential in cell lines or associate with drug responses. RNA levels do not always correspond to protein levels22,64 and would not necessarily capture signal transduction events10. Still, the global RNA associations can reflect some of the transcriptional programs that more aggressive cancers may initiate, even if other factors may influence the extent to which these programs are carried out.
Our present study aimed to gain insight regarding more aggressive diseases rather than to develop a pan-cancer gene prognosticator for use in the clinical setting. Commercial gene panel prognostic assays have been developed and approved for clinical use in individual cancer types, such as the Oncotype DX assays for breast37, colon38, and prostate39 cancers. However, a similar type of pan-cancer assay would not seem practical. Clinical variables such as grade or stage are already highly prognostic64. Where grading and staging criteria differ widely by cancer type, building multivariate models incorporating clinical variables in addition to molecular features, to determine which molecular features might add information to the clinical variables, would not be informative. Also, we see most of our pan-cancer molecular survival correlates being significant for some, but not all, individual cancer types surveyed. A lack of association for a given cancer type may involve power issues, e.g., due to immature patient follow-up data12. Our pan-cancer survival associations would be broad and involve cancers from multiple types, but would not necessarily be pertinent to all cancers. For example, associations involving immune signatures or CNA regions appeared to involve only a subset of cancer types. Ultimately, our pan-cancer survival associations should provide clues as to genes and pathways that would be therapeutically predictive versus those that are merely prognostic. In terms of prognosis, many of the significant pan-cancer molecular features do not stratify patients with large differences in terms of risk, and so the prognostic value of the marker may be low, but the impact of targeting a gene or pathway in patients would remain to be examined.
Pathways of interest involving patient survival include those related to metabolism, with differential pan-cancer mRNA survival associations being suggestive of Warburg-like effects underlying more aggressive cancers14,21,66,67. Previous studies of individual cancer types have similarly observed more aggressive cancers involving higher expression of glycolysis pathway genes, with some but not all studies similarly associating lower expression of genes involving the electron transport or oxidative phosphorylation14,64,87,88. Reprogramming of energy metabolism has been considered an “emerging hallmark” in the hallmarks of cancer, entailing acquired functional capabilities that allow cancer cells to survive, proliferate, and disseminate40,89. Otto Warburg had proposed that increased aerobic glycolysis in cancer cells was an essential compensatory mechanism of energy production resulting from impaired or damaged mitochondrial respiration66,67. Tumor mitochondria are structurally and functionally abnormal and incapable of generating normal levels of energy66,90. Interestingly, genes in the PI3K/Akt pathway, which regulate growth and metabolism14,91, are also associated with survival in our pan-cancer analyses. The pathway-level associations observed here primarily involve gene expression associations, rather than somatic mutation, though concordant CNA survival associations would involve PI3K/Akt and complex I and III genes. Drugs specifically targeting cancer metabolism would include TT232, which targets a tumor-specific isoform of pyruvate kinase92. Other therapeutic approaches that could holistically impact entire sets of genes and pathway branches involved in cancer metabolism might also be explored.
Our present study involved several limitations. Historically, TCGA tumor biopsy samples tended to be larger in mass, for multiple data platforms to be run on the same sample, and this cohort may therefore not represent cancers of smaller tumor sizes. For several cancer types in TCGA, the length of patient follow-up was not long on average, and it was not possible for the TCGA Network to continue following up on patients due to how the project was structured. Our study results alone would not provide a path by which clinicians might integrate our signatures into patient care or treatment selection. TCGA clinical annotations do not include any extensive patient treatment data. Truly valuable molecular signatures in oncology serve both prognostic (outcome-indicating) and predictive (treatment response-indicating) functions. While our current study demonstrates widespread prognostic information at the molecular level, without treatment-stratified clinical data, causal relationships, or validation in patient-derived models, the therapeutic implications remain speculative. On a practical level, molecular data exists on a continuous spectrum, and so any clinical assay that might be developed in the future using our results would need to adopt appropriate cut points that would subdivide patients into subtypes predicted to respond similarly to targeted interventions, regardless of tissue origin. Our correlative in vitro findings alone do not establish predictive biomarker status, and more molecular profiling data from patients with corresponding treatment response information are needed.
Our compiled pan-cancer, multi-omic gene correlates of patient survival represent a resource for others to look up genes or pathways of interest arising in our study results. For genes with outside experimental data supporting a role in cancer in vitro or in vivo, survival associations in human tumors provide further support for a role in the patient setting. Our results also represent a tool for discovering additional genes or pathways to be further explored in the experimental setting. Any CNA or DNA methylation survival associations corresponding to the mRNA associations can provide further context regarding how these genes are differentially expressed. Future work can similarly explore aggressive cancers at the protein signaling or metabolite levels22,64,93.
Methods
Patient cohorts
Results are based on data generated by the TCGA Research Network. Molecular data were aggregated from public repositories. Tumors spanned 32 different TCGA projects, each project representing a specific cancer type, listed as follows: LAML, Acute Myeloid Leukemia; ACC, Adrenocortical carcinoma; BLCA, Bladder Urothelial Carcinoma; LGG, Lower Grade Glioma; BRCA, Breast invasive carcinoma; CESC, Cervical squamous cell carcinoma and endocervical adenocarcinoma; CHOL, Cholangiocarcinoma; CRC, Colorectal adenocarcinoma (combining COAD and READ projects); ESCA, Esophageal carcinoma; GBM, Glioblastoma multiforme; HNSC, Head and Neck squamous cell carcinoma; KICH, Kidney Chromophobe; KIRC, Kidney renal clear cell carcinoma; KIRP, Kidney renal papillary cell carcinoma; LIHC, Liver hepatocellular carcinoma; LUAD, Lung adenocarcinoma; LUSC, Lung squamous cell carcinoma; DLBC, Lymphoid Neoplasm Diffuse Large B-cell Lymphoma; MESO, Mesothelioma; OV, Ovarian serous cystadenocarcinoma; PAAD, Pancreatic adenocarcinoma; PCPG, Pheochromocytoma and Paraganglioma; PRAD, Prostate adenocarcinoma; SARC, Sarcoma; SKCM, Skin Cutaneous Melanoma; STAD, Stomach adenocarcinoma; TGCT, Testicular Germ Cell Tumors; THYM, Thymoma; THCA, Thyroid carcinoma; UCS, Uterine Carcinosarcoma; UCEC, Uterine Corpus Endometrial Carcinoma; UVM, Uveal Melanoma. Cancer molecular profiling data were generated through informed consent as part of previously published studies and analyzed in accordance with each original study’s data use guidelines and restrictions.
Molecular profiling datasets
RNA-seq data were obtained from The Genome Data Commons (GDC, https://portal.gdc.cancer.gov) in January 2024, involving 10,271 cases with corresponding patient survival data. The GDC alignment workflow was performed using a two-pass method with the STAR algorithm94, where STAR aligned each read group separately and then merged the resulting alignments into one. The GDC’s two-pass method includes a splice junction detection step, which is used to generate the final alignment. This workflow outputs a genomic BAM file, which contains both aligned and unaligned reads. Quality assessment was performed pre-alignment with FASTQC and post-alignment with Picard Tools. The RNA-seq gene reference annotation was GENCODE v36, encompassing 60,660 RNA transcripts in total, 19,962 of which are protein-coding. We utilized the upper quartile normalized Fragments per Kilobase of transcript per Million mapped reads (FPKM) RNA values. The GDC workflow generated these data through its pipeline by first aligning reads to the GRCh38 reference genome and then quantifying the mapped reads. To facilitate harmonization across samples, all RNA-Seq reads were treated as unstranded during analyses. In our study, RNA values were log2-transformed for all analyses. Proteomic data for 433 protein features generated by RPPA across 7752 cases with corresponding patient survival data (“Level 4” data, version 4.229) were obtained from The Cancer Proteome Atlas (http://tcpaportal.org/tcpa/). DNA methylation profiles for 8818 cases with corresponding patient survival data were generated by TCGA using the Illumina Infinium HumanMethylation450 (HM450) BeadChips (Illumina, San Diego, CA), which we obtained from The Broad Institute Firehose pipeline (http://gdac.broadinstitute.org/)5. For each of 30,292 enhancers, we mapped the nearest DNA methylation probe (within 20 kb), utilizing the enhancer annotations provided by Kumar et al.62, and using the UCSC Genome Browser LiftOver tool to convert enhancer coordinates from hg19 to hg38. For 23,146 of the 30,292 enhancers, an array probe within 20 kb could be identified. Gene-level CNA data, generated from Affymetrix SNP 6.0 arrays (involving 10,737 cases with corresponding patient survival data), were obtained from the Broad Institute Firehose pipeline5, with our present study using the log2 tumor/normal ratios (from Firehose’s “copy by gene” results table). Somatic mutation calls were obtained from the publicly available “MC3” TCGA MAF file (covering 10,080 cases with patient survival data)2, described in more detail and provided at https://www.synapse.org/#!Synapse:syn7214402. Variants called by two or more algorithms were used in this study.
In addition to the TCGA pan-cancer dataset, genes in the RNA pan-cancer survival signature, based on TCGA data, were applied to eight independent tumor gene expression datasets outside of TCGA: bladder26, breast31, liver32, lung adenocarcinoma33, ovarian34, pediatric brain tumors25, prostate35, renal36. The external bladder expression dataset was a compendium of five different datasets (GSE13507, GSE31684, GSE32894, GSE48075, GSE48276), where log2 expression values within each dataset were centered to standard deviations from the sample median. Similarly, the lung adenocarcinoma expression dataset was a compendium of datasets from 11 different studies33, assembled in a similar manner. For the pediatric brain tumor dataset, we had previously25 obtained processed RNA-seq data from the Children’s Brain Tumor Network (CBTN) from the CBTN Cavatica site from the “CBTN” and “CBTN-X01” folders, from which we generated a batch-corrected dataset to correct for technical differences between the two cohorts; where some patients had multiple tumors, one tumor was randomly selected to represent the patient. The breast cancer expression dataset31 was downloaded from CBioPortal in 2019. The prostate expression dataset35 involved a focused set of 891 genes (both Cancer and Custom gene panels) but was unique in having long patient follow-up data (median > 10 years) for prostate cancer-specific death.
Additionally, we applied CGI probes from the CGI pan-cancer survival signature, based on TCGA data, to four independent DNA methylation array datasets outside of TCGA: pediatric brain tumors25, ependymoma53, glioblastoma54, and breast55. The Chen et al. pediatric brain tumor methylation was previously processed by our group using image files provided by CBTN25. For the other methylation datasets, processed methylation beta values were obtained from the Gene Expression Omnibus. For the breast DNA methylation dataset55, the CGI signature was mapped using the gene identifier; for the other methylation datasets, the CGI signature was mapped using the Illumina probe identifier.
Survival analyses
We obtained TCGA patient survival data12 from the GDC on October 7, 2024. As different cancer types based on histology or tissue of origin may involve differences in patient survival over time, we utilized statistical models to correct for cancer type, whereby any associations of molecular features with survival would not be explainable by differences involving tumor type representation alone10,22,23,24. For RNA, RPPA, and gene signature data, survival analyses were based on log2-transformed expression values, and analyses of DNA methylation were based on logit-transformed array beta values (i.e., “M-values”95). For each molecular feature involving the given profiling dataset (RNA, RPPA, immune signature, CNA, DNA methylation), we used a multivariate Cox (accounting for cancer type, using as.factor in R) to associate patient OS with molecular feature levels. In addition, for DNA methylation data, CGI probes or enhancer-associated probes located on the X or Y chromosome needed to remain significant after correction for patient sex in addition to sample cancer type. By design, the Cox analyses treat the molecular variable in the model as continuous, i.e., without any dichotomization of the variable. In the multivariate survival analysis, the contrast was between long-term survivors and short-term survivors. We did not consider beta and p values assigned to individual cancer types. For each cancer type, hazard ratios and p values would compare that cancer type to a reference cancer type, where the first cancer type alphabetically by name would be the reference (TCGA-ACC). Changing the cancer type names such that another cancer type is used as the reference would not change the molecular-level coefficients and p values used in our study.
Gene signature analyses
To computationally infer the infiltration level of specific immune cell types using RNA-seq data, we used a set of genes specifically overexpressed in one of 24 immune cell types from Bindea et al.41. Elsewhere, the Bindea signature scoring has been found to yield results consistent with those derived from immunohistochemistry (IHC) methods examining lymphocyte-specific expression patterns within cancer versus non-cancer cellular compartments33,96. For scoring TCGA cancer samples for each of these immune cell signatures, we used the average of the gene expression values (transformed within each cancer type to standard deviations from the median). For selected cytoband regions of interest arising from the CNA analyses, the entire set of genes in the given region was applied as a gene signature to eight independent gene expression datasets outside of TCGA, taking the averages of values as normalized to standard deviations from the median across sample profiles.
The signature of genes in the RNA pan-cancer survival signature, based on TCGA data, was applied to eight external tumor gene expression datasets. In a similar manner, the signature of CGIs in the TCGA pan-cancer methylation signature was applied to four external tumor methylation datasets. In both instances, we scored patient profiles in the external molecular profiling dataset using our previously described “t-score” metric14,15, with the log2-transformed or logit-transformed values (for RNA and DNA methylation, respectively) centered across patients to standard deviations from the median. The t-score represents the two-sided t-statistic when comparing the average of the signature’s worse survival features within each external differential methylation profile with the average of the signature’s better survival features. For example, the t-score for a given sample profile is high when the signature’s worse survival and better survival features are, respectively, high and low on average in the external sample profile. Tumor profiles in the external dataset were then separated based on the signature scoring, with survival differences between groupings based on the Log-rank test.
Pathway analyses and essential genes
We evaluated the enrichment of wikiPathways63, GO annotation terms71, and other annotated gene sets of interest within sets of top TCGA pan-cancer significant genes using SigTerms software72 and one-sided Fisher’s exact tests, with FDRs estimated using the method of Storey and Tibshirani47. DrugBank70 genes were accessed in January 2025. The wikiPathway gene sets were accessed in December 2024. GO terms were accessed from the Entrez database in February 2023. We examined gene effect scores (with low scores denoting essential genes) based on DepMap CRISPR assays, using the dataset as analyzed using the Chronos algorithm from Dempster et al.69. We used a cutoff score of <−0.75 to denote gene essentiality in a given cell line.
Drug response associations
Using the GDSC75 resource, we classified 959 cancer cell lines according to the pan-cancer RNA signature of patient survival, using the t-score metric described above. GDSC expression data and drug compound half maximal inhibitory concentration (IC50) data were downloaded in February 2020 (GDSC1-dataset and GDSC2-dataset). We merged the two GDSC IC50 datasets into one. If a drug treatment and cell line were represented in both datasets, we averaged the two values; otherwise, we used whichever IC50 dataset had available data. GDSC IC50 data represented 518 drug compounds. Where a drug compound was represented more than once in the IC50 dataset, one treatment profile was randomly selected to represent the drug. Across the GDSC expression array dataset, log base 2-transformed genes were normalized to standard deviations from the median. We further evaluated the cell lines for correlations between IC50 drug responses and RNA signature scoring, using Pearson’s correlation and natural log-transformed IC50 values.
Statistical analyses
All p values were two-sided unless otherwise specified. Heat map visualizations were performed using JavaTreeview (version 1.1.6r4)97 and matrix2png (version 1.2.1)98. Figures represent biological and not technical replicates. Where each feature in a given dataset was tested for a given statistic, corresponding FDRs were computed using the Storey and Tibshirani method47: [nominal p value of the feature] × [total number of features tested]/[number of features in the dataset with p value less than or equal to the given p value]. For a top feature set with FDR < 10%, the FDR would suggest that on the order of 10% of these features might have nominal significance due to multiple testing (and, conversely, some 90% of the top features would represent bona fide significant associations).
Data availability
All data used in this study are publicly available. TCGA molecular datasets are available from the Genome Data Commons (GDC, https://portal.gdc.cancer.gov) and the Broad Institute Firehose pipeline (http://gdac.broadinstitute.org/), and raw sequencing data are available via the GDC.
Code availability
Our study did not use a specialty code. Survival associations were generated using R version 4.1.3 and the survival library package.
Abbreviations
- TCGA:
-
The Cancer Genome Atlas
- RNA-seq:
-
RNA sequencing
- RPPA:
-
reverse-phase protein arrays
- DepMap:
-
Cancer Dependency Map
- GDSC:
-
Genomics of Drug Sensitivity in Cancer
- CNA:
-
copy number alteration
- CGI:
-
CpG Island
References
Lawrence, M. et al. Discovery and saturation analysis of cancer genes across 21 tumour types. Nature 505, 495–501 (2014).
Ding, L. et al. Perspective on oncogenic processes at the end of the beginning of cancer genomics. Cell 173, 305–320 (2018).
Cancer Genome Atlas Research Network et al. The Cancer Genome Atlas pan-cancer analysis project. Nat. Genet. 45, 1113–1120 (2013).
Hoadley, K. et al. Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin. Cell 158, 929–944 (2014).
Chen, F. et al. Pan-cancer molecular classes transcending tumor lineage across 32 cancer types, multiple data platforms, and over 10,000 cases. Clin. Cancer Res. 24, 2182–2193 (2018).
Zack, T. et al. Pan-cancer patterns of somatic copy number alteration. Nat. Genet. 45, 1134–1140 (2013).
Hoadley, K. et al. Cell-of-origin patterns dominate the molecular classification of 10,000 tumors from 33 types of cancer. Cell 173, 291–304 (2018).
Berger, A. et al. A comprehensive pan-cancer molecular study of gynecologic and breast cancers. Cancer Cell 33, 690–705 (2018).
Knijnenburg, T. et al. Genomic and molecular landscape of DNA damage repair deficiency across The Cancer Genome Atlas. Cell Rep. 23, 239–254.e236 (2018).
Zhang, Y. et al. A Pan-Cancer Proteogenomic Atlas of PI3K/AKT/mTOR pathway alterations. Cancer Cell 31, 820–832.e3 (2017).
Sanchez-Vega, F. et al. Oncogenic signaling pathways in The Cancer Genome Atlas. Cell 173, 321–337 (2018).
Liu, J. et al. An integrated TCGA pan-cancer clinical data resource to drive high-quality survival outcome analytics. Cell 173, 400–416 (2018).
Gibbons, D. & Creighton, C. Pan-cancer survey of epithelial-mesenchymal transition markers across The Cancer Genome Atlas. Dev. Dyn. 247, 555–564 (2018).
The Cancer Genome Atlas Research Network Comprehensive molecular characterization of clear cell renal cell carcinoma. Nature 499, 43–49 (2013).
Cancer Genome Atlas Research Network Integrated genomic analyses of ovarian carcinoma. Nature 474, 609–615 (2011).
The Cancer Genome Atlas Research Network. Comprehensive molecular characterization of papillary renal cell carcinoma. N. Engl. J. Med. 374, 135-145 (2015).
Cancer Genome Atlas Network Genomic classification of cutaneous melanoma. Cell 161, 1681–1696 (2015).
Cancer Genome Atlas Research Network et al. Integrated genomic characterization of endometrial carcinoma. Nature 497, 67–73 (2013).
Sotiriou, C. & Piccart, M. Taking gene-expression profiling to the clinic: when will molecular signatures become relevant to patient care?. Nat. Rev. Cancer 7, 545–553 (2007).
Michiels, S., Ternès, N. & Rotolo, F. Statistical controversies in clinical research: prognostic gene signatures are not (yet) useful in clinical practice. Ann. Oncol. 27, 2160–2167 (2016).
Ward, P. & Thompson, C. Metabolic reprogramming: a cancer hallmark even Warburg did not anticipate. Cancer Cell 21, 297–308 (2012).
Chen, F., Zhang, Y., Chandrashekar, D., Varambally, S. & Creighton, C. Global impact of somatic structural variation on the cancer proteome. Nat. Commun. 14, 5637 (2023).
Zhang, Y. et al. Rearrangement-mediated cis-regulatory alterations in advanced patient tumors reveal interactions with therapy. Cell Rep. 37, 110023 (2021).
Chen, F., Chandrashekar, D., Scheurer, M., Varambally, S. & Creighton, C. Global molecular alterations involving recurrence or progression of pediatric brain tumors. Neoplasia 24, 22–33 (2022).
Chen, F., Zhang, Y., Shen, L. & Creighton, C. The DNA methylome of pediatric brain tumors appears shaped by structural variation and predicts survival. Nat. Commun. 15, 6775 (2024).
Chen, F., Zhang, Y., Sedlazeck, F. & Creighton, C. Germline structural variation globally impacts the cancer transcriptome including disease-relevant genes. Cell Rep. Med. 5, 101446 (2024).
Yuan, Y. et al. Assessing the clinical utility of cancer genomic and proteomic data across tumor types. Nat. Biotechnol. 32, 644–652 (2014).
Nagy, Á, Munkácsy, G. & Győrffy, B. Pancancer survival analysis of cancer hallmark genes. Sci. Rep. 11, 6047 (2021).
Li, J. et al. A protein expression atlas on tissue samples and cell lines from cancer patients provides insights into tumor heterogeneity and dependencies. Nat. Cancer 5, 1579–1595 (2024).
Aran, D., Sirota, M. & Butte, A. Systematic pan-cancer analysis of tumour purity. Nat. Commun. 6, 8971 (2015).
Pereira, B. et al. The somatic mutation profiles of 2,433 breast cancers refines their genomic and transcriptomic landscapes. Nat. Commun. 7, 11479 (2016).
Roessler, S. et al. A unique metastasis gene signature enables prediction of tumor relapse in early-stage hepatocellular carcinoma patients. Cancer Res. 70, 10202–10212 (2010).
Chen, F. et al. Multiplatform-based molecular subtypes of non-small cell lung cancer. Oncogene 36, 1384–1393 (2017).
Tothill, R. et al. Novel molecular subtypes of serous and endometrioid ovarian cancer linked to clinical outcome. Clin. Cancer Res. 14, 5198–5208 (2008).
Nakagawa, T. et al. A tissue biomarker panel predicting systemic progression after PSA recurrence post-definitive prostate cancer therapy. PLoS ONE 3, e2318 (2008).
Zhao, H. et al. Gene expression profiling predicts survival in conventional renal cell carcinoma. PLoS Med. 3, e13 (2006).
Paik, S. et al. A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. N. Engl. J. Med. 351, 2817–2826 (2004).
Gray, R. et al. Validation study of a quantitative multigene reverse transcriptase-polymerase chain reaction assay for assessment of recurrence risk in patients with stage II colon cancer. J. Clin. Oncol. 29, 4611–4619 (2011).
Knezevic, D. et al. Analytical validation of the Oncotype DX prostate cancer assay - a clinical RT-PCR assay optimized for prostate needle biopsies. BMC Genom. 14, 690 (2013).
Hanahan, D. & Weinberg, R. Hallmarks of cancer: the next generation. Cell 144, 646–674 (2011).
Bindea, G. et al. Spatiotemporal dynamics of intratumoral immune cells reveal the immune landscape in human cancer. Immunity 39, 782–795 (2013).
Naidoo, J., Page, D. & Wolchok, J. Immune modulation for cancer therapy. Br. J. Cancer 111, 2214–2219 (2014).
Nin, D. & Deng, L. Biology of cancer-testis antigens and their therapeutic implications in cancer. Cells 12, 926 (2023).
Cook, J. & Hagemann, T. Tumour-associated macrophages and cancer. Curr. Opin. Pharmacol. 13, 595–601 (2013).
Hieronymus, H. et al. Tumor copy number alteration burden is a pan-cancer prognostic factor associated with recurrence and death. eLife 7, e37294 (2018).
Zhang, Y. et al. Global impact of somatic structural variation on the DNA methylome of human cancers. Genome Biol. 20, 209 (2019).
Storey, J. D. & Tibshirani, R. Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. USA 100, 9440–9445 (2003).
Liang, J. et al. Rac1, a potential target for tumor therapy. Front. Oncol. 11, 674426 (2021).
Shah, N. & Sukumar, S. The Hox genes and their roles in oncogenesis. Nat. Rev. Cancer 10, 361–371 (2010).
Wenzel, J. et al. Loss of the nuclear Wnt pathway effector TCF7L2 promotes migration and invasion of human colorectal cancer cells. Oncogene 39, 3893–3909 (2020).
Narimatsu, T. et al. Downregulation of NDUFB6 due to 9p24.1-p13.3 loss is implicated in metastatic clear cell renal cell carcinoma. Cancer Med. 4, 112–124 (2015).
Deaton, A. & Bird, A. CpG islands and the regulation of transcription. Genes Dev. 25, 1010–1022 (2011).
Pohl, L. et al. Molecular characteristics and improved survival prediction in a cohort of 2023 ependymomas. Acta Neuropathol. 147, 24 (2024).
Drexler, R. et al. A prognostic neural epigenetic signature in high-grade glioma. Nat. Med. 30, 1622–1635 (2024).
Holm, K. et al. Molecular subtypes of breast cancer are associated with characteristic DNA methylation patterns. Breast Cancer Res. 12, R36 (2010).
Bird, A. DNA methylation patterns and epigenetic memory. Genes Dev. 16, 16–21 (2002).
Forbes, S. et al. COSMIC: somatic cancer genetics at high-resolution. Nucleic Acids Res. 45, D777–D783 (2017).
Héberlé, E. & Bardet, A. Sensitivity of transcription factors to DNA methylation. Essays Biochem. 63, 727–741 (2019).
Lea, A. et al. Genome-wide quantification of the effects of DNA methylation on human gene regulation. eLife 7, e37513 (2018).
Tong, Y. et al. MICMIC: identification of DNA methylation of distal regulatory regions with causal effects on tumorigenesis. Genome Biol. 19, 73 (2018).
Ankill, J. et al. Epigenetic alterations at distal enhancers are linked to proliferation in human breast cancer. NAR Cancer 4, zcac008 (2022).
Kumar, S. et al. Passenger mutations in more than 2500 cancer genomes: Overall molecular functional impact and consequences. Cell 180, 915–927 (2020).
Slenter, D. et al. WikiPathways: a multifaceted pathway database bridging metabolomics to other omics research. Nucleic Acids Res. 46, D661–D667 (2018).
Monsivais, D. et al. Mass-spectrometry-based proteomic correlates of grade and stage reveal pathways and kinases associated with aggressive human cancers. Oncogene 40, 2081–2095 (2021).
Davis, C. et al. The somatic genomic landscape of chromophobe renal cell carcinoma. Cancer Cell 26, 319–330 (2014).
Seyfried, T. & Shelton, L. Cancer as a metabolic disease. Nutr. Metab.7, 7 (2010).
Warburg, O. On the origin of cancer cells. Science 123, 309–314 (1956).
Tsherniak, A. et al. Defining a Cancer Dependency Map. Cell 170, 564–576 (2017).
Dempster, J. et al. Chronos: a cell population dynamics model of CRISPR experiments that improves inference of gene fitness effects. Genome Biol. 22, 343 (2021).
Law, V. et al. DrugBank 4.0: shedding new light on drug metabolism. Nucleic Acids Res. 42, D1091–D1097 (2014).
Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25–29 (2000).
Creighton, C., Nagaraja, A., Hanash, S., Matzuk, M. & Gunaratne, P. A bioinformatics tool for linking gene expression profiling results with public databases of microRNA target predictions. RNA 14, 2290–2296 (2008).
Cohen, P., Cross, D. & Jänne, P. Kinase drug discovery 20 years after imatinib: progress and future directions. Nat. Rev. Drug Discov. 20, 551–569 (2021).
Sridhar, R., Hanson-Painton, O. & Cooper, D. Protein kinases as therapeutic targets. Pharm. Res. 17, 1345–1353 (2000).
Iorio, F. et al. A landscape of pharmacogenomic interactions in cancer. Cell 166, 740–754 (2016).
Zhang, Y., Chen, F. & Creighton, C. Pan-cancer molecular subtypes of metastasis reveal distinct and evolving transcriptional programs. Cell Rep. Med. 4, 100932 (2023).
Creighton, C. Widespread molecular patterns associated with drug sensitivity in breast cancer cell lines, with implications for human tumors. PLoS ONE 8, e71158 (2013).
Horak, C. et al. Biomarker analysis of neoadjuvant doxorubicin/cyclophosphamide followed by ixabepilone or Paclitaxel in early-stage breast cancer. Clin. Cancer Res. 19, 1587–1595 (2013).
Iwamoto, T. et al. Gene pathways associated with prognosis and chemotherapy sensitivity in molecular subtypes of breast cancer. J. Natl. Cancer Inst. 103, 264–272 (2011).
Hatzis, C. et al. A genomic predictor of response and survival following taxane-anthracycline chemotherapy for invasive breast cancer. JAMA 305, 1873–1881 (2011).
Shen, K. et al. Cell line derived multi-gene predictor of pathologic response to neoadjuvant chemotherapy in breast cancer: a validation study on US Oncology 02-103 clinical trial. BMC Med. Genom. 5, 51 (2012).
Korde, L. et al. Gene expression pathway analysis to predict response to neoadjuvant docetaxel and capecitabine for breast cancer. Breast Cancer Res. Treat. 119, 685–699 (2010).
Prat, A. et al. Research-based PAM50 subtype predictor identifies higher responses and improved survival outcomes in HER2-positive breast cancer in the NOAH study. Clin. Cancer Res. 20, 511–521 (2014).
Miyake, T. et al. GSTP1 expression predicts poor pathological complete response to neoadjuvant chemotherapy in ER-negative breast cancer. Cancer Sci. 103, 913–920 (2012).
Creighton, C. Gene expression profiles in cancers and their therapeutic implications. Cancer J. 29, 9–14 (2023).
Nunnery, S., Mayer, I. & Balko, J. Triple-negative breast cancer: breast tumors with an identity crisis. Cancer J. 27, 2–7 (2021).
Zhang, Y., Chen, F., Balic, M. & Creighton, C. An essential gene signature of breast cancer metastasis reveals targetable pathways. Breast Cancer Res. 26, 98 (2024).
Chen, G. et al. Protein profiles associated with survival in lung adenocarcinoma. Proc. Natl. Acad. Sci. USA 100, 13537–13542 (2003).
Hanahan, D. & Weinberg, R. The hallmarks of cancer. Cell 100, 57–70 (2000).
Cuezva, J. et al. The bioenergetic signature of cancer: a marker of tumor progression. Cancer Res. 62, 6674–6681 (2002).
Engelman, J., Luo, J. & Cantley, L. The evolution of phosphatidylinositol 3-kinases as regulators of growth and metabolism. Nat. Rev. Genet. 7, 606–619 (2006).
Szokolóczi, O. et al. TT232, a novel signal transduction inhibitory compound in the therapy of cancer and inflammatory diseases. J. Recept Signal Transduct. Res. 25, 217–235 (2005).
Reznik, E. et al. A landscape of metabolic variation across tumor types. Cell Syst. 6, 301–313 (2018).
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
Du, P. et al. Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis. BMC Bioinform. 11, 587 (2010).
Şenbabaoğlu, Y. et al. Tumor immune microenvironment characterization in clear cell renal cell carcinoma identifies prognostic and immunotherapeutically relevant messenger RNA signatures. Genome Biol. 17, 231 (2016).
Saldanha, A. J. Java Treeview-extensible visualization of microarray data. Bioinformatics 20, 3246–3248 (2004).
Pavlidis, P. & Noble, W. Matrix2png: a utility for visualizing matrix data. Bioinformatics 19, 295–296 (2003).
Creighton, C. The molecular profile of luminal B breast cancer. Biologics 6, 289–297 (2012).
Acknowledgements
This work was supported in part by National Institutes of Health (NIH) grant P30CA125123 (C.C.). We thank the patients who participated in this study.
Author information
Authors and Affiliations
Contributions
Conceptualization: C.J.C.; methodology: C.J.C., Y.Z., F.C.; formal analysis: C.J.C., Y.Z., F.C.; data curation: Y.Z., C.J.C.; visualization; C.J.C.; writing: C.J.C.; manuscript review: Y.Z., F.C.; supervision: C.J.C.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Ethics approval
Tumor molecular profiling data were generated through informed consent as part of the TCGA consortium efforts and analyzed here per TCGA’s data use guidelines and restrictions. The TCGA network collected tumors from 161 tissue source sites (TSSs) across the world, acquiring tumors from 11,160 patients of 32 different cancer types; these tumors were originally diagnosed from 1978 to 2013, with the median diagnosis year of 200912. At each TSS, the approving ethics committee(s) and institutional review board(s) corresponding to the TSS institution ensured that appropriate informed consent to participate in TCGA projects was obtained from participants for whom the TSS obtained tumor biopsies. Our present study utilized public TCGA data, and no additional informed consent or institutional board approval was required. The research was performed in accordance with the Declaration of Helsinki. Our present study should not be considered a multi-center study, as no new patient data were generated here, and only public data were utilized.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Zhang, Y., Chen, F. & Creighton, C.J. Pan-cancer, multi-omic correlates of survival transcending tumor lineage across 11,019 patients reveal targets and pathways. npj Precis. Onc. 9, 226 (2025). https://doi.org/10.1038/s41698-025-01029-x
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41698-025-01029-x










