A core driver gene set identified based on geMER reveals its potential driver mechanism in pan-cancer

Gan, Jing; Wang, Yuncong; Shi, Zhuoran; Hu, Haoyu; Xu, Manyi; Li, Xinrong; Dong, Wenbo; He, Jiaheng; Zhao, Yusen; Zhang, Yakun; Sun, Yue; Zhang, Caiyu; Lu, Qianyi; Ning, Shangwei; Jin, Yan; Zhi, Hui

doi:10.1038/s41698-025-01060-y

Download PDF

Article
Open access
Published: 09 August 2025

A core driver gene set identified based on geMER reveals its potential driver mechanism in pan-cancer

Jing Gan¹^na1,
Yuncong Wang¹^na1,
Zhuoran Shi²,
Haoyu Hu¹,
Manyi Xu¹,
Xinrong Li¹,
Wenbo Dong¹,
Jiaheng He¹,
Yusen Zhao¹,
Yakun Zhang¹,
Yue Sun¹,
Caiyu Zhang¹,
Qianyi Lu¹,
Shangwei Ning¹,
Yan Jin¹ &
…
Hui Zhi¹

npj Precision Oncology volume 9, Article number: 278 (2025) Cite this article

3240 Accesses
Metrics details

Subjects

Abstract

Increasing evidence underscores the driving role of coding and non-coding variants in cancer development. Analyzing gene sets in biological processes offers deeper insights into the molecular mechanisms of carcinogenesis. Here, we developed geMER to identify candidate driver genes genome-wide by detecting mutation enrichment regions within coding and non-coding elements. We subsequently designed a pipeline to identify a core driver gene set (CDGS) that broadly promotes carcinogenesis across multiple cancers. CDGS comprising 25 genes for 25 cancers displayed instability in DNA aberrations. Variants within the TTN enrichment region may influence the folding of the I-set domain by altering local polarity or side-chain chemistry properties of amino acids, potentially disrupting its antigen-binding capacity in LUAD. Multi-omics analysis revealed that APOB emerged as a candidate oncogene in LIHC, whose genetic alterations within the enrichment region may activate key TFs, upregulate DNA methylation levels, modulate critical histone modifications, and enhance transcriptional activity in the HepG2 and A549 cell lines compared to Panc1. Additionally, CDGS mutation status was an independent prognostic factor for the pan-cancer cohort. High-risk patients tended to develop an immunosuppressive microenvironment and demonstrated a higher likelihood of responding to ICI therapy. Finally, we provided a user-friendly web interface to explore candidate driver genes using geMER (http://bio-bigdata.hrbmu.edu.cn/geMER/).

Analysis of 10,478 cancer genomes identifies candidate driver genes and opportunities for precision oncology

Article Open access 18 June 2024

Gene expression profiling and protein–protein network analysis revealed prognostic hub biomarkers linking cancer risk in type 2 diabetic patients

Article Open access 18 December 2023

Identification and validation of prognosis-associated DNA repair gene signatures in colorectal cancer

Article Open access 28 April 2022

Introduction

Cancer, classified as a genetic disease, is characterized by the abnormal and uncontrolled proliferation of cells, which results from somatic mutations acquired in the DNA of cancerous cells¹. During cancer evolution, driver mutations undergo positive selection conferring a growth advantage to the cells that harbor them. These mutations occur more frequently than expected from genome-wide mutation rates, exhibit a stronger functional impact, and are enriched in hallmark pathways and driver genes of cancers^2,3. Therefore, A cancer driver gene has been defined as one whose mutations promote net cellular growth under specific microenvironmental conditions within the cell in vivo⁴. Since the establishment of genetics, identifying driver genes across tumors has been one of the major motivations for cancer research⁵, which is essential for developing targeted anti-cancer therapies and discovering genomic biomarkers of prognosis and therapy response⁶.

Currently, a substantial body of research has demonstrated that driver mutations can contribute to cancer progression without the need to modify the amino acid sequence of a protein. In fact, over 90% of somatic variants occur in non-coding regions, and increasing evidence underscores the significance of non-coding variants in cancer development⁷. For instance, highly recurrent mutations within the core promoter of oncogene TERT have been observed to generate binding motifs for E-twenty-six transcription factors (TFs), up-regulating TERT expression in melanoma, and as driver mutations across multiple tumor types, including the central nervous system, bladder, thyroid, and skin^8,9. Additionally, Puente et al. discovered recurrent mutations in the 3′UTRs of NOTCH1, which caused abnormal splicing and enhanced NOTCH1 activity, driving a more aggressive disease in chronic lymphocytic leukemia¹⁰. Furthermore, Zeraati et al. experimentally validated mutations in the 5′UTRs of TAOK2, BCL2, and CXCL14, showing that these mutations alter mRNA translation in cancer. As a result, mutations in non-coding regions are closely linked to cancer initiation, metastasis, and drug resistance, providing important avenues for new diagnostic methods and therapeutic strategies.

The availability of large-scale whole-genome sequencing (WGS) data for cancer has made it possible to explore the role of driver events in coding and non-coding regions. As computer technology advances, intelligent algorithms have gradually been recognized as effective tools for predicting cancer driver genes¹¹. Current approaches can be broadly classified into several categories¹²: Mutation frequency-based methods, such as DriverML¹³, ActiveDriver¹⁴, and OncodriveFML¹⁵ et al.; Methods based on sub-network, including DriverNet¹⁶, DawnRank¹⁷, and MUFFINN¹⁸ et al.; Biophysical models and integrated machine learning methods for identifying functional non-coding mutations^19,20,21,22; And methods that identify driver genes by studying and analyzing hotspot mutation regions, where recurrent somatic mutations occur in protein functional regions or disrupt active sites, as seen in tools like DriverDB v3²³, MSEA²⁴, and DiffMut²⁵. In particular, the MSEA method hypothesizes that genes with mutation hotspots may serve as candidate cancer genes. The MSEA-clust module within MSEA modifies the Kolmogorov-Smirnov test to detect mutation enrichment patterns along gene transcripts. In this study, based on the concept of cancer driver genes that driver mutations occur at a higher frequency than the overall mutation rate across the genome, and the significance of mutations in both coding and non-coding regions in cancer development, we identified the mutation enrichment region within genome coding and non-coding elements by MSEA-clust, aiming to pinpoint potential candidate driver genes for cancer.

In addition, despite the availability of various computational methods for identifying potential driver genes, it remains unclear how these driver genes induce extensive transcriptional changes and contribute to malignant phenotypes²⁶. In recent years, an increasing number of studies have demonstrated that cancer occurrence is not solely driven by single gene mutations, but is also closely associated with alterations in a series of gene sets. Cancer development is typically the outcome of multiple molecular events, involving various genes, pathways, and their complex interactions²⁷. In general, analyzing gene sets that constitute pathways and biological processes may provide deeper insights into the molecular mechanism of carcinogenesis compared to examining individual genes²⁷. Currently, researchers have identified numerous important gene sets across various research contexts. For instance, a senescence gene set was defined to identify senescent cells and predict pathways associated with senescence²⁸. Clarke et al. developed an mRNA biomarker set as a predictor for post-treatment Lyme disease²⁹. A mutation-based gene set was established to forecast survival benefits following immunotherapy for multiple cancer types³⁰. Additionally, CRSO is a method designed to identify essential and synergistic alteration modules that drive cancer in individual patients, aiming to identify a rule set that represents the unique minimal combinations capable of explaining the study population of a given tumor³¹. Therefore, integrated analysis for multidimensional data has the significant potential to identify a core and robust gene set and enhance our understanding of the mechanism underlying cancer development and progression. However, studies focused on identifying a core driver gene set shared across multiple cancers and investigating its functional impact on oncogenic processes in a pan-cancer context are limited.

Here, we developed geMER to identify candidate driver genes for both coding and non-coding genome elements. We assessed the performance of geMER using several key metrics and compared its results with other genome-wide tools. We investigated the potential biological roles of candidate drivers in cancer development and progression. Subsequently, we systematically identified a novel core driver gene set (CDGS) for multiple cancer types and characterized its mutation landscape. We described the potential driving mechanisms of CDGS through the multi-omics analysis, containing somatic mutations, copy number variations, transcription, DNA methylation, TFs, and histone modifications. Moreover, we demonstrated the ability of CDGS mutations to predict survival outcomes and the response to immune checkpoint inhibitor (ICI) therapy for pan-cancer patients. Finally, we integrated our geMER results into an interactive, web-based resource, freely available at http://bio-bigdata.hrbmu.edu.cn/geMER/. This platform enables users to explore candidate driver genes for 33 cancer types from TCGA or upload their mutation data to identify candidate driver genes for a given cancer.

Results

geMER: A pipeline for genome-wide identification of potential cancer drivers

Over the years, protein-coding somatic variants have been successfully mapped by exome-based analyses. However, the detection of positive selection signals in non-coding elements also highlights driver events that contribute to tumor development³². Meanwhile, we analyzed non-synonymous somatic mutations from WGS data across 33 cancer types generated by TCGA, along with five genomic elements provided by the PCAWG³³. When we aligned mutations to both coding and non-coding genomic elements, our findings revealed that 94.3% of mutations were located within these elements, with the following distribution: CDS (41.2%), promoters (10.3%), splice sites (32.9%), 3’UTRs (11.3%), and 5’UTRs (4.3%) (Fig. 2A). Therefore, considering the functional impact of mutation enrichment within both coding and non-coding elements, we developed geMER, a method that quantifies the enrichment of mutations and detects enrichment regions across the genomic element for identifying candidate driver genes (see methods, Fig. 1). For 33 cancer types from TCGA, we totally identified 16,667 candidate drivers out of 22,026 eligible unique genes with 2.54 million somatic mutations for analysis, including 15,270 in CDS, 5705 in promoters, 13,784 in splice sites, 8217 in 3’UTRs, and 3387 in 5’UTRs (adj.p < 0.05, Fig. 2B).

**Fig. 2: Identification of candidate drivers for coding and non-coding elements across 33 cancer types.**

To benchmark geMER, we detected mutation enrichment regions within CDS using geMER across five independent healthy cohorts. The mutation record counts of these healthy cohorts were comparable to or exceeded those in the corresponding cancer types from TCGA. As a gold standard for CDS drivers, the COSMIC CGC is a catalog of driver genes whose mutations are causally linked to the development of cancer³⁴. By comparing the proportion of CGC driver genes between the healthy and cancer cohorts, we observed that the proportion in the healthy cohorts was consistently significantly lower than in the corresponding cancer cohorts (Fig. 2C). This finding underscores the capability of geMER to identify potential driver genes specific to tumors. Additionally, we evaluated candidate drivers for genomic elements called by geMER against three other published genome-wide detection tools, including ActiveDriverWGS³, oncodriveFML¹⁵, and DriverPower³⁵. For comparison, we input the same mutation data and detected drivers using their default filter criteria (DriverPower, q < 0.1; OncodriveFML, q < 0.1; ActiveDriverWGS, FDR < 0.05). We evaluated two metrics, including the enrichment of CGC in candidate drivers as a proxy for sensitivity and the F1 score as a balanced indicator for sensitivity and specificity. We found that geMER outperformed other methods across most cancer types, particularly in PRAD, READ, and OV (Fig. 2D, E). Additionally, the number and proportion of CGC genes from geMER results were relatively higher (Fig. 2F). Furthermore, we noticed a substantial overlap in candidate drivers for genomic elements between geMER and other methods, especially with ActiveDriverWGS and DriverPower (Fig. 2G). When geMER was applied to the COSMIC dataset for identifying candidate CDS drivers across 34 cancer types, it successfully identified candidate drivers in 27 cancer types. The high enrichments of known cancer genes and strong F1 scores were observed across cancer types (Fig. S1). In conclusion, geMER effectively identifies candidate drivers for both coding and non-coding elements in various cancers.

Candidate drivers were associated with dysregulated expression and patient prognosis

We conducted a deeper biological function analysis of the candidate element drivers identified by geMER. The abnormalities in mRNA and protein expression regulation are hallmarks of many human diseases including cancer³⁶. Therefore, for each driver, we performed four types of comparison analysis. Firstly, we compared the expression level between healthy and tumor samples using the GTEx and TCGA datasets (Fig. S2A). Next, we grouped tumor patients into mutant- and wild-type groups based on their mutation status to compare gene expression levels and overall survival (OS). Finally, we stratified patients into high- and low-expression groups according to the median expression level and compared intergroup survival outcomes. In total, 12,996 out of 16,667 candidate drivers were significantly associated with either dysregulated expression or patient prognosis in at least one cancer type. Candidate element drivers affecting expression or survival outcomes could be found in numerous cancers (Fig. S2B). For most cancer types, the candidate drivers displayed the highest proportions of differential expression between normal and tumor groups, underscoring their driving roles in cancer progression (Fig. 3A).

**Fig. 3: Functional validation of candidate drivers for five genomic elements across 33 cancer types.**

Since a gene can be a candidate for multiple genomic elements, we focused on unique drivers for each cancer type, regardless of the elements (Supplementary Data 1). For the above four types of analysis, three genes were significant in all analyses within their respective cancers (Fig. S2C). Among them, PDGFRA is a well-known oncogene for glioma³⁷, showing significantly upregulated expression in LGG. PDGFRA’s high expression level was identified as a high-risk factor for patients with advanced tumors (Fig. 3B). Similarly, PPL was significantly upregulated in OV, and its high expression level was linked to poor prognosis, suggesting its potential as an oncogene (Fig. 3C). Consistently, overexpression of PPL has been reported to be associated with worse survival outcomes in ovarian cancer patients³⁸. We also observed a higher PDGFRA protein expression level in glioma compared to normal tissue as evidenced by immunohistochemical (IHC) results from The Human Protein Atlas³⁹ (Fig. 3E). PPL also showed higher protein expression in ovary cancer (Fig. 3F). PDGFRA mutations clustered within functional protein domains Ig1_PDGFR-alphabeta and Ig4_PDGFR-alpha in LGG.PPL mutations were concentrated in the SPEC domain in OV (Fig. S2D). In contrast, LAG3 was significantly downregulated in UCEC. Patients with high LAG3 expression showed a better survival outcome, suggesting that LAG3 was a probable tumor-suppressed gene (Fig. 3D). Consistently, the expression of LAG3 has been associated with prognosis with a range of tumor types⁴⁰. Numerous missense mutations clustered in its Ig-like domain, indicating that these mutations may impact the immune system and contribute to cancer suppression (Fig. 3G).

In addition, mutations of PDGFRA, PPL, and LAG3 were significantly related to their expression levels. Mutations of PDGFRA and PPL were linked to poorer survival outcomes, while patients with LAG3 mutations showed an improved prognosis (Fig. 3B–D). MEXPRESS⁴¹ was subsequently employed to visualize the complex interplay among their expression, copy number variations, DNA methylation, and somatic mutations. The summary view results confirmed a significant correlation (p-value < 0.05) between mRNA expression and methylation levels of LAG3 in UCEC and PDGFRA in LGG (Fig. 3H and Fig. S2E). Their expression was significantly positively correlated with copy number variations and OS while showing negative correlations with numerous promoter methylation probes. Additionally, increased expression was associated with a higher number of somatic mutations (Fig. 3I and Fig. S2F). The observed correlations and dysregulation across multiple omics layers provide strong evidence that the candidate drivers are likely involved in critical mechanisms of cancers and play a pivotal role in tumorigenesis.

Identification and mutation landscape overview of a core driver gene set

To capture the biological processes regulated comprehensively by candidate element drivers, we performed an integrative pathway enrichment analysis across 33 cancer types. 97 significantly enriched pathways with 2497 unique functional drivers were identified across 25 cancer types (Supplementary Data 2). As expected, major pathways were co-supported by coding and non-coding elements highlighting the importance of coding and non-coding drivers in cancers (Fig. S3A, B). 15 pathways closely associated with tumor processes were found to be shared across 22 cancer types (Fig. S3C). For example, widespread genomic alterations in axon guidance genes have been implicated in pancreatic ductal adenocarcinoma, joining growing evidence in other cancers^42,43. The EGFR signaling pathway known for its oncogenic role, contains mutations that predict responses to the EGFR tyrosine kinase inhibitors in lung cancer patients⁴⁴. FGFR signaling has been shown to promote tumorigenesis by facilitating the occurrences of proliferation, migration, and invasion⁴⁵. In addition, more than 20 cancer-specific pathways were found to be activated in eight cancer types, potentially influencing cancer onset and progression (Fig. S3D). For instance, BMP signaling has been shown to promote primary colon cancer growth in vivo and might enhance invasion in vitro⁴⁶. Notch1 signaling was found to inhibit the growth of esophageal carcinoma cells by downregulating gene expression⁴⁷. These findings suggest that mutations in functional drivers may contribute to cancer by modulating key molecular pathways involved in tumor initiation, progression, and treatment.

We identified a core driver gene set (CDGS) for 25 cancer types based on functional drivers, which included 10 known cancer genes from the CGC gene set (see methods, hypergeometric test p < 0.05) (Fig. 4A and Supplementary Data 3). We explored the mutation landscape of CDGS to investigate their genomic characteristics. The overall DNA alteration level of CDGS ranged from 9% to 38% (Fig. S3E). CDGS genes exhibited a higher proportion of somatic mutations and were more prone to amplification or deep deletion, indicating their genomic instability in DNA aberrations. Relative distribution analysis of CDGS revealed that somatic mutations in CDGS always occurred in UCEC and COAD, whereas amplifications or deep deletions were more prevalent in OV and BRCA (Fig. 4B). Additionally, we observed cancer-specific alterations in individual CDGS genes. For instance, TTN showed a higher mutation frequency in SKCM and UCEC, BRAF amplification was particularly high in OV, and TP53 deep deletions predominantly occurred in PRAD. CDGS genes significantly co-occurred across pan-cancer and individual cancer types, revealing a tendency to activate collaborating oncogenic pathways⁴⁸ (Fig. 4C and Fig. S3F). KEGG pathway enrichment analysis found that CDGS genes co-regulate key biological processes of various cancer types (Fig. 4D).

**Fig. 4: Identification and mutation landscape of CDGS across 25 cancer types.**

Mutation frequency alone is not a reliable indicator of a gene being a driver, as it can be influenced by the gene’s size. For example, TTN had the highest mutation frequency among CDGS genes, but it may be attributed to its large gene length of 281,434 bp. Therefore, we identified Pfam domains affected by CDGS variants aiming to provide a more nuanced understanding of their functional impact. Somatic mutations in CDGS impacted 87 distinct Pfam domains, in which 16 domains were affected in over 20 cancer types (Fig. 4E and Supplementary Data 4). Notably, numerous mutations occurred in the fibronectin-type III (FN3) domain and Immunoglobulin I-set (I-set) domain of TTN. TTN mutations have been reported as key intra-tumor potential drivers for MIP/SOL components in lung adenocarcinoma⁴⁹. In this study, TTN appeared to function as a potential tumor suppressor gene (TSG), representing lower expression in LUAD compared to normal tissues (log2FC = −0.66, adj.p = 1.99e−67). Mutations in the enrichment region of TTN were predominantly clustered within the I-set rather than the FN3 domain in LUAD (Fig. 4F). The I-set domains were involved in several critical functions including immune system regulation akin to antibody functions⁵⁰. The p.S9785F, p.I9761T, p.D9826N, and p.V9772L were variants of the TTN enrichment region, which occurred in an I-set domain whose structure could be accurately modeled by SWISS-MODEL with a 100% template protein coverage. Notably, the p.S9785F variant replaced the serine with phenylalanine and the p.I9761T variant replaced Isoleucine with threonine, altering local side-chain polarity and physicochemical properties of the original amino acid. These changes were likely to affect the folding pattern of the I-set domain, potentially impairing its antigen-binding capacity in LUAD.

Genetic alterations in CDGS may affect transcription and modulate epigenetic regulation

To systematically investigate the cancer-driving mechanisms of somatic mutations in CDGS across multi-omics levels, we first compared the expression levels of CDGS in tumor versus normal tissues across 25 cancer types. It revealed a predominant dysregulated expression pattern of CDGS genes and approved their important role in driving cancer development (Fig. 5A). We observed a significant difference in the average expression levels of CDGS between mutant- and wild-type groups in the pan-cancer cohort (Fig. 5B). Specifically, CDGS mutations led to decreased expression in BLCA, PAAD, and STAD, while increasing expression in READ and THCA (Fig. S4A). In specific CDGS gene analysis, we evaluated the proportion of patients with both mutations and altered expression levels (Z-score > 2 for upregulation or Z-score < -2 for downregulation) by comprehensively analyzing genetic and transcriptomic information from the same samples. TP53, KRAS, COL6A3, and APOB were identified with a significant proportion of mutated samples with expression abnormalities for eight cancer types (Fig. S4B). Notably, APOB whose role in tumors has not been well-established expressed the highest proportion in LIHC. The expression level of APOB was significantly higher in LIHC compared to normal tissues (Fig. 5C), suggesting that APOB may function as a potential oncogene. Furthermore, patients with APOB mutations showed a worse prognosis than the wild-type patients (Fig. 5D).

**Fig. 5: Multi-omics analysis of CDGS across 25 cancer types.**

To further investigate the relationship between the variants and other omics of APOB in LIHC, we modeled the causal regulatory relationship mechanisms by integrating transcription, DNA methylation, histone modifications, and TFs data from ENCODE⁵¹. We categorized 25 cancer types into three groups (Supplementary Data 5): Driver & Same-Region, cancers where APOB was identified as a driver with the mutation enrichment region covering the LIHCs; Driver & Different-Region, cancers where APOB was a driver but with the mutation enrichment region distinct from LIHCs; And Non-Driver, cancers where APOB was not a driver. Compared to G401, Caki2, SK-N-DZ, and Panc1 of the Non-Driver group, APOB transcription levels were significantly higher in HepG2 cells for LIHC, particularly in the mutation enrichment region (Fig. 5E and Fig. S4C), indicating that variants in the enrichment region may contribute to the upregulation of APOB expression. Histone modifications such as H3K4me3 and H3K27ac, which mark active promoters to increase gene expression⁵², showed stronger signals near the APOB promoter region in HepG2 cells compared to Panc1 cells (Fig. 5E and Fig. 5F). Conversely, histone markers of transcriptional repression including H3K27me3 and H3K9me3⁵³ exhibited weak signals across the APOB gene body in HepG2 cell lines. Additionally, H3K36me3 and H3K4me1 which activate transcription by marking the gene body and active enhancer regions, respectively⁵², displayed consistently stronger signals in HepG2 than in Panc1 cell lines. We searched the TFs regulating APOB, which included FOXA1, HNF4A, NR2F1, NR2F6, PPARA, and PPARG from the TRRUST database⁵⁴. The TFs targeting the enrichment region (erTFs) of APOB in LIHC were predicted by the Cistrome Data Browser⁵⁵ (Fig. S4D). Although no significant difference was found in the association of TFs and APOB expression between the Driver&Same-Region and Non-Driver groups, we observed significantly positive correlations between APOB and erTFs CEBPA and NR3C1 expressions in a higher number of cancers within the Driver&Same-Region group, compared to the Driver&Different-Region group. It suggests a potential activation influence of mutations within the enrichment region of APOB on CEBPA and NR3C1 in LIHC (Fig. 5G and Fig. S4E). This activity enhancement was also reflected by their signals covering the promoter and enrichment regions of APOB in HepG2 cell lines. A similar signal pattern of APOB was observed in A549 cells for LUSC from the Diver&Same-Region group (Fig. S4F). These findings suggest that mutations in the APOB enrichment region may upregulate transcription, activate key TFs, and influence crucial histone modifications.

Additionally, we found significantly different methylation levels of CDGS genes when comparing tumor tissues to normal tissues (Fig. 5H). In pan-cancer analysis, the average methylation level of CDGS was significantly lower in the mutant group (Fig. S4G). Mutations of CDGS also decreased the methylation level across eight specific cancer types (Fig. S4H). Meanwhile, we observed methylation signals of APOB in HepG2 and A549 rather than Panc1 cell lines (Fig. 5E, F and Fig. S4F), suggesting that mutations of APOB may influence their methylation levels. Notably, these observed discrepancy in methylation level influence arises from the distinct contexts in which APOB and CDGS mutations were analyzed. The objective of the APOB methylation analysis is to investigate the potential impact of mutations within enrichment regions on cancer development. In contrast, the global analysis of CDGS mutations reflects a broader epigenetic effect as a collective group.

Meanwhile, RYR3 was found to be downregulated in LUSC, PAAD, and LIHC compared to normal tissues, indicating its potential role as a TSG (Fig. 5A). RYR3 was identified as a candidate driver gene by geMER in LUSC and PAAD, but not in LIHC. Compared to the HepG2 cell line, the signals of transcription, DNA methylation, and histone modifications that promote transcription were weaker, while histone modifications that suppress transcription were stronger in A549 and Panc1 cell lines (Fig. S4I). These observations suggest that mutations within the enrichment regions of RYR3 probably reduced its transcriptional levels, DNA methylation, and histone modifications that activate expression, while enhancing histone modifications that inhibit expression. These findings provide novel mechanistic insights into the role of RYR3 as a potential driver across various cancer types. RELN, recognized as a candidate driver gene across LIHC, LUSC, and PAAD, displayed similar transcriptional, DNA methylation, and histone modification patterns in all three cell lines (Fig. S4J). This demonstrated that mutations within the enrichment regions of RELN may have similar regulatory mechanisms across these cancers.

Cross-talk among CDGS genes and functional consequences of CDGS mutations

We observed that the members of CDGS featured frequent interactions within protein-protein interaction (PPI) networks (Fig. 6A). Their expression levels were generally significantly positively correlated across pan-cancer and individual cancer types (Fig. 6B). These results indicated that the cross-talk among CDGS genes may contribute to the development and progression of multiple cancers.

**Fig. 6: Analysis of CDGS interaction and hallmark pathways.**

Functionally aberrant drivers synergistically or complementarily disrupt downstream pathways within biological networks, with some resulting in the activation of cancer hallmarks⁵⁶. To explore common functional consequences induced by CDGS mutations across pan-cancer, we downloaded and categorized 50 hallmark pathways of cancers from MSigDB⁵⁷ (Supplementary Data 6). These pathways were scored employing single-sample Gene Set Enrichment Analysis (ssGSEA) using RNA-seq data of the PANCAN cohort. We observed generally significant correlations between the expression levels of CDGS genes and pathway activity scores (Fig. 6C). We subsequently performed a differential pathway activity analysis comparing samples with CDGS mutations to wild-type samples. Activity scores for 47 out of 50 pathways were significantly different between groups, with 40 pathways demonstrating significant activation in the CDGS mutant group (Supplementary Data 7). Notably, the pathways most frequently upregulated were associated with cellular proliferation (Fig. 6D). Consistently, Gene Set Enrichment Analysis (GSEA) also indicated an upregulation of proliferation-related signatures in samples harboring CDGS mutations (Fig. 6E). These results contributed to a better understanding of the CDGS interplay and functional impact of their mutations on cancers.

CDGS mutations could be an independent predictor of prognosis for pan-cancer patients

We investigated the prognostic relevance of CDGS mutation status across 25 cancer types. Results revealed significant associations of CDGS mutations with four types of survival outcomes across pan-cancer cohorts, rather than within specific cancer types (Fig. S5A). To investigate the potential reason, we performed survival analyses by sequentially adding cancer types, starting with ACC. Consistent prognostic associations were observed when 16 or more cancer types were incorporated (Fig. S5B). To further evaluate the influence of specific cancer types, we conducted random sampling analyses across various numbers of cancer types, selecting from 2 to 25. For each list of cancers, we randomized 100 times and repeated four types of survival analysis. As the number of cancer types increased, the density of random instances linked to prognosis also rose, highlighting that the prognostic power of CDGS mutations is stronger and not influenced by specific cancer types (Fig. S5C). These results suggest that the lack of significant associations in certain cancer types is likely due to smaller sample sizes, with larger datasets or more cancer types strengthening the prognostic value of CDGS mutations.

To predict prognosis in the pan-cancer cohort based on CDGS mutations, we employed a randomized approach, dividing patients into training and testing sets in a seven-to-three ratio, repeated 100 times. For each random, patients were stratified into mutant- and wild-type groups based on CDGS mutation status. Subsequently, we assessed differences in terms of survival outcomes between groups. Both the training and testing cohorts consistently demonstrated significant results across four types of survival outcomes with 100 randomizations (Supplementary Data 8). For subsequent analysis, we designated the first randomized datasets as the training and internal validation cohorts, and 11 datasets representing eight different cancers from the ICGC as an external validation cohort. In both the training and validation cohorts, patients in the mutant group exhibited notably shorter OS compared to the wild-type group (Fig. 7A). Consequently, we classified the mutant group as high-risk and the wild-type group as low-risk. The poorer survival trends among high-risk patients were specifically observed in ACC, LGG, and PAAD (Fig. S6A).

**Fig. 7: Survival analysis of CDGS mutations.**

In the training cohorts and the entire TCGA cohort, univariate Cox regression analysis demonstrated a significant correlation between CDGS mutations and OS (p-value < 0.001). Upon adjusting for age, gender, and cancer stages using multivariate Cox regression analysis, CDGS mutations remained an independent predictive factor for prognosis (Fig. 7B and Fig. S6B). These findings underscore the robustness of CDGS mutation status as an independent prognostic indicator for the pan-cancer cohort, further supporting its clinical relevance in prognostic assessment. Comparative analysis of clinical characteristics revealed that patients categorized as high-risk had significantly higher ages and a greater proportion of advanced stage categories (stage M, T, N, and TNM) compared to the low-risk group (Fig. 7C and Fig. S6C). Stratification analysis further demonstrated that CDGS mutation status effectively stratifies patients into high- and low-risk groups in the same stratum of age, gender, and various cancer stages (N, M, T). Across each stratum, patients in the high-risk group consistently showed poorer OS outcomes (Fig. 7D and Fig. S6D). These findings highlight the prognostic utility of CDGS mutations across diverse clinical contexts and its potential as a predictive biomarker in prognosis for the pan-cancer cohort.

Predictive potential of CDGS mutations in immunotherapy outcomes for pan-cancer patients

Carcinogenesis is driven by interactions between genetic mutations and the local tumor microenvironment⁵⁸. Mutations in CDGS impacted immune-related domains, such as I-set domain (Fig. 4E), and activated hallmark pathways associated with immune or immune response (Fig. 6D). Therefore, we compared the immune characteristics between high- and low-risk groups in the entire TCGA cohort containing 25 cancer types. We assessed immune cell infiltrations by CIBESORT and found that 19/22 immune cell abundances were significantly different between these two groups (Fig. S7A). Patients in the low-risk group showed a predominance of various immune cells at the tumor site, including a higher degree of infiltration by M0 macrophages, naïve B cells, activated CD4+ memory T cells, and so on. In contrast, the high-risk group exhibited increased levels of immunosuppressive cells, such as M2 macrophages, activated NK cells, and neutrophils (Fig. 8A). To validate the accuracy of CIBESORT results, we employed the ssGSEA approach to quantify the enrichment levels of 28 immune signatures from He et al.⁵⁹ (Fig. S7B). The distribution of 14 immune cell types according to immune infiltration scores generated by Danaher et al.⁶⁰ was also evaluated (Fig. S7C). Consistently, patients in the high-risk group displayed a significantly higher infiltration of immunosuppressive cells, including M2 macrophages, dendritic cells, regulatory T cells, and Th1 cells. Additionally, MCPcounter analysis revealed that the high-risk group exhibited a significantly greater abundance of fibroblasts (Fig. 8B). Cancer-associated fibroblasts have recently been identified as key promoters of immune evasion⁶¹. This suggests that fibroblasts may facilitate extrinsic immune escape in patients within the high-risk group.

**Fig. 8: Immune profiles of high- and low-risk in 25 cancer types from the TCGA.**

We also calculated enrichment scores for 10 canonical oncogenic pathways from Sanchez-Vega et al.⁶² using ssGSEA and compared them between the high- and low-risk groups (Fig. 8C). The Wnt signaling pathway and cell cycle were more prevalent in the high-risk group, while the NRF2, PI3K, and RAS pathways scored higher in the low-risk group (Wilcoxon p < 0.05). The Wnt pathway has been demonstrated linked to mechanisms of immunological exclusion⁶³. Aberrant activity of the cell cycle and its resulting genomic instability are hallmark features of cancer⁶⁴. NRF2, PI3K, and RAS pathways are implicated in critical processes such as oxidative stress response, cell growth and metabolism, and cell signaling. For instance, NRF2 is a key transcription factor in the cellular response to oxidative stress, and its activation plays a crucial role in regulating antioxidant responses and maintaining cellular homeostasis⁶⁵. These findings highlight distinct immune profiles associated with CDGS mutation status, potentially influencing tumor immune microenvironment dynamics and immune responses in cancer.

Additionally, we compared the expression levels of immune checkpoint genes (ICGs) in the high-risk group relative to the low-risk group to comprehensively characterize the immune profile (Fig. S7D). Several key ICGs, including PD1, PD-L1, and LAG3, were found to be significantly upregulated in the high-risk group (Fig. 8D–F). Notably, the expression of CTLA-4 was markedly increased in the high-risk group, with a logFC of 1.02 and an adj.p of 1.95E-63 (Fig. 8G). These results suggest that the high-risk group may have a potential response to treatment with these immune checkpoint inhibitors (ICIs). Consequently, we investigated the predictive capacity of CDGS mutations regarding the response to ICIs using the training and testing cohorts from Long et al.³⁰. We observed a significantly higher proportion of DCB in the high-risk group compared to the low-risk group (Fisher’s exact test, p < 0.05) (Fig. 8H, I). After receiving immunotherapy, the survival curves of the high-risk and low-risk groups exhibited convergence. The Miao et al. dataset from the testing cohort exhibited consistent results regarding the proportion of responders and survival (Fig. 8J). Two other datasets demonstrated even higher survival rates in the high-risk group, further supporting the potential of CDGS mutations as predictive markers for ICI response across the pan-cancer cohort (Fig. 8K, L).

Discussion

Cancer diagnosis and therapy hinge on the identification and comprehensive understanding of driver genes that play a causal role in cancer evolution⁶⁶. Given that less than 2% of the human genome encodes proteins, non-coding regions, which contain important regulatory elements such as promoters, 5′UTR, 3′UTR, and splice sites, are of critical importance. Therefore, conducting effective simultaneous functional studies of mutations in both coding and non-coding regions is essential for advancing our understanding of cancer biology and development. In response, we have reported genes with mutation enrichment regions within coding and non-coding genomic elements as putative drivers and developed the geMER method to accurately identify them.

The geMER can leverage the massive somatic mutations generated by WGS technology, and can also be compared with other published methods for identifying driver genes across the genome, demonstrating superior efficacy in the majority of cancer types. Compared to existing tools such as DriverPower, OncodriveFML, and ActiveDriverWGS, geMER offers distinct advantages, particularly in identifying driver mutations in non-coding regions. DriverPower integrates mutational burden with functional impact (FI) information³⁵. OncodriveFML models local mutational backgrounds to compute functional mutation bias of genomic elements¹⁵. Their accuracy relies on predicted FI scores that are more developed for coding mutations and poorly assessed in non-coding regions. ActiveDriverWGS tests mutation enrichment in predefined genomic elements based on expected background mutation models, which may limit its sensitivity to rare mutation patterns³. In contrast, geMER is hypothesis-free and does not depend on prior functional annotations or background mutation rates. The calculation of MAS and MES relies solely on the genomic positions of mutations and their counts in tumor samples. This approach allows geMER to effectively identify driver mutations with low frequency, especially in non-coding regions where functional annotations and FI scores are sparse or lacking. Therefore, geMER complements existing methods by expanding the scope of driver mutation detection, with enhanced sensitivity for rare and non-coding mutations.

Additionally, understanding how driver mutations contribute to widespread transcriptional dysregulation in cancer remains a fundamental challenge in cancer genomics. geMER advances driver gene identification by explicitly integrating the concept of mutational convergence within functional genomic elements. It incorporates MAS and MES to assess not only the statistical significance of mutational clustering but also the enrichment level of mutations in regulatory sequence modules, such as promoters, UTRs, and splice sites. By concentrating on functional genomic contexts, geMER could prioritize candidate driver mutations that are more likely to impact gene regulation and downstream transcriptional programs. Ultimately, to substantiate the role of driver genes identified by geMER in malignant phenotypes, we conducted comprehensive biological validation across multiple cancer types. Our results demonstrated that a substantial proportion of candidate driver genes exhibited significant associations with aberrant gene expression patterns and clinical prognosis, supporting their potential roles in tumorigenesis. Notably, genes such as PDGFRA in LGG, PPL in OV, and LAG3 in UCEC consistently showed significant differences in expression and survival outcomes when comparing mutant- and wild-type groups. Taken together, geMER establishes a critical link between statistical mutational enrichment and functional interpretation by pinpointing mutation hotspots within regulatory elements that have plausible biological impacts. This approach offers novel mechanistic insights into how driver genes may induce transcriptional changes, ultimately contributing to malignant phenotypes.

Our current analysis focuses on five gene-proximal regions (e.g., promoters, UTRs, CDS, splice sites), which may overlook potentially impactful mutations in distal regulatory elements such as enhancers and intergenic regions. These long-range regulatory mutations have been shown to play critical roles in cancer development, and several recent studies have developed computational frameworks to identify them through integrative approaches. For example, bpb3 implemented a Bayesian biophysical model to identify functional mutation blocks (FMBs) by integrating genomic and transcriptomic data, particularly those with high-frequency SNVs, altered in TFs binding affinity, and near the regulatory regions of differentially expressed genes²⁰. Similarly, the study by Fadason et al. incorporated DNA methylation, gene expression, chromatin organization, topologically associated domain (TAD), and mutations within enhancers to identify mutation block-gene pairs, thereby uncovering long-range regulatory mechanisms involved in cancer¹⁹. As geMER is designed to be hypothesis-free and annotation-independent, it provides a flexible foundation for expansion. Moving forward, geMER could be extended to incorporate distal and unannotated regulatory elements, thereby improving the identification of non-coding driver mutations located far from gene bodies.

We further identified CDGS by comprehensively integrating biology pathways, mutation frequencies, and the enrichment of the CGC gene set, thereby highlighting the most influential driver genes across multiple cancer types. To mitigate potential bias from frequently mutated genes that may not be true drivers, we employed geMER to identify candidate driver genes based on mutation enrichment rather than raw mutation frequency. This enrichment-based approach helps capture functionally relevant genes that may be infrequently mutated but still play a critical role, including those in non-coding regions. In addition, we incorporated pathway enrichment analysis to retain functional drivers with potential biological relevance. The inclusion of the CGC gene enrichment further validated the reliability of the CDGS identification. These combined strategies increase the likelihood that the genes identified as part of CDGS are genuine functional drivers.

The CDGS enriched 10 known cancer genes from the CGC list, while the remaining have been reported as related to cancers through extensive studies. For instance, RYR3 has been associated with the risk and calcification of breast cancer⁶⁷. In this study, RYR3 was downregulated in BRCA compared to normal tissues, indicating its potential role as a TSG (Fig. 5A). Deep deletions of RYR3 were particularly observed in BRCA and OV (Fig. 4B). Similarly, DMD was identified as a potential TSG with its expression significantly downregulated across 24 cancer types. Amplification and deep deletions of DMD were frequently observed in OV and STAD. Genomic deletions of DMD have been characterized in a subgroup of progressive or higher-grade meningiomas with poor outcomes⁶⁸. The RELN pathway has been recognized as a target for diagnosis and therapy in pancreatic cancer⁶⁹. Consistently, our analysis identified RELN as a candidate driver gene in PAAD, where its expression was significantly elevated compared to normal tissues (Fig. 5A). Polymorphisms of COL6A3 have been potentially linked to lung cancer risk⁷⁰. Meanwhile, we found that high proportions of patients with COL6A3 mutant and expression dysregulated in multiple cancers, including LUAD (Fig. S4B). Additionally, LAMA1 was significantly upregulated in esophageal squamous cell carcinoma, correlating with the aggressive oncogenic phenotype⁷¹. This observation is consistent with our findings, as depicted in Fig. 5A. Thus, CDGS comprises both well-established and potential cancer drivers, whose mutations could significantly influence tumor occurrence and development.

In our study, we not only explored the potential driver mechanisms of CDGS at the pan-cancer level but also demonstrated its relevance to individual cancer types, such as breast cancer. Specifically, the relative distributions of somatic mutations and copy number alterations indicated a significant proportion of CDGS genes exhibit genomic instability in BRCA (Fig. 4B). Furthermore, CDGS genes were enriched in pathways associated with breast cancer (Fig. 4D). When comparing tumor tissue to normal tissue, nearly all CDGS genes exhibit significant differential expression in BRCA (Fig. 5A), with notable differences in their methylation levels (Fig. 5H). Interestingly, we identified a high proportion of COL6A3-mutated samples with expression dysregulation, indicating a potentially important role of COL6A3 in breast cancer progression (Fig. S4B). Additionally, the mutational status of CDGS was significantly associated with OS in BRCA patients (Fig. S5A). These findings underscore the utility of our pipeline in identifying novel pan-cancer driver genes, as well as its potential to offer valuable insights for clinical research and application in specific cancer types.

Finally, we confirmed that the collective mutation status of CDGS served as an independent prognostic factor across the pan-cancer cohort. These mutations were associated with distinct immune profiles, potentially influencing the tumor immune microenvironment and modulating immune responses. High-risk patients were more likely to develop an immunosuppressive microenvironment and showed a potential responsiveness to ICIs treatment. Looking ahead, we propose that future studies should incorporate pre-treatment survival data to directly evaluate the predictive value of CDGS mutations for immunotherapy outcomes.

In conclusion, the geMER method effectively identified potential driver genes for various cancer types by researching mutation enrichment regions within coding and non-coding genomic elements. CDGS shared by multiple cancers offered valuable insights into the underlying oncogenic mechanisms, whose mutations accurately predicted patient survival outcomes and informed decisions related to immunotherapy strategies.

Methods

Data resources

The multiple omics data for 33 cancer types of the Cancer Genome Atlas (TCGA) project were downloaded from the UCSC Xena Database (https://xenabrowser.net/datapages/). These comprehensive datasets encompassed somatic mutation of whole cancer genomes generated by MuTect2 via the DNA Seq Variant Calling Pipeline, RNA-sequencing (RNA-seq) data produced using the Illumina HiSeq platform and normalized through the log(fpkm+1) transformation process, copy number variation generated using the Illumina platform computed by GISTIC2, DNA methylation data generated on the Illumina Human Methylation 450 K platform, and clinicopathological information. Except for individual cancer types, we downloaded RNA-seq and methylation data of the PANCAN cohort and extracted samples of 25 cancer types associated with CDGS for pan-cancer analysis.

The genomic element set containing the coding sequences (CDS) (n = 20,185), splice sites (n = 18,729), 3′ untranslated regions (3′UTRs) (n = 19,369), 5’ untranslated regions (5′UTRs) (n = 19,188), and promoters (n = 20,164) were generated by PCAWG and downloaded from DriverPower (https://github.com/smshuai/DriverPower/tree/master/data)³⁵. Their genome coordinates were converted from GENCODE v37 to GENCODE v38 using the online UCSC Liftover tool to align them with somatic mutations from TCGA.

RNA-Seq data for normal tissues of healthy individuals was from the Genotype-Tissue Expression (GTEx) dataset (https://commonfund.nih.gov/GTEx/). For the purpose of comparing the expression levels with tumor tissues from TCGA, the GTEx expression values were converted from log2(fpkm+0.01) to log2(fpkm+1).

The somatic mutations of 34 cancer types were downloaded from the Catalogue Of Somatic Mutations In Cancer (COSMIC) dataset (https://cancer.sanger.ac.uk/cosmic), which is the world’s most comprehensive resource for investigating the somatic mutations’ influence on human tumors⁷².

Multiple-omics signal abundances of various cancer cell lines were downloaded from the Encyclopedia of DNA Elements (ENCODE) database⁵¹. Specifically, we downloaded transcriptomic data of the G401 and Caki2 cell lines derived from KIRP or KIRC, and SK-N-DZ cell lines derived from LGG. For PAAD, we downloaded transcriptomic, DNA methylation, and histone modification data from the Panc1 cell line. For LIHC and LUAD, we downloaded transcriptomic, DNA methylation, histone modification, and transcription factor binding data from the HepG2 cell and A549 cell, respectively.

Eleven datasets of eight cancer types simultaneously containing somatic mutations and prognostic information as a prognosis validation set were from the International Cancer Genome Consortium (ICGC) (https://dcc.icgc.org/). These datasets included COAD-COCA-CN, KIRC-RECA-EU, LIHC-LICA-CN, LIHC-LINC-JP, LIHC-LIRI-JP, KIRC-RECA-EU, PAAD-PACA-AU, PAAD-PACA-CA, SKCM-SKCA-BR, STAD-GACA-CN and UCS-UTCA-FR.

The training and testing cohort including clinical information for patients undergoing ICI immunotherapy were from Long et al.³⁰. Mutation data of the training cohort TMB and Immunotherapy (MSK, Nat Genet 2019)⁷³ were downloaded from the cBioPortal database (https://www.cbioportal.org). For the testing cohort, we maintained the clinical information provided by Van Allen et al.⁷⁴, Liu et al.⁷⁵, Miao et al.^76,77, Riaz et al.⁷⁸, and Hugo et al.⁷⁹ for whom mutation data and ICI immunotherapy response outcomes were accessible. We categorized the immunotherapy response outcomes as durable clinical benefit (DCB) containing complete response (CR) or partial response (PR) or stable disease (SD) with progression-free survival (PFS) lasted > 6 months, and no durable benefit (NDB) containing progressive disease (PD) or SD with PFS lasted≤6 months according to Long et al.³⁰.

Identification of candidate drivers for cancers across genomic elements by geMER

The mutation enrichment research of genomic elements (geMER) was established by integrating nonsynonymous somatic mutations and the sequence information of genomic elements. First, nonsynonymous somatic mutations were aligned to five genomic elements. For each genomic element, we retained genes with mutations observed in at least four tumor samples within each cancer type. Next, the mutation enrichment regions for genomic elements were detected by the mutation set enrichment analysis (MESA-clust) [20]. Specifically, geMER simulates a walker walking along the sequence of a given genomic element while recording a mutation accumulation score (MAS) which starts as 0. The change of MAS at each genomic site depends on their mutation occurrence and frequency in tumor patients. The maximum variation of MAS highlights the location and magnitude of mutation enrichment. For each genome site, MAS increases if patients encounter mutations and decreases without a mutation. The increment (${S}_{{inc}}$) and decrement (${S}_{{dec}}$) of MAS are separately calculated by:

$${S}_{{inc}}=1/\sum Y,{and}$$

(1)

$${S}_{{dec}}=1/Z$$

(2)

where Y represents a vector that records the number of mutations at each genome site, and L denotes the length of the genomic element sequence, that is, Y = (${y}_{1}$, …, ${y}_{L}$). Z denotes the number of non-mutated positions. The sum of increments is equal to the sum of decrements, both amounting to 1. Thus, MAS at the ${i}^{{th}}$ position is calculated by

$${{MAS}}_{i}=\sum _{j\in {L}^{M},j\le i}{y}_{j}\times {S}_{{inc}}-\sum _{j\notin {L}^{M},j\le i}{S}_{{dec}}$$

(3)

where ${L}^{M}$ denotes a vector of positions where mutations occur, and the increment at a mutant position is calculated by ${y}_{j}\times {S}_{{inc}}$, where 1 ≤ j ≤ L and $j\in {L}^{M}$. The decrease of positions without mutations is consistently ${S}_{{dec}}$. Therefore, MAS bridges of the genomic element sequence are expected to sharply increase over a short distance in region with mutation enrichment. Thus, the Mutation Enrichment Score (MES) is defined as the maximum deviation of MAS, which represents the enrichment region of mutations and the abundance of mutation enrichment:

$${MES}=\max \left({MAS}\right)-\min \left({MAS}\right)$$

(4)

An empirical p-value is employed to assess the significance of MES through a randomization-based test. The randomization process is conducted 1000 times. In each randomization, the number of mutations is kept the same as the actual mutation records, while the sites where mutations occur are randomly selected across the sequence, allowing for replacement. The p-value is calculated as

$$p=\frac{{MES}(\pi )\ge {MES}}{1000}$$

(5)

where ${MES}(\pi )$ is a MES vector of randomly selected mutations in the given genomic element. Benjamini-Hochberg procedure is applied to adjust the p-values. Genes with an adjusted p-value (adj.p) < 0.05 are considered candidate drivers for the given element.

Integrative pathway enrichment analysis of candidate drivers using ActivePathways

ActivePathways is an integrated approach that employs statistical data fusion, rationalizes contributing evidence, and highlights relevant genes to explore significant enrichment pathways across various datasets⁸⁰. It incorporates molecular pathways of the Reactome database and biological processes of Gene Ontology. To perform integrative pathway enrichment analysis, a matrix of adjusted p values for genomic elements from geMER was provided as input to ActivePathway, where the rows represent unique candidate driver genes and columns represent genomic elements. We completed ActivePathways with default parameters, corrected the enriched pathways by multiple tests, and filtered the significant pathways according to default settings (Holm Q < 0.05). A gene involved in biological pathways identified by ActivePathway was considered a functional driver.

Identification of a core driver gene set for multiple cancer types

To identify a set of the most influential driver genes across pan-cancer and explore potential common mechanisms of tumorigenesis, we identified a core driver gene set shared by multiple cancer types (Fig. 4A). We first mapped the candidate driver genes of genomic elements identified by geMER into biological pathways by ActivePathway across cancer types. The functional drivers that comprehensively regulated the biological processes and molecular pathways were recognized within corresponding cancer types. Next, we integrated somatic mutation data for the pan-cancer cohort containing cancer types corresponding to functional drivers. The mutation frequency of a functional driver was calculated as its mutation records divided by the number of mutations across all genes in the pan-cancer cohort. The functional drivers were subsequently ranked decreasingly based on mutation frequency. Finally, we estimated the enrichment between the top gene set of ranked functional drivers and the Cancer Gene Census (CGC) gene list. Specifically, n represents the number of ranked functional drivers. For top i (i = 2, 3, …, n) functional drivers, we separately computed their overlap proportion with CGC genes. To evaluate the statistical significance of enrichment, the p-value was calculated using the hypergeometric test:

$$P(k,N,M,i)=\frac{\left(\begin{array}{c}M\\ k\end{array}\right)\left(\begin{array}{c}N-M\\ i-k\end{array}\right)}{\left(\begin{array}{c}N\\ i\end{array}\right)}$$

(6)

where N represents the total number of a combination list for functional drivers and CGC genes, M denotes the number of CGC genes, and k refers to the number of top functional drivers included within CGC genes. A set of functional drivers with the significantly highest proportion of CGC genes was considered a CDGS.

Analysis of Somatic mutations and copy number alteration for CDGS

Copy-number values equal to 1 and −1 were considered to be amplification and deep deletion, respectively. Relative distribution of mutation, amplification, and deep deletion frequencies for CDGS genes were respectively calculated by the occurrence number of an individual gene dividing the total occurrence numbers of CDGS genes across patients, multiplied by one hundred percent. OncoPrint plots of CDGS somatic mutations and copy number variation in the pan-cancer cohort were generated by the “ComplexHeatmap” R package⁸¹. Co-occurrence and mutual exclusivity of CDGS mutations were plotted using the R package “maftools”⁸². Protein family (Pfam) encoded by the somatic mutations of CDGS were computed by Variant Effect Predictor (VEP)⁸³. The clusters of the enrichment region mutations within protein domains of TTN in LUAD were shown by ProteinPaint⁸⁴. The three-dimensional structure of I-set Ig domains was modeled by SWISS-MODEL⁸⁵ and the alterations in amino acids within the protein structure were visualized with PyMOL.

Statistical analysis

The hypergeometric test was used to evaluate the enrichment of known cancer genes from the CGC gene set. Differential expression, methylation, and pathway activity analyses between comparison groups were conducted using the R package “limma”. A gene with adj.p < 0.05 as well as fold change (FC) > 2 or < -2 was considered significantly different. CpG sites missing in more than 5% samples were excluded from the methylation profile⁸⁶. The impute.knn function from R package “impute” was applied to impute the retained CpGs according to the K Nearest Neighbor approach. The M-value transformation was used to normalize methylation distribution and stabilize the variance: M-value = log2(Beta/(1-Beta)) for beta values. Kaplan-Meier survival curves were implemented to determine the association between CDGS mutations and prognosis. The log-rank test was used to calculate the statistical significance of the comparison groups. Correlation analysis was completed using the Pearson correlation.

Data availability

All data analyzed during this study are publicly available as detailed in the data resources section of methods. All data supporting the findings of the current study are listed in Supplementary Tables and our online data portal (http://bio-bigdata.hrbmu.edu.cn/geMER/). In addition, the original source codes (implemented in R and Java) and test datasets for geMER have been deposited in the GitHub repository (https://github.com/S-tian428/geMER).

Abbreviations

CDGS:: Core driver gene set
WGS:: Whole genome sequencing
TFs:: Transcription factors
erTFs:: TFs targeting the enrichment region
ICIs:: Immune checkpoint inhibitors
TCGA:: The Cancer Genome Atlas
RNA-seq:: RNA-sequencing
GTEx:: Genotype-Tissue Expression
COSMIC:: Catalogue Of Somatic Mutations In Cancer
ICGC:: International Cancer Genome Consortium
DCB:: Durable clinical benefit
CR:: Complete response
PR:: Partial response
SD:: Stable disease
OS:: Overall survival
NDB:: No durable benefit
PD:: Progressive disease
CDS:: Coding sequences
geMER:: The Mutation Enrichment Research of Genomic Elements
MAS:: Mutation accumulation score
MES:: Mutation Enrichment Score
adj.p:: Adjust p-value
CGC:: Cancer Gene Census
VEP:: Variant Effect Predictor
FC:: Fold change
IHC:: Immunohistochemical
Pfam:: Protein family
FN3:: Fibronectin-type III domain
I-set:: Immunoglobulin I-set domain
ENCODE:: Encyclopedia of DNA Elements
Driver & Same-Region:: Cancers where APOB identified as a driver with mutation enrichment regions overlapping those in LIHC.
Driver & Different-Region:: Cancers where APOB was a driver but with mutation enrichment regions differing from LIHC.
Non-Driver:: Cancers where APOB was not a driver
PPI:: Protein-protein interaction
ssGSEA:: Single-sample Gene Set Enrichment Analysis
GSEA:: Gene Set Enrichment Analysis
DC:: Dendritic cells
TSG:: Tumor suppressor gene
Treg:: Regulatory T cells
ICGs:: Immune checkpoint genes
ACC:: Adrenocortical Cancer
BLCA:: Bladder Cancer
BRCA:: Breast Cancer
CHOL:: Bile Duct Cancer
COAD:: Colon Cancer
KICH:: Kidney Chromophobe
KIRC:: Kidney Clear Cell Carcinoma
LAML:: Acute Myeloid Leukemia
LGG:: Lower Grade Glioma
LIHC:: Liver Cancer
LUAD:: Lung Adenocarcinoma
OV:: Ovarian Cancer
PAAD:: Pancreatic Cancer
PCPG:: Pheochromocytoma & Paraganglioma
PRAD:: Prostate Cancer
READ:: Rectal Cancer
SKCM:: Melanoma
STAD:: Stomach Cancer
THCA:: Hyroid Cancer
THYM:: Thymoma
UCEC:: Endometrioid Cancer

References

Stratton, M. R. Exploring the genomes of cancer cells: progress and promise. Science331, 1553–1558 (2011).
Article PubMed Google Scholar
Stratton, M. R., Campbell, P. J. & Futreal, P. A. The cancer genome. Nature 458, 719–724 (2009).
Article PubMed Central PubMed Google Scholar
Zhu, H. et al. Candidate cancer driver mutations in distal regulatory elements and long-range chromatin interaction networks. Mol. cell 77, 1307–1321.e1310 (2020).
Article PubMed Google Scholar
Tokheim, C. J., Papadopoulos, N., Kinzler, K. W., Vogelstein, B. & Karchin, R. Evaluating the evaluation of cancer driver genes. Proc. Natl Acad. Sci. USA 113, 14330–14335 (2016).
Article PubMed Central PubMed Google Scholar
Futreal, P. A. et al. A census of human cancer genes. Nat. Rev. Cancer 4, 177–183, https://doi.org/10.1038/nrc1299 (2004).
Article PubMed Central PubMed Google Scholar
Martínez-Jiménez, F. et al. A compendium of mutational cancer driver genes. Nat. Rev. Cancer 20, 555–572 (2020).
Article PubMed Google Scholar
Piraino, S. W. & Furney, S. J. Beyond the exome: the role of non-coding somatic mutations in cancer. Ann. Oncol. 27, 240–248 (2016).
Article PubMed Google Scholar
Huang, F. W. et al. Highly recurrent TERT promoter mutations in human melanoma. Science 339, 957–959 (2013).
Article PubMed Central PubMed Google Scholar
Vinagre, J. et al. Frequency of TERT promoter mutations in human cancers. Nat. Commun. 4, 2185 (2013).
Article PubMed Google Scholar
Puente, X. S. et al. Non-coding recurrent mutations in chronic lymphocytic leukaemia. Nature 526, 519–524 (2015).
Article PubMed Google Scholar
Pham, V. V. H. et al. Computational methods for cancer driver discovery: A survey. Theranostics 11, 5553–5568 (2021).
Article PubMed Central PubMed Google Scholar
He, Z., Lin, Y., Wei, R., Liu, C. & Jiang, D. Repulsion and attraction in searching: A hybrid algorithm based on gravitational kernel and vital few for cancer driver gene prediction. Comput Biol. Med. 151, 106236 (2022).
Article PubMed Google Scholar
Han, Y. et al. DriverML: a machine learning algorithm for identifying driver genes in cancer sequencing studies. Nucleic Acids Res. 47, e45 (2019).
Article PubMed Central PubMed Google Scholar
Reimand, J. & Bader, G. D. Systematic analysis of somatic mutations in phosphorylation signaling predicts novel cancer drivers. Mol. Syst. Biol. 9, 637 (2013).
Article PubMed Central PubMed Google Scholar
Mularoni, L., Sabarinathan, R., Deu-Pons, J., Gonzalez-Perez, A. & López-Bigas, N. OncodriveFML: a general framework to identify coding and non-coding regions with cancer driver mutations. Genome Biol. 17, 128 (2016).
Article PubMed Central PubMed Google Scholar
Bashashati, A. et al. DriverNet: uncovering the impact of somatic driver mutations on transcriptional networks in cancer. Genome Biol. 13, R124 (2012).
Article PubMed Central PubMed Google Scholar
Hou, J. P. & Ma, J. DawnRank: discovering personalized driver genes in cancer. Genome Med. 6, 56 (2014).
Article PubMed Central PubMed Google Scholar
Cho, A. et al. MUFFINN: cancer gene discovery via network analysis of somatic mutation data. Genome Biol. 17, 129 (2016).
Article PubMed Central PubMed Google Scholar
Farooq, A., Trøen, G., Delabie, J. & Wang, J. Integrating whole genome sequencing, methylation, gene expression, topological associated domain information in regulatory mutation prediction: a study of follicular lymphoma. Comput Struct. Biotechnol. J. 20, 1726–1742 (2022).
Article PubMed Central PubMed Google Scholar
Yang, M., Ali, O., Bjørås, M. & Wang, J. Identifying functional regulatory mutation blocks by integrating genome sequencing and transcriptome data. iScience 26, 107266 (2023).
Article PubMed Central PubMed Google Scholar
Batmanov, K., Wang, W., Bjørås, M., Delabie, J. & Wang, J. Integrative whole-genome sequence analysis reveals roles of regulatory mutations in BCL6 and BCL2 in follicular lymphoma. Sci. Rep. 7, 7040 (2017).
Article PubMed Central PubMed Google Scholar
Wang, J. et al. Predicting regulatory mutations and their target genes by new computational integrative analysis: a study of follicular lymphoma. Comput Biol. Med. 178, 108787 (2024).
Article PubMed Google Scholar
Liu, S. H. et al. DriverDBv3: a multi-omics database for cancer driver gene research. Nucleic Acids Res. 48, D863–d870 (2020).
PubMed Google Scholar
Jia, P. et al. MSEA: detection and quantification of mutation hotspots through mutation set enrichment analysis. Genome Biol. 15, 489 (2014).
Article PubMed Central PubMed Google Scholar
Przytycki, P. F. & Singh, M. Differential analysis between somatic mutation and germline variation profiles reveals cancer-related genes. Genome Med. 9, 79 (2017).
Article PubMed Central PubMed Google Scholar
Ping, Y. et al. Identifying core gene modules in glioblastoma based on multilayer factor-mediated dysfunctional regulatory networks through integrating multi-dimensional genomic data. Nucleic Acids Res. 43, 1997–2007 (2015).
Article PubMed Central PubMed Google Scholar
Boca, S. M., Kinzler, K. W., Velculescu, V. E., Vogelstein, B. & Parmigiani, G. Patient-oriented gene set analysis for cancer mutation data. Genome Biol. 11, R112 (2010).
Article PubMed Central PubMed Google Scholar
Saul, D. et al. A new gene set identifies senescent cells and predicts senescence-associated pathways across tissues. Nat. Commun. 13, 4827 (2022).
Article PubMed Central PubMed Google Scholar
Clarke, D. J. B. et al. Gene set predictor for post-treatment Lyme disease. Cell Rep. Med 3, 100816 (2022).
Article PubMed Central PubMed Google Scholar
Long, J. et al. A mutation-based gene set predicts survival benefit after immunotherapy across multiple cancers and reveals the immune response landscape. Genome Med. 14, 20 (2022).
Article PubMed Central PubMed Google Scholar
Klein, M. I. et al. Identifying modules of cooperating cancer drivers. Mol. Syst. Biol. 17, e9810 (2021).
Article PubMed Central PubMed Google Scholar
Elliott, K. & Larsson, E. Non-coding driver mutations in human cancer. Nat. Rev. Cancer 21, 500–509 (2021).
Article PubMed Google Scholar
Pan-cancer analysis of whole genomes. Nature 578, 82-93 (2020).
Forbes, S. A. et al. COSMIC: exploring the world’s knowledge of somatic mutations in human cancer. Nucleic acids Res. 43, D805–D811 (2015).
Article PubMed Google Scholar
Shuai, S., Gallinger, S. & Stein, L. D. Combined burden and functional impact tests for cancer driver discovery using DriverPower. Nat. Commun. 11, 734 (2020).
Article PubMed Central PubMed Google Scholar
Occhi, G. et al. A novel mutation in the upstream open reading frame of the CDKN1B gene causes a MEN4 phenotype. PLoS Genet. 9, e1003350 (2013).
Article PubMed Central PubMed Google Scholar
Chen, S. et al. CircCDK14 Promotes tumor progression and resists ferroptosis in glioma by regulating PDGFRA. Int J. Biol. Sci. 18, 841–857 (2022).
Article PubMed Central PubMed Google Scholar
Salerno, E. P. et al. Human melanomas and ovarian cancers overexpressing mechanical barrier molecule genes lack immune signatures and have increased patient mortality risk. Oncoimmunology 5, e1240857 (2016).
Article PubMed Central PubMed Google Scholar
Uhlén, M. et al. Proteomics. Tissue-based map of the human proteome. Science 347, 1260419 (2015).
Article PubMed Google Scholar
Andrews, L. P. et al. Molecular pathways and mechanisms of LAG3 in cancer therapy. Clin. Cancer Res. 28, 5030–5039 (2022).
Article PubMed Central PubMed Google Scholar
Koch, A., Jeschke, J., Van Criekinge, W., van Engeland, M. & De Meyer, T. MEXPRESS update 2019. Nucleic Acids Res. 47, W561–w565, https://doi.org/10.1093/nar/gkz445 (2019).
Article PubMed Central PubMed Google Scholar
Mehlen, P., Delloye-Bourgeois, C. & Chédotal, A. Novel roles for Slits and netrins: axon guidance cues as anticancer targets?. Nat. Rev. Cancer 11, 188–197 (2011).
Article PubMed Google Scholar
Biankin, A. V. et al. Pancreatic cancer genomes reveal aberrations in axon guidance pathway genes. Nature 491, 399–405 (2012).
Article PubMed Central PubMed Google Scholar
Chong, C. R. & Jänne, P. A. The quest to overcome resistance to EGFR-targeted therapies in cancer. Nat. Med. 19, 1389–1400 (2013).
Article PubMed Central PubMed Google Scholar
Ruan, R. et al. Unleashing the potential of combining FGFR inhibitor and immune checkpoint blockade for FGF/FGFR signaling in tumor microenvironment. Mol. Cancer 22, 60 (2023).
Article PubMed Central PubMed Google Scholar
Lorente-Trigos, A., Varnat, F., Melotti, A. & Ruiz i Altaba, A. BMP signaling promotes the growth of primary human colon carcinomas in vivo. J. Mol. Cell Biol. 2, 318–332 (2010).
Article PubMed Google Scholar
Zhang, K. J. et al. Notch1 signaling inhibits growth of EC109 esophageal carcinoma cells through downmodulation of HPV18 E6/E7 gene expression. Acta Pharm. Sin. 30, 153–158 (2009).
Article Google Scholar
El Tekle, G. et al. Co-occurrence and mutual exclusivity: what cross-cancer mutation patterns can tell us. Trends Cancer 7, 823–836 (2021).
Article PubMed Google Scholar
Li, J. et al. Spatial whole exome sequencing reveals the genetic features of highly-aggressive components in lung adenocarcinoma. Neoplasia 54, 101013 (2024).
Article PubMed Central PubMed Google Scholar
Teichmann, S. A. & Chothia, C. Immunoglobulin superfamily proteins in Caenorhabditis elegans. J. Mol. Biol. 296, 1367–1383 (2000).
Article PubMed Google Scholar
The ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Chen, K. W. & Chen, L. Epigenetic regulation of BDNF gene during development and diseases. Int. J. Mol. Sci. 18, https://doi.org/10.3390/ijms18030571 (2017).
Jiao, Y. et al. The modification role and tumor association with a methyltransferase: KMT2C. Front Immunol. 15, 1444923 (2024).
Article PubMed Central PubMed Google Scholar
Han, H. et al. TRRUST v2: an expanded reference database of human and mouse transcriptional regulatory interactions. Nucleic Acids Res. 46, D380–d386 (2018).
Article PubMed Google Scholar
Zheng, R. et al. Cistrome Data Browser: expanded datasets and new tools for gene regulatory analysis. Nucleic Acids Res. 47, D729–d735 (2019).
Article PubMed Google Scholar
Li, Y., Zhang, S. W., Xie, M. Y. & Zhang, T. PhenoDriver: interpretable framework for studying personalized phenotype-associated driver genes in breast cancer. Brief Bioinform 24, https://doi.org/10.1093/bib/bbad291 (2023).
Liberzon, A. et al. The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst. 1, 417–425 (2015).
Article PubMed Central PubMed Google Scholar
Gourmet, L., Sottoriva, A., Walker-Samuel, S., Secrier, M. & Zapata, L. Immune evasion impacts the landscape of driver genes during cancer evolution. Genome Biol. 25, 168 (2024).
Article PubMed Central PubMed Google Scholar
He, Y., Jiang, Z., Chen, C. & Wang, X. Classification of triple-negative breast cancers based on Immunogenomic profiling. J. Exp. Clin. Cancer Res. 37, 327 (2018).
Article PubMed Central PubMed Google Scholar
Danaher, P. et al. Gene expression markers of Tumor Infiltrating Leukocytes. J. Immunother. Cancer 5, 18 (2017).
Article PubMed Central PubMed Google Scholar
Arpinati, L. & Scherz-Shouval, R. From gatekeepers to providers: regulation of immune functions by cancer-associated fibroblasts. Trends Cancer 9, 421–443 (2023).
Article PubMed Google Scholar
Sanchez-Vega, F. et al. Oncogenic signaling pathways in The Cancer Genome Atlas. Cell 173, 321–337.e310 (2018).
Article PubMed Central PubMed Google Scholar
Luke, J. J., Bao, R., Sweis, R. F., Spranger, S. & Gajewski, T. F. WNT/β-catenin pathway activation correlates with immune exclusion across human cancers. Clin. Cancer Res. 25, 3074–3083 (2019).
Article PubMed Central PubMed Google Scholar
Fuentes-Antrás, J., Bedard, P. L. & Cescon, D. W. Seize the engine: emerging cell cycle targets in breast cancer. Clin. Transl. Med. 14, e1544 (2024).
Article PubMed Central PubMed Google Scholar
Adinolfi, S. et al. The KEAP1-NRF2 pathway: Targets for therapy and role in cancer. Redox Biol. 63, 102726 (2023).
Article PubMed Central PubMed Google Scholar
Zhang, S. W., Xu, J. Y. & Zhang, T. DGMP: identifying cancer driver genes by jointing DGCN and MLP from multi-omics genomic data. Genomics Proteom. Bioinform. 20, 928–938 (2022).
Article Google Scholar
Zhang, L. et al. Functional SNP in the microRNA-367 binding site in the 3’UTR of the calcium channel ryanodine receptor gene 3 (RYR3) affects breast cancer risk and calcification. Proc. Natl Acad. Sci. USA 108, 13653–13658 (2011).
Article PubMed Central PubMed Google Scholar
Juratli, T. A. et al. DMD genomic deletions characterize a subset of progressive/higher-grade meningiomas with poor outcome. Acta Neuropathol. 136, 779–792 (2018).
Article PubMed Google Scholar
Sato, N., Fukushima, N., Chang, R., Matsubayashi, H. & Goggins, M. Differential and epigenetic gene expression profiling identifies frequent disruption of the RELN pathway in pancreatic cancers. Gastroenterology 130, 548–565 (2006).
Article PubMed Google Scholar
Duan, Y. et al. COL6A3 polymorphisms were associated with lung cancer risk in a Chinese population. Respiratory Res. 20, 143 (2019).
Article Google Scholar
Zhou, P. L. et al. Circular RNA hsa_circ_0000277 sequesters miR-4766-5p to upregulate LAMA1 and promote esophageal carcinoma progression. Cell Death Dis. 12, 676 (2021).
Article PubMed Central PubMed Google Scholar
Tate, J. G. et al. COSMIC: the Catalogue Of Somatic Mutations In Cancer. Nucleic Acids Res. 47, D941–d947 (2019).
Article PubMed Google Scholar
Samstein, R. M. et al. Tumor mutational load predicts survival after immunotherapy across multiple cancer types. Nat. Genet. 51, 202–206 (2019).
Article PubMed Central PubMed Google Scholar
Van Allen, E. M. et al. Genomic correlates of response to CTLA-4 blockade in metastatic melanoma. Science 350, 207–211 (2015).
Article PubMed Central PubMed Google Scholar
Liu, D. et al. Integrative molecular and clinical modeling of clinical outcomes to PD1 blockade in patients with metastatic melanoma. Nat. Med. 25, 1916–1927 (2019).
Article PubMed Central PubMed Google Scholar
Miao, D. et al. Genomic correlates of response to immune checkpoint blockade in microsatellite-stable solid tumors. Nat. Genet. 50, 1271–1281 (2018).
Article PubMed Central PubMed Google Scholar
Miao, D. et al. Genomic correlates of response to immune checkpoint therapies in clear cell renal cell carcinoma. Science 359, 801–806 (2018).
Article PubMed Central PubMed Google Scholar
Riaz, N. et al. Tumor and microenvironment evolution during immunotherapy with nivolumab. Cell 171, 934–949.e916 (2017).
Article PubMed Central PubMed Google Scholar
Hugo, W. et al. Genomic and transcriptomic features of response to anti-PD-1 therapy in metastatic melanoma. Cell 165, 35–44 (2016).
Article PubMed Central PubMed Google Scholar
Paczkowska, M. et al. Integrative pathway enrichment analysis of multivariate omics data. Nat. Commun. 11, 735 (2020).
Article PubMed Central PubMed Google Scholar
Gu, Z., Eils, R. & Schlesner, M. Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics32, 2847–2849 (2016).
Article PubMed Google Scholar
Mayakonda, A., Lin, D. C., Assenov, Y., Plass, C. & Koeffler, H. P. Maftools: efficient and comprehensive analysis of somatic variants in cancer. Genome Res. 28, 1747–1756 (2018).
Article PubMed Central PubMed Google Scholar
McLaren, W. et al. The ensembl variant effect predictor. Genome Biol. 17, 122 (2016).
Article PubMed Central PubMed Google Scholar
Zhou, X. et al. Exploring genomic alteration in pediatric cancer using ProteinPaint. Nat. Genet. 48, 4–6 (2016).
Article PubMed Central PubMed Google Scholar
Waterhouse, A. et al. SWISS-MODEL: homology modelling of protein structures and complexes. Nucleic Acids Res 46, W296–w303 (2018).
Article PubMed Central PubMed Google Scholar
Yu, X. et al. Tumor expression quantitative trait methylation screening reveals distinct CpG panels for deconvolving cancer immune signatures. Cancer Res. 82, 1724–1735 (2022).
Article PubMed Central PubMed Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China [32170674, 82172353, 32070672, 32370718], the Scientific Research Project of Heilongjiang Province of China [CZKYF2021-2-B012], and the Outstanding Youth Foundation of Heilongjiang Province of China [YQ2022C034].

Author information

These authors contributed equally: Jing Gan, Yuncong Wang

Authors and Affiliations

College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang, China
Jing Gan, Yuncong Wang, Haoyu Hu, Manyi Xu, Xinrong Li, Wenbo Dong, Jiaheng He, Yusen Zhao, Yakun Zhang, Yue Sun, Caiyu Zhang, Qianyi Lu, Shangwei Ning, Yan Jin & Hui Zhi
The Second Affiliated Hospital of Harbin Medical University, Harbin, Heilongjiang, China
Zhuoran Shi

Authors

Jing Gan
View author publications
Search author on:PubMed Google Scholar
Yuncong Wang
View author publications
Search author on:PubMed Google Scholar
Zhuoran Shi
View author publications
Search author on:PubMed Google Scholar
Haoyu Hu
View author publications
Search author on:PubMed Google Scholar
Manyi Xu
View author publications
Search author on:PubMed Google Scholar
Xinrong Li
View author publications
Search author on:PubMed Google Scholar
Wenbo Dong
View author publications
Search author on:PubMed Google Scholar
Jiaheng He
View author publications
Search author on:PubMed Google Scholar
Yusen Zhao
View author publications
Search author on:PubMed Google Scholar
Yakun Zhang
View author publications
Search author on:PubMed Google Scholar
Yue Sun
View author publications
Search author on:PubMed Google Scholar
Caiyu Zhang
View author publications
Search author on:PubMed Google Scholar
Qianyi Lu
View author publications
Search author on:PubMed Google Scholar
Shangwei Ning
View author publications
Search author on:PubMed Google Scholar
Yan Jin
View author publications
Search author on:PubMed Google Scholar
Hui Zhi
View author publications
Search author on:PubMed Google Scholar

Contributions

H.Z., J.Y., and S.W.N. designed and directed all the research. J.G., Y.C.W., Z.R.S., H.Y.H., M.Y.X., X.R.L., W.B.D., J.H.H., Y.S.Z., Y.K.Z., Y.S., C.Y.Z., and Q.Y.L. performed the data processing and experimental analysis. H.Z. and J.G. drafted the manuscript. All authors reviewed and approved the final version of the manuscript.

Corresponding authors

Correspondence to Shangwei Ning, Yan Jin or Hui Zhi.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information (download PDF )

Supplementary Information (download XLSX )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Gan, J., Wang, Y., Shi, Z. et al. A core driver gene set identified based on geMER reveals its potential driver mechanism in pan-cancer. npj Precis. Onc. 9, 278 (2025). https://doi.org/10.1038/s41698-025-01060-y

Download citation

Received: 11 April 2025
Accepted: 17 July 2025
Published: 09 August 2025
Version of record: 09 August 2025
DOI: https://doi.org/10.1038/s41698-025-01060-y