Abstract
Cancer development is driven by a small subset of somatic mutations, known as driver mutations, that disrupt key regulatory processes in cells. These mutations occur in specific genes, called cancer driver genes, whose altered functions promote tumor initiation and progression. Accurately identifying driver genes remains a major challenge due to their rarity and the overwhelming presence of passenger mutations. Recent advances in graph-based deep learning have improved the modeling of gene interactions, but most approaches are limited to pairwise connections and fail to capture the higher-order complexity of biological systems. We introduce ONCOPLEX, a hypergraph-based neural network framework that models genes as nodes and curated cancer-related pathways as hyperedges, enabling the representation of multi-gene interactions. Unlike previous methods, ONCOPLEX integrates diverse molecular and phenotypic features, such as somatic mutations, gene expression, and DNA methylation, into a pathway-informed hypergraph structure to learn biologically meaningful gene representations. ONCOPLEX is trained in a supervised manner on labeled driver and non-driver genes, with unlabeled genes included as nodes during representation learning. Comprehensive evaluations across pan-cancer and cancer-type-specific settings show that ONCOPLEX consistently outperforms state-of-the-art methods in classification and ranking metrics. It accurately recovers known driver genes and highlights novel candidates supported by literature and enrichment analyses. These findings underscore the power of pathway-guided hypergraph modeling for advancing cancer driver gene discovery.
Data availability
Core features, including gene expression, mutation, and methylation profiles, were obtained from The Cancer Genome Atlas (TCGA) via the GDC portal (https://portal.gdc.cancer.gov/). The comprehensive feature set was downloaded from DORGE (https://doi.org/10.1126/sciadv.aba6784). Graphs were constructed using MSigDB pathway data (https://www.gsea-msigdb.org/). Cancer driver gene labels were collected from the Network of Cancer Genes v6.0, DigSee, COSMIC CGC v91, and IntOGen v2024.09.204. All processed data, including node features, graph structures, and gene labels, are available at: https://github.com/etab12/ONCOPLEX.
References
The national cancer institute (nci). https://www.cancer.gov/.
Hanahan, D. & Weinberg, R. A. Hallmarks of cancer: The next generation. Cell144, 646–674 (2011).
Hanahan, D. Hallmarks of cancer: New dimensions. Cancer Discov.12, 31–46 (2022).
Vogelstein, B. et al. Cancer genome landscapes. Science (New York, N.Y.) https://doi.org/10.1126/science.1235122 (2013).
Alexandrov, L. et al. Signatures of mutational processes in human cancer. Nature500, 415–421. https://doi.org/10.1038/nature12477 (2013).
Lawrence, M. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature https://doi.org/10.1038/nature12213 (2013).
Goh, K.-I. et al. The human disease network. Proc. Natl. Acad. Sci. U S A104, 8685–8690. https://doi.org/10.1073/pnas.0701361104 (2007).
Köhler, S., Bauer, S., Horn, D. & Robinson, P. N. Walking the interactome for prioritization of candidate disease genes. Am. J. Hum. Genet.82, 949–958. https://doi.org/10.1016/j.ajhg.2008.02.013 (2008).
Pearson, K. The problem of the random walk. Nature72, 342–342. https://doi.org/10.1038/072342a0 (1905).
Qi, Y., Suhail, Y., Lin, Y.-Y., Boeke, J. D. & Bader, J. S. Finding friends and enemies in an enemies-only network: A graph diffusion kernel for predicting novel genetic interactions and co-complex membership from yeast genetic interactions. Genome Res.18, 1991–2004. https://doi.org/10.1101/gr.077693.108 (2008).
Luo, P., Ding, Y., Lei, X. & Wu, F.-X. deepDriver: Predicting cancer driver genes based on somatic mutations using deep convolutional neural networks. Front. Genet. https://doi.org/10.3389/fgene.2019.00013 (2019).
Tokheim, C., Papadopoulos, N., Kinzler, K., Vogelstein, B. & Karchin, R. Evaluating the evaluation of cancer driver genes. Proc. Natl. Acad. Sci.113, 201616440. https://doi.org/10.1073/pnas.1616440113 (2016).
Collier, O., Stoven, V. & Vert, J.-P. LOTUS: A single- and multitask machine learning algorithm for the prediction of cancer driver genes. PLOS Comput. Biol.15, 1–27. https://doi.org/10.1371/journal.pcbi.1007381 (2019).
Schulte-Sasse, R., Budach, S., Hnisz, D. & Marsico, A. Integration of multiomics data with graph convolutional networks to identify new cancer genes and their associated molecular mechanisms. Nat. Mach. Intell.3, 1–14. https://doi.org/10.1038/s42256-021-00325-y (2021).
Peng, W., Tang, Q., Dai, W. & Chen, T. Improving cancer driver gene identification using multi-task learning on graph convolutional network. Brief. Bioinform.23, bbab432. https://doi.org/10.1093/bib/bbab432 (2021).
Zhang, T., Zhang, S.-W., Xie, M.-Y. & Li, Y. A novel heterophilic graph diffusion convolutional network for identifying cancer driver genes. Brief. Bioinform.24, bbad137. https://doi.org/10.1093/bib/bbad137 (2023).
Wang, C. et al. DriverRWH: discovering cancer driver genes by random walk on a gene mutation hypergraph. BMC Bioinform.23, 277. https://doi.org/10.1186/s12859-022-04788-7 (2022).
Zhang, N. et al. A novel hypergraph model for identifying and prioritizing personalized drivers in cancer. PLoS Comput. Biol.20, e1012068. https://doi.org/10.1371/journal.pcbi.1012068 (2024).
Wang, Y., Jin, S. & Zou, X. Pitch: A pathway-induced prioritization of personalized cancer driver genes based on higher-order interactions. IEEE J. Biomed. Health Inform. https://doi.org/10.1109/JBHI.2025.3538536 (2025).
Deng, C. et al. Identifying new cancer genes based on the integration of annotated gene sets via hypergraph neural networks. Bioinformatics40, i511–i520. https://doi.org/10.1093/bioinformatics/btae257 (2024).
Ye, Z. et al. GRB2 stabilizes RAD51 at reversed replication forks suppressing genomic instability and innate immunity against cancer. Nat. Commun.15, 2132. https://doi.org/10.1038/s41467-024-46283-y (2024).
Petrosino, M. et al. The complex impact of cancer-related missense mutations on the stability and on the biophysical and biochemical properties of MAPK1 and MAPK3 somatic variants. Hum. Genom.17, 95. https://doi.org/10.1186/s40246-023-00544-x (2023).
Liu, Y. et al. Pan-cancer analysis on the role of PIK3R1 and PIK3R2 in human tumors. Sci. Rep.12, 5924. https://doi.org/10.1038/s41598-022-09889-0 (2022).
Tharin, Z. et al. Pik3ca and pik3r1 tumor mutational landscape in a pan-cancer patient cohort and its association with pathway activation and treatment efficacy. Sci. Rep. https://doi.org/10.1038/s41598-023-31593-w (2023).
Deng, M. et al. Kras mutations upregulate runx1 to promote occurrence of head and neck squamous cell carcinoma. Mol. Carcinog.62, 1284–1294. https://doi.org/10.1002/mc.23563 (2023).
Yi, Q. et al. Spectrum of BRAF aberrations and its potential clinical implications: insights from integrative pan-cancer analysis. Front. Bioeng. Biotechnol. https://doi.org/10.3389/fbioe.2022.806851 (2022).
Yu, G., Wang, L.-G., Han, Y. & He, Q.-Y. clusterprofiler: an r package for comparing biological themes among gene clusters. OMICS: J. Integr. Biol.16, 284–287. https://doi.org/10.1089/omi.2011.0118 (2012) (PMID: 22455463).
Stoup, N., Liberelle, M., Lebègue, N. & Van Seuningen, I. Emerging paradigms and recent progress in targeting erbb in cancers. Trends Pharmacol. Sci.45, 552–576. https://doi.org/10.1016/j.tips.2024.04.009 (2024).
Gao, Y., Luo, Y., Ji, G. & Wu, T. Functional and pathological roles of adenylyl cyclases in various diseases. Int. J. Biol. Macromol.281, 136198. https://doi.org/10.1016/j.ijbiomac.2024.136198 (2024).
Collier, O., Stoven, V. & Vert, J.-P. Lotus: A single- and multitask machine learning algorithm for the prediction of cancer driver genes. PLOS Comput. Biol.15, 1–27. https://doi.org/10.1371/journal.pcbi.1007381 (2019).
Feng, Y., You, H., Zhang, Z., Ji, R. & Gao, Y. Hypergraph neural networks. AAAI 2019 (2018).
Subramanian, A. et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci.102, 15545–15550. https://doi.org/10.1073/pnas.0506580102 (2005).
Chang, K. et al. The cancer genome atlas pan-cancer analysis project. Nat. Genet45, 1113–1120. https://doi.org/10.1038/ng.2764 (2013).
Lyu, J. et al. Dorge: Discovery of oncogenes and tumor suppressor genes using genetic and epigenetic features. Sci. Adv.6, eaba6784. https://doi.org/10.1126/sciadv.aba6784 (2020).
Loshchilov, I. & Hutter, F. Fixing weight decay regularization in Adam. CoRRabs/1711.05101 (2017).
Bai, S., Zhang, F. & Torr, P. H. S. Hypergraph convolution and hypergraph attention. Pattern Recognit.110, 107637. https://doi.org/10.1016/j.patcog.2020.107637 (2021).
Funding
This study was supported by Dubai RDI Grant(2025/DRDI0406), the Center for Applied and Translational Genomics (CATG), Mohammed Bin Rashid University of Medicine and Health Sciences (MBRU), Dubai Health, Dubai, United Arab Emirates.
Author information
Authors and Affiliations
Contributions
V.D.T., E.M.A., and O.S.A. conceived the study. V.D.T. and O.S.A. supervised the project. E.M.A. collected and processed the data, implemented the model, performed the experiments, and prepared the initial draft of the manuscript. All authors contributed to the development of the methodology, analysis of the results, and writing and revision of the manuscript. All authors approved the final version of the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Alotaibi, E.M., Alkhnbashi, O.S. & Tran, V.D. ONCOPLEX: an oncology-inspired hypergraph model integrating diverse biological knowledge for cancer driver gene prediction. Sci Rep (2026). https://doi.org/10.1038/s41598-026-36127-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-026-36127-8