Mapping the functional network of human cancer through machine learning and pan-cancer proteogenomics

Shi, Zhiao; Lei, Jonathan T.; Elizarraras, John M.; Zhang, Bing

doi:10.1038/s43018-024-00869-z

Analysis
Published: 11 December 2024

Mapping the functional network of human cancer through machine learning and pan-cancer proteogenomics

Zhiao Shi^1,2^na1,
Jonathan T. Lei^1,2^na1,
John M. Elizarraras^1,2 &
…
Bing Zhang ORCID: orcid.org/0000-0001-8676-2425^1,2

Nature Cancer volume 6, pages 205–222 (2025) Cite this article

5877 Accesses
12 Citations
84 Altmetric
Metrics details

Subjects

Abstract

Large-scale omics profiling has uncovered a vast array of somatic mutations and cancer-associated proteins, posing substantial challenges for their functional interpretation. Here we present a network-based approach centered on FunMap, a pan-cancer functional network constructed using supervised machine learning on extensive proteomics and RNA sequencing data from 1,194 individuals spanning 11 cancer types. Comprising 10,525 protein-coding genes, FunMap connects functionally associated genes with unprecedented precision, surpassing traditional protein–protein interaction maps. Network analysis identifies functional protein modules, reveals a hierarchical structure linked to cancer hallmarks and clinical phenotypes, provides deeper insights into established cancer drivers and predicts functions for understudied cancer-associated proteins. Additionally, applying graph-neural-network-based deep learning to FunMap uncovers drivers with low mutation frequency. This study establishes FunMap as a powerful and unbiased tool for interpreting somatic mutations and understudied proteins, with broad implications for advancing cancer biology and informing therapeutic strategies.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to the full article PDF.

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Protein coexpression is a strong predictor of gene cofunctionality.**

**Fig. 2: FunMap has high functional relevance, deep proteome coverage and a scale-free, modular and small-world network topology.**

**Fig. 3: FunMap reveals known and previously unidentified dense modules associated with cancer biology and clinical phenotype.**

**Fig. 4: Hierarchical modular organization of FunMap statistically linked to cancer hallmarks.**

**Fig. 5: In-depth analysis of selected FunMap branches and their clinical associations.**

**Fig. 6: Connecting somatic mutations to functional protein modules.**

**Fig. 7: FunMap predicts functions of understudied proteins.**

**Fig. 8: Discovery of cancer drivers with low mutation frequency using FunMap.**

A multiscale functional map of somatic mutations in cancer integrating protein structure and network topology

Article Open access 24 January 2025

Tumor relevant protein functional interactions identified using bipartite graph analyses

Article Open access 02 November 2021

Interpretable identification of cancer genes across biological networks via transformer-powered graph representation learning

Article 09 January 2025

Data availability

Proteomics and RNAseq data for the ten CPTAC cancer types were derived from the CPTAC pan-cancer study¹⁵ (https://proteomic.datacommons.cancer.gov/pdc/cptac-pancancer). Proteomics and RNAseq data for HCC were downloaded from the original publication⁵⁵. The data tables derived from these resources and used as input for FunMap construction are available from Zenodo (https://doi.org/10.5281/zenodo.7948943)⁷³. Derived feature data for XGBoost model training are available from Zenodo (https://doi.org/10.5281/zenodo.7949374)⁷⁴. XGBoost prediction scores for all gene pairs are available from Zenodo (https://doi.org/10.5281/zenodo.10080763)⁷⁵. The FunMap edge list, dense modules and hierarchical modules are available online (https://funmap.linkedomics.org/). The same website also provides visualization tools to explore the gene neighborhoods, dense modules and hierarchical organization of FunMap. Additionally, the FunMap network and modules were integrated into WebGestalt⁷⁶ for enrichment analysis of user-provided gene lists. Cell line annotations and CRISPR KO dependency scores can be retrieved from the DepMap website (https://www.depmap.org). Other datasets used in the study included the gene cofunctionality gold standard derived from the Reactome pathway database¹², ProHD¹², BioPlex¹⁸, HuRI¹⁹, HI-Union¹⁹ and BioGRID²⁰. Source data are provided with this paper.

Code availability

The FunMap Python package is fully open source and available for download from the Python Package Index (https://pypi.org/project/funmap). The source code is hosted on GitHub (https://github.com/bzhanglab/funmap). Other supporting software is available as follows: scikit-learn 1.3.2 (https://scikit-learn.org/stable/index.html), ICE 1.0.2 (http://ice.zhang-lab.org), NetSAM 1.44.0 (https://www.bioconductor.org/packages/release/bioc/html/NetSAM.html), WebGestaltR 0.4.6 (https://cran.r-project.org/web/packages/WebGestaltR/index.html) and pytorch_geometric 1.7.2 (https://github.com/pyg-team/pytorch_geometric).

References

Ostroverkhova, D., Przytycka, T. M. & Panchenko, A. R. Cancer driver mutations: predictions and reality. Trends Mol. Med. 29, 554–566 (2023).
Article CAS PubMed Google Scholar
Kustatscher, G. et al. Understudied proteins: opportunities and challenges for functional proteomics. Nat. Methods 19, 774–779 (2022).
Article CAS PubMed Google Scholar
Dinstag, G. & Shamir, R. PRODIGY: personalized prioritization of driver genes. Bioinformatics 36, 1831–1839 (2020).
Article CAS PubMed Google Scholar
Leiserson, M. D. M. et al. Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes. Nat. Genet. 47, 106–114 (2015).
Article CAS PubMed Google Scholar
Sharan, R., Ulitsky, I. & Shamir, R. Network-based prediction of protein function. Mol. Syst. Biol. 3, 88 (2007).
Article PubMed PubMed Central Google Scholar
Kim, M. et al. A protein interaction landscape of breast cancer. Science 374, eabf3066 (2021).
Article CAS PubMed PubMed Central Google Scholar
Swaney, D. L. et al. A protein network map of head and neck cancer reveals PIK3CA mutant drug sensitivity. Science 374, eabf2911 (2021).
Article CAS PubMed PubMed Central Google Scholar
Quackenbush, J. Microarrays—guilt by association. Science 302, 240–241 (2003).
Article CAS PubMed Google Scholar
Yanai, I. et al. Similar gene expression profiles do not imply similar tissue functions. Trends Genet. 22, 132–138 (2006).
Article CAS PubMed Google Scholar
Wang, J. et al. Proteome profiling outperforms transcriptome profiling for coexpression based gene function prediction. Mol. Cell. Proteomics 16, 121–134 (2017).
Article CAS PubMed Google Scholar
Ribeiro, D. M., Ziyani, C. & Delaneau, O. Shared regulation and functional relevance of local gene co-expression revealed by single cell analysis. Commun. Biol. 5, 876 (2022).
Article Google Scholar
Kustatscher, G. et al. Co-regulation map of the human proteome enables identification of protein functions. Nat. Biotechnol. 37, 1361–1371 (2019).
Article CAS PubMed PubMed Central Google Scholar
Wu, L. et al. Variation and genetic control of protein abundance in humans. Nature 499, 79–82 (2013).
Article CAS PubMed PubMed Central Google Scholar
Lapek, J. D. Jr et al. Detection of dysregulated protein-association networks by high-throughput proteomics predicts cancer vulnerabilities. Nat. Biotechnol. 35, 983–989 (2017).
Article CAS PubMed PubMed Central Google Scholar
Li, Y. et al. Proteogenomic data and resources for pan-cancer analysis. Cancer Cell 41, 1397–1406 (2023).
Article CAS PubMed PubMed Central Google Scholar
Zhu, H. et al. Proteomics of adjacent-to-tumor samples uncovers clinically relevant biological events in hepatocellular carcinoma. Natl Sci. Rev. 10, nwad167 (2023).
Article CAS PubMed PubMed Central Google Scholar
Obayashi, T. & Kinoshita, K. Rank of correlation coefficient as a comparable measure for biological significance of gene coexpression. DNA Res. 16, 249–260 (2009).
Article CAS PubMed PubMed Central Google Scholar
Huttlin, E. L. et al. Dual proteome-scale networks reveal cell-specific remodeling of the human interactome. Cell 184, 3022–3040 (2021).
Article CAS PubMed PubMed Central Google Scholar
Luck, K. et al. A reference map of the human binary protein interactome. Nature 580, 402–408 (2020).
Article CAS PubMed PubMed Central Google Scholar
Oughtred, R. et al. The BioGRID database: a comprehensive biomedical resource of curated protein, genetic, and chemical interactions. Protein Sci. 30, 187–200 (2021).
Article CAS PubMed Google Scholar
Szklarczyk, D. et al. The STRING database in 2021: customizable protein–protein networks, and functional characterization of user-uploaded gene/measurement sets. Nucleic Acids Res. 49, D605–D612 (2021).
Article CAS PubMed Google Scholar
Tsitsiridis, G. et al. CORUM: the comprehensive resource of mammalian protein complexes—2022. Nucleic Acids Res. 51, D539–D545 (2023).
Article CAS PubMed Google Scholar
Shi, Z., Derow, C. K. & Zhang, B. Co-expression module analysis reveals biological processes, genomic gain, and regulatory mechanisms associated with breast cancer progression. BMC Syst. Biol. 4, 74 (2010).
Article PubMed PubMed Central Google Scholar
Barabási, A.-L. & Oltvai, Z. N. Network biology: understanding the cell’s functional organization. Nat. Rev. Genet. 5, 101–113 (2004).
Article PubMed Google Scholar
Shi, Z., Wang, J. & Zhang, B. NetGestalt: integrating multidimensional omics data over biological networks. Nat. Methods 10, 597–598 (2013).
Article CAS PubMed PubMed Central Google Scholar
Knijnenburg, T. A., Bismeijer, T., Wessels, L. F. A. & Shmulevich, I. A multilevel pan-cancer map links gene mutations to cancer hallmarks. Chin. J. Cancer 34, 439–449 (2015).
Article CAS PubMed Google Scholar
Chen, Y., Verbeek, F. J. & Wolstencroft, K. Establishing a consensus for the hallmarks of cancer based on gene ontology and pathway annotations. BMC Bioinformatics 22, 178 (2021).
Article PubMed PubMed Central Google Scholar
Chen, X. & Cubillos-Ruiz, J. R. Endoplasmic reticulum stress signals in the tumour and its microenvironment. Nat. Rev. Cancer 21, 71–88 (2021).
Article CAS PubMed Google Scholar
Vasaikar, S. et al. Proteogenomic analysis of human colon cancer reveals new therapeutic opportunities. Cell 177, 1035–1049 (2019).
Article CAS PubMed PubMed Central Google Scholar
Zhang, B. et al. Proteogenomic characterization of human colon and rectal cancer. Nature 513, 382–387 (2014).
Article CAS PubMed PubMed Central Google Scholar
Giacinti, C. & Giordano, A. RB and cell cycle progression. Oncogene 25, 5220–5227 (2006).
Article CAS PubMed Google Scholar
Deacu, E. et al. Activin type II receptor restoration in ACVR2-deficient colon cancer cells induces transforming growth factor-β response pathway genes. Cancer Res. 64, 7690–7696 (2004).
Article CAS PubMed Google Scholar
Yang, P. et al. SET domain containing 1B gene is mutated in primary hepatic neuroendocrine tumors. Int. J. Cancer 145, 2986–2995 (2019).
Article CAS PubMed PubMed Central Google Scholar
Chorley, B. N. et al. Identification of novel NRF2-regulated genes by ChIP-Seq: influence on retinoid X receptor alpha. Nucleic Acids Res. 40, 7416–7429 (2012).
Article CAS PubMed PubMed Central Google Scholar
Penning, T. M. Aldo-keto reductase regulation by the NRF2 system: implications for stress response, chemotherapy drug resistance, and carcinogenesis. Chem. Res. Toxicol. 30, 162–176 (2017).
Article CAS PubMed Google Scholar
Chen, Y.-T., Shi, D., Yang, D. & Yan, B. Antioxidant sulforaphane and sensitizer trinitrobenzene sulfonate induce carboxylesterase-1 through a novel element transactivated by nuclear factor-E2 related factor-2. Biochem. Pharmacol. 84, 864–871 (2012).
Article CAS PubMed PubMed Central Google Scholar
Thimmulappa, R. K. et al. Identification of NRF2-regulated genes induced by the chemopreventive agent sulforaphane by oligonucleotide microarray. Cancer Res. 62, 5196–5203 (2002).
CAS PubMed Google Scholar
Rojo de la Vega, M., Chapman, E. & Zhang, D. D. NRF2 and the hallmarks of cancer. Cancer Cell 34, 21–43 (2018).
Article CAS PubMed Google Scholar
Xu, I. M.-J. et al. Transketolase counteracts oxidative stress to drive cancer development. Proc. Natl Acad. Sci. USA 113, E725–E734 (2016).
Article PubMed PubMed Central Google Scholar
Loignon, M. et al. Cul3 overexpression depletes NRF2 in breast cancer and is associated with sensitivity to carcinogens, to oxidative stress, and to chemotherapy. Mol. Cancer Ther. 8, 2432–2440 (2009).
Article CAS PubMed Google Scholar
Kalthoff, S., Ehmer, U., Freiberg, N., Manns, M. P. & Strassburg, C. P. Interaction between oxidative stress sensor NRF2 and xenobiotic-activated aryl hydrocarbon receptor in the regulation of the human phase II detoxifying UDP-glucuronosyltransferase 1A10. J. Biol. Chem. 285, 5993–6002 (2010).
Article CAS PubMed PubMed Central Google Scholar
Bech-Otschir, D. et al. COP9 signalosome-specific phosphorylation targets p53 to degradation by the ubiquitin system. EMBO J. 20, 1630–1639 (2001).
Article CAS PubMed PubMed Central Google Scholar
Pineau, C. et al. Cell type-specific expression of testis elevated genes based on transcriptomics and antibody-based proteomics. J. Proteome Res. 18, 4215–4230 (2019).
Article PubMed Google Scholar
Pan, D. et al. A major chromatin regulator determines resistance of tumor cells to T cell-mediated killing. Science 359, 770–775 (2018).
Article CAS PubMed PubMed Central Google Scholar
Miao, D. et al. Genomic correlates of response to immune checkpoint therapies in clear cell renal cell carcinoma. Science 359, 801–806 (2018).
Article CAS PubMed PubMed Central Google Scholar
Wu, L. et al. KDM5 histone demethylases repress immune response via suppression of STING. PLoS Biol. 16, e2006134 (2018).
Article PubMed PubMed Central Google Scholar
Liu, S., Liu, T., Jiang, J., Guo, H. & Yang, R. p53 mutation and deletion contribute to tumor immune evasion. Front. Genet. 14, 1088455 (2023).
Article CAS PubMed PubMed Central Google Scholar
Hu, C. et al. ATRX loss promotes immunosuppressive mechanisms in IDH1 mutant glioma. Neuro. Oncol. 24, 888–900 (2022).
Article CAS PubMed Google Scholar
Wang, J., Vasaikar, S., Shi, Z., Greer, M. & Zhang, B. WebGestalt 2017: a more comprehensive, powerful, flexible and interactive gene set enrichment analysis toolkit. Nucleic Acids Res. 45, W130–W137 (2017).
Article CAS PubMed PubMed Central Google Scholar
Gaudet, P., Livstone, M. S., Lewis, S. E. & Thomas, P. D. Phylogenetic-based propagation of functional annotations within the Gene Ontology consortium. Brief. Bioinform. 12, 449–462 (2011).
Article PubMed PubMed Central Google Scholar
Srivastava, A. et al. MAB21L4 deficiency drives squamous cell carcinoma via activation of RET. Cancer Res. 82, 3143–3157 (2022).
Article CAS PubMed PubMed Central Google Scholar
Futreal, P. A. et al. A census of human cancer genes. Nat. Rev. Cancer 4, 177–183 (2004).
Article CAS PubMed PubMed Central Google Scholar
Pastushenko, I. et al. Fat1 deletion promotes hybrid EMT state, tumour stemness and metastasis. Nature 589, 448–455 (2021).
Article CAS PubMed Google Scholar
Bennett, H. M., Stephenson, W., Rose, C. M. & Darmanis, S. Single-cell proteomics enabled by next-generation sequencing or mass spectrometry. Nat. Methods 20, 363–374 (2023).
Article CAS PubMed Google Scholar
Gao, Q. et al. Integrated proteogenomic characterization of HBV-related hepatocellular carcinoma. Cell 179, 561–577 (2019).
Article CAS PubMed Google Scholar
Chen, T. & Guestrin, C. XGBoost: a scalable tree boosting system. In Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ed. Krishnapuram, B.) (Association for Computing Machinery, 2016).
Newman, M. E. J. Modularity and community structure in networks. Proc. Natl Acad. Sci. USA 103, 8577–8582 (2006).
Article CAS PubMed PubMed Central Google Scholar
Plaisier, C. L., Pan, M. & Baliga, N. S. A miRNA-regulatory network explains how dysregulated miRNAs perturb oncogenic processes across diverse cancers. Genome Res. 22, 2302–2314 (2012).
Article CAS PubMed PubMed Central Google Scholar
Hirsch, T. et al. Regeneration of the entire human epidermis using transgenic stem cells. Nature 551, 327–332 (2017).
Article CAS PubMed PubMed Central Google Scholar
Kiefer, J. et al. Abstract 3589: a systematic approach toward gene annotation of the hallmarks of cancer. Cancer Res. 77, 3589 (2017).
Article Google Scholar
Hanahan, D. & Weinberg, R. A. Hallmarks of cancer: the next generation. Cell 144, 646–674 (2011).
Article CAS PubMed Google Scholar
Mongiat, M., Andreuzzi, E., Tarticchio, G. & Paulitti, A. Extracellular matrix, a hard player in angiogenesis. Int. J. Mol. Sci. 17, 1822 (2016).
Article PubMed PubMed Central Google Scholar
Veličković, P. et al. Graph attention networks. Preprint at https://arxiv.org/abs/1710.10903 (2017).
Bekker, J. & Davis, J. Learning from positive and unlabeled data: a survey. Mach. Learn. 109, 719–760 (2020).
Article Google Scholar
Mordelet, F. & Vert, J.-P. A bagging SVM to learn from positive and unlabeled examples. Pattern Recognit. Lett. 37, 201–209 (2014).
Article Google Scholar
Liu, T.-Y. Learning to Rank for Information Retrieval 1st edn (Springer, 2009).
Zhang, E. & Zhang, Y. Average precision. In Encyclopedia of Database Systems (eds Liu, L. & Özsu, M. T.) 192–193 (Springer, 2009).
Fey, M. & Lenssen, J. E. Fast graph representation learning with PyTorch Geometric. Preprint at https://arxiv.org/abs/1903.02428 (2019).
Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. Preprint at https://arxiv.org/abs/1412.6980 (2014).
Pedregosa, F. et al. Scikit-learn: machine learning in python. Preprint at https://arxiv.org/abs/1201.0490 (2012).
Dempster, J. M. et al. Chronos: a cell population dynamics model of CRISPR experiments that improves inference of gene fitness effects. Genome Biol. 22, 343 (2021).
Article PubMed PubMed Central Google Scholar
Pacini, C. et al. Integrated cross-study datasets of genetic dependencies in cancer. Nat. Commun. 12, 1661 (2021).
Article CAS PubMed PubMed Central Google Scholar
Shi, Z. FunMap input expression data matrices. Zenodo https://doi.org/10.5281/zenodo.7948943 (2023).
Shi, Z. FunMap feature data file used for model training. Zenodo https://doi.org/10.5281/zenodo.7949374 (2023).
Shi, Z. FunMap prediction scores for all gene pairs. Zenodo https://doi.org/10.5281/zenodo.10080763 (2023).
Elizarraras, J. M. et al. WebGestalt 2024: faster gene set analysis and new support for metabolomics and multi-omics. Nucleic Acids Res. 52, W415–W421 (2024).
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We gratefully acknowledge contributions from the CPTAC and its Pan-Cancer Analysis working group. This work was supported by National Institutes of Health grants from the National Cancer Institute (U24 CA210954, U24 CA271076, R01 CA245903 and U01 CA271247 to B.Z.), by the Cancer Prevention and Research Institute of Texas (CPRIT; award RR160027 to B.Z.) and by the McNair Medical Institute at The Robert and Janice McNair Foundation (to B.Z.). B.Z. is a CPRIT scholar in cancer research and a McNair scholar.

Author information

These authors contributed equally: Zhiao Shi, Jonathan T. Lei.

Authors and Affiliations

Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX, USA
Zhiao Shi, Jonathan T. Lei, John M. Elizarraras & Bing Zhang
Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
Zhiao Shi, Jonathan T. Lei, John M. Elizarraras & Bing Zhang

Authors

Zhiao Shi
View author publications
Search author on:PubMed Google Scholar
Jonathan T. Lei
View author publications
Search author on:PubMed Google Scholar
John M. Elizarraras
View author publications
Search author on:PubMed Google Scholar
Bing Zhang
View author publications
Search author on:PubMed Google Scholar

Contributions

Conceptualization, B.Z.; methodology, Z.S. and B.Z.; formal analysis, Z.S. and J.T.L.; investigation, Z.S., J.T.L. and B.Z.; resources, Z.S. and J.M.E.; data curation, Z.S. and J.T.L.; writing—original draft, Z.S., J.T.L. and B.Z.; visualization, Z.S., J.T.L. and J.M.E.; supervision, B.Z.; funding acquisition, B.Z.

Corresponding author

Correspondence to Bing Zhang.

Ethics declarations

Competing interests

B.Z. received research funding from AstraZeneca and consulting fees from Inotiv. The other authors declare no competing interests.

Peer review

Peer review information

Nature Cancer thanks Leeat Keren and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Quantification of inter-sample heterogeneity through gene-wise standard deviation.

A) Distributions of gene-wise standard deviations across individual datasets (n = 17,733 to 19,113 mRNAs and n = 7,961 to 11,815 proteins). For boxplots, centerline indicates the median, box limits indicate upper and lower quartiles, whiskers indicate the 1.5 interquartile range. B) Median values of the median standard deviations across various dataset groups. T: Tumor; N: Normal.

Source data

Extended Data Fig. 2 Breakdown of feature importance in the XGBoost model.

A) Barplot showing importance of individual features. B) Pie chart depicting aggregated importance by data and sample type pairs.

Source data

Extended Data Fig. 3 Characterization of dense modules.

A) Heatmap depicting log2 fold change (log2FC) of average protein abundance of dense modules (cliques) in tumor vs normal for each of the five cancer cohorts shown. All 78 cliques have concordant tumor over- or under-expression in all five cohorts (FDR < 0.01 in each cohort). Table shows the number and maximum number of overlapping edges with other networks as indicated. Gene ontology biological processes (GO_BP) indicates the top enriched term of a given clique (GO_BP_FDR). B-C) Tumor overexpressed, ECM-associated dense modules, Clique 96 (B) and Clique 54 (C). Edge color indicates lack of overlap in BioGRID, BioPlex, HI-union, STRING, and CORUM (pink) or overlap in any of these resources (gray). D-E) Boxplots comparing average protein abundance of Clique 96 (D) and Clique 54 (E) in tumor and normal samples demonstrating tumor overexpression in five cancer cohorts. Number of samples, n, are indicated in parenthesis. P-values determined by two-sided Wilcoxon rank-sum test. F-G) Kaplan-Meier plots depicting overall survival (OS) difference in patients from indicated cohorts stratified by median value of the average abundance of proteins in Clique 96 (F) and Clique 54 (G). Logrank p-values and hazard ratio (HR) shown with 95% confidence intervals derived from Cox-proportional hazard models. Significance is indicated as ****p < 0.0001. For boxplots, centerline indicates the median, box limits indicate upper and lower quartiles, whiskers indicate the 1.5 interquartile range, and number of samples per group indicated in parentheses.

Source data

Extended Data Fig. 4 Connecting somatic mutations to functional protein modules.

A) Average pairwise Pearson’s correlation coefficient for genes in L2_M40 based on mRNA or protein data in different cancer types. B) Average pairwise Pearson’s correlation coefficient for genes in L3_M58 based on mRNA or protein data in different cancer types. C) Comparison of TP53 protein abundance (log2 MS1 intensity) in TP53 wildtype (wt) and mutant (mut) samples across 10 cancer types. Number of samples, n, are indicated in parenthesis. P-values were derived from two-sided Wilcoxon rank-sum test. Significance is indicated as *p < 0.05, **p < 0.01, ***p < 0.001, ****p < 0.0001, ns: not significant. For boxplots, centerline indicates the median, box limits indicate upper and lower quartiles, whiskers indicate the 1.5 interquartile range, and number of samples per group indicated in parentheses.

Source data

Extended Data Fig. 5 Illuminating understudied cancer proteins RBM34 and RBM12B.

A) Boxplots comparing protein abundance of RBM34 and RBM12B in tumor and normal samples demonstrating tumor over-expression in five cancer cohorts. Number of samples, n, are indicated in parenthesis. P-values determined by two-sided Wilcoxon rank-sum test. For boxplots, centerline indicates the median, box limits indicate upper and lower quartiles, whiskers indicate the 1.5 interquartile range, and number of samples per group indicated in parentheses. B) Barplots depicting frequency of somatic copy number and mutations in RBM34 and RBM12B from TCGA PanCancer Atlas Studies in cBioPortal. C-D) Network neighborhood of RBM34 (C) or RBM12B (D) with genes associated with the enriched GO terms highlighted.

Source data

Extended Data Fig. 6 Illuminating understudied cancer proteins CXorf38 and MAB21L4.

A) Boxplots comparing protein abundance of CXorf38 in tumor and normal samples demonstrating tumor over-expression in five cancer cohorts. Number of samples, n, are indicated in parenthesis. P-values determined by two-sided Wilcoxon rank-sum test. B) Relationship between protein abundance of CXorf38 and RNA-seq inferred ESTIMATE ImmunoScore in eight cancer types. P-values were derived from two-sided Spearman’s rank correlation. Shaded area depicts the 95% confidence interval. C) Single cell data from the Human Protein Atlas showing that CXorf38 is expressed across all cell types, but the highest expression occurs in immune cells. D) Boxplots comparing protein abundance of MAB21L4 in tumor and normal samples in five cancer cohorts. Number of samples, n, are indicated in parenthesis. P-values determined by two-sided Wilcoxon rank-sum test. Significance is indicated as *p < 0.05, **p < 0.01, ***p < 0.001, ****p < 0.0001, ns: not significant. For boxplots, centerline indicates the median, box limits indicate upper and lower quartiles, whiskers indicate the 1.5 interquartile range, and number of samples per group indicated in parentheses.

Source data

Extended Data Fig. 7 Graph neural network architecture for predicting cancer driver genes based on network topology and mutation data.

The model takes as input mutation data for genes represented in a feature matrix. Nodes in the graph correspond to genes, where pink nodes are known positive driver genes, orange nodes are hidden positive genes, and gray nodes are unlabeled genes. Both the node features and network topology are processed through hidden layers with ReLU activations. The output layer predicts gene classifications, with red nodes indicating predicted positive driver genes and blue nodes indicating predicted negative genes.

Supplementary information

Reporting Summary (download PDF )

Supplementary Tables 1–7 (download XLSX )

Supplementary Table 1. RNA and protein expression datasets. Supplementary Table 2. FunMap edges. Supplementary Table 3. FunMap dense modules and their characterization. Supplementary Table 4. FunMap hierarchical modules and their characterization. Supplementary Table 5. Importance of mutant genes in predicting module abundance. Supplementary Table 6. Dark gene characterization using FunMap. Supplementary Table 7. Driver gene prediction.

Source data

Source Data Fig. 1 (download XLSX )

Statistical source data.

Source Data Fig. 2 (download XLSX )

Statistical source data.

Source Data Fig. 3 (download XLSX )

Statistical source data.

Source Data Fig. 4 (download XLSX )

Statistical source data.

Source Data Fig. 5 (download XLSX )

Statistical source data.

Source Data Fig. 6 (download XLSX )

Statistical source data.

Source Data Fig. 7 (download XLSX )

Statistical source data.

Source Data Fig. 8 (download XLSX )

Statistical source data.

Source Data Extended Data Fig. 1 (download XLSX )

Statistical source data.

Source Data Extended Data Fig. 2 (download XLSX )

Statistical source data.

Source Data Extended Data Fig. 3 (download XLSX )

Statistical source data.

Source Data Extended Data Fig. 4 (download XLSX )

Statistical source data.

Source Data Extended Data Fig. 5 (download XLSX )

Statistical source data.

Source Data Extended Data Fig. 6 (download XLSX )

Statistical source data.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Shi, Z., Lei, J.T., Elizarraras, J.M. et al. Mapping the functional network of human cancer through machine learning and pan-cancer proteogenomics. Nat Cancer 6, 205–222 (2025). https://doi.org/10.1038/s43018-024-00869-z

Download citation

Received: 06 April 2023
Accepted: 25 October 2024
Published: 11 December 2024
Version of record: 11 December 2024
Issue date: January 2025
DOI: https://doi.org/10.1038/s43018-024-00869-z

This article is cited by

Artificial intelligence models: transforming early diagnosis and precise treatment of gastrointestinal cancers
- Kaijie Liu
- Zeyu Luo
- Jinjun Guo
Molecular Cancer (2026)
Deciphering the dark cancer phosphoproteome using machine-learned co-regulation of phosphosites
- Wen Jiang
- Eric J. Jaehnig
- Bing Zhang
Nature Communications (2025)
Transcriptome remodelling and changes in growth and cardiometabolic phenotype result following Grb10a knockdown in the early life of the zebrafish
- Bridget L. Evans
- Terence Garner
- Adam Stevens
Cellular and Molecular Life Sciences (2025)

Subjects

Abstract

Access options

Similar content being viewed by others

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links