Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Analysis
  • Published:

Mapping the functional network of human cancer through machine learning and pan-cancer proteogenomics

Abstract

Large-scale omics profiling has uncovered a vast array of somatic mutations and cancer-associated proteins, posing substantial challenges for their functional interpretation. Here we present a network-based approach centered on FunMap, a pan-cancer functional network constructed using supervised machine learning on extensive proteomics and RNA sequencing data from 1,194 individuals spanning 11 cancer types. Comprising 10,525 protein-coding genes, FunMap connects functionally associated genes with unprecedented precision, surpassing traditional protein–protein interaction maps. Network analysis identifies functional protein modules, reveals a hierarchical structure linked to cancer hallmarks and clinical phenotypes, provides deeper insights into established cancer drivers and predicts functions for understudied cancer-associated proteins. Additionally, applying graph-neural-network-based deep learning to FunMap uncovers drivers with low mutation frequency. This study establishes FunMap as a powerful and unbiased tool for interpreting somatic mutations and understudied proteins, with broad implications for advancing cancer biology and informing therapeutic strategies.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Protein coexpression is a strong predictor of gene cofunctionality.
The alternative text for this image may have been generated using AI.
Fig. 2: FunMap has high functional relevance, deep proteome coverage and a scale-free, modular and small-world network topology.
The alternative text for this image may have been generated using AI.
Fig. 3: FunMap reveals known and previously unidentified dense modules associated with cancer biology and clinical phenotype.
The alternative text for this image may have been generated using AI.
Fig. 4: Hierarchical modular organization of FunMap statistically linked to cancer hallmarks.
The alternative text for this image may have been generated using AI.
Fig. 5: In-depth analysis of selected FunMap branches and their clinical associations.
The alternative text for this image may have been generated using AI.
Fig. 6: Connecting somatic mutations to functional protein modules.
The alternative text for this image may have been generated using AI.
Fig. 7: FunMap predicts functions of understudied proteins.
The alternative text for this image may have been generated using AI.
Fig. 8: Discovery of cancer drivers with low mutation frequency using FunMap.
The alternative text for this image may have been generated using AI.

Similar content being viewed by others

Data availability

Proteomics and RNAseq data for the ten CPTAC cancer types were derived from the CPTAC pan-cancer study15 (https://proteomic.datacommons.cancer.gov/pdc/cptac-pancancer). Proteomics and RNAseq data for HCC were downloaded from the original publication55. The data tables derived from these resources and used as input for FunMap construction are available from Zenodo (https://doi.org/10.5281/zenodo.7948943)73. Derived feature data for XGBoost model training are available from Zenodo (https://doi.org/10.5281/zenodo.7949374)74. XGBoost prediction scores for all gene pairs are available from Zenodo (https://doi.org/10.5281/zenodo.10080763)75. The FunMap edge list, dense modules and hierarchical modules are available online (https://funmap.linkedomics.org/). The same website also provides visualization tools to explore the gene neighborhoods, dense modules and hierarchical organization of FunMap. Additionally, the FunMap network and modules were integrated into WebGestalt76 for enrichment analysis of user-provided gene lists. Cell line annotations and CRISPR KO dependency scores can be retrieved from the DepMap website (https://www.depmap.org). Other datasets used in the study included the gene cofunctionality gold standard derived from the Reactome pathway database12, ProHD12, BioPlex18, HuRI19, HI-Union19 and BioGRID20. Source data are provided with this paper.

Code availability

The FunMap Python package is fully open source and available for download from the Python Package Index (https://pypi.org/project/funmap). The source code is hosted on GitHub (https://github.com/bzhanglab/funmap). Other supporting software is available as follows: scikit-learn 1.3.2 (https://scikit-learn.org/stable/index.html), ICE 1.0.2 (http://ice.zhang-lab.org), NetSAM 1.44.0 (https://www.bioconductor.org/packages/release/bioc/html/NetSAM.html), WebGestaltR 0.4.6 (https://cran.r-project.org/web/packages/WebGestaltR/index.html) and pytorch_geometric 1.7.2 (https://github.com/pyg-team/pytorch_geometric).

References

  1. Ostroverkhova, D., Przytycka, T. M. & Panchenko, A. R. Cancer driver mutations: predictions and reality. Trends Mol. Med. 29, 554–566 (2023).

    Article  CAS  PubMed  Google Scholar 

  2. Kustatscher, G. et al. Understudied proteins: opportunities and challenges for functional proteomics. Nat. Methods 19, 774–779 (2022).

    Article  CAS  PubMed  Google Scholar 

  3. Dinstag, G. & Shamir, R. PRODIGY: personalized prioritization of driver genes. Bioinformatics 36, 1831–1839 (2020).

    Article  CAS  PubMed  Google Scholar 

  4. Leiserson, M. D. M. et al. Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes. Nat. Genet. 47, 106–114 (2015).

    Article  CAS  PubMed  Google Scholar 

  5. Sharan, R., Ulitsky, I. & Shamir, R. Network-based prediction of protein function. Mol. Syst. Biol. 3, 88 (2007).

    Article  PubMed  PubMed Central  Google Scholar 

  6. Kim, M. et al. A protein interaction landscape of breast cancer. Science 374, eabf3066 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Swaney, D. L. et al. A protein network map of head and neck cancer reveals PIK3CA mutant drug sensitivity. Science 374, eabf2911 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Quackenbush, J. Microarrays—guilt by association. Science 302, 240–241 (2003).

    Article  CAS  PubMed  Google Scholar 

  9. Yanai, I. et al. Similar gene expression profiles do not imply similar tissue functions. Trends Genet. 22, 132–138 (2006).

    Article  CAS  PubMed  Google Scholar 

  10. Wang, J. et al. Proteome profiling outperforms transcriptome profiling for coexpression based gene function prediction. Mol. Cell. Proteomics 16, 121–134 (2017).

    Article  CAS  PubMed  Google Scholar 

  11. Ribeiro, D. M., Ziyani, C. & Delaneau, O. Shared regulation and functional relevance of local gene co-expression revealed by single cell analysis. Commun. Biol. 5, 876 (2022).

    Article  Google Scholar 

  12. Kustatscher, G. et al. Co-regulation map of the human proteome enables identification of protein functions. Nat. Biotechnol. 37, 1361–1371 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Wu, L. et al. Variation and genetic control of protein abundance in humans. Nature 499, 79–82 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Lapek, J. D. Jr et al. Detection of dysregulated protein-association networks by high-throughput proteomics predicts cancer vulnerabilities. Nat. Biotechnol. 35, 983–989 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Li, Y. et al. Proteogenomic data and resources for pan-cancer analysis. Cancer Cell 41, 1397–1406 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Zhu, H. et al. Proteomics of adjacent-to-tumor samples uncovers clinically relevant biological events in hepatocellular carcinoma. Natl Sci. Rev. 10, nwad167 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Obayashi, T. & Kinoshita, K. Rank of correlation coefficient as a comparable measure for biological significance of gene coexpression. DNA Res. 16, 249–260 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Huttlin, E. L. et al. Dual proteome-scale networks reveal cell-specific remodeling of the human interactome. Cell 184, 3022–3040 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Luck, K. et al. A reference map of the human binary protein interactome. Nature 580, 402–408 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Oughtred, R. et al. The BioGRID database: a comprehensive biomedical resource of curated protein, genetic, and chemical interactions. Protein Sci. 30, 187–200 (2021).

    Article  CAS  PubMed  Google Scholar 

  21. Szklarczyk, D. et al. The STRING database in 2021: customizable protein–protein networks, and functional characterization of user-uploaded gene/measurement sets. Nucleic Acids Res. 49, D605–D612 (2021).

    Article  CAS  PubMed  Google Scholar 

  22. Tsitsiridis, G. et al. CORUM: the comprehensive resource of mammalian protein complexes—2022. Nucleic Acids Res. 51, D539–D545 (2023).

    Article  CAS  PubMed  Google Scholar 

  23. Shi, Z., Derow, C. K. & Zhang, B. Co-expression module analysis reveals biological processes, genomic gain, and regulatory mechanisms associated with breast cancer progression. BMC Syst. Biol. 4, 74 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  24. Barabási, A.-L. & Oltvai, Z. N. Network biology: understanding the cell’s functional organization. Nat. Rev. Genet. 5, 101–113 (2004).

    Article  PubMed  Google Scholar 

  25. Shi, Z., Wang, J. & Zhang, B. NetGestalt: integrating multidimensional omics data over biological networks. Nat. Methods 10, 597–598 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Knijnenburg, T. A., Bismeijer, T., Wessels, L. F. A. & Shmulevich, I. A multilevel pan-cancer map links gene mutations to cancer hallmarks. Chin. J. Cancer 34, 439–449 (2015).

    Article  CAS  PubMed  Google Scholar 

  27. Chen, Y., Verbeek, F. J. & Wolstencroft, K. Establishing a consensus for the hallmarks of cancer based on gene ontology and pathway annotations. BMC Bioinformatics 22, 178 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  28. Chen, X. & Cubillos-Ruiz, J. R. Endoplasmic reticulum stress signals in the tumour and its microenvironment. Nat. Rev. Cancer 21, 71–88 (2021).

    Article  CAS  PubMed  Google Scholar 

  29. Vasaikar, S. et al. Proteogenomic analysis of human colon cancer reveals new therapeutic opportunities. Cell 177, 1035–1049 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Zhang, B. et al. Proteogenomic characterization of human colon and rectal cancer. Nature 513, 382–387 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Giacinti, C. & Giordano, A. RB and cell cycle progression. Oncogene 25, 5220–5227 (2006).

    Article  CAS  PubMed  Google Scholar 

  32. Deacu, E. et al. Activin type II receptor restoration in ACVR2-deficient colon cancer cells induces transforming growth factor-β response pathway genes. Cancer Res. 64, 7690–7696 (2004).

    Article  CAS  PubMed  Google Scholar 

  33. Yang, P. et al. SET domain containing 1B gene is mutated in primary hepatic neuroendocrine tumors. Int. J. Cancer 145, 2986–2995 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Chorley, B. N. et al. Identification of novel NRF2-regulated genes by ChIP-Seq: influence on retinoid X receptor alpha. Nucleic Acids Res. 40, 7416–7429 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Penning, T. M. Aldo-keto reductase regulation by the NRF2 system: implications for stress response, chemotherapy drug resistance, and carcinogenesis. Chem. Res. Toxicol. 30, 162–176 (2017).

    Article  CAS  PubMed  Google Scholar 

  36. Chen, Y.-T., Shi, D., Yang, D. & Yan, B. Antioxidant sulforaphane and sensitizer trinitrobenzene sulfonate induce carboxylesterase-1 through a novel element transactivated by nuclear factor-E2 related factor-2. Biochem. Pharmacol. 84, 864–871 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Thimmulappa, R. K. et al. Identification of NRF2-regulated genes induced by the chemopreventive agent sulforaphane by oligonucleotide microarray. Cancer Res. 62, 5196–5203 (2002).

    CAS  PubMed  Google Scholar 

  38. Rojo de la Vega, M., Chapman, E. & Zhang, D. D. NRF2 and the hallmarks of cancer. Cancer Cell 34, 21–43 (2018).

    Article  CAS  PubMed  Google Scholar 

  39. Xu, I. M.-J. et al. Transketolase counteracts oxidative stress to drive cancer development. Proc. Natl Acad. Sci. USA 113, E725–E734 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  40. Loignon, M. et al. Cul3 overexpression depletes NRF2 in breast cancer and is associated with sensitivity to carcinogens, to oxidative stress, and to chemotherapy. Mol. Cancer Ther. 8, 2432–2440 (2009).

    Article  CAS  PubMed  Google Scholar 

  41. Kalthoff, S., Ehmer, U., Freiberg, N., Manns, M. P. & Strassburg, C. P. Interaction between oxidative stress sensor NRF2 and xenobiotic-activated aryl hydrocarbon receptor in the regulation of the human phase II detoxifying UDP-glucuronosyltransferase 1A10. J. Biol. Chem. 285, 5993–6002 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Bech-Otschir, D. et al. COP9 signalosome-specific phosphorylation targets p53 to degradation by the ubiquitin system. EMBO J. 20, 1630–1639 (2001).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Pineau, C. et al. Cell type-specific expression of testis elevated genes based on transcriptomics and antibody-based proteomics. J. Proteome Res. 18, 4215–4230 (2019).

    Article  PubMed  Google Scholar 

  44. Pan, D. et al. A major chromatin regulator determines resistance of tumor cells to T cell-mediated killing. Science 359, 770–775 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Miao, D. et al. Genomic correlates of response to immune checkpoint therapies in clear cell renal cell carcinoma. Science 359, 801–806 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Wu, L. et al. KDM5 histone demethylases repress immune response via suppression of STING. PLoS Biol. 16, e2006134 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  47. Liu, S., Liu, T., Jiang, J., Guo, H. & Yang, R. p53 mutation and deletion contribute to tumor immune evasion. Front. Genet. 14, 1088455 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Hu, C. et al. ATRX loss promotes immunosuppressive mechanisms in IDH1 mutant glioma. Neuro. Oncol. 24, 888–900 (2022).

    Article  CAS  PubMed  Google Scholar 

  49. Wang, J., Vasaikar, S., Shi, Z., Greer, M. & Zhang, B. WebGestalt 2017: a more comprehensive, powerful, flexible and interactive gene set enrichment analysis toolkit. Nucleic Acids Res. 45, W130–W137 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Gaudet, P., Livstone, M. S., Lewis, S. E. & Thomas, P. D. Phylogenetic-based propagation of functional annotations within the Gene Ontology consortium. Brief. Bioinform. 12, 449–462 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  51. Srivastava, A. et al. MAB21L4 deficiency drives squamous cell carcinoma via activation of RET. Cancer Res. 82, 3143–3157 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Futreal, P. A. et al. A census of human cancer genes. Nat. Rev. Cancer 4, 177–183 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Pastushenko, I. et al. Fat1 deletion promotes hybrid EMT state, tumour stemness and metastasis. Nature 589, 448–455 (2021).

    Article  CAS  PubMed  Google Scholar 

  54. Bennett, H. M., Stephenson, W., Rose, C. M. & Darmanis, S. Single-cell proteomics enabled by next-generation sequencing or mass spectrometry. Nat. Methods 20, 363–374 (2023).

    Article  CAS  PubMed  Google Scholar 

  55. Gao, Q. et al. Integrated proteogenomic characterization of HBV-related hepatocellular carcinoma. Cell 179, 561–577 (2019).

    Article  CAS  PubMed  Google Scholar 

  56. Chen, T. & Guestrin, C. XGBoost: a scalable tree boosting system. In Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ed. Krishnapuram, B.) (Association for Computing Machinery, 2016).

  57. Newman, M. E. J. Modularity and community structure in networks. Proc. Natl Acad. Sci. USA 103, 8577–8582 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Plaisier, C. L., Pan, M. & Baliga, N. S. A miRNA-regulatory network explains how dysregulated miRNAs perturb oncogenic processes across diverse cancers. Genome Res. 22, 2302–2314 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Hirsch, T. et al. Regeneration of the entire human epidermis using transgenic stem cells. Nature 551, 327–332 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Kiefer, J. et al. Abstract 3589: a systematic approach toward gene annotation of the hallmarks of cancer. Cancer Res. 77, 3589 (2017).

    Article  Google Scholar 

  61. Hanahan, D. & Weinberg, R. A. Hallmarks of cancer: the next generation. Cell 144, 646–674 (2011).

    Article  CAS  PubMed  Google Scholar 

  62. Mongiat, M., Andreuzzi, E., Tarticchio, G. & Paulitti, A. Extracellular matrix, a hard player in angiogenesis. Int. J. Mol. Sci. 17, 1822 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  63. Veličković, P. et al. Graph attention networks. Preprint at https://arxiv.org/abs/1710.10903 (2017).

  64. Bekker, J. & Davis, J. Learning from positive and unlabeled data: a survey. Mach. Learn. 109, 719–760 (2020).

    Article  Google Scholar 

  65. Mordelet, F. & Vert, J.-P. A bagging SVM to learn from positive and unlabeled examples. Pattern Recognit. Lett. 37, 201–209 (2014).

    Article  Google Scholar 

  66. Liu, T.-Y. Learning to Rank for Information Retrieval 1st edn (Springer, 2009).

  67. Zhang, E. & Zhang, Y. Average precision. In Encyclopedia of Database Systems (eds Liu, L. & Özsu, M. T.) 192–193 (Springer, 2009).

  68. Fey, M. & Lenssen, J. E. Fast graph representation learning with PyTorch Geometric. Preprint at https://arxiv.org/abs/1903.02428 (2019).

  69. Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. Preprint at https://arxiv.org/abs/1412.6980 (2014).

  70. Pedregosa, F. et al. Scikit-learn: machine learning in python. Preprint at https://arxiv.org/abs/1201.0490 (2012).

  71. Dempster, J. M. et al. Chronos: a cell population dynamics model of CRISPR experiments that improves inference of gene fitness effects. Genome Biol. 22, 343 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  72. Pacini, C. et al. Integrated cross-study datasets of genetic dependencies in cancer. Nat. Commun. 12, 1661 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. Shi, Z. FunMap input expression data matrices. Zenodo https://doi.org/10.5281/zenodo.7948943 (2023).

  74. Shi, Z. FunMap feature data file used for model training. Zenodo https://doi.org/10.5281/zenodo.7949374 (2023).

  75. Shi, Z. FunMap prediction scores for all gene pairs. Zenodo https://doi.org/10.5281/zenodo.10080763 (2023).

  76. Elizarraras, J. M. et al. WebGestalt 2024: faster gene set analysis and new support for metabolomics and multi-omics. Nucleic Acids Res. 52, W415–W421 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We gratefully acknowledge contributions from the CPTAC and its Pan-Cancer Analysis working group. This work was supported by National Institutes of Health grants from the National Cancer Institute (U24 CA210954, U24 CA271076, R01 CA245903 and U01 CA271247 to B.Z.), by the Cancer Prevention and Research Institute of Texas (CPRIT; award RR160027 to B.Z.) and by the McNair Medical Institute at The Robert and Janice McNair Foundation (to B.Z.). B.Z. is a CPRIT scholar in cancer research and a McNair scholar.

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization, B.Z.; methodology, Z.S. and B.Z.; formal analysis, Z.S. and J.T.L.; investigation, Z.S., J.T.L. and B.Z.; resources, Z.S. and J.M.E.; data curation, Z.S. and J.T.L.; writing—original draft, Z.S., J.T.L. and B.Z.; visualization, Z.S., J.T.L. and J.M.E.; supervision, B.Z.; funding acquisition, B.Z.

Corresponding author

Correspondence to Bing Zhang.

Ethics declarations

Competing interests

B.Z. received research funding from AstraZeneca and consulting fees from Inotiv. The other authors declare no competing interests.

Peer review

Peer review information

Nature Cancer thanks Leeat Keren and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Quantification of inter-sample heterogeneity through gene-wise standard deviation.

A) Distributions of gene-wise standard deviations across individual datasets (n = 17,733 to 19,113 mRNAs and n = 7,961 to 11,815 proteins). For boxplots, centerline indicates the median, box limits indicate upper and lower quartiles, whiskers indicate the 1.5 interquartile range. B) Median values of the median standard deviations across various dataset groups. T: Tumor; N: Normal.

Source data

Extended Data Fig. 2 Breakdown of feature importance in the XGBoost model.

A) Barplot showing importance of individual features. B) Pie chart depicting aggregated importance by data and sample type pairs.

Source data

Extended Data Fig. 3 Characterization of dense modules.

A) Heatmap depicting log2 fold change (log2FC) of average protein abundance of dense modules (cliques) in tumor vs normal for each of the five cancer cohorts shown. All 78 cliques have concordant tumor over- or under-expression in all five cohorts (FDR < 0.01 in each cohort). Table shows the number and maximum number of overlapping edges with other networks as indicated. Gene ontology biological processes (GO_BP) indicates the top enriched term of a given clique (GO_BP_FDR). B-C) Tumor overexpressed, ECM-associated dense modules, Clique 96 (B) and Clique 54 (C). Edge color indicates lack of overlap in BioGRID, BioPlex, HI-union, STRING, and CORUM (pink) or overlap in any of these resources (gray). D-E) Boxplots comparing average protein abundance of Clique 96 (D) and Clique 54 (E) in tumor and normal samples demonstrating tumor overexpression in five cancer cohorts. Number of samples, n, are indicated in parenthesis. P-values determined by two-sided Wilcoxon rank-sum test. F-G) Kaplan-Meier plots depicting overall survival (OS) difference in patients from indicated cohorts stratified by median value of the average abundance of proteins in Clique 96 (F) and Clique 54 (G). Logrank p-values and hazard ratio (HR) shown with 95% confidence intervals derived from Cox-proportional hazard models. Significance is indicated as ****p < 0.0001. For boxplots, centerline indicates the median, box limits indicate upper and lower quartiles, whiskers indicate the 1.5 interquartile range, and number of samples per group indicated in parentheses.

Source data

Extended Data Fig. 4 Connecting somatic mutations to functional protein modules.

A) Average pairwise Pearson’s correlation coefficient for genes in L2_M40 based on mRNA or protein data in different cancer types. B) Average pairwise Pearson’s correlation coefficient for genes in L3_M58 based on mRNA or protein data in different cancer types. C) Comparison of TP53 protein abundance (log2 MS1 intensity) in TP53 wildtype (wt) and mutant (mut) samples across 10 cancer types. Number of samples, n, are indicated in parenthesis. P-values were derived from two-sided Wilcoxon rank-sum test. Significance is indicated as *p < 0.05, **p < 0.01, ***p < 0.001, ****p < 0.0001, ns: not significant. For boxplots, centerline indicates the median, box limits indicate upper and lower quartiles, whiskers indicate the 1.5 interquartile range, and number of samples per group indicated in parentheses.

Source data

Extended Data Fig. 5 Illuminating understudied cancer proteins RBM34 and RBM12B.

A) Boxplots comparing protein abundance of RBM34 and RBM12B in tumor and normal samples demonstrating tumor over-expression in five cancer cohorts. Number of samples, n, are indicated in parenthesis. P-values determined by two-sided Wilcoxon rank-sum test. For boxplots, centerline indicates the median, box limits indicate upper and lower quartiles, whiskers indicate the 1.5 interquartile range, and number of samples per group indicated in parentheses. B) Barplots depicting frequency of somatic copy number and mutations in RBM34 and RBM12B from TCGA PanCancer Atlas Studies in cBioPortal. C-D) Network neighborhood of RBM34 (C) or RBM12B (D) with genes associated with the enriched GO terms highlighted.

Source data

Extended Data Fig. 6 Illuminating understudied cancer proteins CXorf38 and MAB21L4.

A) Boxplots comparing protein abundance of CXorf38 in tumor and normal samples demonstrating tumor over-expression in five cancer cohorts. Number of samples, n, are indicated in parenthesis. P-values determined by two-sided Wilcoxon rank-sum test. B) Relationship between protein abundance of CXorf38 and RNA-seq inferred ESTIMATE ImmunoScore in eight cancer types. P-values were derived from two-sided Spearman’s rank correlation. Shaded area depicts the 95% confidence interval. C) Single cell data from the Human Protein Atlas showing that CXorf38 is expressed across all cell types, but the highest expression occurs in immune cells. D) Boxplots comparing protein abundance of MAB21L4 in tumor and normal samples in five cancer cohorts. Number of samples, n, are indicated in parenthesis. P-values determined by two-sided Wilcoxon rank-sum test. Significance is indicated as *p < 0.05, **p < 0.01, ***p < 0.001, ****p < 0.0001, ns: not significant. For boxplots, centerline indicates the median, box limits indicate upper and lower quartiles, whiskers indicate the 1.5 interquartile range, and number of samples per group indicated in parentheses.

Source data

Extended Data Fig. 7 Graph neural network architecture for predicting cancer driver genes based on network topology and mutation data.

The model takes as input mutation data for genes represented in a feature matrix. Nodes in the graph correspond to genes, where pink nodes are known positive driver genes, orange nodes are hidden positive genes, and gray nodes are unlabeled genes. Both the node features and network topology are processed through hidden layers with ReLU activations. The output layer predicts gene classifications, with red nodes indicating predicted positive driver genes and blue nodes indicating predicted negative genes.

Supplementary information

Reporting Summary (download PDF )

Supplementary Tables 1–7 (download XLSX )

Supplementary Table 1. RNA and protein expression datasets. Supplementary Table 2. FunMap edges. Supplementary Table 3. FunMap dense modules and their characterization. Supplementary Table 4. FunMap hierarchical modules and their characterization. Supplementary Table 5. Importance of mutant genes in predicting module abundance. Supplementary Table 6. Dark gene characterization using FunMap. Supplementary Table 7. Driver gene prediction.

Source data

Source Data Fig. 1 (download XLSX )

Statistical source data.

Source Data Fig. 2 (download XLSX )

Statistical source data.

Source Data Fig. 3 (download XLSX )

Statistical source data.

Source Data Fig. 4 (download XLSX )

Statistical source data.

Source Data Fig. 5 (download XLSX )

Statistical source data.

Source Data Fig. 6 (download XLSX )

Statistical source data.

Source Data Fig. 7 (download XLSX )

Statistical source data.

Source Data Fig. 8 (download XLSX )

Statistical source data.

Source Data Extended Data Fig. 1 (download XLSX )

Statistical source data.

Source Data Extended Data Fig. 2 (download XLSX )

Statistical source data.

Source Data Extended Data Fig. 3 (download XLSX )

Statistical source data.

Source Data Extended Data Fig. 4 (download XLSX )

Statistical source data.

Source Data Extended Data Fig. 5 (download XLSX )

Statistical source data.

Source Data Extended Data Fig. 6 (download XLSX )

Statistical source data.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shi, Z., Lei, J.T., Elizarraras, J.M. et al. Mapping the functional network of human cancer through machine learning and pan-cancer proteogenomics. Nat Cancer 6, 205–222 (2025). https://doi.org/10.1038/s43018-024-00869-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1038/s43018-024-00869-z

This article is cited by

Search

Quick links

Nature Briefing: Cancer

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

Get what matters in cancer research, free to your inbox weekly. Sign up for Nature Briefing: Cancer