Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

A structurally informed human protein–protein interactome reveals proteome-wide perturbations caused by disease mutations

Abstract

To assist the translation of genetic findings to disease pathobiology and therapeutics discovery, we present an ensemble deep learning framework, termed PIONEER (Protein–protein InteractiOn iNtErfacE pRediction), that predicts protein-binding partner-specific interfaces for all known protein interactions in humans and seven other common model organisms to generate comprehensive structurally informed protein interactomes. We demonstrate that PIONEER outperforms existing state-of-the-art methods and experimentally validate its predictions. We show that disease-associated mutations are enriched in PIONEER-predicted protein–protein interfaces and explore their impact on disease prognosis and drug responses. We identify 586 significant protein–protein interactions (PPIs) enriched with PIONEER-predicted interface somatic mutations (termed oncoPPIs) from analysis of approximately 11,000 whole exomes across 33 cancer types and show significant associations of oncoPPIs with patient survival and drug responses. PIONEER, implemented as both a web server platform and a software package, identifies functional consequences of disease-associated alleles and offers a deep learning tool for precision medicine at multiscale interactome network levels.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Overview of the PIONEER framework.
Fig. 2: PIONEER-predicted PPI alleles are enriched in disease-associated mutations.
Fig. 3: A landscape of oncoPPIs identified by PIONEER across 33 cancer types (~11,000 cancer genomes).
Fig. 4: PIONEER-predicted oncoPPIs are associated with patient survival.
Fig. 5: PIONEER-predicted PPI-perturbing tumor alleles in ubiquitination by E3 ligases.

Similar content being viewed by others

Data availability

Mutation data from the TCGA study were downloaded from the National Cancer Instituteʼs Genomic Data Commons (https://portal.gdc.cancer.gov). The MSK MetTropism dataset was downloaded from the cBioPortal (https://www.cbioportal.org/study/summary?id=msk_met_2021). Variant data from the 1000 Genomes Project were downloaded from the National Center for Biotechnology Informationʼs FTP site (https://ftp-trace.ncbi.nih.gov/1000genomes/ftp). The ExAC dataset was downloaded from the Genome Aggregation Database (https://gnomad.broadinstitute.org/downloads#exac-variants). Variants collected by the HGMD were downloaded from https://www.hgmd.cf.ac.uk/ac/index.php. Genomic variants and drug response data of human cancer cell lines were downloaded from GDSC datasets (https://www.cancerrxgene.org/downloads/bulk_download). Genomic profiling of PDXs and drug response curve metrics of PDX clinical trials were downloaded from Supplementary Table 1 of the corresponding paper (https://www.nature.com/articles/nm.3954#Sec28). The homologous structures of PPIs that do not have co-crystal structures were collected from Interactome3D (https://interactome3d.irbbarcelona.org). The ModBase data were downloaded from https://modbase.compbio.ucsf.edu. The PDB data were downloaded from the PDB FTP site (https://files.wwpdb.org/pub/pdb/data/structures/divided/pdb). The AlphaFold2-predicted protein structures were download from the AlphaFold2 database (https://alphafold.ebi.ac.uk). All other data supporting the results in this study are available in the supplementary materials and at https://pioneer.yulab.org. Source data are provided with this paper.

Code availability

The source code of PIONEER is available at GitHub126.

References

  1. Nussinov, R., Jang, H., Nir, G., Tsai, C. J. & Cheng, F. Open structural data in precision medicine. Annu. Rev. Biomed. Data Sci. 5, 95–117 (2022).

    Article  PubMed  Google Scholar 

  2. Braberg, H., Echeverria, I., Kaake, R. M., Sali, A. & Krogan, N. J. From systems to structure—using genetic data to model protein structures. Nat. Rev. Genet. 23, 342–354 (2022).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  3. Vidal, M., Cusick, M. E. & Barabási, A.-L. Interactome networks and human disease. Cell 144, 986–998 (2011).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  4. Barabási, A.-L., Gulbahce, N. & Loscalzo, J. Network medicine: a network-based approach to human disease. Nat. Rev. Genet. 12, 56–68 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  5. Meyer, M. J. et al. Interactome INSIDER: a structural interactome browser for genomic studies. Nat. Methods 15, 107–114 (2018).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  6. Wang, X. et al. Three-dimensional reconstruction of protein networks provides insight into human genetic disease. Nat. Biotechnol. 30, 159–164 (2012).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  7. Cheng, F. et al. Comprehensive characterization of protein–protein interactions perturbed by disease mutations. Nat. Genet. 53, 342–353 (2021).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  8. Sahni, N. et al. Widespread macromolecular interaction perturbations in human genetic disorders. Cell 161, 647–660 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  9. Wierbowski, S. D. et al. A 3D structural SARS-CoV-2-human interactome to explore genetic and drug perturbations. Nat. Methods 18, 1477–1488 (2021).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  10. Evans, R. et al. Protein complex prediction with AlphaFold-Multimer. Preprint at bioRxiv https://doi.org/10.1101/2021.10.04.463034 (2022).

  11. Gao, M., Nakajima An, D., Parks, J. M. & Skolnick, J. AF2Complex predicts direct physical interactions in multimeric proteins with deep learning. Nat. Commun. 13, 1744 (2022).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  12. Burke, D. F. et al. Towards a structurally resolved human protein interaction network. Nat. Struct. Mol. Biol. 30, 216–225 (2023).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  13. Tunyasuvunakool, K. et al. Highly accurate protein structure prediction for the human proteome. Nature 596, 590–596 (2021).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  14. Bianchi, F. M., Grattarola, D., Livi, L. & Alippi, C. Graph neural networks with convolutional ARMA filters. IEEE Trans. Pattern Anal. Mach. Intell. 44, 3496–3507 (2022).

    PubMed  Google Scholar 

  15. Cho, K. et al. Learning phrase representations using rnn encoder–decoder for statistical machine translation. In Proc. of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) 1724–1734 (Association for Computational Linguistics, 2014).

  16. Zhuang, F. et al. A comprehensive survey on transfer learning. Proc. of the IEEE 109, 43–76 (2021).

    Article  Google Scholar 

  17. Krapp, L. F., Abriata, L. A., Cortes Rodriguez, F. & Dal Peraro, M. PeSTo: parameter-free geometric deep learning for accurate prediction of protein binding interfaces. Nat. Commun. 14, 2175 (2023).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  18. Tubiana, J., Schneidman-Duhovny, D. & Wolfson, H. J. ScanNet: an interpretable geometric deep learning model for structure-based protein binding site prediction. Nat. Methods 19, 730–739 (2022).

    Article  PubMed  CAS  Google Scholar 

  19. Gainza, P. et al. Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning. Nat. Methods 17, 184–192 (2020).

    Article  PubMed  CAS  Google Scholar 

  20. Sanchez-Garcia, R., Macias, J. R., Sorzano, C. O. S., Carazo, J. M. & Segura, J. BIPSPI+: mining type-specific datasets of protein complexes to improve protein binding site prediction. J. Mol. Biol. 434, 167556 (2022).

    Article  PubMed  CAS  Google Scholar 

  21. Zeng, M. et al. Protein–protein interaction site prediction through combining local and global features with deep neural networks. Bioinformatics 36, 1114–1120 (2020).

    Article  PubMed  CAS  Google Scholar 

  22. Townshend, R. J. L., Bedi, R., Suriana, P. A. & Dror, R. O. End-to-end learning on 3D protein structure for interface prediction. 33rd Conference on Neural Information Processing Systems. https://proceedings.neurips.cc/paper_files/paper/2019/file/6c7de1f27f7de61a6daddfffbe05c058-Paper.pdf(NeurIPS, 2019).

  23. Fout, A., Byrd, J., Shariat, B. & Ben-Hur, A. Protein interface prediction using graph convolutional networks. Advances in Neural Information Processing Systems 30. https://papers.nips.cc/paper_files/paper/2017/file/f507783927f2ec2737ba40afbd17efb5-Paper.pdf (NIPS, 2017).

  24. Lensink, M. F. & Wodak, S. J. Score_set: a CAPRI benchmark for scoring protein complexes. Proteins 82, 3163–3169 (2014).

    Article  PubMed  CAS  Google Scholar 

  25. UniProt Consortium. UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 47, D506–D515 (2019).

    Article  Google Scholar 

  26. Das, J. & Yu, H. HINT: high-quality protein interactomes and their applications in understanding human disease. BMC Syst. Biol. 6, 92 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  27. Oughtred, R. et al. The BioGRID interaction database: 2019 update. Nucleic Acids Res. 47, D529–D541 (2019).

    Article  PubMed  CAS  Google Scholar 

  28. Salwinski, L. et al. The database of interacting proteins: 2004 update. Nucleic Acids Res. 32, D449–D451 (2004).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  29. Orchard, S. et al. The MIntAct project—IntAct as a common curation platform for 11 molecular interaction databases. Nucleic Acids Res. 42, D358–D363 (2014).

    Article  PubMed  CAS  Google Scholar 

  30. Licata, L. et al. MINT, the molecular interaction database: 2012 update. Nucleic Acids Res. 40, D857–D861 (2012).

    Article  PubMed  CAS  Google Scholar 

  31. Turner, B. et al. iRefWeb: interactive analysis of consolidated protein interaction data and their supporting evidence. Database 2010, baq023 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  32. Keshava Prasad, T. S. et al. Human protein reference database—2009 update. Nucleic Acids Res. 37, D767–D772 (2009).

    Article  PubMed  CAS  Google Scholar 

  33. Mewes, H. W. et al. MIPS: curated databases and comprehensive secondary data resources in 2010. Nucleic Acids Res. 39, D220–D224 (2011).

    Article  PubMed  CAS  Google Scholar 

  34. Nelson, L. & Cox, M. Lehninger Principles of Biochemistry 7th edn (W.H. Freeman, 2017).

  35. Rose, G. D., Geselowitz, A. R., Lesser, G. J., Lee, R. H. & Zehfus, M. H. Hydrophobicity of amino acid residues in globular proteins. Science 229, 834–838 (1985).

    Article  PubMed  CAS  Google Scholar 

  36. Aftabuddin, M. & Kundu, S. Hydrophobic, hydrophilic, and charged amino acid networks within protein. Biophys. J. 93, 225–231 (2007).

    Article  PubMed  CAS  Google Scholar 

  37. Tsai, C. J., Lin, S. L., Wolfson, H. J. & Nussinov, R. Studies of protein–protein interfaces: a statistical analysis of the hydrophobic effect. Protein Sci. 6, 53–64 (1997).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  38. Ansari, S. & Helms, V. Statistical analysis of predominantly transient protein–protein interfaces. Proteins 61, 344–355 (2005).

    Article  PubMed  CAS  Google Scholar 

  39. Burley, S. K. et al. RCSB Protein Data Bank: celebrating 50 years of the PDB with new tools for understanding and visualizing biological macromolecules in 3D. Protein Sci. 31, 187–208 (2022).

    Article  PubMed  CAS  Google Scholar 

  40. Wei, X. et al. A massively parallel pipeline to clone DNA variants and examine molecular phenotypes of human disease mutations. PLoS Genet. 10, e1004819 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  41. Xiong, D., Lee, D., Li, L., Zhao, Q. & Yu, H. Implications of disease-related mutations at protein–protein interfaces. Curr. Opin. Struct. Biol. 72, 219–225 (2022).

    Article  PubMed  CAS  Google Scholar 

  42. Stenson, P. D. et al. The Human Gene Mutation Database: towards a comprehensive repository of inherited mutation data for medical research, genetic diagnosis and next-generation sequencing studies. Hum. Genet. 136, 665–677 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  43. Auton, A. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).

    Article  PubMed  Google Scholar 

  44. Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  45. Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310–315 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  46. Schymkowitz, J. et al. The FoldX web server: an online force field. Nucleic Acids Res. 33, W382–W388 (2005).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  47. Zhou, Y. et al. A network medicine approach to investigation and population-based validation of disease manifestations and drug repurposing for COVID-19. PLoS Biol. 18, e3000970 (2020).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  48. Plasilova, M. et al. Homozygous missense mutation in the lamin A/C gene causes autosomal recessive Hutchinson–Gilford progeria syndrome. J. Med. Genet. 41, 609–614 (2004).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  49. Favretto, F. et al. The molecular basis of the interaction of cyclophilin A with α-synuclein. Angew. Chem. Int. Ed. 59, 5643–5646 (2020).

    Article  CAS  Google Scholar 

  50. Liu, Q. et al. HIF2A germline–mutation-induced polycythemia in a patient with VHL-associated renal-cell carcinoma. Cancer Biol. Ther. 18, 944–947 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  51. Tarade, D., Robinson, C. M., Lee, J. E. & Ohh, M. HIF-2α-pVHL complex reveals broad genotype-phenotype correlations in HIF-2α-driven disease. Nat. Commun. 9, 3359 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  52. V, F. R. L. et al. Three novel EPAS1/HIF2A somatic and germline mutations associated with polycythemia and pheochromocytoma/paraganglioma. Blood 120, 2080 (2012).

    Article  Google Scholar 

  53. Chang, K. et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 45, 1113–1120 (2013).

    Article  CAS  Google Scholar 

  54. Nguyen, B. et al. Genomic characterization of metastatic patterns from prospective clinical sequencing of 25,000 patients. Cell 185, 563–575 (2022).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  55. Rabara, D. et al. KRAS G13D sensitivity to neurofibromin-mediated GTP hydrolysis. Proc. Natl Acad. Sci. USA 116, 22122–22131 (2019).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  56. Wang, Z. et al. The diverse roles of SPOP in prostate cancer and kidney cancer. Nat. Rev. Urol. 17, 339–350 (2020).

    Article  PubMed  CAS  Google Scholar 

  57. Song, Y. et al. The emerging role of SPOP protein in tumorigenesis and cancer therapy. Mol. Cancer 19, 2 (2020).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  58. Xu, J. & Lin, D. I. Oncogenic c-terminal cyclin D1 (CCND1) mutations are enriched in endometrioid endometrial adenocarcinomas. PLoS ONE 13, e0199688 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  59. Ryu, D. et al. Alterations in the transcriptional programs of myeloma cells and the microenvironment during extramedullary progression affect proliferation and immune evasion. Clin. Cancer Res. 26, 935–944 (2020).

    Article  PubMed  CAS  Google Scholar 

  60. Zhang, M. et al. CanProVar 2.0: an updated database of human cancer proteome variation. J. Proteome Res. 16, 421–432 (2017).

    Article  PubMed  CAS  Google Scholar 

  61. Mészáros, B., Kumar, M., Gibson, T. J., Uyar, B. & Dosztányi, Z. Degrons in cancer. Sci. Signal. 10, eaak9982 (2017).

    Article  PubMed  Google Scholar 

  62. Yang, Q., Zhao, J., Chen, D. & Wang, Y. E3 ubiquitin ligases: styles, structures and functions. Mol. Biomed. 2, 23 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  63. Senft, D., Qi, J. & Ronai, Z. E. A. Ubiquitin ligases in oncogenic transformation and cancer therapy. Nat. Rev. Cancer 18, 69–88 (2018).

    Article  PubMed  CAS  Google Scholar 

  64. Han, Y., Lee, H., Park, J. C. & Yi, G. S. E3Net: a system for exploring E3-mediated regulatory networks of cellular functions. Mol. Cell. Proteomics 11, O111.014076 (2012).

    Article  PubMed  Google Scholar 

  65. Li, Z. et al. UbiNet 2.0: a verified, classified, annotated and updated database of E3 ubiquitin ligase–substrate interactions. Database 2021, baab010 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  66. Mena, E. L. et al. Dimerization quality control ensures neuronal development and survival. Science 362, eaap8236 (2018).

    Article  PubMed  Google Scholar 

  67. Wang, Q. et al. Alterations of anaphase-promoting complex genes in human colon cancer cells. Oncogene 22, 1486–1490 (2003).

    Article  PubMed  CAS  Google Scholar 

  68. Yin, Q., Wyatt, C. J., Han, T., Smalley, K. S. M. & Wan, L. ITCH as a potential therapeutic target in human cancers. Semin. Cancer Biol. 67, 117–130 (2020).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  69. Li, L. et al. CHIP mediates degradation of Smad proteins and potentially regulates Smad-induced transcription. Mol. Cell. Biol. 24, 856–864 (2004).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  70. Tsai, W.-W. et al. TRIM24 links a non-canonical histone signature to breast cancer. Nature 468, 927–932 (2010).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  71. Lv, D. et al. TRIM24 is an oncogenic transcriptional co-activator of STAT3 in glioblastoma. Nat. Commun. 8, 1454 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  72. Cuadrado, A. et al. Therapeutic targeting of the NRF2 and KEAP1 partnership in chronic diseases. Nat. Rev. Drug Discov. 18, 295–317 (2019).

    Article  PubMed  CAS  Google Scholar 

  73. Furukawa, M. & Xiong, Y. BTB protein Keap1 targets antioxidant transcription factor Nrf2 for ubiquitination by the Cullin 3-Roc1 ligase. Mol. Cell. Biol. 25, 162–171 (2005).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  74. Fukutomi, T., Takagi, K., Mizushima, T., Ohuchi, N. & Yamamoto, M. Kinetic, thermodynamic, and structural characterizations of the association between Nrf2-DLGex Degron and Keap1. Mol. Cell. Biol. 34, 832–846 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  75. Gao, H. et al. High-throughput screening using patient-derived tumor xenografts to predict clinical trial drug response. Nat. Med. 21, 1318–1325 (2015).

    Article  PubMed  CAS  Google Scholar 

  76. Vasaikar, S. et al. Proteogenomic analysis of human colon cancer reveals new therapeutic opportunities. Cell 177, 1035–1049.e19 (2019).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  77. Abi-Habib, R. J. et al. BRAF status and mitogen-activated protein/extracellular signal-regulated kinase kinase 1/2 activity indicate sensitivity of melanoma cells to anthrax lethal toxin. Mol. Cancer Ther. 4, 1303–1310 (2005).

    Article  PubMed  CAS  Google Scholar 

  78. Roberts, P. J. & Der, C. J. Targeting the Raf-MEK-ERK mitogen-activated protein kinase cascade for the treatment of cancer. Oncogene 26, 3291–3310 (2007).

    Article  PubMed  CAS  Google Scholar 

  79. Endres, N. F. et al. Conformational coupling across the plasma membrane in activation of the EGF receptor. Cell 152, 543–556 (2013).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  80. Lu, C. F. et al. Structural evidence for loose linkage between ligand binding and kinase activation in the epidermal growth factor receptor. Mol. Cell. Biol. 30, 5432–5443 (2010).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  81. Liang, S. I. et al. Phosphorylated EGFR dimers are not sufficient to activate ras. Cell Rep. 22, 2593–2600 (2018).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  82. Bishayee, A., Beguinot, L. & Bishayee, S. Phosphorylation of tyrosine 992, 1068, and 1086 is required for conformational change of the human epidermal growth factor receptor C-terminal tail. Mol. Biol. Cell. 10, 525–536 (1999).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  83. Siegelin, M. D. & Borczuk, A. C. Epidermal growth factor receptor mutations in lung adenocarcinoma. Lab Invest. 94, 129–137 (2014).

    Article  PubMed  CAS  Google Scholar 

  84. Hillig, R. C. et al. Discovery of potent SOS1 inhibitors that block RAS activation via disruption of the RAS–SOS1 interaction. Proc. Natl Acad. Sci. USA 116, 2551–2560 (2019).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  85. You, X. et al. Unique dependence on Sos1 in KrasG12D-induced leukemogenesis. Blood 132, 2575–2579 (2018).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  86. Hofmann, M. H. et al. Trial in process: phase 1 studies of BI 1701963, a SOS1::KRAS inhibitor, in combination with MEK inhibitors, irreversible KRASG12C inhibitors or irinotecan. Cancer Res. 81, CT210 (2021).

    Article  Google Scholar 

  87. Huijberts, S. C. F. A. et al. Phase I study of lapatinib plus trametinib in patients with KRAS-mutant colorectal, non-small cell lung, and pancreatic cancer. Cancer Chemother. Pharmacol. 85, 917–930 (2020).

    Article  PubMed  CAS  Google Scholar 

  88. Cho, M. et al. A phase I clinical trial of binimetinib in combination with FOLFOX in patients with advanced metastatic colorectal cancer who failed prior standard therapy. Oncotarget 8, 79750–79760 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  89. Hofmann, M. H. et al. BI-3406, a potent and selective SOS1–KRAS interaction inhibitor, is effective in KRAS-driven cancers through combined MEK inhibition. Cancer Discov. 11, 142–157 (2021).

    Article  PubMed  CAS  Google Scholar 

  90. Liu, F., Yang, X., Geng, M. & Huang, M. Targeting ERK, an Achilles’ Heel of the MAPK pathway, in cancer therapy. Acta Pharm. Sin. B 8, 552–562 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  91. Tran, T. H. et al. KRAS interaction with RAF1 RAS-binding domain and cysteine-rich domain provides insights into RAS-mediated RAF activation. Nat. Commun. 12, 1176 (2021).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  92. Patelli, G. et al. Strategies to tackle RAS-mutated metastatic colorectal cancer. ESMO Open 6, 100156 (2021).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  93. Li, Z.-N., Zhao, L., Yu, L.-F. & Wei, M.-J. BRAF and KRAS mutations in metastatic colorectal cancer: future perspectives for personalized therapy. Gastroenterol. Rep. 8, 192–205 (2020).

    Article  Google Scholar 

  94. Corcoran, R. B. et al. Combined BRAF, EGFR, and MEK inhibition in patients with BRAFV600E-mutant colorectal cancer. Cancer Discov. 8, 428–443 (2018).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  95. Lin, Q. et al. The association between BRAF mutation class and clinical features in BRAF-mutant Chinese non-small cell lung cancer patients. J. Transl. Med. 17, 298 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  96. Caunt, C. J., Sale, M. J., Smith, P. D. & Cook, S. J. MEK1 and MEK2 inhibitors and cancer therapy: the long and winding road. Nat. Rev. Cancer 15, 577–592 (2015).

    Article  PubMed  CAS  Google Scholar 

  97. Huang, K. L. et al. Regulated phosphosignaling associated with breast cancer subtypes and druggability. Mol. Cell. Proteomics 18, 1630–1650 (2019).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  98. Landrum, M. J. et al. ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res. 44, D862–D868 (2016).

    Article  PubMed  CAS  Google Scholar 

  99. Luck, K. et al. A reference map of the human binary protein interactome. Nature 580, 402–408 (2020).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  100. Huttlin, E. L. et al. The BioPlex network: a systematic exploration of the human interactome. Cell 162, 425–440 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  101. Cho, N. H. et al. OpenCell: endogenous tagging for the cartography of human cellular organization. Science 375, eabi6983 (2022).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  102. Petrey, D., Zhao, H., Trudeau, S. J., Murray, D. & Honig, B. PrePPI: a structure informed proteome-wide database of protein–protein interactions. J. Mol. Biol. 435, 168052 (2023).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  103. Gao, Z. et al. Hierarchical graph learning for protein–protein interaction. Nat. Commun. 14, 1093 (2023).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  104. Pieper, U. et al. ModBase, a database of annotated comparative protein structure models and associated resources. Nucleic Acids Res. 42, D336–D346 (2014).

    Article  PubMed  CAS  Google Scholar 

  105. Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023).

    Article  PubMed  CAS  Google Scholar 

  106. Su, J. et al. SaProt: protein language modeling with structure-aware vocabulary. The Twelfth International Conference on Learning Representations. https://openreview.net/pdf?id=6MRm3G4NiU (ICLR, 2023).

  107. Taliun, D. et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature 590, 290–299 (2021).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  108. Gary, W. B. et al. The Alzheimer’s disease sequencing project: study design and sample selection. Neurol. Genet. 3, e194 (2017).

    Article  Google Scholar 

  109. Mosca, R., Céol, A. & Aloy, P. Interactome3D: adding structural details to protein networks. Nat. Methods 10, 47–53 (2013).

    Article  PubMed  CAS  Google Scholar 

  110. Velankar, S. et al. SIFTS: structure integration with function, taxonomy and sequences resource. Nucleic Acids Res. 41, D483–D489 (2013).

    Article  PubMed  CAS  Google Scholar 

  111. Lee, B. & Richards, F. M. The interpretation of protein structures: estimation of static accessibility. J. Mol. Biol. 55, 379–400 (1971).

    Article  PubMed  CAS  Google Scholar 

  112. Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  113. Sievers, F. & Higgins, D. G. Clustal Omega for making accurate alignments of many protein sequences. Protein Sci. 27, 135–145 (2018).

    Article  PubMed  CAS  Google Scholar 

  114. Fu, L., Niu, B., Zhu, Z., Wu, S. & Li, W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28, 3150–3152 (2012).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  115. Pierce, B. G. et al. ZDOCK server: interactive docking prediction of protein-protein complexes and symmetric multimers. Bioinformatics 30, 1771–1773 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  116. Scardapane, S., Van Vaerenbergh, S., Totaro, S. & Uncini, A. Kafnets: kernel-based non-parametric activation functions for neural networks. Neural Netw. 110, 19–32 (2019).

    Article  PubMed  Google Scholar 

  117. Li, Y., Golding, G. B. & Ilie, L. DELPHI: accurate deep ensemble model for protein interaction sites prediction. Bioinformatics 37, 896–904 (2021).

    Article  PubMed  CAS  Google Scholar 

  118. Zhang, J. & Kurgan, L. SCRIBER: accurate and partner type-specific prediction of protein-binding residues from proteins sequences. Bioinformatics 35, i343–i353 (2019).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  119. Zhang, B., Li, J., Quan, L., Chen, Y. & Lü, Q. Sequence-based prediction of protein–protein interaction sites by simplified long short-term memory network. Neurocomputing 357, 86–100 (2019).

    Article  Google Scholar 

  120. Fu, W. et al. Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants. Nature 493, 216–220 (2013).

    Article  PubMed  CAS  Google Scholar 

  121. Walhout, A. J. M. & Vidal, M. High-throughput yeast two-hybrid assays for large-scale protein interaction mapping. Methods 24, 297–306 (2001).

    Article  PubMed  CAS  Google Scholar 

  122. McLaren, W. et al. The Ensembl Variant Effect Predictor. Genome Biol. 17, 122 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  123. Dominguez, C., Boelens, R. & Bonvin, A. M. J. J. HADDOCK: a protein−protein docking approach based on biochemical or biophysical information. J. Am. Chem. Soc. 125, 1731–1737 (2003).

    Article  PubMed  CAS  Google Scholar 

  124. Schneider, C. A., Rasband, W. S. & Eliceiri, K. W. NIH Image to ImageJ: 25 years of image analysis. Nat. Methods 9, 671–675 (2012).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  125. Wu, E. L. et al. CHARMM-GUI Membrane Builder toward realistic biological membrane simulations. J. Comput. Chem. 35, 1997–2004 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  126. Xiong, D., Lee, D. & Liang, S. GitHub code repository for PIONEER. https://github.com/hyulab/PIONEER (2024).

Download references

Acknowledgements

This work was supported by the National Institute of General Medical Sciences (R01GM124559, R01GM125639, R01GM130885 and RM1GM139738) and the National Institute of Diabetes and Digestive and Kidney Diseases (R01DK115398) to H.Y.; the National Institute on Aging (R01AG084250, R56AG074001, U01AG073323, R01AG066707, R01AG076448, R01AG082118, RF1AG082211 and R21AG083003) and the National Institute of Neurological Disorders and Stroke (RF1NS133812) to F.C.; and the National Human Genome Research Institute (U01HG007691), the National Heart, Lung, and Blood Institute (R01HL155107, R01HL155096, R01HL166137 and U54HL119145), the American Heart Association (AHA957729 and 24MERIT1185447) and European Union Horizon Health 2021 (101057619) to J.L. C.E. is the Sondra J. and Stephen R. Hardis Chair of Cancer Genomic Medicine at the Cleveland Clinic. This work partially used Jetstream2 at Indiana University through allocation BIO220060 from the Advanced Cyberinfrastructure Coordination Ecosystem: Services and Support program, which is supported by the National Science Foundation (2138259, 2138286, 2138307, 2137603 and 2138296).

Author information

Authors and Affiliations

Authors

Contributions

D.X., Y.Q., J.Z., Y.Z., D.L., F.C. and H.Y. conceived and developed the project. Under close supervision of H.Y., D.X., D.L. and S.L. developed the models and conducted computational experiments; and D.X., S.G., M.T. and S.L. built the web server. W.L. and J.K. conducted biological experiments. D.X., Y.Q., J.Z., Y.Z., D.L., F.C. and H.Y. performed the analyses. D.X., Y.Q., J.Z., Y.Z., D.L., S.G., C.E., J.L., F.C. and H.Y. wrote and critically revised the manuscript. All authors discussed the results and reviewed the manuscript.

Corresponding authors

Correspondence to Feixiong Cheng or Haiyuan Yu.

Ethics declarations

Competing interests

J.L. is co-scientific founder of Scipher Medicine, Inc., which applies network medicine strategies to biomarker development and personalized drug selection. The remaining authors declare no competing interests.

Peer review

Peer review information

Nature Biotechnology thanks Leng Han and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 PIONEER provides high-quality interfaces for the whole proteome.

a, Workflow for compiling interactome PIONEER. The interfaces calculated from experimentally determined co-crystal structures or homology models are primarily used, the remaining unresolved interactions are predicted by PIONEER. b, Percentage of CAPRI decoys having a given average PIONEER prediction score at interfaces. Percentages are plotted along the y axis for 4 classes of CAPRI models. The total number of models in each class is indicated in the text in the figure. c, Fraction of interactions disrupted by random population variants in PIONEER-predicted and known interfaces. The error bar denotes standard error for the binomial distribution. Significance was determined by two-sided z-test. The n numbers are shown in Supplementary Table 6. d, Enrichment of disease-associated mutations in PIONEER-predicted and known interfaces. The error bar denotes standard error for the log odds ratio. Significance was determined by two-sided z-test. The n numbers are shown in Supplementary Table 7. e, Enrichment of population variants in PIONEER-predicted and known interfaces. The error bar denotes standard error for the log odds ratio. The n numbers are shown in Supplementary Table 8.

Extended Data Fig. 2 Pharmacogenomic landscape identified by the PIONEER-predicted interactome network.

a, Drug responses evaluated by oncoPPIs in the PDX models. Effect size was quantified by Cohen’s d statistic using the difference between two means divided by a pooled s.d. for the data. Significance was determined by ANOVA adjusted by Benjamini-Hochberg method. b, Circos plot displaying drug responses evaluated by putative PIONEER-predicted oncoPPIs harboring a statistically significant excess number of missense mutations at PPI interfaces, following a binomial distribution across selected anti-cancer therapeutic agents in cancer cell lines. Each node denotes a specific oncoPPI. Node size denotes significance determined by ANOVA. Effect size was quantified by Cohen’s d statistic using the difference between two means divided by a pooled s.d. for the data. Node color denotes three different types of PPIs: (1) PDB: Red; (2) HM: Blue; and (3) PIONEER: Green. ‘HM’ represents homolog models. c, Highlighted examples of drug responses. Data are represented as a box plot with an underlaid violin plot in which the middle line is the median, the lower and upper edges of the box are the first and third quartiles, the whiskers represent IQR × 1.5, and the dots are outlier points. Significance was determined by ANOVA. The n numbers are shown in Supplementary Table 16.

Extended Data Fig. 3 Proteogenomics of the PIONEER-predicted interactome network.

a, Phosphorylation-associated PPI-perturbing mutations altered the proteomic changes in COAD and UCEC. The abundance of proteins was quantified using the TMT technique. Data are represented as a box plot with an underlaid violin plot in which the middle line is the median, the lower and upper edges of the box are the first and third quartiles, the whiskers represent IQR × 1.5, and the dots are outlier points. Significance was determined by two-tailed Wilcoxon rank-sum test. The n numbers are shown in Supplementary Table 17. b, Phosphorylation-associated PPI-perturbing mutations in the EGFR–RAS–RAF–MEK–ERK cascade signaling pathway. The whole transmembrane EGFR structures were constructed by three crystal structures (PDB: 3NJP, 2M20, 2GS6). The membrane model is shown in green. The phosphorylation sites are indicated by the symbol 'P'. The detailed interface structure of SOS1–KRAS is also shown in the inset. The key mutated residue Gln61 on KRAS forms a hydrogen bond (purple dashed line) with residue Thr935 on SOS1, and Tyr884 on SOS1 is involved in a cation-π interaction (red dash line) with residue Arg73 on KRAS. Two subunits of RAF protein structure models were built by RAF1 and BRAF, separately (PDB: 6VJJ and 6Q0J). The two subunits are connected by a disordered loop indicated by blue cartoon lines. Two heterodimers of KRAS–RAF1 and BRAF–MEK1 constitutes the KRAS–RAF–MEK1 complex. PDB ID of each complex structure model is provided. c, Highlighted examples of drug responses. Data are represented as a box plot with an underlaid violin plot in which the middle line is the median, the lower and upper edges of the box are the first and third quartiles, the whiskers represent IQR × 1.5, and the dots are outlier points. Significance was determined by ANOVA. The n numbers are shown in Supplementary Table 16.

Supplementary information

Supplementary Information

Supplementary Methods and Supplementary Figs. 1–14.

Reporting Summary

Supplementary Table 1

Supplementary Tables 1–18.

Supplementary Data 1

The labeled dataset used for PIONEER model training, validation and testing.

Supplementary Data 2

Somatic mutations in 33 cancer types of TCGA.

Supplementary Data 3

Significance test of somatic mutation enrichment in PPI interfaces by 33 cancer types in TCGA.

Source data

Source Data Fig. 4

Unprocessed western blots for Fig. 4d.

Source Data Fig. 5

Unprocessed western blots for Fig. 5d.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xiong, D., Qiu, Y., Zhao, J. et al. A structurally informed human protein–protein interactome reveals proteome-wide perturbations caused by disease mutations. Nat Biotechnol 43, 1510–1524 (2025). https://doi.org/10.1038/s41587-024-02428-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue date:

  • DOI: https://doi.org/10.1038/s41587-024-02428-4

This article is cited by

Search

Quick links

Nature Briefing: Cancer

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

Get what matters in cancer research, free to your inbox weekly. Sign up for Nature Briefing: Cancer