Abstract
To assist the translation of genetic findings to disease pathobiology and therapeutics discovery, we present an ensemble deep learning framework, termed PIONEER (Protein–protein InteractiOn iNtErfacE pRediction), that predicts protein-binding partner-specific interfaces for all known protein interactions in humans and seven other common model organisms to generate comprehensive structurally informed protein interactomes. We demonstrate that PIONEER outperforms existing state-of-the-art methods and experimentally validate its predictions. We show that disease-associated mutations are enriched in PIONEER-predicted protein–protein interfaces and explore their impact on disease prognosis and drug responses. We identify 586 significant protein–protein interactions (PPIs) enriched with PIONEER-predicted interface somatic mutations (termed oncoPPIs) from analysis of approximately 11,000 whole exomes across 33 cancer types and show significant associations of oncoPPIs with patient survival and drug responses. PIONEER, implemented as both a web server platform and a software package, identifies functional consequences of disease-associated alleles and offers a deep learning tool for precision medicine at multiscale interactome network levels.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout





Similar content being viewed by others
Data availability
Mutation data from the TCGA study were downloaded from the National Cancer Instituteʼs Genomic Data Commons (https://portal.gdc.cancer.gov). The MSK MetTropism dataset was downloaded from the cBioPortal (https://www.cbioportal.org/study/summary?id=msk_met_2021). Variant data from the 1000 Genomes Project were downloaded from the National Center for Biotechnology Informationʼs FTP site (https://ftp-trace.ncbi.nih.gov/1000genomes/ftp). The ExAC dataset was downloaded from the Genome Aggregation Database (https://gnomad.broadinstitute.org/downloads#exac-variants). Variants collected by the HGMD were downloaded from https://www.hgmd.cf.ac.uk/ac/index.php. Genomic variants and drug response data of human cancer cell lines were downloaded from GDSC datasets (https://www.cancerrxgene.org/downloads/bulk_download). Genomic profiling of PDXs and drug response curve metrics of PDX clinical trials were downloaded from Supplementary Table 1 of the corresponding paper (https://www.nature.com/articles/nm.3954#Sec28). The homologous structures of PPIs that do not have co-crystal structures were collected from Interactome3D (https://interactome3d.irbbarcelona.org). The ModBase data were downloaded from https://modbase.compbio.ucsf.edu. The PDB data were downloaded from the PDB FTP site (https://files.wwpdb.org/pub/pdb/data/structures/divided/pdb). The AlphaFold2-predicted protein structures were download from the AlphaFold2 database (https://alphafold.ebi.ac.uk). All other data supporting the results in this study are available in the supplementary materials and at https://pioneer.yulab.org. Source data are provided with this paper.
Code availability
The source code of PIONEER is available at GitHub126.
References
Nussinov, R., Jang, H., Nir, G., Tsai, C. J. & Cheng, F. Open structural data in precision medicine. Annu. Rev. Biomed. Data Sci. 5, 95–117 (2022).
Braberg, H., Echeverria, I., Kaake, R. M., Sali, A. & Krogan, N. J. From systems to structure—using genetic data to model protein structures. Nat. Rev. Genet. 23, 342–354 (2022).
Vidal, M., Cusick, M. E. & Barabási, A.-L. Interactome networks and human disease. Cell 144, 986–998 (2011).
Barabási, A.-L., Gulbahce, N. & Loscalzo, J. Network medicine: a network-based approach to human disease. Nat. Rev. Genet. 12, 56–68 (2011).
Meyer, M. J. et al. Interactome INSIDER: a structural interactome browser for genomic studies. Nat. Methods 15, 107–114 (2018).
Wang, X. et al. Three-dimensional reconstruction of protein networks provides insight into human genetic disease. Nat. Biotechnol. 30, 159–164 (2012).
Cheng, F. et al. Comprehensive characterization of protein–protein interactions perturbed by disease mutations. Nat. Genet. 53, 342–353 (2021).
Sahni, N. et al. Widespread macromolecular interaction perturbations in human genetic disorders. Cell 161, 647–660 (2015).
Wierbowski, S. D. et al. A 3D structural SARS-CoV-2-human interactome to explore genetic and drug perturbations. Nat. Methods 18, 1477–1488 (2021).
Evans, R. et al. Protein complex prediction with AlphaFold-Multimer. Preprint at bioRxiv https://doi.org/10.1101/2021.10.04.463034 (2022).
Gao, M., Nakajima An, D., Parks, J. M. & Skolnick, J. AF2Complex predicts direct physical interactions in multimeric proteins with deep learning. Nat. Commun. 13, 1744 (2022).
Burke, D. F. et al. Towards a structurally resolved human protein interaction network. Nat. Struct. Mol. Biol. 30, 216–225 (2023).
Tunyasuvunakool, K. et al. Highly accurate protein structure prediction for the human proteome. Nature 596, 590–596 (2021).
Bianchi, F. M., Grattarola, D., Livi, L. & Alippi, C. Graph neural networks with convolutional ARMA filters. IEEE Trans. Pattern Anal. Mach. Intell. 44, 3496–3507 (2022).
Cho, K. et al. Learning phrase representations using rnn encoder–decoder for statistical machine translation. In Proc. of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) 1724–1734 (Association for Computational Linguistics, 2014).
Zhuang, F. et al. A comprehensive survey on transfer learning. Proc. of the IEEE 109, 43–76 (2021).
Krapp, L. F., Abriata, L. A., Cortes Rodriguez, F. & Dal Peraro, M. PeSTo: parameter-free geometric deep learning for accurate prediction of protein binding interfaces. Nat. Commun. 14, 2175 (2023).
Tubiana, J., Schneidman-Duhovny, D. & Wolfson, H. J. ScanNet: an interpretable geometric deep learning model for structure-based protein binding site prediction. Nat. Methods 19, 730–739 (2022).
Gainza, P. et al. Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning. Nat. Methods 17, 184–192 (2020).
Sanchez-Garcia, R., Macias, J. R., Sorzano, C. O. S., Carazo, J. M. & Segura, J. BIPSPI+: mining type-specific datasets of protein complexes to improve protein binding site prediction. J. Mol. Biol. 434, 167556 (2022).
Zeng, M. et al. Protein–protein interaction site prediction through combining local and global features with deep neural networks. Bioinformatics 36, 1114–1120 (2020).
Townshend, R. J. L., Bedi, R., Suriana, P. A. & Dror, R. O. End-to-end learning on 3D protein structure for interface prediction. 33rd Conference on Neural Information Processing Systems. https://proceedings.neurips.cc/paper_files/paper/2019/file/6c7de1f27f7de61a6daddfffbe05c058-Paper.pdf(NeurIPS, 2019).
Fout, A., Byrd, J., Shariat, B. & Ben-Hur, A. Protein interface prediction using graph convolutional networks. Advances in Neural Information Processing Systems 30. https://papers.nips.cc/paper_files/paper/2017/file/f507783927f2ec2737ba40afbd17efb5-Paper.pdf (NIPS, 2017).
Lensink, M. F. & Wodak, S. J. Score_set: a CAPRI benchmark for scoring protein complexes. Proteins 82, 3163–3169 (2014).
UniProt Consortium. UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 47, D506–D515 (2019).
Das, J. & Yu, H. HINT: high-quality protein interactomes and their applications in understanding human disease. BMC Syst. Biol. 6, 92 (2012).
Oughtred, R. et al. The BioGRID interaction database: 2019 update. Nucleic Acids Res. 47, D529–D541 (2019).
Salwinski, L. et al. The database of interacting proteins: 2004 update. Nucleic Acids Res. 32, D449–D451 (2004).
Orchard, S. et al. The MIntAct project—IntAct as a common curation platform for 11 molecular interaction databases. Nucleic Acids Res. 42, D358–D363 (2014).
Licata, L. et al. MINT, the molecular interaction database: 2012 update. Nucleic Acids Res. 40, D857–D861 (2012).
Turner, B. et al. iRefWeb: interactive analysis of consolidated protein interaction data and their supporting evidence. Database 2010, baq023 (2010).
Keshava Prasad, T. S. et al. Human protein reference database—2009 update. Nucleic Acids Res. 37, D767–D772 (2009).
Mewes, H. W. et al. MIPS: curated databases and comprehensive secondary data resources in 2010. Nucleic Acids Res. 39, D220–D224 (2011).
Nelson, L. & Cox, M. Lehninger Principles of Biochemistry 7th edn (W.H. Freeman, 2017).
Rose, G. D., Geselowitz, A. R., Lesser, G. J., Lee, R. H. & Zehfus, M. H. Hydrophobicity of amino acid residues in globular proteins. Science 229, 834–838 (1985).
Aftabuddin, M. & Kundu, S. Hydrophobic, hydrophilic, and charged amino acid networks within protein. Biophys. J. 93, 225–231 (2007).
Tsai, C. J., Lin, S. L., Wolfson, H. J. & Nussinov, R. Studies of protein–protein interfaces: a statistical analysis of the hydrophobic effect. Protein Sci. 6, 53–64 (1997).
Ansari, S. & Helms, V. Statistical analysis of predominantly transient protein–protein interfaces. Proteins 61, 344–355 (2005).
Burley, S. K. et al. RCSB Protein Data Bank: celebrating 50 years of the PDB with new tools for understanding and visualizing biological macromolecules in 3D. Protein Sci. 31, 187–208 (2022).
Wei, X. et al. A massively parallel pipeline to clone DNA variants and examine molecular phenotypes of human disease mutations. PLoS Genet. 10, e1004819 (2014).
Xiong, D., Lee, D., Li, L., Zhao, Q. & Yu, H. Implications of disease-related mutations at protein–protein interfaces. Curr. Opin. Struct. Biol. 72, 219–225 (2022).
Stenson, P. D. et al. The Human Gene Mutation Database: towards a comprehensive repository of inherited mutation data for medical research, genetic diagnosis and next-generation sequencing studies. Hum. Genet. 136, 665–677 (2017).
Auton, A. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).
Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310–315 (2014).
Schymkowitz, J. et al. The FoldX web server: an online force field. Nucleic Acids Res. 33, W382–W388 (2005).
Zhou, Y. et al. A network medicine approach to investigation and population-based validation of disease manifestations and drug repurposing for COVID-19. PLoS Biol. 18, e3000970 (2020).
Plasilova, M. et al. Homozygous missense mutation in the lamin A/C gene causes autosomal recessive Hutchinson–Gilford progeria syndrome. J. Med. Genet. 41, 609–614 (2004).
Favretto, F. et al. The molecular basis of the interaction of cyclophilin A with α-synuclein. Angew. Chem. Int. Ed. 59, 5643–5646 (2020).
Liu, Q. et al. HIF2A germline–mutation-induced polycythemia in a patient with VHL-associated renal-cell carcinoma. Cancer Biol. Ther. 18, 944–947 (2017).
Tarade, D., Robinson, C. M., Lee, J. E. & Ohh, M. HIF-2α-pVHL complex reveals broad genotype-phenotype correlations in HIF-2α-driven disease. Nat. Commun. 9, 3359 (2018).
V, F. R. L. et al. Three novel EPAS1/HIF2A somatic and germline mutations associated with polycythemia and pheochromocytoma/paraganglioma. Blood 120, 2080 (2012).
Chang, K. et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 45, 1113–1120 (2013).
Nguyen, B. et al. Genomic characterization of metastatic patterns from prospective clinical sequencing of 25,000 patients. Cell 185, 563–575 (2022).
Rabara, D. et al. KRAS G13D sensitivity to neurofibromin-mediated GTP hydrolysis. Proc. Natl Acad. Sci. USA 116, 22122–22131 (2019).
Wang, Z. et al. The diverse roles of SPOP in prostate cancer and kidney cancer. Nat. Rev. Urol. 17, 339–350 (2020).
Song, Y. et al. The emerging role of SPOP protein in tumorigenesis and cancer therapy. Mol. Cancer 19, 2 (2020).
Xu, J. & Lin, D. I. Oncogenic c-terminal cyclin D1 (CCND1) mutations are enriched in endometrioid endometrial adenocarcinomas. PLoS ONE 13, e0199688 (2018).
Ryu, D. et al. Alterations in the transcriptional programs of myeloma cells and the microenvironment during extramedullary progression affect proliferation and immune evasion. Clin. Cancer Res. 26, 935–944 (2020).
Zhang, M. et al. CanProVar 2.0: an updated database of human cancer proteome variation. J. Proteome Res. 16, 421–432 (2017).
Mészáros, B., Kumar, M., Gibson, T. J., Uyar, B. & Dosztányi, Z. Degrons in cancer. Sci. Signal. 10, eaak9982 (2017).
Yang, Q., Zhao, J., Chen, D. & Wang, Y. E3 ubiquitin ligases: styles, structures and functions. Mol. Biomed. 2, 23 (2021).
Senft, D., Qi, J. & Ronai, Z. E. A. Ubiquitin ligases in oncogenic transformation and cancer therapy. Nat. Rev. Cancer 18, 69–88 (2018).
Han, Y., Lee, H., Park, J. C. & Yi, G. S. E3Net: a system for exploring E3-mediated regulatory networks of cellular functions. Mol. Cell. Proteomics 11, O111.014076 (2012).
Li, Z. et al. UbiNet 2.0: a verified, classified, annotated and updated database of E3 ubiquitin ligase–substrate interactions. Database 2021, baab010 (2021).
Mena, E. L. et al. Dimerization quality control ensures neuronal development and survival. Science 362, eaap8236 (2018).
Wang, Q. et al. Alterations of anaphase-promoting complex genes in human colon cancer cells. Oncogene 22, 1486–1490 (2003).
Yin, Q., Wyatt, C. J., Han, T., Smalley, K. S. M. & Wan, L. ITCH as a potential therapeutic target in human cancers. Semin. Cancer Biol. 67, 117–130 (2020).
Li, L. et al. CHIP mediates degradation of Smad proteins and potentially regulates Smad-induced transcription. Mol. Cell. Biol. 24, 856–864 (2004).
Tsai, W.-W. et al. TRIM24 links a non-canonical histone signature to breast cancer. Nature 468, 927–932 (2010).
Lv, D. et al. TRIM24 is an oncogenic transcriptional co-activator of STAT3 in glioblastoma. Nat. Commun. 8, 1454 (2017).
Cuadrado, A. et al. Therapeutic targeting of the NRF2 and KEAP1 partnership in chronic diseases. Nat. Rev. Drug Discov. 18, 295–317 (2019).
Furukawa, M. & Xiong, Y. BTB protein Keap1 targets antioxidant transcription factor Nrf2 for ubiquitination by the Cullin 3-Roc1 ligase. Mol. Cell. Biol. 25, 162–171 (2005).
Fukutomi, T., Takagi, K., Mizushima, T., Ohuchi, N. & Yamamoto, M. Kinetic, thermodynamic, and structural characterizations of the association between Nrf2-DLGex Degron and Keap1. Mol. Cell. Biol. 34, 832–846 (2014).
Gao, H. et al. High-throughput screening using patient-derived tumor xenografts to predict clinical trial drug response. Nat. Med. 21, 1318–1325 (2015).
Vasaikar, S. et al. Proteogenomic analysis of human colon cancer reveals new therapeutic opportunities. Cell 177, 1035–1049.e19 (2019).
Abi-Habib, R. J. et al. BRAF status and mitogen-activated protein/extracellular signal-regulated kinase kinase 1/2 activity indicate sensitivity of melanoma cells to anthrax lethal toxin. Mol. Cancer Ther. 4, 1303–1310 (2005).
Roberts, P. J. & Der, C. J. Targeting the Raf-MEK-ERK mitogen-activated protein kinase cascade for the treatment of cancer. Oncogene 26, 3291–3310 (2007).
Endres, N. F. et al. Conformational coupling across the plasma membrane in activation of the EGF receptor. Cell 152, 543–556 (2013).
Lu, C. F. et al. Structural evidence for loose linkage between ligand binding and kinase activation in the epidermal growth factor receptor. Mol. Cell. Biol. 30, 5432–5443 (2010).
Liang, S. I. et al. Phosphorylated EGFR dimers are not sufficient to activate ras. Cell Rep. 22, 2593–2600 (2018).
Bishayee, A., Beguinot, L. & Bishayee, S. Phosphorylation of tyrosine 992, 1068, and 1086 is required for conformational change of the human epidermal growth factor receptor C-terminal tail. Mol. Biol. Cell. 10, 525–536 (1999).
Siegelin, M. D. & Borczuk, A. C. Epidermal growth factor receptor mutations in lung adenocarcinoma. Lab Invest. 94, 129–137 (2014).
Hillig, R. C. et al. Discovery of potent SOS1 inhibitors that block RAS activation via disruption of the RAS–SOS1 interaction. Proc. Natl Acad. Sci. USA 116, 2551–2560 (2019).
You, X. et al. Unique dependence on Sos1 in KrasG12D-induced leukemogenesis. Blood 132, 2575–2579 (2018).
Hofmann, M. H. et al. Trial in process: phase 1 studies of BI 1701963, a SOS1::KRAS inhibitor, in combination with MEK inhibitors, irreversible KRASG12C inhibitors or irinotecan. Cancer Res. 81, CT210 (2021).
Huijberts, S. C. F. A. et al. Phase I study of lapatinib plus trametinib in patients with KRAS-mutant colorectal, non-small cell lung, and pancreatic cancer. Cancer Chemother. Pharmacol. 85, 917–930 (2020).
Cho, M. et al. A phase I clinical trial of binimetinib in combination with FOLFOX in patients with advanced metastatic colorectal cancer who failed prior standard therapy. Oncotarget 8, 79750–79760 (2017).
Hofmann, M. H. et al. BI-3406, a potent and selective SOS1–KRAS interaction inhibitor, is effective in KRAS-driven cancers through combined MEK inhibition. Cancer Discov. 11, 142–157 (2021).
Liu, F., Yang, X., Geng, M. & Huang, M. Targeting ERK, an Achilles’ Heel of the MAPK pathway, in cancer therapy. Acta Pharm. Sin. B 8, 552–562 (2018).
Tran, T. H. et al. KRAS interaction with RAF1 RAS-binding domain and cysteine-rich domain provides insights into RAS-mediated RAF activation. Nat. Commun. 12, 1176 (2021).
Patelli, G. et al. Strategies to tackle RAS-mutated metastatic colorectal cancer. ESMO Open 6, 100156 (2021).
Li, Z.-N., Zhao, L., Yu, L.-F. & Wei, M.-J. BRAF and KRAS mutations in metastatic colorectal cancer: future perspectives for personalized therapy. Gastroenterol. Rep. 8, 192–205 (2020).
Corcoran, R. B. et al. Combined BRAF, EGFR, and MEK inhibition in patients with BRAFV600E-mutant colorectal cancer. Cancer Discov. 8, 428–443 (2018).
Lin, Q. et al. The association between BRAF mutation class and clinical features in BRAF-mutant Chinese non-small cell lung cancer patients. J. Transl. Med. 17, 298 (2019).
Caunt, C. J., Sale, M. J., Smith, P. D. & Cook, S. J. MEK1 and MEK2 inhibitors and cancer therapy: the long and winding road. Nat. Rev. Cancer 15, 577–592 (2015).
Huang, K. L. et al. Regulated phosphosignaling associated with breast cancer subtypes and druggability. Mol. Cell. Proteomics 18, 1630–1650 (2019).
Landrum, M. J. et al. ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res. 44, D862–D868 (2016).
Luck, K. et al. A reference map of the human binary protein interactome. Nature 580, 402–408 (2020).
Huttlin, E. L. et al. The BioPlex network: a systematic exploration of the human interactome. Cell 162, 425–440 (2015).
Cho, N. H. et al. OpenCell: endogenous tagging for the cartography of human cellular organization. Science 375, eabi6983 (2022).
Petrey, D., Zhao, H., Trudeau, S. J., Murray, D. & Honig, B. PrePPI: a structure informed proteome-wide database of protein–protein interactions. J. Mol. Biol. 435, 168052 (2023).
Gao, Z. et al. Hierarchical graph learning for protein–protein interaction. Nat. Commun. 14, 1093 (2023).
Pieper, U. et al. ModBase, a database of annotated comparative protein structure models and associated resources. Nucleic Acids Res. 42, D336–D346 (2014).
Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023).
Su, J. et al. SaProt: protein language modeling with structure-aware vocabulary. The Twelfth International Conference on Learning Representations. https://openreview.net/pdf?id=6MRm3G4NiU (ICLR, 2023).
Taliun, D. et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature 590, 290–299 (2021).
Gary, W. B. et al. The Alzheimer’s disease sequencing project: study design and sample selection. Neurol. Genet. 3, e194 (2017).
Mosca, R., Céol, A. & Aloy, P. Interactome3D: adding structural details to protein networks. Nat. Methods 10, 47–53 (2013).
Velankar, S. et al. SIFTS: structure integration with function, taxonomy and sequences resource. Nucleic Acids Res. 41, D483–D489 (2013).
Lee, B. & Richards, F. M. The interpretation of protein structures: estimation of static accessibility. J. Mol. Biol. 55, 379–400 (1971).
Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).
Sievers, F. & Higgins, D. G. Clustal Omega for making accurate alignments of many protein sequences. Protein Sci. 27, 135–145 (2018).
Fu, L., Niu, B., Zhu, Z., Wu, S. & Li, W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28, 3150–3152 (2012).
Pierce, B. G. et al. ZDOCK server: interactive docking prediction of protein-protein complexes and symmetric multimers. Bioinformatics 30, 1771–1773 (2014).
Scardapane, S., Van Vaerenbergh, S., Totaro, S. & Uncini, A. Kafnets: kernel-based non-parametric activation functions for neural networks. Neural Netw. 110, 19–32 (2019).
Li, Y., Golding, G. B. & Ilie, L. DELPHI: accurate deep ensemble model for protein interaction sites prediction. Bioinformatics 37, 896–904 (2021).
Zhang, J. & Kurgan, L. SCRIBER: accurate and partner type-specific prediction of protein-binding residues from proteins sequences. Bioinformatics 35, i343–i353 (2019).
Zhang, B., Li, J., Quan, L., Chen, Y. & Lü, Q. Sequence-based prediction of protein–protein interaction sites by simplified long short-term memory network. Neurocomputing 357, 86–100 (2019).
Fu, W. et al. Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants. Nature 493, 216–220 (2013).
Walhout, A. J. M. & Vidal, M. High-throughput yeast two-hybrid assays for large-scale protein interaction mapping. Methods 24, 297–306 (2001).
McLaren, W. et al. The Ensembl Variant Effect Predictor. Genome Biol. 17, 122 (2016).
Dominguez, C., Boelens, R. & Bonvin, A. M. J. J. HADDOCK: a protein−protein docking approach based on biochemical or biophysical information. J. Am. Chem. Soc. 125, 1731–1737 (2003).
Schneider, C. A., Rasband, W. S. & Eliceiri, K. W. NIH Image to ImageJ: 25 years of image analysis. Nat. Methods 9, 671–675 (2012).
Wu, E. L. et al. CHARMM-GUI Membrane Builder toward realistic biological membrane simulations. J. Comput. Chem. 35, 1997–2004 (2014).
Xiong, D., Lee, D. & Liang, S. GitHub code repository for PIONEER. https://github.com/hyulab/PIONEER (2024).
Acknowledgements
This work was supported by the National Institute of General Medical Sciences (R01GM124559, R01GM125639, R01GM130885 and RM1GM139738) and the National Institute of Diabetes and Digestive and Kidney Diseases (R01DK115398) to H.Y.; the National Institute on Aging (R01AG084250, R56AG074001, U01AG073323, R01AG066707, R01AG076448, R01AG082118, RF1AG082211 and R21AG083003) and the National Institute of Neurological Disorders and Stroke (RF1NS133812) to F.C.; and the National Human Genome Research Institute (U01HG007691), the National Heart, Lung, and Blood Institute (R01HL155107, R01HL155096, R01HL166137 and U54HL119145), the American Heart Association (AHA957729 and 24MERIT1185447) and European Union Horizon Health 2021 (101057619) to J.L. C.E. is the Sondra J. and Stephen R. Hardis Chair of Cancer Genomic Medicine at the Cleveland Clinic. This work partially used Jetstream2 at Indiana University through allocation BIO220060 from the Advanced Cyberinfrastructure Coordination Ecosystem: Services and Support program, which is supported by the National Science Foundation (2138259, 2138286, 2138307, 2137603 and 2138296).
Author information
Authors and Affiliations
Contributions
D.X., Y.Q., J.Z., Y.Z., D.L., F.C. and H.Y. conceived and developed the project. Under close supervision of H.Y., D.X., D.L. and S.L. developed the models and conducted computational experiments; and D.X., S.G., M.T. and S.L. built the web server. W.L. and J.K. conducted biological experiments. D.X., Y.Q., J.Z., Y.Z., D.L., F.C. and H.Y. performed the analyses. D.X., Y.Q., J.Z., Y.Z., D.L., S.G., C.E., J.L., F.C. and H.Y. wrote and critically revised the manuscript. All authors discussed the results and reviewed the manuscript.
Corresponding authors
Ethics declarations
Competing interests
J.L. is co-scientific founder of Scipher Medicine, Inc., which applies network medicine strategies to biomarker development and personalized drug selection. The remaining authors declare no competing interests.
Peer review
Peer review information
Nature Biotechnology thanks Leng Han and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 PIONEER provides high-quality interfaces for the whole proteome.
a, Workflow for compiling interactome PIONEER. The interfaces calculated from experimentally determined co-crystal structures or homology models are primarily used, the remaining unresolved interactions are predicted by PIONEER. b, Percentage of CAPRI decoys having a given average PIONEER prediction score at interfaces. Percentages are plotted along the y axis for 4 classes of CAPRI models. The total number of models in each class is indicated in the text in the figure. c, Fraction of interactions disrupted by random population variants in PIONEER-predicted and known interfaces. The error bar denotes standard error for the binomial distribution. Significance was determined by two-sided z-test. The n numbers are shown in Supplementary Table 6. d, Enrichment of disease-associated mutations in PIONEER-predicted and known interfaces. The error bar denotes standard error for the log odds ratio. Significance was determined by two-sided z-test. The n numbers are shown in Supplementary Table 7. e, Enrichment of population variants in PIONEER-predicted and known interfaces. The error bar denotes standard error for the log odds ratio. The n numbers are shown in Supplementary Table 8.
Extended Data Fig. 2 Pharmacogenomic landscape identified by the PIONEER-predicted interactome network.
a, Drug responses evaluated by oncoPPIs in the PDX models. Effect size was quantified by Cohen’s d statistic using the difference between two means divided by a pooled s.d. for the data. Significance was determined by ANOVA adjusted by Benjamini-Hochberg method. b, Circos plot displaying drug responses evaluated by putative PIONEER-predicted oncoPPIs harboring a statistically significant excess number of missense mutations at PPI interfaces, following a binomial distribution across selected anti-cancer therapeutic agents in cancer cell lines. Each node denotes a specific oncoPPI. Node size denotes significance determined by ANOVA. Effect size was quantified by Cohen’s d statistic using the difference between two means divided by a pooled s.d. for the data. Node color denotes three different types of PPIs: (1) PDB: Red; (2) HM: Blue; and (3) PIONEER: Green. ‘HM’ represents homolog models. c, Highlighted examples of drug responses. Data are represented as a box plot with an underlaid violin plot in which the middle line is the median, the lower and upper edges of the box are the first and third quartiles, the whiskers represent IQR × 1.5, and the dots are outlier points. Significance was determined by ANOVA. The n numbers are shown in Supplementary Table 16.
Extended Data Fig. 3 Proteogenomics of the PIONEER-predicted interactome network.
a, Phosphorylation-associated PPI-perturbing mutations altered the proteomic changes in COAD and UCEC. The abundance of proteins was quantified using the TMT technique. Data are represented as a box plot with an underlaid violin plot in which the middle line is the median, the lower and upper edges of the box are the first and third quartiles, the whiskers represent IQR × 1.5, and the dots are outlier points. Significance was determined by two-tailed Wilcoxon rank-sum test. The n numbers are shown in Supplementary Table 17. b, Phosphorylation-associated PPI-perturbing mutations in the EGFR–RAS–RAF–MEK–ERK cascade signaling pathway. The whole transmembrane EGFR structures were constructed by three crystal structures (PDB: 3NJP, 2M20, 2GS6). The membrane model is shown in green. The phosphorylation sites are indicated by the symbol 'P'. The detailed interface structure of SOS1–KRAS is also shown in the inset. The key mutated residue Gln61 on KRAS forms a hydrogen bond (purple dashed line) with residue Thr935 on SOS1, and Tyr884 on SOS1 is involved in a cation-π interaction (red dash line) with residue Arg73 on KRAS. Two subunits of RAF protein structure models were built by RAF1 and BRAF, separately (PDB: 6VJJ and 6Q0J). The two subunits are connected by a disordered loop indicated by blue cartoon lines. Two heterodimers of KRAS–RAF1 and BRAF–MEK1 constitutes the KRAS–RAF–MEK1 complex. PDB ID of each complex structure model is provided. c, Highlighted examples of drug responses. Data are represented as a box plot with an underlaid violin plot in which the middle line is the median, the lower and upper edges of the box are the first and third quartiles, the whiskers represent IQR × 1.5, and the dots are outlier points. Significance was determined by ANOVA. The n numbers are shown in Supplementary Table 16.
Supplementary information
Supplementary Information
Supplementary Methods and Supplementary Figs. 1–14.
Supplementary Table 1
Supplementary Tables 1–18.
Supplementary Data 1
The labeled dataset used for PIONEER model training, validation and testing.
Supplementary Data 2
Somatic mutations in 33 cancer types of TCGA.
Supplementary Data 3
Significance test of somatic mutation enrichment in PPI interfaces by 33 cancer types in TCGA.
Source data
Source Data Fig. 4
Unprocessed western blots for Fig. 4d.
Source Data Fig. 5
Unprocessed western blots for Fig. 5d.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Xiong, D., Qiu, Y., Zhao, J. et al. A structurally informed human protein–protein interactome reveals proteome-wide perturbations caused by disease mutations. Nat Biotechnol 43, 1510–1524 (2025). https://doi.org/10.1038/s41587-024-02428-4
Received:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/s41587-024-02428-4
This article is cited by
-
PyPropel: a Python-based tool for efficiently processing and characterising protein data
BMC Bioinformatics (2025)
-
Conserved missense variant pathogenicity and correlated phenotypes across paralogous genes
Genome Biology (2025)
-
Decoding the functional impact of the cancer genome through protein–protein interactions
Nature Reviews Cancer (2025)
-
Credible inferences in microbiome research: ensuring rigour, reproducibility and relevance in the era of AI
Nature Reviews Gastroenterology & Hepatology (2025)