A structurally informed human protein–protein interactome reveals proteome-wide perturbations caused by disease mutations

Xiong, Dapeng; Qiu, Yunguang; Zhao, Junfei; Zhou, Yadi; Lee, Dongjin; Gupta, Shobhita; Torres, Mateo; Lu, Weiqiang; Liang, Siqi; Kang, Jin Joo; Eng, Charis; Loscalzo, Joseph; Cheng, Feixiong; Yu, Haiyuan

doi:10.1038/s41587-024-02428-4

Article
Published: 24 October 2024

A structurally informed human protein–protein interactome reveals proteome-wide perturbations caused by disease mutations

Nature Biotechnology volume 43, pages 1510–1524 (2025)Cite this article

18k Accesses
30 Citations
137 Altmetric
Metrics details

Subjects

Abstract

To assist the translation of genetic findings to disease pathobiology and therapeutics discovery, we present an ensemble deep learning framework, termed PIONEER (Protein–protein InteractiOn iNtErfacE pRediction), that predicts protein-binding partner-specific interfaces for all known protein interactions in humans and seven other common model organisms to generate comprehensive structurally informed protein interactomes. We demonstrate that PIONEER outperforms existing state-of-the-art methods and experimentally validate its predictions. We show that disease-associated mutations are enriched in PIONEER-predicted protein–protein interfaces and explore their impact on disease prognosis and drug responses. We identify 586 significant protein–protein interactions (PPIs) enriched with PIONEER-predicted interface somatic mutations (termed oncoPPIs) from analysis of approximately 11,000 whole exomes across 33 cancer types and show significant associations of oncoPPIs with patient survival and drug responses. PIONEER, implemented as both a web server platform and a software package, identifies functional consequences of disease-associated alleles and offers a deep learning tool for precision medicine at multiscale interactome network levels.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to the full article PDF.

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Overview of the PIONEER framework.**

**Fig. 2: PIONEER-predicted PPI alleles are enriched in disease-associated mutations.**

**Fig. 3: A landscape of oncoPPIs identified by PIONEER across 33 cancer types (~11,000 cancer genomes).**

**Fig. 4: PIONEER-predicted oncoPPIs are associated with patient survival.**

**Fig. 5: PIONEER-predicted PPI-perturbing tumor alleles in ubiquitination by E3 ligases.**

Graph masked self-distillation learning for prediction of mutation impact on protein–protein interactions

Article Open access 26 October 2024

Towards a structurally resolved human protein interaction network

Article Open access 23 January 2023

Decoding the functional impact of the cancer genome through protein–protein interactions

Article 14 January 2025

Data availability

Mutation data from the TCGA study were downloaded from the National Cancer Instituteʼs Genomic Data Commons (https://portal.gdc.cancer.gov). The MSK MetTropism dataset was downloaded from the cBioPortal (https://www.cbioportal.org/study/summary?id=msk_met_2021). Variant data from the 1000 Genomes Project were downloaded from the National Center for Biotechnology Informationʼs FTP site (https://ftp-trace.ncbi.nih.gov/1000genomes/ftp). The ExAC dataset was downloaded from the Genome Aggregation Database (https://gnomad.broadinstitute.org/downloads#exac-variants). Variants collected by the HGMD were downloaded from https://www.hgmd.cf.ac.uk/ac/index.php. Genomic variants and drug response data of human cancer cell lines were downloaded from GDSC datasets (https://www.cancerrxgene.org/downloads/bulk_download). Genomic profiling of PDXs and drug response curve metrics of PDX clinical trials were downloaded from Supplementary Table 1 of the corresponding paper (https://www.nature.com/articles/nm.3954#Sec28). The homologous structures of PPIs that do not have co-crystal structures were collected from Interactome3D (https://interactome3d.irbbarcelona.org). The ModBase data were downloaded from https://modbase.compbio.ucsf.edu. The PDB data were downloaded from the PDB FTP site (https://files.wwpdb.org/pub/pdb/data/structures/divided/pdb). The AlphaFold2-predicted protein structures were download from the AlphaFold2 database (https://alphafold.ebi.ac.uk). All other data supporting the results in this study are available in the supplementary materials and at https://pioneer.yulab.org. Source data are provided with this paper.

Code availability

The source code of PIONEER is available at GitHub¹²⁶.

References

Nussinov, R., Jang, H., Nir, G., Tsai, C. J. & Cheng, F. Open structural data in precision medicine. Annu. Rev. Biomed. Data Sci. 5, 95–117 (2022).
Article PubMed Google Scholar
Braberg, H., Echeverria, I., Kaake, R. M., Sali, A. & Krogan, N. J. From systems to structure—using genetic data to model protein structures. Nat. Rev. Genet. 23, 342–354 (2022).
Article PubMed PubMed Central CAS Google Scholar
Vidal, M., Cusick, M. E. & Barabási, A.-L. Interactome networks and human disease. Cell 144, 986–998 (2011).
Article PubMed PubMed Central CAS Google Scholar
Barabási, A.-L., Gulbahce, N. & Loscalzo, J. Network medicine: a network-based approach to human disease. Nat. Rev. Genet. 12, 56–68 (2011).
Article PubMed PubMed Central Google Scholar
Meyer, M. J. et al. Interactome INSIDER: a structural interactome browser for genomic studies. Nat. Methods 15, 107–114 (2018).
Article PubMed PubMed Central CAS Google Scholar
Wang, X. et al. Three-dimensional reconstruction of protein networks provides insight into human genetic disease. Nat. Biotechnol. 30, 159–164 (2012).
Article PubMed PubMed Central CAS Google Scholar
Cheng, F. et al. Comprehensive characterization of protein–protein interactions perturbed by disease mutations. Nat. Genet. 53, 342–353 (2021).
Article PubMed PubMed Central CAS Google Scholar
Sahni, N. et al. Widespread macromolecular interaction perturbations in human genetic disorders. Cell 161, 647–660 (2015).
Article PubMed PubMed Central CAS Google Scholar
Wierbowski, S. D. et al. A 3D structural SARS-CoV-2-human interactome to explore genetic and drug perturbations. Nat. Methods 18, 1477–1488 (2021).
Article PubMed PubMed Central CAS Google Scholar
Evans, R. et al. Protein complex prediction with AlphaFold-Multimer. Preprint at bioRxiv https://doi.org/10.1101/2021.10.04.463034 (2022).
Gao, M., Nakajima An, D., Parks, J. M. & Skolnick, J. AF2Complex predicts direct physical interactions in multimeric proteins with deep learning. Nat. Commun. 13, 1744 (2022).
Article PubMed PubMed Central CAS Google Scholar
Burke, D. F. et al. Towards a structurally resolved human protein interaction network. Nat. Struct. Mol. Biol. 30, 216–225 (2023).
Article PubMed PubMed Central CAS Google Scholar
Tunyasuvunakool, K. et al. Highly accurate protein structure prediction for the human proteome. Nature 596, 590–596 (2021).
Article PubMed PubMed Central CAS Google Scholar
Bianchi, F. M., Grattarola, D., Livi, L. & Alippi, C. Graph neural networks with convolutional ARMA filters. IEEE Trans. Pattern Anal. Mach. Intell. 44, 3496–3507 (2022).
PubMed Google Scholar
Cho, K. et al. Learning phrase representations using rnn encoder–decoder for statistical machine translation. In Proc. of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) 1724–1734 (Association for Computational Linguistics, 2014).
Zhuang, F. et al. A comprehensive survey on transfer learning. Proc. of the IEEE 109, 43–76 (2021).
Article Google Scholar
Krapp, L. F., Abriata, L. A., Cortes Rodriguez, F. & Dal Peraro, M. PeSTo: parameter-free geometric deep learning for accurate prediction of protein binding interfaces. Nat. Commun. 14, 2175 (2023).
Article PubMed PubMed Central CAS Google Scholar
Tubiana, J., Schneidman-Duhovny, D. & Wolfson, H. J. ScanNet: an interpretable geometric deep learning model for structure-based protein binding site prediction. Nat. Methods 19, 730–739 (2022).
Article PubMed CAS Google Scholar
Gainza, P. et al. Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning. Nat. Methods 17, 184–192 (2020).
Article PubMed CAS Google Scholar
Sanchez-Garcia, R., Macias, J. R., Sorzano, C. O. S., Carazo, J. M. & Segura, J. BIPSPI+: mining type-specific datasets of protein complexes to improve protein binding site prediction. J. Mol. Biol. 434, 167556 (2022).
Article PubMed CAS Google Scholar
Zeng, M. et al. Protein–protein interaction site prediction through combining local and global features with deep neural networks. Bioinformatics 36, 1114–1120 (2020).
Article PubMed CAS Google Scholar
Townshend, R. J. L., Bedi, R., Suriana, P. A. & Dror, R. O. End-to-end learning on 3D protein structure for interface prediction. 33rd Conference on Neural Information Processing Systems. https://proceedings.neurips.cc/paper_files/paper/2019/file/6c7de1f27f7de61a6daddfffbe05c058-Paper.pdf(NeurIPS, 2019).
Fout, A., Byrd, J., Shariat, B. & Ben-Hur, A. Protein interface prediction using graph convolutional networks. Advances in Neural Information Processing Systems 30. https://papers.nips.cc/paper_files/paper/2017/file/f507783927f2ec2737ba40afbd17efb5-Paper.pdf (NIPS, 2017).
Lensink, M. F. & Wodak, S. J. Score_set: a CAPRI benchmark for scoring protein complexes. Proteins 82, 3163–3169 (2014).
Article PubMed CAS Google Scholar
UniProt Consortium. UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 47, D506–D515 (2019).
Article Google Scholar
Das, J. & Yu, H. HINT: high-quality protein interactomes and their applications in understanding human disease. BMC Syst. Biol. 6, 92 (2012).
Article PubMed PubMed Central Google Scholar
Oughtred, R. et al. The BioGRID interaction database: 2019 update. Nucleic Acids Res. 47, D529–D541 (2019).
Article PubMed CAS Google Scholar
Salwinski, L. et al. The database of interacting proteins: 2004 update. Nucleic Acids Res. 32, D449–D451 (2004).
Article PubMed PubMed Central CAS Google Scholar
Orchard, S. et al. The MIntAct project—IntAct as a common curation platform for 11 molecular interaction databases. Nucleic Acids Res. 42, D358–D363 (2014).
Article PubMed CAS Google Scholar
Licata, L. et al. MINT, the molecular interaction database: 2012 update. Nucleic Acids Res. 40, D857–D861 (2012).
Article PubMed CAS Google Scholar
Turner, B. et al. iRefWeb: interactive analysis of consolidated protein interaction data and their supporting evidence. Database 2010, baq023 (2010).
Article PubMed PubMed Central Google Scholar
Keshava Prasad, T. S. et al. Human protein reference database—2009 update. Nucleic Acids Res. 37, D767–D772 (2009).
Article PubMed CAS Google Scholar
Mewes, H. W. et al. MIPS: curated databases and comprehensive secondary data resources in 2010. Nucleic Acids Res. 39, D220–D224 (2011).
Article PubMed CAS Google Scholar
Nelson, L. & Cox, M. Lehninger Principles of Biochemistry 7th edn (W.H. Freeman, 2017).
Rose, G. D., Geselowitz, A. R., Lesser, G. J., Lee, R. H. & Zehfus, M. H. Hydrophobicity of amino acid residues in globular proteins. Science 229, 834–838 (1985).
Article PubMed CAS Google Scholar
Aftabuddin, M. & Kundu, S. Hydrophobic, hydrophilic, and charged amino acid networks within protein. Biophys. J. 93, 225–231 (2007).
Article PubMed CAS Google Scholar
Tsai, C. J., Lin, S. L., Wolfson, H. J. & Nussinov, R. Studies of protein–protein interfaces: a statistical analysis of the hydrophobic effect. Protein Sci. 6, 53–64 (1997).
Article PubMed PubMed Central CAS Google Scholar
Ansari, S. & Helms, V. Statistical analysis of predominantly transient protein–protein interfaces. Proteins 61, 344–355 (2005).
Article PubMed CAS Google Scholar
Burley, S. K. et al. RCSB Protein Data Bank: celebrating 50 years of the PDB with new tools for understanding and visualizing biological macromolecules in 3D. Protein Sci. 31, 187–208 (2022).
Article PubMed CAS Google Scholar
Wei, X. et al. A massively parallel pipeline to clone DNA variants and examine molecular phenotypes of human disease mutations. PLoS Genet. 10, e1004819 (2014).
Article PubMed PubMed Central Google Scholar
Xiong, D., Lee, D., Li, L., Zhao, Q. & Yu, H. Implications of disease-related mutations at protein–protein interfaces. Curr. Opin. Struct. Biol. 72, 219–225 (2022).
Article PubMed CAS Google Scholar
Stenson, P. D. et al. The Human Gene Mutation Database: towards a comprehensive repository of inherited mutation data for medical research, genetic diagnosis and next-generation sequencing studies. Hum. Genet. 136, 665–677 (2017).
Article PubMed PubMed Central CAS Google Scholar
Auton, A. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Article PubMed Google Scholar
Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).
Article PubMed PubMed Central CAS Google Scholar
Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310–315 (2014).
Article PubMed PubMed Central CAS Google Scholar
Schymkowitz, J. et al. The FoldX web server: an online force field. Nucleic Acids Res. 33, W382–W388 (2005).
Article PubMed PubMed Central CAS Google Scholar
Zhou, Y. et al. A network medicine approach to investigation and population-based validation of disease manifestations and drug repurposing for COVID-19. PLoS Biol. 18, e3000970 (2020).
Article PubMed PubMed Central CAS Google Scholar
Plasilova, M. et al. Homozygous missense mutation in the lamin A/C gene causes autosomal recessive Hutchinson–Gilford progeria syndrome. J. Med. Genet. 41, 609–614 (2004).
Article PubMed PubMed Central CAS Google Scholar
Favretto, F. et al. The molecular basis of the interaction of cyclophilin A with α-synuclein. Angew. Chem. Int. Ed. 59, 5643–5646 (2020).
Article CAS Google Scholar
Liu, Q. et al. HIF2A germline–mutation-induced polycythemia in a patient with VHL-associated renal-cell carcinoma. Cancer Biol. Ther. 18, 944–947 (2017).
Article PubMed PubMed Central CAS Google Scholar
Tarade, D., Robinson, C. M., Lee, J. E. & Ohh, M. HIF-2α-pVHL complex reveals broad genotype-phenotype correlations in HIF-2α-driven disease. Nat. Commun. 9, 3359 (2018).
Article PubMed PubMed Central Google Scholar
V, F. R. L. et al. Three novel EPAS1/HIF2A somatic and germline mutations associated with polycythemia and pheochromocytoma/paraganglioma. Blood 120, 2080 (2012).
Article Google Scholar
Chang, K. et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 45, 1113–1120 (2013).
Article CAS Google Scholar
Nguyen, B. et al. Genomic characterization of metastatic patterns from prospective clinical sequencing of 25,000 patients. Cell 185, 563–575 (2022).
Article PubMed PubMed Central CAS Google Scholar
Rabara, D. et al. KRAS G13D sensitivity to neurofibromin-mediated GTP hydrolysis. Proc. Natl Acad. Sci. USA 116, 22122–22131 (2019).
Article PubMed PubMed Central CAS Google Scholar
Wang, Z. et al. The diverse roles of SPOP in prostate cancer and kidney cancer. Nat. Rev. Urol. 17, 339–350 (2020).
Article PubMed CAS Google Scholar
Song, Y. et al. The emerging role of SPOP protein in tumorigenesis and cancer therapy. Mol. Cancer 19, 2 (2020).
Article PubMed PubMed Central CAS Google Scholar
Xu, J. & Lin, D. I. Oncogenic c-terminal cyclin D1 (CCND1) mutations are enriched in endometrioid endometrial adenocarcinomas. PLoS ONE 13, e0199688 (2018).
Article PubMed PubMed Central Google Scholar
Ryu, D. et al. Alterations in the transcriptional programs of myeloma cells and the microenvironment during extramedullary progression affect proliferation and immune evasion. Clin. Cancer Res. 26, 935–944 (2020).
Article PubMed CAS Google Scholar
Zhang, M. et al. CanProVar 2.0: an updated database of human cancer proteome variation. J. Proteome Res. 16, 421–432 (2017).
Article PubMed CAS Google Scholar
Mészáros, B., Kumar, M., Gibson, T. J., Uyar, B. & Dosztányi, Z. Degrons in cancer. Sci. Signal. 10, eaak9982 (2017).
Article PubMed Google Scholar
Yang, Q., Zhao, J., Chen, D. & Wang, Y. E3 ubiquitin ligases: styles, structures and functions. Mol. Biomed. 2, 23 (2021).
Article PubMed PubMed Central Google Scholar
Senft, D., Qi, J. & Ronai, Z. E. A. Ubiquitin ligases in oncogenic transformation and cancer therapy. Nat. Rev. Cancer 18, 69–88 (2018).
Article PubMed CAS Google Scholar
Han, Y., Lee, H., Park, J. C. & Yi, G. S. E3Net: a system for exploring E3-mediated regulatory networks of cellular functions. Mol. Cell. Proteomics 11, O111.014076 (2012).
Article PubMed Google Scholar
Li, Z. et al. UbiNet 2.0: a verified, classified, annotated and updated database of E3 ubiquitin ligase–substrate interactions. Database 2021, baab010 (2021).
Article PubMed PubMed Central Google Scholar
Mena, E. L. et al. Dimerization quality control ensures neuronal development and survival. Science 362, eaap8236 (2018).
Article PubMed Google Scholar
Wang, Q. et al. Alterations of anaphase-promoting complex genes in human colon cancer cells. Oncogene 22, 1486–1490 (2003).
Article PubMed CAS Google Scholar
Yin, Q., Wyatt, C. J., Han, T., Smalley, K. S. M. & Wan, L. ITCH as a potential therapeutic target in human cancers. Semin. Cancer Biol. 67, 117–130 (2020).
Article PubMed PubMed Central CAS Google Scholar
Li, L. et al. CHIP mediates degradation of Smad proteins and potentially regulates Smad-induced transcription. Mol. Cell. Biol. 24, 856–864 (2004).
Article PubMed PubMed Central CAS Google Scholar
Tsai, W.-W. et al. TRIM24 links a non-canonical histone signature to breast cancer. Nature 468, 927–932 (2010).
Article PubMed PubMed Central CAS Google Scholar
Lv, D. et al. TRIM24 is an oncogenic transcriptional co-activator of STAT3 in glioblastoma. Nat. Commun. 8, 1454 (2017).
Article PubMed PubMed Central Google Scholar
Cuadrado, A. et al. Therapeutic targeting of the NRF2 and KEAP1 partnership in chronic diseases. Nat. Rev. Drug Discov. 18, 295–317 (2019).
Article PubMed CAS Google Scholar
Furukawa, M. & Xiong, Y. BTB protein Keap1 targets antioxidant transcription factor Nrf2 for ubiquitination by the Cullin 3-Roc1 ligase. Mol. Cell. Biol. 25, 162–171 (2005).
Article PubMed PubMed Central CAS Google Scholar
Fukutomi, T., Takagi, K., Mizushima, T., Ohuchi, N. & Yamamoto, M. Kinetic, thermodynamic, and structural characterizations of the association between Nrf2-DLGex Degron and Keap1. Mol. Cell. Biol. 34, 832–846 (2014).
Article PubMed PubMed Central Google Scholar
Gao, H. et al. High-throughput screening using patient-derived tumor xenografts to predict clinical trial drug response. Nat. Med. 21, 1318–1325 (2015).
Article PubMed CAS Google Scholar
Vasaikar, S. et al. Proteogenomic analysis of human colon cancer reveals new therapeutic opportunities. Cell 177, 1035–1049.e19 (2019).
Article PubMed PubMed Central CAS Google Scholar
Abi-Habib, R. J. et al. BRAF status and mitogen-activated protein/extracellular signal-regulated kinase kinase 1/2 activity indicate sensitivity of melanoma cells to anthrax lethal toxin. Mol. Cancer Ther. 4, 1303–1310 (2005).
Article PubMed CAS Google Scholar
Roberts, P. J. & Der, C. J. Targeting the Raf-MEK-ERK mitogen-activated protein kinase cascade for the treatment of cancer. Oncogene 26, 3291–3310 (2007).
Article PubMed CAS Google Scholar
Endres, N. F. et al. Conformational coupling across the plasma membrane in activation of the EGF receptor. Cell 152, 543–556 (2013).
Article PubMed PubMed Central CAS Google Scholar
Lu, C. F. et al. Structural evidence for loose linkage between ligand binding and kinase activation in the epidermal growth factor receptor. Mol. Cell. Biol. 30, 5432–5443 (2010).
Article PubMed PubMed Central CAS Google Scholar
Liang, S. I. et al. Phosphorylated EGFR dimers are not sufficient to activate ras. Cell Rep. 22, 2593–2600 (2018).
Article PubMed PubMed Central CAS Google Scholar
Bishayee, A., Beguinot, L. & Bishayee, S. Phosphorylation of tyrosine 992, 1068, and 1086 is required for conformational change of the human epidermal growth factor receptor C-terminal tail. Mol. Biol. Cell. 10, 525–536 (1999).
Article PubMed PubMed Central CAS Google Scholar
Siegelin, M. D. & Borczuk, A. C. Epidermal growth factor receptor mutations in lung adenocarcinoma. Lab Invest. 94, 129–137 (2014).
Article PubMed CAS Google Scholar
Hillig, R. C. et al. Discovery of potent SOS1 inhibitors that block RAS activation via disruption of the RAS–SOS1 interaction. Proc. Natl Acad. Sci. USA 116, 2551–2560 (2019).
Article PubMed PubMed Central CAS Google Scholar
You, X. et al. Unique dependence on Sos1 in Kras^G12D-induced leukemogenesis. Blood 132, 2575–2579 (2018).
Article PubMed PubMed Central CAS Google Scholar
Hofmann, M. H. et al. Trial in process: phase 1 studies of BI 1701963, a SOS1::KRAS inhibitor, in combination with MEK inhibitors, irreversible KRASG12C inhibitors or irinotecan. Cancer Res. 81, CT210 (2021).
Article Google Scholar
Huijberts, S. C. F. A. et al. Phase I study of lapatinib plus trametinib in patients with KRAS-mutant colorectal, non-small cell lung, and pancreatic cancer. Cancer Chemother. Pharmacol. 85, 917–930 (2020).
Article PubMed CAS Google Scholar
Cho, M. et al. A phase I clinical trial of binimetinib in combination with FOLFOX in patients with advanced metastatic colorectal cancer who failed prior standard therapy. Oncotarget 8, 79750–79760 (2017).
Article PubMed PubMed Central Google Scholar
Hofmann, M. H. et al. BI-3406, a potent and selective SOS1–KRAS interaction inhibitor, is effective in KRAS-driven cancers through combined MEK inhibition. Cancer Discov. 11, 142–157 (2021).
Article PubMed CAS Google Scholar
Liu, F., Yang, X., Geng, M. & Huang, M. Targeting ERK, an Achilles’ Heel of the MAPK pathway, in cancer therapy. Acta Pharm. Sin. B 8, 552–562 (2018).
Article PubMed PubMed Central Google Scholar
Tran, T. H. et al. KRAS interaction with RAF1 RAS-binding domain and cysteine-rich domain provides insights into RAS-mediated RAF activation. Nat. Commun. 12, 1176 (2021).
Article PubMed PubMed Central CAS Google Scholar
Patelli, G. et al. Strategies to tackle RAS-mutated metastatic colorectal cancer. ESMO Open 6, 100156 (2021).
Article PubMed PubMed Central CAS Google Scholar
Li, Z.-N., Zhao, L., Yu, L.-F. & Wei, M.-J. BRAF and KRAS mutations in metastatic colorectal cancer: future perspectives for personalized therapy. Gastroenterol. Rep. 8, 192–205 (2020).
Article Google Scholar
Corcoran, R. B. et al. Combined BRAF, EGFR, and MEK inhibition in patients with BRAF^V600E-mutant colorectal cancer. Cancer Discov. 8, 428–443 (2018).
Article PubMed PubMed Central CAS Google Scholar
Lin, Q. et al. The association between BRAF mutation class and clinical features in BRAF-mutant Chinese non-small cell lung cancer patients. J. Transl. Med. 17, 298 (2019).
Article PubMed PubMed Central Google Scholar
Caunt, C. J., Sale, M. J., Smith, P. D. & Cook, S. J. MEK1 and MEK2 inhibitors and cancer therapy: the long and winding road. Nat. Rev. Cancer 15, 577–592 (2015).
Article PubMed CAS Google Scholar
Huang, K. L. et al. Regulated phosphosignaling associated with breast cancer subtypes and druggability. Mol. Cell. Proteomics 18, 1630–1650 (2019).
Article PubMed PubMed Central CAS Google Scholar
Landrum, M. J. et al. ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res. 44, D862–D868 (2016).
Article PubMed CAS Google Scholar
Luck, K. et al. A reference map of the human binary protein interactome. Nature 580, 402–408 (2020).
Article PubMed PubMed Central CAS Google Scholar
Huttlin, E. L. et al. The BioPlex network: a systematic exploration of the human interactome. Cell 162, 425–440 (2015).
Article PubMed PubMed Central CAS Google Scholar
Cho, N. H. et al. OpenCell: endogenous tagging for the cartography of human cellular organization. Science 375, eabi6983 (2022).
Article PubMed PubMed Central CAS Google Scholar
Petrey, D., Zhao, H., Trudeau, S. J., Murray, D. & Honig, B. PrePPI: a structure informed proteome-wide database of protein–protein interactions. J. Mol. Biol. 435, 168052 (2023).
Article PubMed PubMed Central CAS Google Scholar
Gao, Z. et al. Hierarchical graph learning for protein–protein interaction. Nat. Commun. 14, 1093 (2023).
Article PubMed PubMed Central CAS Google Scholar
Pieper, U. et al. ModBase, a database of annotated comparative protein structure models and associated resources. Nucleic Acids Res. 42, D336–D346 (2014).
Article PubMed CAS Google Scholar
Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023).
Article PubMed CAS Google Scholar
Su, J. et al. SaProt: protein language modeling with structure-aware vocabulary. The Twelfth International Conference on Learning Representations. https://openreview.net/pdf?id=6MRm3G4NiU (ICLR, 2023).
Taliun, D. et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature 590, 290–299 (2021).
Article PubMed PubMed Central CAS Google Scholar
Gary, W. B. et al. The Alzheimer’s disease sequencing project: study design and sample selection. Neurol. Genet. 3, e194 (2017).
Article Google Scholar
Mosca, R., Céol, A. & Aloy, P. Interactome3D: adding structural details to protein networks. Nat. Methods 10, 47–53 (2013).
Article PubMed CAS Google Scholar
Velankar, S. et al. SIFTS: structure integration with function, taxonomy and sequences resource. Nucleic Acids Res. 41, D483–D489 (2013).
Article PubMed CAS Google Scholar
Lee, B. & Richards, F. M. The interpretation of protein structures: estimation of static accessibility. J. Mol. Biol. 55, 379–400 (1971).
Article PubMed CAS Google Scholar
Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).
Article PubMed PubMed Central CAS Google Scholar
Sievers, F. & Higgins, D. G. Clustal Omega for making accurate alignments of many protein sequences. Protein Sci. 27, 135–145 (2018).
Article PubMed CAS Google Scholar
Fu, L., Niu, B., Zhu, Z., Wu, S. & Li, W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28, 3150–3152 (2012).
Article PubMed PubMed Central CAS Google Scholar
Pierce, B. G. et al. ZDOCK server: interactive docking prediction of protein-protein complexes and symmetric multimers. Bioinformatics 30, 1771–1773 (2014).
Article PubMed PubMed Central CAS Google Scholar
Scardapane, S., Van Vaerenbergh, S., Totaro, S. & Uncini, A. Kafnets: kernel-based non-parametric activation functions for neural networks. Neural Netw. 110, 19–32 (2019).
Article PubMed Google Scholar
Li, Y., Golding, G. B. & Ilie, L. DELPHI: accurate deep ensemble model for protein interaction sites prediction. Bioinformatics 37, 896–904 (2021).
Article PubMed CAS Google Scholar
Zhang, J. & Kurgan, L. SCRIBER: accurate and partner type-specific prediction of protein-binding residues from proteins sequences. Bioinformatics 35, i343–i353 (2019).
Article PubMed PubMed Central CAS Google Scholar
Zhang, B., Li, J., Quan, L., Chen, Y. & Lü, Q. Sequence-based prediction of protein–protein interaction sites by simplified long short-term memory network. Neurocomputing 357, 86–100 (2019).
Article Google Scholar
Fu, W. et al. Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants. Nature 493, 216–220 (2013).
Article PubMed CAS Google Scholar
Walhout, A. J. M. & Vidal, M. High-throughput yeast two-hybrid assays for large-scale protein interaction mapping. Methods 24, 297–306 (2001).
Article PubMed CAS Google Scholar
McLaren, W. et al. The Ensembl Variant Effect Predictor. Genome Biol. 17, 122 (2016).
Article PubMed PubMed Central Google Scholar
Dominguez, C., Boelens, R. & Bonvin, A. M. J. J. HADDOCK: a protein−protein docking approach based on biochemical or biophysical information. J. Am. Chem. Soc. 125, 1731–1737 (2003).
Article PubMed CAS Google Scholar
Schneider, C. A., Rasband, W. S. & Eliceiri, K. W. NIH Image to ImageJ: 25 years of image analysis. Nat. Methods 9, 671–675 (2012).
Article PubMed PubMed Central CAS Google Scholar
Wu, E. L. et al. CHARMM-GUI Membrane Builder toward realistic biological membrane simulations. J. Comput. Chem. 35, 1997–2004 (2014).
Article PubMed PubMed Central CAS Google Scholar
Xiong, D., Lee, D. & Liang, S. GitHub code repository for PIONEER. https://github.com/hyulab/PIONEER (2024).

Download references

Acknowledgements

This work was supported by the National Institute of General Medical Sciences (R01GM124559, R01GM125639, R01GM130885 and RM1GM139738) and the National Institute of Diabetes and Digestive and Kidney Diseases (R01DK115398) to H.Y.; the National Institute on Aging (R01AG084250, R56AG074001, U01AG073323, R01AG066707, R01AG076448, R01AG082118, RF1AG082211 and R21AG083003) and the National Institute of Neurological Disorders and Stroke (RF1NS133812) to F.C.; and the National Human Genome Research Institute (U01HG007691), the National Heart, Lung, and Blood Institute (R01HL155107, R01HL155096, R01HL166137 and U54HL119145), the American Heart Association (AHA957729 and 24MERIT1185447) and European Union Horizon Health 2021 (101057619) to J.L. C.E. is the Sondra J. and Stephen R. Hardis Chair of Cancer Genomic Medicine at the Cleveland Clinic. This work partially used Jetstream2 at Indiana University through allocation BIO220060 from the Advanced Cyberinfrastructure Coordination Ecosystem: Services and Support program, which is supported by the National Science Foundation (2138259, 2138286, 2138307, 2137603 and 2138296).

Author information

These authors contributed equally: Dapeng Xiong, Yunguang Qiu, Junfei Zhao, Yadi Zhou, Dongjin Lee.

Authors and Affiliations

Department of Computational Biology, Cornell University, Ithaca, NY, USA
Dapeng Xiong, Dongjin Lee, Mateo Torres, Siqi Liang, Jin Joo Kang & Haiyuan Yu
Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, USA
Dapeng Xiong, Dongjin Lee, Shobhita Gupta, Mateo Torres, Siqi Liang, Jin Joo Kang & Haiyuan Yu
Center for Innovative Proteomics, Cornell University, Ithaca, NY, USA
Dapeng Xiong, Shobhita Gupta, Mateo Torres, Jin Joo Kang & Haiyuan Yu
Cleveland Clinic Genome Center, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA
Yunguang Qiu, Yadi Zhou, Charis Eng & Feixiong Cheng
Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA
Yunguang Qiu, Yadi Zhou & Feixiong Cheng
Department of Systems Biology, Herbert Irving Comprehensive Center, Columbia University, New York, NY, USA
Junfei Zhao
Biophysics Program, Cornell University, Ithaca, NY, USA
Shobhita Gupta
Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, China
Weiqiang Lu
Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH, USA
Charis Eng & Feixiong Cheng
Case Comprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, OH, USA
Charis Eng & Feixiong Cheng
Channing Division of Network Medicine, Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
Joseph Loscalzo

Authors

Dapeng Xiong
View author publications
Search author on:PubMed Google Scholar
Yunguang Qiu
View author publications
Search author on:PubMed Google Scholar
Junfei Zhao
View author publications
Search author on:PubMed Google Scholar
Yadi Zhou
View author publications
Search author on:PubMed Google Scholar
Dongjin Lee
View author publications
Search author on:PubMed Google Scholar
Shobhita Gupta
View author publications
Search author on:PubMed Google Scholar
Mateo Torres
View author publications
Search author on:PubMed Google Scholar
Weiqiang Lu
View author publications
Search author on:PubMed Google Scholar
Siqi Liang
View author publications
Search author on:PubMed Google Scholar
Jin Joo Kang
View author publications
Search author on:PubMed Google Scholar
Charis Eng
View author publications
Search author on:PubMed Google Scholar
Joseph Loscalzo
View author publications
Search author on:PubMed Google Scholar
Feixiong Cheng
View author publications
Search author on:PubMed Google Scholar
Haiyuan Yu
View author publications
Search author on:PubMed Google Scholar

Contributions

D.X., Y.Q., J.Z., Y.Z., D.L., F.C. and H.Y. conceived and developed the project. Under close supervision of H.Y., D.X., D.L. and S.L. developed the models and conducted computational experiments; and D.X., S.G., M.T. and S.L. built the web server. W.L. and J.K. conducted biological experiments. D.X., Y.Q., J.Z., Y.Z., D.L., F.C. and H.Y. performed the analyses. D.X., Y.Q., J.Z., Y.Z., D.L., S.G., C.E., J.L., F.C. and H.Y. wrote and critically revised the manuscript. All authors discussed the results and reviewed the manuscript.

Corresponding authors

Correspondence to Feixiong Cheng or Haiyuan Yu.

Ethics declarations

Competing interests

J.L. is co-scientific founder of Scipher Medicine, Inc., which applies network medicine strategies to biomarker development and personalized drug selection. The remaining authors declare no competing interests.

Peer review

Peer review information

Nature Biotechnology thanks Leng Han and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 PIONEER provides high-quality interfaces for the whole proteome.

a, Workflow for compiling interactome PIONEER. The interfaces calculated from experimentally determined co-crystal structures or homology models are primarily used, the remaining unresolved interactions are predicted by PIONEER. b, Percentage of CAPRI decoys having a given average PIONEER prediction score at interfaces. Percentages are plotted along the y axis for 4 classes of CAPRI models. The total number of models in each class is indicated in the text in the figure. c, Fraction of interactions disrupted by random population variants in PIONEER-predicted and known interfaces. The error bar denotes standard error for the binomial distribution. Significance was determined by two-sided z-test. The n numbers are shown in Supplementary Table 6. d, Enrichment of disease-associated mutations in PIONEER-predicted and known interfaces. The error bar denotes standard error for the log odds ratio. Significance was determined by two-sided z-test. The n numbers are shown in Supplementary Table 7. e, Enrichment of population variants in PIONEER-predicted and known interfaces. The error bar denotes standard error for the log odds ratio. The n numbers are shown in Supplementary Table 8.

Extended Data Fig. 2 Pharmacogenomic landscape identified by the PIONEER-predicted interactome network.

a, Drug responses evaluated by oncoPPIs in the PDX models. Effect size was quantified by Cohen’s d statistic using the difference between two means divided by a pooled s.d. for the data. Significance was determined by ANOVA adjusted by Benjamini-Hochberg method. b, Circos plot displaying drug responses evaluated by putative PIONEER-predicted oncoPPIs harboring a statistically significant excess number of missense mutations at PPI interfaces, following a binomial distribution across selected anti-cancer therapeutic agents in cancer cell lines. Each node denotes a specific oncoPPI. Node size denotes significance determined by ANOVA. Effect size was quantified by Cohen’s d statistic using the difference between two means divided by a pooled s.d. for the data. Node color denotes three different types of PPIs: (1) PDB: Red; (2) HM: Blue; and (3) PIONEER: Green. ‘HM’ represents homolog models. c, Highlighted examples of drug responses. Data are represented as a box plot with an underlaid violin plot in which the middle line is the median, the lower and upper edges of the box are the first and third quartiles, the whiskers represent IQR × 1.5, and the dots are outlier points. Significance was determined by ANOVA. The n numbers are shown in Supplementary Table 16.

Extended Data Fig. 3 Proteogenomics of the PIONEER-predicted interactome network.

a, Phosphorylation-associated PPI-perturbing mutations altered the proteomic changes in COAD and UCEC. The abundance of proteins was quantified using the TMT technique. Data are represented as a box plot with an underlaid violin plot in which the middle line is the median, the lower and upper edges of the box are the first and third quartiles, the whiskers represent IQR × 1.5, and the dots are outlier points. Significance was determined by two-tailed Wilcoxon rank-sum test. The n numbers are shown in Supplementary Table 17. b, Phosphorylation-associated PPI-perturbing mutations in the EGFR–RAS–RAF–MEK–ERK cascade signaling pathway. The whole transmembrane EGFR structures were constructed by three crystal structures (PDB: 3NJP, 2M20, 2GS6). The membrane model is shown in green. The phosphorylation sites are indicated by the symbol 'P'. The detailed interface structure of SOS1–KRAS is also shown in the inset. The key mutated residue Gln61 on KRAS forms a hydrogen bond (purple dashed line) with residue Thr935 on SOS1, and Tyr884 on SOS1 is involved in a cation-π interaction (red dash line) with residue Arg73 on KRAS. Two subunits of RAF protein structure models were built by RAF1 and BRAF, separately (PDB: 6VJJ and 6Q0J). The two subunits are connected by a disordered loop indicated by blue cartoon lines. Two heterodimers of KRAS–RAF1 and BRAF–MEK1 constitutes the KRAS–RAF–MEK1 complex. PDB ID of each complex structure model is provided. c, Highlighted examples of drug responses. Data are represented as a box plot with an underlaid violin plot in which the middle line is the median, the lower and upper edges of the box are the first and third quartiles, the whiskers represent IQR × 1.5, and the dots are outlier points. Significance was determined by ANOVA. The n numbers are shown in Supplementary Table 16.

Supplementary information

Supplementary Information

Supplementary Methods and Supplementary Figs. 1–14.

Reporting Summary

Supplementary Table 1

Supplementary Tables 1–18.

Supplementary Data 1

The labeled dataset used for PIONEER model training, validation and testing.

Supplementary Data 2

Somatic mutations in 33 cancer types of TCGA.

Supplementary Data 3

Significance test of somatic mutation enrichment in PPI interfaces by 33 cancer types in TCGA.

Source data

Source Data Fig. 4

Unprocessed western blots for Fig. 4d.

Source Data Fig. 5

Unprocessed western blots for Fig. 5d.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Xiong, D., Qiu, Y., Zhao, J. et al. A structurally informed human protein–protein interactome reveals proteome-wide perturbations caused by disease mutations. Nat Biotechnol 43, 1510–1524 (2025). https://doi.org/10.1038/s41587-024-02428-4

Download citation

Received: 01 February 2024
Accepted: 11 September 2024
Published: 24 October 2024
Version of record: 24 October 2024
Issue date: September 2025
DOI: https://doi.org/10.1038/s41587-024-02428-4

This article is cited by

PyPropel: a Python-based tool for efficiently processing and characterising protein data
- Jianfeng Sun
- Jinlong Ru
- Dapeng Xiong
BMC Bioinformatics (2025)
Conserved missense variant pathogenicity and correlated phenotypes across paralogous genes
- Tobias Brünger
- Alina Ivaniuk
- Dennis Lal
Genome Biology (2025)
Decoding the functional impact of the cancer genome through protein–protein interactions
- Haian Fu
- Xiulei Mo
- Andrey A. Ivanov
Nature Reviews Cancer (2025)
Credible inferences in microbiome research: ensuring rigour, reproducibility and relevance in the era of AI
- Alberto Caminero
- Carolina Tropini
- Elena F. Verdu
Nature Reviews Gastroenterology & Hepatology (2025)