Abstract
Protein complexes are fundamental to all biological processes. Public repositories have expanded to include millions of potential protein–protein interactions (PPIs) from human and diverse model organisms. Yet, large-scale structural characterization of these complexes—especially across different biological kingdoms—has lagged far behind, leaving most potential and unidentified interactions unresolved. Here, we present a comprehensive atlas of 1.1 million predicted protein–protein interaction structures generated with the AlphaFold2-based ColabFold framework. This dataset spans proteome-wide interactions from bacteria, archaea, humans, mice, plants, and human–virus pairs. Overall, we identify 181,671 high-confidence protein complex structures, especially 37,855 in the human interactome. Structural clustering revealed numerous conserved protein complex architectures shared across kingdoms, providing insights into previously uncharacterized biological functions. Supported by co-immunoprecipitation experiments, we further identify candidate viral receptors for Human mastadenovirus A and Papiine alphaherpesvirus 2. Comparative analyses integrating our complex structures with the AlphaFold monomeric structure database uncovered widespread gene fusion and fission events during evolution. Finally, we demonstrate how our dataset can enhance protein binding–surface prediction using deep learning approaches, illustrating its broad utility beyond structural modeling alone. Altogether, this atlas to our knowledge, represents one of the most extensive cross-kingdom resources and opens avenues for future discoveries in various biomedical applications.
Similar content being viewed by others
Data availability
The 1.1 million predicted protein structures generated in this study have been deposited in the ModelScope database (https://www.modelscope.cn/collections/protein_complex_atlas-2ae5e7d4f4a343). The processed, curated high-confidence PPI structures are available at a companion website (https://www.biopredictnavigator.cn). Accession codes for analysed genomes of representative prokaryotes are available in Supplementary Data 1. Source data are provided with this paper.
Code availability
The code for this manuscript is provided in GitHub repository: https://github.com/wensm77/Protein-Complex-Atlas, and on Zenodo: https://doi.org/10.5281/zenodo.18630539.
References
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 871–876 (2021).
Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023).
Evans, R. et al. Protein complex prediction with AlphaFold-Multimer. bioRxiv, 2021.10.04.463034 (2021).
Bryant, P. et al. Predicting the structure of large protein complexes using AlphaFold and Monte Carlo tree search. Nat. Commun. 13, 6028 (2022).
Shor, B. & Schneidman-Duhovny, D. CombFold: predicting structures of large protein assemblies using a combinatorial assembly algorithm and AlphaFold2. Nat. Methods 21, 477–487 (2024).
Krishna, R. et al. Generalized biomolecular modeling and design with RoseTTAFold All-Atom. Science 384, eadl2528 (2024).
Abramson, J. et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493–500 (2024).
Akdel, M. et al. A structural biology community assessment of AlphaFold2 applications. Nat. Struct. Mol. Biol. 29, 1056–1067 (2022).
Bouatta, N. & AlQuraishi, M. Structural biology at the scale of proteomes. Nat. Struct. Mol. Biol. 30, 129–130 (2023).
Hammack, A. T. & Blaby-Haas, C. E. Machine learning sheds light on microbial dark proteins. Nat. Rev. Microbiol. 22, 63–63 (2024).
van Kempen, M. et al. Fast and accurate protein structure search with Foldseek. Nat. Biotechnol. 42, 243–246 (2024).
Barrio-Hernandez, I. et al. Clustering predicted structures at the scale of the known protein universe. Nature 622, 637–645 (2023).
Durairaj, J. et al. Uncovering new families and folds in the natural protein universe. Nature 622, 646–653 (2023).
Bordin, N. et al. AlphaFold2 reveals commonalities and novelties in protein structure space for 21 model organisms. Commun. Biol. 6, 160 (2023).
Nomburg, J. et al. Birth of protein folds and functions in the virome. Nature 633, 710–717 (2024).
Kim, R. S., Levy Karin, E., Mirdita, M., Chikhi, R. & Steinegger, M. BFVD—a large repository of predicted viral protein structures. Nucleic Acids Res. 53, D340–D347 (2024).
Huang, J. et al. Discovery of deaminase functions by structure-based protein clustering. Cell 187, 4426–4428 (2024).
Szklarczyk, D. et al. The STRING database in 2023: protein-protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic Acids Res. 51, D638–D646 (2023).
Oughtred, R. et al. The BioGRID database: A comprehensive biomedical resource of curated protein, genetic, and chemical interactions. Protein Sci. 30, 187–200 (2021).
Schweke, H. et al. An atlas of protein homo-oligomerization across domains of life. Cell 187, 999–1010 e15 (2024).
Humphreys, I. R. et al. Computed structures of core eukaryotic protein complexes. Science 374, eabm4805 (2021).
Burke, D. F. et al. Towards a structurally resolved human protein interaction network. Nat. Struct. Mol. Biol. 30, 216–225 (2023).
Shine, M. et al. Co-transcriptional gene regulation in eukaryotes and prokaryotes. Nat. Rev. Mol. Cell Biol. 25, 534–554 (2024).
Taboada, B., Estrada, K., Ciria, R. & Merino, E. Operon-mapper: a web server for precise operon identification in bacterial and archaeal genomes. Bioinformatics 34, 4118–4120 (2018).
Altae-Tran, H. et al. Uncovering the functional diversity of rare CRISPR-Cas systems with deep terascale clustering. Science 382, eadi1910 (2023).
Makarova, K. S. et al. Evolutionary classification of CRISPR-Cas systems: a burst of class 2 and derived variants. Nat. Rev. Microbiol. 18, 67–83 (2020).
Terlouw, B. R. et al. MIBiG 3.0: a community-driven effort to annotate experimentally validated biosynthetic gene clusters. Nucleic Acids Res. 51, D603–D610 (2023).
Mirdita, M. et al. ColabFold: making protein folding accessible to all. Nat. Methods 19, 679–682 (2022).
Passaro, S. et al. Boltz-2: towards accurate and efficient binding affinity prediction. bioRxiv, 2025.06.14.659707 (2025).
Discovery, C. et al. Chai-1: Decoding the molecular interactions of life. bioRxiv, 2024.10.10.615955 (2024).
Team, B. A. A. S. et al. Protenix—advancing structure prediction through a comprehensive AlphaFold3 reproduction. bioRxiv, 2025.01.08.631967 (2025).
Bryant, P., Pozzati, G. & Elofsson, A. Improved prediction of protein-protein interactions using AlphaFold2. Nat. Commun. 13, 1265 (2022).
Zhu, W., Shenoy, A., Kundrotas, P. & Elofsson, A. Evaluation of AlphaFold-multimer prediction on multi-chain protein complexes. Bioinformatics 39, btad424 (2023).
Kim, A.-R. et al. Enhanced protein-protein interaction discovery via AlphaFold-multimer. bioRxiv, 2024.02.19.580970 (2024).
Schweke, H. et al. An atlas of protein homo-oligomerization across domains of life. Cell 187, 999–1010.e15 (2024).
Steinegger, M. & Söding, J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat. Biotechnol. 35, 1026–1028 (2017).
Yang, M. et al. Biogenesis of a bacterial metabolosome for propanediol utilization. Nat. Commun. 13, 2920 (2022).
Chandrangsu, P., Rensing, C. & Helmann, J. D. Metal homeostasis and resistance in bacteria. Nat. Rev. Microbiol. 15, 338–350 (2017).
Liu, B., Zheng, D., Zhou, S., Chen, L. & Yang, J. VFDB 2022: a general classification scheme for bacterial virulence factors. Nucleic Acids Res. 50, D912–D917 (2022).
Luck, K. et al. A reference map of the human binary protein interactome. Nature 580, 402–408 (2020).
Li, H., Mapolelo, D. T., Randeniya, S., Johnson, M. K. & Outten, C. E. Human glutaredoxin 3 forms [2Fe-2S]-bridged complexes with human BolA2. Biochemistry 51, 1687–1696 (2012).
Liang, G. & Bushman, F. D. The human virome: assembly, composition and host interactions. Nat. Rev. Microbiol. 19, 514–527 (2021).
Lasso, G. et al. A structure-informed atlas of human-virus interactions. Cell 178, 1526–1541 e16 (2019).
Yang, X. et al. HVIDB: a comprehensive database for human-virus protein-protein interactions. Brief. Bioinform. 22, 832–844 (2021).
Xiao, J. et al. FBXL20-mediated Vps34 ubiquitination as a p53 controlled checkpoint in regulating autophagy and receptor degradation. Genes Dev. 29, 184–196 (2015).
Arnold, B. J., Huang, I. T. & Hanage, W. P. Horizontal gene transfer and adaptive evolution in bacteria. Nat. Rev. Microbiol. 20, 206–218 (2022).
Su, J. et al. SaProt: protein language modeling with structure-aware vocabulary. bioRxiv, 2023.10.01.560349 (2023).
Lau, A. M., Kandathil, S. M. & Jones, D. T. Merizo: a rapid and accurate protein domain segmentation method using invariant point attention. Nat. Commun. 14, 8445 (2023).
Wang, Y., Boadu, F. & Cheng, J. MPBind: multitask protein binding site prediction by protein language models and equivariant graph neural networks. bioRxiv, 2025.04.12.648527 (2025).
Fang, Y. et al. DeepProSite: structure-aware protein binding site prediction using ESMFold and pretrained language model. Bioinformatics 39, btad718 (2023).
Spitz, F. & Furlong, E. E. M. Transcription factors: from enhancer binding to developmental control. Nat. Rev. Genet. 13, 613–626 (2012).
Martínez-Reyes, I. & Chandel, N. S. Mitochondrial TCA cycle metabolites control physiology and disease. Nat. Commun. 11, 102 (2020).
Lee, J. M., Hammarén, H. M., Savitski, M. M. & Baek, S. H. Control of protein stability by post-translational modifications. Nat. Commun. 14, 201 (2023).
Douguet, D., Chen, H.-C., Tovchigrechko, A. & Vakser, I. A. Dockground resource for studying protein–protein interfaces. Bioinformatics 22, 2612–2618 (2006).
Biasini, M. et al. OpenStructure: an integrated software framework for computational structural biology. Acta Crystallogr. D. Biol. Crystallogr. 69, 701–709 (2013).
Mirabello, C. & Wallner, B. DockQ v2: improved automatic quality measure for protein multimers, nucleic acids, and small molecules. Bioinformatics 40, (2024).
Acknowledgements
This work was supported by National Key R&D Program of China (No.2023YFF1205400 to D.M.), National Natural Science Foundation of China under grant (No. 32571689 and No. 32301230 to D.M., No. 82573859 to Y. Y. and No. 72174172 to D. E.), Zhejiang Laboratory PI start program, Fundamental and Interdisciplinary Disciplines Breakthrough Plan of the Ministry of Education of China (JYB2025XDXM502 to J.Z.), the Noncommunicable Chronic Diseases-National Science and Technology Major Project (No. 2024ZD0525100 to J.Z.) and the Scientific and Technological Innovation Team for Qinghai-Tibetan Plateau Research in Southwest Minzu University (Grant No.2024CXTD20). Authors thank Yuanzhao Pan (Beijing National Day School) for valuable support and insightful discussions. Xitong Li (Jiangnan University), Weizhen Ou (Jiangnan University), Jijun Fan (Jiangnan University), Wenbo Deng (China University of Mining and Technology), and Shuhao Niu (Jiangnan University) provided suggestions for language revisions.
Author information
Authors and Affiliations
Contributions
D.M. conceived this project. X.Q., C.Y., J.L., S.W., Yuanyuan L., K.D., Yongfu H., J.F., W.M., L.L., Z.L., Y.S., H.Z., Yayun H., R.Z., P.J., Yafei L., B.L., H.W., Yuxuan C., Z.M., P.Y., X.X., J.W., Y.Z., Q.Z., W.Z., K.Y., S. L., H.X., D.E. performed the computational analysis. J.F. and Y.Y. performed wet-lab experiments. D.M., Y.Y., Ying C., and C.S. supervised the project. D.M., R.Z., Z.X., J.Z., D.E., H.X., and W.Z. wrote the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no conflicts of interest.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Source data
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Qi, X., Ye, C., Liang, J. et al. Atlas of predicted protein complex structures across kingdoms. Nat Commun (2026). https://doi.org/10.1038/s41467-026-70884-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467-026-70884-4


