Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Structural genomics sheds light on protein functions and remote homologs across the insect tree of life

Abstract

Protein structure bridges the sequence–function relationship, enabling deep exploration of biological processes across diverse organisms. Insects, the most diverse animal lineage, accounting for over 50% of all described animal species, provide an exceptional system for exploring sequence–structure–function relationships. Here, we reconstructed a comprehensive and well-resolved phylogeny of 4854 insects, spanning all orders. Leveraging this framework, we created an atlas of 13.29 million predicted protein structures from 824 representative species, including 11.63 million newly predicted structures. Structural clustering revealed that proteins with divergent sequences but similar structures could be effectively grouped together. Structural similarity searches against proteins with well-characterized functions yielded annotations for 7.61 million insect proteins, including up to 14% of previously unannotated proteins. We further identified 750 million remote homologs between insect proteins, many of which trace back to ancient branches of the insect phylogeny. Remarkably, despite extensive sequence divergence, cGAS-like receptors (cGLRs) were structurally conserved across all 824 insects. Experimental assays demonstrated that these structurally identified cGLRs play a crucial role in antiviral defense in the yellow fever mosquito. Our findings highlight the significance of structural genomics for understanding protein function and evolution across the tree of life.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: A comprehensive and well-resolved phylogeny of 4854 insects.
Fig. 2: The insect protein-structure universe and statistics.
Fig. 3: Insect functional genomics.
Fig. 4: Structural alignments reveal massive numbers of remote homologous proteins across the insect tree of life.
Fig. 5: A notable case of remote homologs: cGLRs are structurally conserved but show marked sequence divergence across the insect tree of life, with functional characterization in the yellow fever mosquito.

Similar content being viewed by others

Data availability

All gene alignments and gene trees are available on the figshare repository (https://doi.org/10.25452/figshare.plus.25906339). Raw RNA sequencing data has been deposited in GenBank under Bioproject ID: PRJNA1173893. Protein structures are freely available on TIPS database (http://tips.shenxlab.com/). The web server offers options for searching, visualizing, and downloading protein structures, as well as accessing the comprehensive insect tree of life.

References

  1. Lewin, H. A. et al. The Earth BioGenome Project 2020: Starting the clock. Proc. Natl. Acad. Sci. USA 119, e2115635118 (2022).

  2. Thomas, G. W. C. et al. Gene content evolution in the arthropods. Genome Biol. 21, 1–14 (2020).

    Article  Google Scholar 

  3. Marks, R. A., Hotaling, S., Frandsen, P. B. & VanBuren, R. Representation and participation across 20 years of plant genome sequencing. Nat. Plants 7, 1571–1578 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Opulente, D. A. et al. Genomic factors shape carbon and nitrogen metabolic niche breadth across Saccharomycotina yeasts. Science 384, eadj4503 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Hug, L. A. et al. A new view of the tree of life. Nat. Microbiol. 1, 1–6 (2016).

    Article  Google Scholar 

  6. Venter, J. C. et al. The sequence of the human genome. Science 291, 1304–1351 (2001).

    Article  CAS  PubMed  Google Scholar 

  7. Perez-Sepulveda, B. M. et al. An accessible, efficient and global approach for the large-scale sequencing of bacterial genomes. Genome Biol. 22, 349 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  8. Shen, X.-X. et al. Tempo and mode of genome evolution in the budding yeast subphylum. Cell 175, 1533–1545.e20 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Blackstock, W. P. & Weir, M. P. Proteomics: quantitative and physical mapping of cellular proteins. Trends Biotechnol. 17, 121–127 (1999).

    Article  CAS  PubMed  Google Scholar 

  10. Anderson, N. L. & Anderson, N. G. Proteome and proteomics: new technologies, new concepts, and new words. Electrophoresis 19, 1853–1861 (1998).

    Article  CAS  PubMed  Google Scholar 

  11. Hamamsy, T. et al. Protein remote homology detection and structural alignment using deep learning. Nat. Biotechnol. 42, 975–985 (2024).

    Article  CAS  PubMed  Google Scholar 

  12. Kilinc, M., Jia, K. & Jernigan, R. L. Improved global protein homolog detection with major gains in function identification. Proc. Natl. Acad. Sci. USA 120, e2211823120 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Rost, B. Twilight zone of protein sequence alignments. Protein Eng. Des. Sel. 12, 85–94 (1999).

    Article  CAS  Google Scholar 

  14. Chothia, C. & Lesk, A. M. The relation between the divergence of sequence and structure in proteins. EMBO J 5, 823–826 (1986).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Sali, A., Glaeser, R., Earnest, T. & Baumeister, W. From words to literature in structural proteomics. Nature 422, 216–225 (2003).

    Article  CAS  PubMed  Google Scholar 

  16. Seong, K. & Krasileva, K. V. Prediction of effector protein structures from fungal phytopathogens enables evolutionary analyses. Nat. Microbiol. 8, 174–187 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Barrio-Hernandez, I. et al. Clustering predicted structures at the scale of the known protein universe. Nature 622, 637–645 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Illergård, K., Ardell, D. H. & Elofsson, A. Structure is three to ten times more conserved than sequence—A study of structural response in protein cores. Proteins Struct. Funct. Bioinform. 77, 499–508 (2009).

    Article  Google Scholar 

  19. Nomburg, J. et al. Birth of protein folds and functions in the virome. Nature 633, 710–717 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Kim, R. S., Levy Karin, E., Mirdita, M., Chikhi, R. & Steinegger, M. BFVD—a large repository of predicted viral protein structures. Nucleic Acids Res. 53, D340–D347 (2025).

    Article  CAS  PubMed  Google Scholar 

  21. Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Mirdita, M. et al. ColabFold: making protein folding accessible to all. Nat. Methods 19, 679–682 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023).

    Article  CAS  PubMed  Google Scholar 

  24. Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 871–876 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Varadi, M. et al. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res. 50, D439–D444 (2022).

    Article  CAS  PubMed  Google Scholar 

  26. Burley, S. K. et al. RCSB Protein Data Bank: powerful new tools for exploring 3D structures of biological macromolecules for basic and applied research and education in fundamental biology, biomedicine, biotechnology, bioengineering and energy sciences. Nucleic Acids Res. 49, D437–D451 (2021).

    Article  CAS  PubMed  Google Scholar 

  27. Sillitoe, I. et al. CATH: increased structural coverage of functional space. Nucleic Acids Res. 49, D266–D273 (2021).

    Article  CAS  PubMed  Google Scholar 

  28. Lau, A. M. et al. Exploring structural diversity across the protein universe with The Encyclopedia of Domains. Science 386, eadq4946 (2024).

    Article  CAS  PubMed  Google Scholar 

  29. Stork, N. E. How many species of insects and other terrestrial arthropods are there on earth? Annu. Rev. Entomol. 63, 31–45 (2018).

    Article  CAS  PubMed  Google Scholar 

  30. May, R. M. How many species are there on earth?. Science 241, 1441–1449 (1988).

    Article  CAS  PubMed  Google Scholar 

  31. Misof, B. et al. Phylogenomics resolves the timing and pattern of insect evolution. Science 346, 763–767 (2014).

    Article  CAS  PubMed  Google Scholar 

  32. Rainford, J. L., Hofreiter, M., Nicholson, D. B. & Mayhew, P. J. Phylogenetic distribution of extant richness suggests metamorphosis is a key innovation driving diversification in insects. PLoS One 9, e109085 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  33. Manni, M., Berkeley, M. R., Seppey, M., Simão, F. A. & Zdobnov, E. M. BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol. Biol. Evol. 38, 4647–4654 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Whitfield, J. B. & Kjer, K. M. Ancient rapid radiations of insects: challenges for phylogenetic analysis. Annu. Rev. Entomol. 53, 449–472 (2008).

    Article  CAS  PubMed  Google Scholar 

  35. Sharma, P. P. Integrating morphology and phylogenomics supports a terrestrial origin of insect flight. Proc. Natl. Acad. Sci. USA 116, 2796–2798 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Wipfler, B. et al. Evolutionary history of Polyneoptera and its implications for our understanding of early winged insects. Proc. Natl. Acad. Sci. USA 116, 3024–3029 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. van Kempen, M. et al. Fast and accurate protein structure search with Foldseek. Nat. Biotechnol. 42, 243–246 (2024).

    Article  PubMed  Google Scholar 

  38. Yeo, J. et al. Metagenomic-scale analysis of the predicted protein structure universe. bioRxiv https://doi.org/10.1101/2025.04.23.650224 (2025).

  39. Akdel, M. et al. A structural biology community assessment of AlphaFold2 applications. Nat. Struct. Mol. Biol. 29, 1056–1067 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Monzon, V., Haft, D. H. & Bateman, A. Folding the unfoldable: using AlphaFold to explore spurious proteins. Bioinforma. Adv. 2, vbab043 (2022).

    Article  Google Scholar 

  41. Steinegger, M. & Söding, J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat. Biotechnol. 35, 1026–1028 (2017).

    Article  CAS  PubMed  Google Scholar 

  42. Zhong, X. et al. Structural mechanisms for regulation of GSDMB pore-forming activity. Nature 616, 598–605 (2023).

    Article  CAS  PubMed  Google Scholar 

  43. Johnson, A. G. et al. Structure and assembly of a bacterial gasdermin pore. Nature 628, 657–663 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Johnson, A. G. et al. Bacterial gasdermins reveal an ancient mechanism of cell death. Science 375, 221–225 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Wang, C. et al. Structural basis for GSDMB pore formation and its targeting by IpaH7.8. Nature 616, 590–597 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Devant, P. & Kagan, J. C. Molecular mechanisms of gasdermin D pore-forming activity. Nat. Immunol. 24, 1064–1075 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Prashar, A. et al. Crystal structures of PirA and PirB toxins from Photorhabdus akhurstii subsp. akhurstii K-1. Insect Biochem. Mol. Biol. 162, 104014 (2023).

    Article  CAS  PubMed  Google Scholar 

  48. Lee, C.-T. et al. The opportunistic marine pathogen Vibrio parahaemolyticus becomes virulent by acquiring a plasmid that expresses a deadly toxin. Proc. Natl. Acad. Sci. USA 112, 10798–10803 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Wang, H.-C. et al. A bacterial binary toxin system that kills both insects and aquatic crustaceans: Photorhabdus insect-related toxins A and B. PLoS Pathog. 19, e1011330 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Saito, M. et al. Fanzor is a eukaryotic programmable RNA-guided endonuclease. Nature 620, 660–668 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Jiang, K. et al. Programmable RNA-guided DNA endonucleases are widespread in eukaryotes and their viruses. Sci. Adv. 9, eadk0171 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Altae-Tran, H. et al. The widespread IS200/IS605 transposon family encodes diverse programmable RNA-guided endonucleases. Science 374, 57–65 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Bao, W. & Jurka, J. Homologues of bacterial TnpB_IS605 are widespread in diverse eukaryotic transposable elements. Mob. DNA 4, 12 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Yoon, P. H. et al. Eukaryotic RNA-guided endonucleases evolved from a unique clade of bacterial enzymes. Nucleic Acids Res. 51, 12414–12427 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Gligorijević, V. et al. Structure-based protein function prediction using graph convolutional networks. Nat. Commun. 12, 3168 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  56. Xu, J. & Zhang, Y. How significant is a protein structure similarity with TM-score = 0.5? Bioinformatics 26, 889–895 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Zhang, Y., Hubner, I. A., Arakaki, A. K., Shakhnovich, E. & Skolnick, J. On the origin and highly likely completeness of single-domain protein structures. Proc. Natl. Acad. Sci. USA 103, 2605–2610 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Shinoda, T. & Itoyama, K. Juvenile hormone acid methyltransferase: a key regulatory enzyme for insect metamorphosis. Proc. Natl. Acad. Sci. USA 100, 11986–11991 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Jindra, M., Palli, S. R. & Riddiford, L. M. The juvenile hormone signaling pathway in insect development. Annu. Rev. Entomol. 58, 181–204 (2013).

    Article  CAS  PubMed  Google Scholar 

  60. Bänziger, C. et al. Wntless, a conserved membrane protein dedicated to the secretion of Wnt proteins from signaling cells. Cell 125, 509–522 (2006).

    Article  PubMed  Google Scholar 

  61. Korkut, C. et al. Trans-synaptic transmission of vesicular Wnt signals through Evi/Wntless. Cell 139, 393–404 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Martin-Martin, I. et al. ADP binding by the Culex quinquefasciatus mosquito D7 salivary protein enhances blood feeding on mammals. Nat. Commun. 11, 2911 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Martin-Martin, I. et al. Aedes aegypti D7 long salivary proteins modulate blood feeding and parasite infection. MBio 14, e0228923 (2023).

    Article  PubMed  Google Scholar 

  64. Holleufer, A. et al. Two cGAS-like receptors induce antiviral immunity in Drosophila. Nature 597, 114–118 (2021).

    Article  CAS  PubMed  Google Scholar 

  65. Slavik, K. M. et al. cGAS-like receptors sense RNA and control 3′2′-cGAMP signalling in Drosophila. Nature 597, 109–113 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Li, Y. et al. cGLRs are a diverse family of pattern recognition receptors in innate immunity. Cell 186, 3261–3276.e20 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Wang, J. & Meng, W. cGAS: Bridging immunity and metabolic regulation. J. Mol. Cell Biol. mjaf018 (2025).

  68. Palmer, C. S. Innate metabolic responses against viral infections. Nat. Metab. 4, 1245–1259 (2022).

    Article  PubMed  Google Scholar 

  69. Liu, H., Wang, F., Cao, Y., Dang, Y. & Ge, B. The multifaceted functions of cGAS. J. Mol. Cell Biol. 14, mjac031 (2022).

  70. Cai, H. et al. 2′3′-cGAMP triggers a STING- and NF-κB–dependent broad antiviral response in Drosophila. Sci. Signal. 13, eabc4537 (2020).

    Article  CAS  PubMed  Google Scholar 

  71. Antonova, Y., Alvarez, K. S., Kim, Y. J., Kokoza, V. & Raikhel, A. S. The role of NF-κB factor REL2 in the Aedes aegypti immune response. Insect Biochem. Mol. Biol. 39, 303–314 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. Martin, M., Hiroyasu, A., Guzman, R. M., Roberts, S. A. & Goodman, A. G. Analysis of Drosophila STING reveals an evolutionarily conserved antimicrobial function. Cell Rep. 23, 3537–3550.e6 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. Kristensen, N. P. Phylogeny of insect orders. Annu. Rev. Entomol. 26, 135–157 (1981).

    Article  Google Scholar 

  74. Ribeiro, T. M. & Espíndola, A. Integrated phylogenomic approaches in insect systematics. Curr. Opin. Insect Sci. 61, 101150 (2024).

    Article  PubMed  Google Scholar 

  75. Chesters, D. The phylogeny of insects in the data-driven era. Syst. Entomol. 45, 540–551 (2020).

    Article  Google Scholar 

  76. Trautwein, M. D., Wiegmann, B. M., Beutel, R., Kjer, K. M. & Yeates, D. K. Advances in insect phylogeny at the dawn of the postgenomic era. Annu. Rev. Entomol. 57, 449–468 (2012).

    Article  CAS  PubMed  Google Scholar 

  77. Behura, S. K. Insect phylogenomics. Insect Mol. Biol. 24, 403–411 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  78. Yeates, D. K., Meusemann, K., Trautwein, M., Wiegmann, B. & Zwick, A. Power, resolution and bias: recent advances in insect phylogeny driven by the genomic revolution. Curr. Opin. Insect Sci. 13, 16–23 (2016).

    Article  PubMed  Google Scholar 

  79. Giribet, G. & Edgecombe, G. D. The phylogeny and evolutionary history of arthropods. Curr. Biol. 29, R592–R602 (2019).

    Article  CAS  PubMed  Google Scholar 

  80. Johnson, K. P. Putting the genome in insect phylogenomics. Curr. Opin. Insect Sci. 36, 111–117 (2019).

    Article  PubMed  Google Scholar 

  81. Tihelka, E. et al. The evolution of insect biodiversity. Curr. Biol. 31, R1299–R1311 (2021).

    Article  CAS  PubMed  Google Scholar 

  82. Kohli, M. et al. Evolutionary history and divergence times of Odonata (dragonflies and damselflies) revealed through transcriptomics. iScience 24, 103324 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  83. Kawahara, A. Y. et al. Phylogenomics reveals the evolutionary timing and pattern of butterflies and moths. Proc. Natl. Acad. Sci. USA. 116, 22657–22663 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  84. Johnson, K. P. et al. Phylogenomics and the evolution of hemipteroid insects. Proc. Natl. Acad. Sci. USA 115, 12775–12780 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  85. Peters, R. S. et al. Evolutionary history of the hymenoptera. Curr. Biol. 27, 1013–1018 (2017).

    Article  CAS  PubMed  Google Scholar 

  86. Blaimer, B. B. et al. Key innovations and the diversification of Hymenoptera. Nat. Commun. 14, 1212 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  87. Almeida, E. A. B. et al. The evolutionary history of bees in time and space. Curr. Biol. 33, 3409–3422.e6 (2023).

    Article  CAS  PubMed  Google Scholar 

  88. de Moya, R. S. et al. Phylogenomics of parasitic and nonparasitic lice (Insecta: Psocodea): combining sequence data and exploring compositional bias solutions in next generation data sets. Syst. Biol. 70, 719–738 (2021).

    Article  PubMed  Google Scholar 

  89. Kawahara, A. Y. et al. A global phylogeny of butterflies reveals their evolutionary history, ancestral hosts and biogeographic origins. Nat. Ecol. Evol. 7, 903–913 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  90. McKenna, D. D. et al. The evolution and genomic basis of beetle diversity. Proc. Natl. Acad. Sci. USA 116, 24729–24737 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  91. Delsuc, F., Brinkmann, H. & Philippe, H. Phylogenomics and the reconstruction of the tree of life. Nat. Rev. Genet. 6, 361–375 (2005).

    Article  CAS  PubMed  Google Scholar 

  92. Philippe, H., Delsuc, F., Brinkmann, H. & Lartillot, N. Phylogenomics. Annu. Rev. Ecol. Evol. Syst. 36, 541–562 (2005).

    Article  Google Scholar 

  93. Steenwyk, J. L., Li, Y., Zhou, X., Shen, X.-X. & Rokas, A. Incongruence in the phylogenomics era. Nat. Rev. Genet. 24, 834–850 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  94. Shen, X.-X., Steenwyk, J. L. & Rokas, A. Dissecting incongruence between concatenation- and quartet-based approaches in phylogenomic data. Syst. Biol. 70, 997–1014 (2021).

    Article  PubMed  Google Scholar 

  95. Shen, X.-X., Hittinger, C. T. & Rokas, A. Contentious relationships in phylogenomic studies can be driven by a handful of genes. Nat. Ecol. Evol. 1, 0126 (2017).

    Article  Google Scholar 

  96. Mutti, G., Ocaña-Pallarès, E. & Gabaldón, T. Newly developed structure-based methods do not outperform standard sequence-based methods for large-scale phylogenomics. Mol. Biol. Evol. 42, msaf149 (2025).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  97. Durairaj, J. et al. Uncovering new families and folds in the natural protein universe. Nature 622, 646–653 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  98. Mifsud, J. C. O. et al. Mapping glycoprotein structure reveals Flaviviridae evolutionary history. Nature 633, 695–703 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  99. Huang, J. et al. Discovery of deaminase functions by structure-based protein clustering. Cell 186, 3182–3195.e14 (2023).

    Article  CAS  PubMed  Google Scholar 

  100. Himmel, N. J., Moi, D. & Benton, R. Remote homolog detection places insect chemoreceptors in a cryptic protein superfamily spanning the tree of life. Curr. Biol. 33, 5023–5033.e4 (2023).

    Article  CAS  PubMed  Google Scholar 

  101. Liu, W. et al. PLMSearch: Protein language model powers accurate and fast sequence search for remote homology. Nat. Commun. 15, 2775 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  102. Hong, L. et al. Fast, sensitive detection of protein homologs using deep dense retrieval. Nat. Biotechnol. 43, 983–995 (2025).

    Article  CAS  PubMed  Google Scholar 

  103. Jenson, J. M. & Chen, Z. J. cGAS goes viral: a conserved immune defense system from bacteria to humans. Mol. Cell 84, 120–130 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  104. Wein, T. & Sorek, R. Bacterial origins of human cell-autonomous innate immune mechanisms. Nat. Rev. Immunol. 22, 629–638 (2022).

    Article  CAS  PubMed  Google Scholar 

  105. Hobbs, S. J. & Kranzusch, P. J. Nucleotide immune signaling in CBASS, Pycsar, thoeris, and CRISPR antiphage defense. Annu. Rev. Microbiol. 78, 255–276 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  106. Sun, L., Wu, J., Du, F., Chen, X. & Chen, Z. J. Cyclic GMP-AMP synthase is a cytosolic DNA sensor that activates the type i interferon pathway. Science 339, 786–791 (2013).

    Article  CAS  PubMed  Google Scholar 

  107. Culbertson, E. M. & Levin, T. C. Eukaryotic CD-NTase, STING, and viperin proteins evolved via domain shuffling, horizontal transfer, and ancient inheritance from prokaryotes. PLoS Biol 21, e3002436 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  108. Millman, A., Melamed, S., Amitai, G. & Sorek, R. Diversity and classification of cyclic-oligonucleotide-based anti-phage signalling systems. Nat. Microbiol. 5, 1608–1615 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  109. McFarland, A. P. et al. Sensing of bacterial cyclic dinucleotides by the oxidoreductase RECON promotes NF-κB activation and shapes a proinflammatory antibacterial state. Immunity 46, 433–445 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  110. Xia, P. et al. The ER membrane adaptor ERAdP senses the bacterial second messenger c-di-AMP and initiates anti-bacterial immunity. Nat. Immunol. 19, 141–150 (2018).

    Article  CAS  PubMed  Google Scholar 

  111. Chow, K. L., Hall, D. H. & Emmons, S. W. The mab-21 gene of Caenorhabditis elegans encodes a novel protein required for choice of alternate cell fates. Development 121, 3615–3626 (1995).

    Article  CAS  PubMed  Google Scholar 

  112. Yamada, R. et al. Cell-autonomous involvement of Mab21l1 is essential for lens placode development. Development 130, 1759–1770 (2003).

    Article  CAS  PubMed  Google Scholar 

  113. Li, L. et al. Hydrolysis of 2′3′-cGAMP by ENPP1 and design of nonhydrolyzable analogs. Nat. Chem. Biol. 10, 1043–1048 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  114. Hou, Y. et al. SMPDL3A is a cGAMP-degrading enzyme induced by LXR-mediated lipid metabolism to restrict cGAS-STING DNA sensing. Immunity 56, 2492–2507.e10 (2023).

    Article  CAS  PubMed  Google Scholar 

  115. Maltbaek, J. H., Cambier, S., Snyder, J. M. & Stetson, D. B. ABCC1 transporter exports the immunostimulatory cyclic dinucleotide cGAMP. Immunity 55, 1799–1812.e4 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  116. Porta-Pardo, E., Ruiz-Serra, V., Valentini, S. & Valencia, A. The structural coverage of the human proteome before and after AlphaFold. PLoS Comput. Biol. 18, e1009818 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  117. Derry, A., Carpenter, K. A. & Altman, R. B. Training data composition affects performance of protein structure analysis algorithms. Pac. Symp. Biocomput. 27, 10–21 (2022).

    PubMed  PubMed Central  Google Scholar 

  118. Necci, M. et al. Critical assessment of protein intrinsic disorder prediction. Nat. Methods 18, 472–481 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  119. Gramates, L. S. et al. FlyBase: a guided tour of highlighted features. Genetics 220, iyac035 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  120. Sayers, E. W. et al. Database resources of the National Center for Biotechnology Information in 2023. Nucleic Acids Res. 51, D29–D38 (2023).

    Article  CAS  PubMed  Google Scholar 

  121. Mei, Y. et al. InsectBase 2.0: a comprehensive gene resource for insects. Nucleic Acids Res. 50, D1040–D1045 (2022).

    Article  CAS  PubMed  Google Scholar 

  122. Poelchau, M. et al. The i5k Workspace@NAL—enabling genomic data access, visualization and curation of arthropod genomes. Nucleic Acids Res. 43, D714–D719 (2015).

    Article  CAS  PubMed  Google Scholar 

  123. Martin, F. J. et al. Ensembl 2023. Nucleic Acids Res. 51, D933–D941 (2023).

    Article  CAS  PubMed  Google Scholar 

  124. Kent, W. J. et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  125. Bai, X. et al. Database resources of the National Genomics Data Center, China National Center for Bioinformation in 2024. Nucleic Acids Res. 52, D18–D32 (2024).

    Article  Google Scholar 

  126. Grabherr, M. G. et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 29, 644–652 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  127. Kriventseva, E. V. et al. OrthoDB v10: sampling the diversity of animal, plant, fungal, protist, bacterial and viral genomes for evolutionary and functional annotations of orthologs. Nucleic Acids Res. 47, D807–D811 (2019).

    Article  CAS  PubMed  Google Scholar 

  128. Li, Y. et al. HGT is widespread in insects and contributes to male courtship in lepidopterans. Cell 185, 2975–2987.e10 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  129. Zhao, T. et al. Whole-genome microsynteny-based phylogeny of angiosperms. Nat. Commun. 12, 3498 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  130. Steenwyk, J. L., Shen, X.-X., Lind, A. L., Goldman, G. H. & Rokas, A. A robust phylogenomic time tree for biotechnologically and medically important fungi in the genera Aspergillus and Penicillium. MBio 10, 1–25 (2019).

    Article  Google Scholar 

  131. Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: Improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  132. Capella-Gutierrez, S., Silla-Martinez, J. M. & Gabaldon, T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972–1973 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  133. Nguyen, L.-T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274 (2015).

    Article  CAS  PubMed  Google Scholar 

  134. Minh, B. Q. et al. IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol. Biol. Evol. 37, 1530–1534 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  135. Yin, J., Zhang, C. & Mirarab, S. ASTRAL-MP: scaling ASTRAL to very large datasets using randomization and parallelization. Bioinformatics 35, 3961–3969 (2019).

    Article  CAS  PubMed  Google Scholar 

  136. Zhang, C., Rabiee, M., Sayyari, E. & Mirarab, S. ASTRAL-III: polynomial time species tree reconstruction from partially resolved gene trees. BMC Bioinformatics 19, 153 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  137. Letunic, I. & Bork, P. Interactive Tree Of Life (iTOL) v5: an online tool for phylogenetic tree display and annotation. Nucleic Acids Res. 49, W293–W296 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  138. Hu, G. et al. flDPnn: accurate intrinsic disorder prediction with putative propensities of disorder functions. Nat. Commun. 12, 4438 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  139. Deiana, A., Forcelloni, S., Porrello, A. & Giansanti, A. Intrinsically disordered proteins and structured proteins with intrinsically disordered regions have different functional roles in the cell. PLoS One 14, e0217889 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  140. Wells, J. et al. Chainsaw: protein domain segmentation with fully convolutional neural networks. Bioinformatics 40, btae296 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  141. Lau, A. M., Kandathil, S. M. & Jones, D. T. Merizo: a rapid and accurate protein domain segmentation method using invariant point attention. Nat. Commun. 14, 8445 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  142. Zhu, K., Su, H., Peng, Z. & Yang, J. A unified approach to protein domain parsing with inter-residue distance matrix. Bioinformatics 39, btad070 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  143. Lees, J. et al. Gene3D: a domain-based resource for comparative genomics, functional annotation and protein network analysis. Nucleic Acids Res. 40, D465–D471 (2012).

    Article  CAS  PubMed  Google Scholar 

  144. Jones, P. et al. InterProScan 5: genome-scale protein function classification. Bioinformatics 30, 1236–1240 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  145. Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  146. Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).

    Article  CAS  PubMed  Google Scholar 

  147. Liao, Y., Smyth, G. K. & Shi, W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930 (2014).

    Article  CAS  PubMed  Google Scholar 

  148. Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  149. Ge, S. X., Jung, D. & Yao, R. ShinyGO: a graphical gene-set enrichment tool for animals and plants. Bioinformatics 36, 2628–2629 (2020).

    Article  CAS  PubMed  Google Scholar 

  150. Cai, H. et al. The virus-induced cyclic dinucleotide 2′3′-c-di-GMP mediates STING-dependent antiviral immunity in Drosophila. Immunity 56, 1991–2005.e9 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank Antonis Rokas for insightful comments and Xin Qiao for help with the cell culture and transfection. This work was conducted in part using the resources of the Information Technology Center and State Key Lab of CAD&CG at Zhejiang University. This work was supported by the Scientific Research Innovation Capability Support Project for Young Faculty (ZYGXQNJSKYCXNLZCXM-A12 to X.-X.S.), the Key Program of National Natural Science Foundation of China (32530086 to X.-X.S.), the National Science Foundation for Distinguished Young Scholars of Zhejiang Province (LR23C140001 to X.-X.S.), Shanghai Municipal Science and Technology Major Project (S.W.), the New Cornerstone Science Foundation (NCI202328 to S.W.), the National Natural Science Foundation of China (32230015 and 32021001 to S.W., 32200395 to C.C.), Zhejiang Provincial Natural Science Foundation of China (LZ23C020002 to R.P.), the Key-Area Research and Development Program of Guangdong Province (2018B020205003 and 2020B0202090001 to X.Z.), and the Key International Joint Research Program of National Natural Science Foundation of China (31920103005 to X.-X.C.).

Author information

Authors and Affiliations

Authors

Contributions

X.-X.S. and S.W. conceived and designed the study. X.-X.S., W.W., C.C., Y.Z., J.C., Q.Z., Y.W., H.C., Z.L., H.G., G.-Z.O., C.L., and M.T. performed computational analyses and experiments. X.-X.S., S.W., W.W., C.C., Y.Z., X.Z., Y.C., R.P., J.Y., H.C., G.Z., and X.-X.C. interpreted results. X.-X.S. wrote the manuscript with input from all authors. X.-X.S., S.W., W.W., C.C., X.Z., and H.C. edited the manuscript.

Corresponding authors

Correspondence to Xiaofan Zhou, Sibao Wang or Xing-Xing Shen.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information, Fig. S1 (download PDF )

Supplementary information, Fig. S2 (download PDF )

Supplementary information, Fig. S3 (download PDF )

Supplementary information, Fig. S4 (download PDF )

Supplementary information, Fig. S5 (download PDF )

Supplementary information, Fig. S6 (download PDF )

Supplementary information, Fig. S7 (download PDF )

Supplementary information, Fig. S8 (download PDF )

Supplementary information, Fig. S9 (download PDF )

Supplementary information, Fig. S10 (download PDF )

Supplementary information, Fig. S11 (download PDF )

Supplementary information, Fig. S12 (download PDF )

Supplementary information, Fig. S13 (download PDF )

Supplementary information, Fig. S14 (download PDF )

Supplementary information, Fig. S15 (download PDF )

Supplementary information, Fig. S16 (download PDF )

Supplementary information, Fig. S17 (download PDF )

Supplementary information, Fig. S18 (download PDF )

Supplementary information, Fig. S19 (download PDF )

Supplementary information, Fig. S20 (download PDF )

Supplementary information, Fig. S21 (download PDF )

Supplementary information, Fig. S22 (download PDF )

Supplementary information, Fig. S23 (download PDF )

Supplementary information, Table S1 (download XLSX )

Supplementary information, Table S2 (download XLSX )

Supplementary information, Table S3 (download XLSX )

Supplementary information, Table S4 (download XLSX )

Supplementary information, Table S5 (download XLSX )

Supplementary information, Table S6 (download XLSX )

Supplementary information, Table S7 (download XLSX )

Supplementary information, Table S8 (download XLSX )

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, W., Cui, C., Zhu, Y. et al. Structural genomics sheds light on protein functions and remote homologs across the insect tree of life. Cell Res (2026). https://doi.org/10.1038/s41422-026-01220-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • DOI: https://doi.org/10.1038/s41422-026-01220-0

Search

Quick links