Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Nature Communications
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. nature communications
  3. articles
  4. article
The Extreme Environment Microbiome Catalog (EEMC): a global resource for microbial diversity and antimicrobial discovery
Download PDF
Download PDF
  • Article
  • Open access
  • Published: 02 April 2026

The Extreme Environment Microbiome Catalog (EEMC): a global resource for microbial diversity and antimicrobial discovery

  • Puzi Jiang  (江浦滋)  ORCID: orcid.org/0000-0001-5872-99431,2 na1,
  • Zhengjiao Liang  (梁正佼)1,2 na1,
  • Vladimir Kovacevic  ORCID: orcid.org/0000-0002-9843-62613 na1,
  • Jingya Shi  (石静雅)1,2 na1,
  • Nikola Milicevic3,
  • Feng Wang  (王峰)1,
  • Lin Liu  (刘林)4,
  • Yue Liu  (刘悦)  ORCID: orcid.org/0009-0009-1093-40971,2,
  • Yunjiang Jiang  (蒋云江)1,2,
  • Mo Han  (韩默)  ORCID: orcid.org/0000-0002-4404-297X1,2,
  • Xiaonan Lin  (林晓楠)1,
  • Časlav Petronić3,
  • Nikola Stanojevic3,
  • Lingqin Wang  (王凌琴)5,
  • Suwan Wang  (王苏婉)5,
  • Haixian Cheng  (程海鲜)5,
  • Jiani Li  (李佳妮)5,
  • Rouxi Chen  (陈柔汐)1,2,
  • Yong Zhang  (张勇)  ORCID: orcid.org/0000-0001-9950-17934,
  • Yuxiang Li  (黎宇翔)  ORCID: orcid.org/0000-0002-1575-36924,
  • Junhua Li  (李俊桦)  ORCID: orcid.org/0000-0001-6784-18733,
  • Xiaodong Fang  (方晓东)1,2,6,
  • Zhen Yue  (岳震)  ORCID: orcid.org/0000-0001-6993-60671,2,
  • Chuang Xue  (薛闯)  ORCID: orcid.org/0000-0002-3856-84575,7,
  • Peng Yin  (殷鹏)  ORCID: orcid.org/0000-0003-2407-40744 &
  • …
  • Haixin Chen  (陈海新)  ORCID: orcid.org/0000-0002-3659-37471,2 

Nature Communications , Article number:  (2026) Cite this article

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Applied microbiology
  • Data mining
  • Environmental microbiology
  • Microbiome

Abstract

Microorganisms in extreme environments represent a promising source of novel metabolites, yet their global diversity and biosynthetic potential remain underexplored. Here, we reconstruct 78,213 bacterial and archaeal genomes from 2293 publicly available metagenomes and 3214 microbial isolates to establish a unified database, the Extreme Environment Microbiome Catalog (EEMC). The EEMC expands known global phylogenetic diversity, encompassing 32,715 representative species and nearly 4 billion non-redundant genes, 63.00% and 19.21% of which are previously unannotated, respectively. It also comprises 163,693 biosynthetic gene clusters, grouped into 64,733 gene cluster families, 58.68% of which are classified as novel, underscoring the functional diversity of microbial communities across various extreme habitats. We further develop protein large language models to predict genome-encoded candidate antimicrobial peptides (cAMPs) from the EEMC, identifying 3032 non-toxic candidates. Of 100 synthesized peptides, 84% demonstrate antibacterial activity, and all 50 tested cAMPs exhibit low cytotoxicity. Notably, six of the most potent cAMPs show significant efficacy against multidrug-resistant, Gram-negative pathogens in vitro, indicating their biomedical potential. Together, our study establishes the EEMC as a foundational resource for uncovering novel microbial lineages and biosynthetic capabilities, highlighting its substantial potential for drug discovery and laying the foundation for future advances in biotechnology and biomedicine.

Data availability

All 74,999 MAGs generated in this study, together with 83 in-house isolate genomes from deep-sea, the non-redundant gene sets from assembled contigs and genomes, and 163,693 BGCs, have been deposited in the China National GeneBank DataBase (CNGBdb) with accession number CNP0007106. The accession IDs of publicly available bacterial and archaeal reference genomes from NCBI genome database are provided in Supplementary Data 2. The referenced representative genomes used in this study, including 113,104 from GTDB release R220, 22,732 from GEM47, 24,195 from GOMC19, and 957 from Tara Ocean46, are available at https://gtdb.ecogenomic.org/, https://portal.nersc.gov/GEM/genomes/, https://db.cngb.org/maya/datasets/MDB0000002, and https://merenlab.org/data/tara-oceans-mags/, respectively. The 4472 representative genomes from UHGG v2.0108 used in this study are available at https://www.ebi.ac.uk/metagenomics/genome-catalogues/human-gut-v2-0-2. All additional data supporting the findings of this study are provided within the main text, Supplementary Information files, or via the provided repositories. Source data are provided with this paper.

Code availability

The trained model weights and corresponding datasets are now publicly available at Zenodo (https://zenodo.org/records/17613552). The inference scripts enabling reproduction of the model results are publicly available on our GitHub repository (https://github.com/BGI-METAI/Metagenome-AI), together with Python Jupyter notebooks for creating the figures and tables with model results from this manuscript.

References

  1. Shu, W.-S. & Huang, L.-N. Microbial diversity in extreme environments. Nat. Rev. Microbiol. 20, 219–235 (2022).

    Google Scholar 

  2. Ando, N. et al. The molecular basis for life in extreme environments. Annu. Rev. biophysics 50, 343–372 (2021).

    Google Scholar 

  3. Hemmerling, F. & Piel, J. Strategies to access biosynthetic novelty in bacterial genomes for drug discovery. Nat. Rev. Drug Discov. 21, 359–378 (2022).

    Google Scholar 

  4. Sayed, A. M. et al. Extreme environments: microbiology leading to specialized metabolites. J. Appl. Microbiol. 128, 630–657 (2020).

    Google Scholar 

  5. Dick, G. J. The microbiomes of deep-sea hydrothermal vents: distributed globally, shaped locally. Nat. Rev. Microbiol. 17, 271–283 (2019).

    Google Scholar 

  6. Yang, Z.-W. et al. Cultivation strategies for prokaryotes from extreme environments. Imeta 2, e123 (2023).

    Google Scholar 

  7. Rüttimann, C., Cotoras, M., ZALDÍÅ, J. & Vicuna, R. Dna polymerases from the extremely thermophilic bacterium thermus thermophilus hb-8. Eur. J. Biochem. 149, 41–46 (1985).

    Google Scholar 

  8. Williams, G. B., Ma, H., Khusnutdinova, A. N., Yakunin, A. F. & Golyshin, P. N. Harnessing extremophilic carboxylesterases for applications in polyester depolymerisation and plastic waste recycling. Essays Biochem. 67, 715–729 (2023).

    Google Scholar 

  9. Rateb, M. E. et al. Chaxamycins a–d, bioactive ansamycins from a hyper-arid desert streptomyces sp. J. Nat. products 74, 1491–1499 (2011).

    Google Scholar 

  10. Jensen, P. R., Moore, B. S. & Fenical, W. The marine actinomycete genus salinispora: a model organism for secondary metabolite discovery. Nat. Prod. Rep. 32, 738–751 (2015).

    Google Scholar 

  11. Quinn, G. A. & Dyson, P. J. Going to extremes: progress in exploring new environments for novel antibiotics. npj Antimicrobials Resistance 2, 8 (2024).

    Google Scholar 

  12. Hugenholtz, P., Pitulle, C., Hershberger, K. L. & Pace, N. R. Novel division level bacterial diversity in a yellowstone hot spring. J. Bacteriol. 180, 366–376 (1998).

    Google Scholar 

  13. Eder, W., Ludwig, W. & Huber, R. Novel 16s rrna gene sequences retrieved from highly saline brine sediments of kebrit deep, red sea. Arch. Microbiol. 172, 213–218 (1999).

    Google Scholar 

  14. Medema, M. H., de Rond, T. & Moore, B. S. Mining genomes to illuminate the specialized chemistry of life. Nat. Rev. Genet. 22, 553–571 (2021).

    Google Scholar 

  15. Paoli, L. et al. Biosynthetic potential of the global ocean microbiome. Nature 607, 111–118 (2022).

    Google Scholar 

  16. Cheng, M. et al. A genome and gene catalog of the aquatic microbiomes of the tibetan plateau. Nat. Commun. 15, 1438 (2024).

    Google Scholar 

  17. Qian, L. et al. Vertically stratified methane, nitrogen and sulphur cycling and coupling mechanisms in mangrove sediment microbiomes. Microbiome 11, 71 (2023).

    Google Scholar 

  18. Liu, Y. et al. A genome and gene catalog of glacier microbiomes. Nat. Biotechnol. 40, 1341–1348 (2022).

    Google Scholar 

  19. Chen, J. et al. Global marine microbial diversity and its potential in bioprospecting. Nature 633, 371–379 (2024).

    Google Scholar 

  20. Wang, L. et al. Mining of novel secondary metabolite biosynthetic gene clusters from acid mine drainage. Sci. Data 9, 760 (2022).

    Google Scholar 

  21. Naghavi, M. et al. Global burden of bacterial antimicrobial resistance 1990–2021: a systematic analysis with forecasts to 2050. Lancet 404, 1199–1226 (2024).

    Google Scholar 

  22. Plackett, B. Why big pharma has abandoned antibiotics. Nature 586, S50–S50 (2020).

    Google Scholar 

  23. Walesch, S. et al. Fighting antibiotic resistance–strategies and (pre) clinical developments to find new antibacterials. EMBO Rep. 24, e56033 (2023).

    Google Scholar 

  24. Genilloud, O. Natural products discovery and potential for new antibiotics. Curr. Opin. Microbiol. 51, 81–87 (2019).

    Google Scholar 

  25. Lewis, K. et al. Sophisticated natural products as antibiotics. Nature 632, 39–49 (2024).

    Google Scholar 

  26. Newman, D. J. & Cragg, G. M. Natural products as sources of new drugs over the nearly four decades from 01/1981 to 09/2019. J. Nat. products 83, 770–803 (2020).

    Google Scholar 

  27. Montalbán-López, M. et al. New developments in ripp discovery, enzymology and engineering. Nat. Prod. Rep. 38, 130–239 (2021).

    Google Scholar 

  28. Li, Y. & Rebuffat, S. The manifold roles of microbial ribosomal peptide–based natural products in physiology and ecology. J. Biol. Chem. 295, 34–54 (2020).

    Google Scholar 

  29. Merwin, N. J. et al. Deepripp integrates multiomics data to automate discovery of novel ribosomally synthesized natural products. Proc. Natl. Acad. Sci. 117, 371–380 (2020).

    Google Scholar 

  30. Ma, Y. et al. Identification of antimicrobial peptides from the human gut microbiome using deep learning. Nat. Biotechnol. 40, 921–931 (2022).

    Google Scholar 

  31. Wei, B. et al. Global analysis of the biosynthetic chemical space of marine prokaryotes. Microbiome 11, 144 (2023).

    Google Scholar 

  32. Veltri, D., Kamath, U. & Shehu, A. Deep learning improves antimicrobial peptide recognition. Bioinformatics 34, 2740–2747 (2018).

    Google Scholar 

  33. Meher, P. K., Sahu, T. K., Saini, V. & Rao, A. R. Predicting antimicrobial peptides with improved accuracy by incorporating the compositional, physico-chemical and structural features into chou’s general pseaac. Sci. Rep. 7, 42362 (2017).

    Google Scholar 

  34. Bhadra, P., Yan, J., Li, J., Fong, S. & Siu, S. W. Ampep: Sequence-based prediction of antimicrobial peptides using distribution patterns of amino acid properties and random forest. Sci. Rep. 8, 1697 (2018).

    Google Scholar 

  35. Li, T. et al. A foundation model identifies broad-spectrum antimicrobial peptides against drug-resistant bacterial infection. Nat. Commun. 15, 7538 (2024).

    Google Scholar 

  36. Szymczak, P. et al. Discovering highly potent antimicrobial peptides with deep generative model hydramp. Nat. Commun. 14, 1453 (2023).

    Google Scholar 

  37. Santos-Junior, C. D., Pan, S., Zhao, X.-M. & Coelho, L. P. Macrel: antimicrobial peptide screening in genomes and metagenomes. PeerJ 8, e10555 (2020).

    Google Scholar 

  38. Wan, F., Torres, M. D., Peng, J. & de la Fuente-Nunez, C. Deep-learning-enabled antibiotic discovery through molecular de-extinction. Nat. Biomed. Eng. 8, 854–871 (2024).

    Google Scholar 

  39. Torres, M. D., Wan, F. & de la Fuente-Nunez, C. Deep learning reveals antibiotics in the archaeal proteome. Nature Microbiology 1–15 (2025).

  40. Lei, J. et al. The antimicrobial peptides and their potential clinical applications. Am. J. Transl. Res. 11, 3919 (2019).

    Google Scholar 

  41. Elnaggar, A. et al. Prottrans: Toward understanding the language of life through self-supervised learning. IEEE Trans. pattern Anal. Mach. Intell. 44, 7112–7127 (2021).

    Google Scholar 

  42. Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023).

    Google Scholar 

  43. Hayes, T. et al. Simulating 500 million years of evolution with a language model. Science 387, 850–858 (2025).

    Google Scholar 

  44. O’Leary, N. A. et al. Reference sequence (refseq) database at ncbi: current status, taxonomic expansion, and functional annotation. Nucleic acids Res. 44, D733–D745 (2016).

    Google Scholar 

  45. Bowers, R. M. et al. Minimum information about a single amplified genome (misag) and a metagenome-assembled genome (mimag) of bacteria and archaea. Nat. Biotechnol. 35, 725–731 (2017).

    Google Scholar 

  46. Delmont, T. O. et al. Nitrogen-fixing populations of planctomycetes and proteobacteria are abundant in surface ocean metagenomes. Nat. Microbiol. 3, 804–813 (2018).

    Google Scholar 

  47. Nayfach, S. et al. A genomic catalog of earth’s microbiomes. Nat. Biotechnol. 39, 499–509 (2021).

    Google Scholar 

  48. Suzek, B. E. et al. Uniref clusters: a comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics 31, 926–932 (2015).

    Google Scholar 

  49. Boeckmann, B. et al. The swiss-prot protein knowledgebase and its supplement trembl in 2003. Nucleic acids Res. 31, 365–370 (2003).

    Google Scholar 

  50. Tatusov, R. L. et al. The cog database: an updated version includes eukaryotes. BMC Bioinforma. 4, 41 (2003).

    Google Scholar 

  51. Kanehisa, M., Furumichi, M., Tanabe, M., Sato, Y. & Morishima, K. Kegg: new perspectives on genomes, pathways, diseases and drugs. Nucleic acids Res. 45, D353–D361 (2017).

    Google Scholar 

  52. Levasseur, A., Drula, E., Lombard, V., Coutinho, P. M. & Henrissat, B. Expansion of the enzymatic repertoire of the cazy database to integrate auxiliary redox enzymes. Biotechnol. biofuels 6, 41 (2013).

    Google Scholar 

  53. Consortium, G. O. The gene ontology resource: 20 years and still going strong. Nucleic acids Res. 47, D330–D338 (2019).

    Google Scholar 

  54. Jia, B. et al. Card 2017: expansion and model-centric curation of the comprehensive antibiotic resistance database. Nucleic acids research gkw1004 (2016).

  55. Liu, B., Zheng, D., Zhou, S., Chen, L. & Yang, J. Vfdb 2022: a general classification scheme for bacterial virulence factors. Nucleic acids Res. 50, D912–D917 (2022).

    Google Scholar 

  56. Ghoul, M. & Mitri, S. The ecology and evolution of microbial competition. Trends Microbiol. 24, 833–845 (2016).

    Google Scholar 

  57. Gavriilidou, A. et al. Compendium of specialized metabolite biosynthetic diversity encoded in bacterial genomes. Nat. Microbiol. 7, 726–735 (2022).

    Google Scholar 

  58. Du, R., Xiong, W., Xu, L., Xu, Y. & Wu, Q. Metagenomics reveals the habitat specificity of biosynthetic potential of secondary metabolites in global food fermentations. Microbiome 11, 115 (2023).

    Google Scholar 

  59. Zhang, J. et al. Large-scale biosynthetic analysis of human microbiomes reveals diverse protective ribosomal peptides. Nat. Commun. 16, 3054 (2025).

    Google Scholar 

  60. Yu, Y., Mai, Y., Zheng, Y. & Shi, L. Assessing and mitigating batch effects in large-scale omics studies. Genome Biol. 25, 254 (2024).

    Google Scholar 

  61. Torres, M. D. et al. Mining human microbiomes reveals an untapped source of peptide antibiotics. Cell 187, 5453–5467 (2024).

    Google Scholar 

  62. Santos-Júnior, C. D. et al. Discovery of antimicrobial peptides in the global microbiome with machine learning. Cell 187, 3761–3778 (2024).

    Google Scholar 

  63. Rathore, A. S., Choudhury, S., Arora, A., Tijare, P. & Raghava, G. P. Toxinpred 3.0: An improved method for predicting the toxicity of peptides. Computers Biol. Med. 179, 108926 (2024).

    Google Scholar 

  64. Agrawal, P., Amir, S., Barua, D., Mohanty, D. et al. Rippminer-genome: a web resource for automated prediction of crosslinked chemical structures of ripps by genome mining. J. Mol. Biol. 433, 166887 (2021).

    Google Scholar 

  65. Wang, X.-F. et al. Prot-diff: A modularized and efficient strategy for de novo generation of antimicrobial peptide sequences by integrating protein language and diffusion models. Adv. Sci. 11, 2406305 (2024).

    Google Scholar 

  66. Abramson, J. et al. Accurate structure prediction of biomolecular interactions with alphafold 3. Nature 630, 493–500 (2024).

    Google Scholar 

  67. Kabsch, W. & Sander, C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolym.: Original Res. Biomolecules 22, 2577–2637 (1983).

    Google Scholar 

  68. Atanasov, A. G., Zotchev, S. B., Dirsch, V. M. & Supuran, C. T. Natural products in drug discovery: advances and opportunities. Nat. Rev. Drug Discov. 20, 200–216 (2021).

    Google Scholar 

  69. Zheng, M. et al. Sequencing-guided re-estimation and promotion of cultivability for environmental bacteria. Nat. Commun. 15, 9051 (2024).

    Google Scholar 

  70. Van Goethem, M. W. et al. Long-read metagenomics of soil communities reveals phylum-specific secondary metabolite dynamics. Commun. Biol. 4, 1302 (2021).

    Google Scholar 

  71. Huang, R. et al. Long-read metagenomics of marine microbes reveals diversely expressed secondary metabolites. Microbiol. Spectr. 11, e01501–23 (2023).

    Google Scholar 

  72. Blin, K. et al. antismash 7.0: new and improved predictions for detection, regulation, chemical structures and visualisation. Nucleic acids Res. 51, W46–W50 (2023).

    Google Scholar 

  73. Hannigan, G. D. et al. A deep learning genome-mining strategy for biosynthetic gene cluster prediction. Nucleic acids Res. 47, e110–e110 (2019).

    Google Scholar 

  74. Mesbah, N. M. Industrial biotechnology based on enzymes from extreme environments. Front. Bioeng. Biotechnol. 10, 870083 (2022).

    Google Scholar 

  75. Kroll, A., Ranjan, S., Engqvist, M. K. & Lercher, M. J. A general model to predict small molecule substrates of enzymes based on machine and deep learning. Nat. Commun. 14, 2787 (2023).

    Google Scholar 

  76. Wang, Z. et al. Robust enzyme discovery and engineering with deep learning using catapro. Nat. Commun. 16, 2736 (2025).

    Google Scholar 

  77. Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one fastq preprocessor. Bioinformatics 34, i884–i890 (2018).

    Google Scholar 

  78. Li, D., Liu, C.-M., Luo, R., Sadakane, K. & Lam, T.-W. Megahit: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de bruijn graph. Bioinformatics 31, 1674–1676 (2015).

    Google Scholar 

  79. Uritskiy, G. V., DiRuggiero, J. & Taylor, J. Metawrap–a flexible pipeline for genome-resolved metagenomic data analysis. Microbiome 6, 158 (2018).

    Google Scholar 

  80. Kang, D. D. et al. Metabat 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. PeerJ 7, e7359 (2019).

    Google Scholar 

  81. Wu, Y.-W., Simmons, B. A. & Singer, S. W. Maxbin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics 32, 605–607 (2016).

    Google Scholar 

  82. Alneberg, J. et al. Binning metagenomic contigs by coverage and composition. Nat. Methods 11, 1144–1146 (2014).

    Google Scholar 

  83. Nayfach, S., Shi, Z. J., Seshadri, R., Pollard, K. S. & Kyrpides, N. C. New insights from uncultivated genomes of the global human gut microbiome. Nature 568, 505–510 (2019).

    Google Scholar 

  84. Wick, R. R., Judd, L. M., Gorrie, C. L. & Holt, K. E. Unicycler: resolving bacterial genome assemblies from short and long sequencing reads. PLoS computational Biol. 13, e1005595 (2017).

    Google Scholar 

  85. Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. Checkm: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 25, 1043–1055 (2015).

    Google Scholar 

  86. Nawrocki, E. P. & Eddy, S. R. Infernal 1.1: 100-fold faster rna homology searches. Bioinformatics 29, 2933–2935 (2013).

    Google Scholar 

  87. Kalvari, I. et al. Rfam 14: expanded coverage of metagenomic, viral and microrna families. Nucleic acids Res. 49, D192–D200 (2021).

    Google Scholar 

  88. Olm, M. R., Brown, C. T., Brooks, B. & Banfield, J. F. drep: a tool for fast and accurate genomic comparisons that enables improved genome recovery from metagenomes through de-replication. ISME J. 11, 2864–2868 (2017).

    Google Scholar 

  89. Parks, D. H. et al. Gtdb: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy. Nucleic acids Res. 50, D785–D794 (2022).

    Google Scholar 

  90. Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinforma. 11, 119 (2010).

    Google Scholar 

  91. Steinegger, M. & Söding, J. Mmseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat. Biotechnol. 35, 1026–1028 (2017).

    Google Scholar 

  92. Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using diamond. Nat. Methods 12, 59–60 (2015).

    Google Scholar 

  93. Huerta-Cepas, J. et al. eggnog 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic acids Res. 47, D309–D314 (2019).

    Google Scholar 

  94. Huerta-Cepas, J. et al. Fast genome-wide functional annotation through orthology assignment by eggnog-mapper. Mol. Biol. evolution 34, 2115–2122 (2017).

    Google Scholar 

  95. Chu, Y. et al. Viruses in human-impacted estuarine ecotones: Distribution, metabolic potential, and environmental risks. Water Res. 282, 123750 (2025).

    Google Scholar 

  96. Navarro-Muñoz, J. C. et al. A computational framework to explore large-scale biosynthetic diversity. Nat. Chem. Biol. 16, 60–68 (2020).

    Google Scholar 

  97. Kautsar, S. A., van der Hooft, J. J., de Ridder, D. & Medema, M. H. Big-slice: A highly scalable tool maps the diversity of 1.2 million biosynthetic gene clusters. Gigascience 10, giaa154 (2021).

    Google Scholar 

  98. Wang, G., Li, X. & Wang, Z. Apd3: the antimicrobial peptide database as a tool for research and education. Nucleic acids Res. 44, D1087–D1093 (2016).

    Google Scholar 

  99. Gawde, U. et al. Campr4: a database of natural and synthetic antimicrobial peptides. Nucleic acids Res. 51, D377–D383 (2023).

    Google Scholar 

  100. Pirtskhalava, M. et al. Dbaasp v3: database of antimicrobial/cytotoxic activity and structure of peptides as a resource for development of new therapeutics. Nucleic acids Res. 49, D288–D297 (2021).

    Google Scholar 

  101. Ma, T. et al. Dramp 4.0: an open-access data repository dedicated to the clinical translation of antimicrobial peptides. Nucleic Acids Res. 53, D403–D410 (2025).

    Google Scholar 

  102. Jhong, J.-H. et al. dbamp: an integrated resource for exploring antimicrobial peptides with functional activities and physicochemical properties on transcriptome and proteome data. Nucleic acids Res. 47, D285–D297 (2019).

    Google Scholar 

  103. Ye, G. et al. Lamp2: a major update of the database linking antimicrobial peptides. Database 2020, baaa061 (2020).

    Google Scholar 

  104. Di Luca, M., Maccari, G., Maisetta, G. & Batoni, G. Baamps: the database of biofilm-active antimicrobial peptides. Biofouling 31, 193–199 (2015).

    Google Scholar 

  105. Gautam, A. et al. Hemolytik: a database of experimentally determined hemolytic and non-hemolytic peptides. Nucleic acids Res. 42, D444–D449 (2014).

    Google Scholar 

  106. Dalla-Torre, H. et al. Nucleotide transformer: building and evaluating robust foundation models for human genomics. Nat. Methods 22, 287–297 (2025).

    Google Scholar 

  107. Wiegand, I., Hilpert, K. & Hancock, R. E. Agar and broth dilution methods to determine the minimal inhibitory concentration (mic) of antimicrobial substances. Nat. Protoc. 3, 163–175 (2008).

    Google Scholar 

  108. Almeida, A. et al. A unified catalog of 204,938 reference genomes from the human gut microbiome. Nat. Biotechnol. 39, 105–114 (2021).

    Google Scholar 

Download references

Acknowledgements

This study was supported by the Key Research and Development Program of Hainan Province (grant No. ZDYF2024SHFZ046 to Haixin Chen), the Project of Sanya Yazhou Bay Science and Technology City (grant No. SKJC-2024-01-002 to X.F., SKJC-2024-01-001 to Haixin Chen, and SCKJ-JYRC-2023-41 to P.J.), Hainan Provincial Natural Science Foundation of China (grant No. 826QN0909 to P.J.), and Hainan Yazhou Bay Seed Lab (grant No. JBGS B23YQ2003 to Z.Y.). Computations in this study were supported by the High-performance Computing Platform of YaZhou Bay Science and Technology City Advanced Computing Center. The authors thank Kui Zhu (China Agricultural University) for kindly donating the strains S. aureus ATCC 29213, E. faecalis ATCC 29212, E. faecalis VRE10, and E. coli ATCC 25922 for AMP antibacterial tests; Cong Shen (Guangdong Provincial Hospital of Chinese Medicine) for providing bacterial strains E. faecium BM4105, P. aeruginosa ATCC 27853, and P. aeruginosa PAO1 for AMP antibacterial tests; and our colleague Jun Wang for his help with the bioinformatic analysis. The authors thank the China National GeneBank, BGI Research, Shenzhen 518120, China, for their support. The authors thank ChatGPT-4o mini (OpenAI) for assistance with language editing and phrasing. Supplementary Fig. 1 was created with Biorender.com.

Author information

Author notes
  1. These authors contributed equally: Puzi Jiang, Zhengjiao Liang, Vladimir Kovacevic, Jingya Shi.

Authors and Affiliations

  1. BGI Research, Sanya, China

    Puzi Jiang  (江浦滋), Zhengjiao Liang  (梁正佼), Jingya Shi  (石静雅), Feng Wang  (王峰), Yue Liu  (刘悦), Yunjiang Jiang  (蒋云江), Mo Han  (韩默), Xiaonan Lin  (林晓楠), Rouxi Chen  (陈柔汐), Xiaodong Fang  (方晓东), Zhen Yue  (岳震) & Haixin Chen  (陈海新)

  2. Hainan Technology Innovation Center for Marine Biological Resources Utilization (Preparatory Period), BGI Research, Sanya, China

    Puzi Jiang  (江浦滋), Zhengjiao Liang  (梁正佼), Jingya Shi  (石静雅), Yue Liu  (刘悦), Yunjiang Jiang  (蒋云江), Mo Han  (韩默), Rouxi Chen  (陈柔汐), Xiaodong Fang  (方晓东), Zhen Yue  (岳震) & Haixin Chen  (陈海新)

  3. BGI Research, Belgrade, Serbia

    Vladimir Kovacevic, Nikola Milicevic, Časlav Petronić, Nikola Stanojevic & Junhua Li  (李俊桦)

  4. BGI Research, Wuhan, China

    Lin Liu  (刘林), Yong Zhang  (张勇), Yuxiang Li  (黎宇翔) & Peng Yin  (殷鹏)

  5. MOE Key Laboratory of Bio-Intelligent Manufacturing, State Key Laboratory of Fine Chemicals, Frontiers Science Centre for Smart Materials Oriented Chemical Engineering, School of Bioengineering, Dalian University of Technology, Dalian, China

    Lingqin Wang  (王凌琴), Suwan Wang  (王苏婉), Haixian Cheng  (程海鲜), Jiani Li  (李佳妮) & Chuang Xue  (薛闯)

  6. BGI Research, Shenzhen, China

    Xiaodong Fang  (方晓东)

  7. Ningbo Institute of Dalian University of Technology, Ningbo, China

    Chuang Xue  (薛闯)

Authors
  1. Puzi Jiang  (江浦滋)
    View author publications

    Search author on:PubMed Google Scholar

  2. Zhengjiao Liang  (梁正佼)
    View author publications

    Search author on:PubMed Google Scholar

  3. Vladimir Kovacevic
    View author publications

    Search author on:PubMed Google Scholar

  4. Jingya Shi  (石静雅)
    View author publications

    Search author on:PubMed Google Scholar

  5. Nikola Milicevic
    View author publications

    Search author on:PubMed Google Scholar

  6. Feng Wang  (王峰)
    View author publications

    Search author on:PubMed Google Scholar

  7. Lin Liu  (刘林)
    View author publications

    Search author on:PubMed Google Scholar

  8. Yue Liu  (刘悦)
    View author publications

    Search author on:PubMed Google Scholar

  9. Yunjiang Jiang  (蒋云江)
    View author publications

    Search author on:PubMed Google Scholar

  10. Mo Han  (韩默)
    View author publications

    Search author on:PubMed Google Scholar

  11. Xiaonan Lin  (林晓楠)
    View author publications

    Search author on:PubMed Google Scholar

  12. Časlav Petronić
    View author publications

    Search author on:PubMed Google Scholar

  13. Nikola Stanojevic
    View author publications

    Search author on:PubMed Google Scholar

  14. Lingqin Wang  (王凌琴)
    View author publications

    Search author on:PubMed Google Scholar

  15. Suwan Wang  (王苏婉)
    View author publications

    Search author on:PubMed Google Scholar

  16. Haixian Cheng  (程海鲜)
    View author publications

    Search author on:PubMed Google Scholar

  17. Jiani Li  (李佳妮)
    View author publications

    Search author on:PubMed Google Scholar

  18. Rouxi Chen  (陈柔汐)
    View author publications

    Search author on:PubMed Google Scholar

  19. Yong Zhang  (张勇)
    View author publications

    Search author on:PubMed Google Scholar

  20. Yuxiang Li  (黎宇翔)
    View author publications

    Search author on:PubMed Google Scholar

  21. Junhua Li  (李俊桦)
    View author publications

    Search author on:PubMed Google Scholar

  22. Xiaodong Fang  (方晓东)
    View author publications

    Search author on:PubMed Google Scholar

  23. Zhen Yue  (岳震)
    View author publications

    Search author on:PubMed Google Scholar

  24. Chuang Xue  (薛闯)
    View author publications

    Search author on:PubMed Google Scholar

  25. Peng Yin  (殷鹏)
    View author publications

    Search author on:PubMed Google Scholar

  26. Haixin Chen  (陈海新)
    View author publications

    Search author on:PubMed Google Scholar

Contributions

H.C. (Haixin Chen), P.Y., C.X., Z.Y., and P.J. conceived and supervised the study. P.J., Z.L., F.W., and L.L. collected the data and contributed to formal analyses. P.J. and Z.L. conducted bioinformatic analyses and data visualization. P.Y. and V.K. conceived the development of the MAI framework. V.K. and N.M. developed the MAI algorithm. C.P. and N.S. evaluated the MAI framework and published AMP models. M.H., Y.Z., Y.X.L., and J.H.L. provided computational resources and bioinformatic analysis support. C.X., J.S., and P.J. designed the experimental validation. C.X., X.F., and R.C. provided experimental platforms and resources. Y.L. isolated and sequenced microbial strains from cold seeps. J.S., X.L., L.W., S.W., H.C. (Haixian Cheng), J.N.L., and Y.J. performed all in vitro experiments. J.S. and P.J. analyzed and interpreted the experimental results. P.J., V.K., and Z.L. drafted the manuscript. All authors reviewed and approved the final version of the manuscript.

Corresponding authors

Correspondence to Zhen Yue  (岳震), Chuang Xue  (薛闯), Peng Yin  (殷鹏) or Haixin Chen  (陈海新).

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks Marla Trindade, Leonardo van Zyl, and the other, anonymous, reviewer for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information (download PDF )

Description of Additional Supplementary Files (download PDF )

Dataset 1-5 (download XLSX )

Dataset 6 (download XLSX )

Dataset 7-22 (download XLSX )

Reporting Summary (download PDF )

Transparent Peer Review file (download PDF )

Source data

Source data (download XLSX )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jiang, P., Liang, Z., Kovacevic, V. et al. The Extreme Environment Microbiome Catalog (EEMC): a global resource for microbial diversity and antimicrobial discovery. Nat Commun (2026). https://doi.org/10.1038/s41467-026-71145-0

Download citation

  • Received: 23 June 2025

  • Accepted: 13 March 2026

  • Published: 02 April 2026

  • DOI: https://doi.org/10.1038/s41467-026-71145-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Download PDF

Advertisement

Explore content

  • Research articles
  • Reviews & Analysis
  • News & Comment
  • Videos
  • Collections
  • Subjects
  • Follow us on Facebook
  • Follow us on X
  • Sign up for alerts
  • RSS feed

About the journal

  • Aims & Scope
  • Editors
  • Journal Information
  • Open Access Fees and Funding
  • Calls for Papers
  • Editorial Values Statement
  • Journal Metrics
  • Editors' Highlights
  • Contact
  • Editorial policies
  • Top Articles

Publish with us

  • For authors
  • For Reviewers
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Nature Communications (Nat Commun)

ISSN 2041-1723 (online)

nature.com footer links

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing Microbiology

Sign up for the Nature Briefing: Microbiology newsletter — what matters in microbiology research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: Microbiology