Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Uncovering evolutionarily remote and highly potent antimicrobial peptides with protein language models

Abstract

Identifying evolutionarily remote antimicrobial peptides (AMPs) is crucial for discovering underexplored clinical candidates to combat antibiotic resistance. Existing experimental and computational methods are limited by their reliance on sequence identity to known AMPs, missing distant homologues. Here we introduce HMD-AMP, a protein language model-based approach for AMP discovery. HMD-AMP outperforms previous methods in identifying evolutionarily distant AMPs and enables the discovery of unknown and highly potent AMPs from metagenomic data. Applied to host and gut microorganism genomes of nine mammals, HMD-AMP revealed over 37 million predicted AMPs. Of 91 high-confidence sequences experimentally validated, 74 showed strong antibacterial activity and 48 were evolutionarily remote from known AMPs. Four of these AMPs exhibited broad-spectrum antibacterial activity at low effective concentrations and showed low toxicity, with the most potent peptide demonstrating therapeutic efficacy in a mouse model of peritoneal Escherichia coli infection. This study introduces an effective strategy to uncover AMPs.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: The workflow for discovering evolutionarily remote and highly potent AMPs.
Fig. 2: HMD-AMP outperforms other models on cross-validation and cross-kingdom test.
Fig. 3: HMD-AMP shows robust performance on homology-partitioned independent test and AMP target annotation tasks.
Fig. 4: The discovery of highly diverse and potent AMPs from swine metagenomic data by HMD-AMP.
Fig. 5: Structure and sequence features of evolutionarily remote AMPs associated with antimicrobial function.
Fig. 6: Characterization of the most potent AMPs discovered from the swine metagenomes.

Similar content being viewed by others

Data availability

All the datasets we used are provided in the Methods and are publicly available. The training dataset for HMD-AMP and predicted sORFs of all nine mammals is available via Zenodo at https://doi.org/10.5281/zenodo.15622525 (ref. 69). All other relevant data supporting the key findings of this study, such as the results in the AMP discovery, are available within this Article and its Supplementary Information. Source data are provided with this paper.

Code availability

The open source codes of HMD-AMP, including trained weights, inference scripts, and training scripts, are available via GitHub at https://github.com/ml4bio/HMD-AMP. Logo plots were created in WebLogo (https://weblogo.threeplusone.com/). Visualization of multiple sequence alignment of peptides used MEGA11. Structure visualizations used PyMOL (https://pymol.org/). Helical wheel plots used HeliQuest (https://heliquest.ipmc.cnrs.fr).

References

  1. Holmes, A. H. et al. Understanding the mechanisms and drivers of antimicrobial resistance. Lancet 387, 176–187 (2016).

    Article  CAS  PubMed  Google Scholar 

  2. De Breij, A. et al. The antimicrobial peptide saap-148 combats drug-resistant bacteria and biofilms. Sci. Transl. Med. 10, eaan4044 (2018).

    Article  PubMed  Google Scholar 

  3. Mwangi, J. et al. The antimicrobial peptide zy4 combats multidrug-resistant Pseudomonas aeruginosa and Acinetobacter baumannii infection. Proc. Natl Acad. Sci. USA 116, 26516–26522 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Thapa, R. K., Diep, D. B. & Tønnesen, H. H. Topical antimicrobial peptide formulations for wound healing: current developments and future prospects. Acta Biomater. 103, 52–67 (2020).

    Article  CAS  PubMed  Google Scholar 

  5. Zhang, R. et al. Antimicrobial peptide dp7 with potential activity against sars coronavirus infections. Signal Transduct. Target. Ther. 6, 140 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Pierre, J. F. et al. Peptide yy: a paneth cell antimicrobial peptide that maintains candida gut commensalism. Science 381, 502–508 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Fjell, C. D., Hiss, J. A., Hancock, R. E. & Schneider, G. Designing antimicrobial peptides: form follows function. Nat. Rev. Drug Discov. 11, 37–51 (2012).

    Article  CAS  Google Scholar 

  8. Porto, W. F. et al. In silico optimization of a guava antimicrobial peptide enables combinatorial exploration for peptide design. Nat. Commun. 9, 1490 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  9. Ma, Y. et al. Identification of antimicrobial peptides from the human gut microbiome using deep learning. Nat. Biotechnol. 40, 921–931 (2022).

    Article  CAS  PubMed  Google Scholar 

  10. Huang, J. et al. Identification of potent antimicrobial peptides via a machine-learning pipeline that mines the entire space of peptide sequences. Nat. Biomed. Eng. 7, 797–810 (2023).

    Article  CAS  PubMed  Google Scholar 

  11. Maasch, J. R., Torres, M. D., Melo, M. C. & de la Fuente-Nunez, C. Molecular de-extinction of ancient antimicrobial peptides enabled by machine learning. Cell Host Microbe 31, 1260–1274 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Santos-Júnior, C. D. et al. Discovery of antimicrobial peptides in the global microbiome with machine learning. Cell 187, 3761–3778 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  13. Torres, M. D. et al. Mining human microbiomes reveals an untapped source of peptide antibiotics. Cell 187, 5453–5467 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Wan, F., Torres, M. D., Peng, J. & de la Fuente-Nunez, C. Deep-learning-enabled antibiotic discovery through molecular de-extinction. Nat. Biomed. Eng. 8, 854–871 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Das, P. et al. Accelerated antimicrobial discovery via deep generative models and molecular dynamics simulations. Nat. Biomed. Eng 5, 613–623 (2021).

    Article  CAS  PubMed  Google Scholar 

  16. Szymczak, P. et al. Discovering highly potent antimicrobial peptides with deep generative model hydramp. Nat. Commun. 14, 1453 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Li, T. et al. A foundation model identifies broad-spectrum antimicrobial peptides against drug-resistant bacterial infection. Nat. Commun. 15, 7538 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Lazzaro, B. P., Zasloff, M. & Rolff, J. Antimicrobial peptides: application informed by evolution. Science 368, eaau5480 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Loewenstein, Y. et al. Protein function annotation by homology-based inference. Genome Biol. 10, 1–8 (2009).

    Article  Google Scholar 

  20. Pirtskhalava, M. et al. Dbaasp v3: database of antimicrobial/cytotoxic activity and structure of peptides as a resource for development of new therapeutics. Nucleic Acids Res. 49, D288–D297 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Wang, G., Li, X. & Wang, Z. Apd3: the antimicrobial peptide database as a tool for research and education. Nucleic Acids Res. 44, D1087–D1093 (2016).

    Article  CAS  PubMed  Google Scholar 

  22. Hancock, R. E. Cationic peptides: effectors in innate immunity and novel antimicrobials. Lancet Infect. Dis. 1, 156–164 (2001).

    Article  CAS  PubMed  Google Scholar 

  23. Brogden, K. A., Ackermann, M., McCray Jr, P. B. & Tack, B. F. Antimicrobial peptides in animals and their role in host defences. Int. J. Antimicrob. Agents 22, 465–478 (2003).

    Article  CAS  PubMed  Google Scholar 

  24. Rives, A. et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc. Natl Acad. Sci. USA 118, e2016239118 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Hong, L. et al. Fast, sensitive detection of protein homologs using deep dense retrieval. Nat. Biotechnol. 43, 983–995 (2025).

    Article  CAS  PubMed  Google Scholar 

  26. Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023).

    Article  CAS  PubMed  Google Scholar 

  27. Zhou, Z.-H. & Feng, J. Deep forest. Natl Sci. Rev. 6, 74–86 (2019).

    Article  PubMed  Google Scholar 

  28. Gull, S., Shamim, N. & Minhas, F. Amap: hierarchical multi-label prediction of biologically active and antimicrobial peptides. Comput. Biol. Med. 107, 172–181 (2019).

    Article  CAS  PubMed  Google Scholar 

  29. Bhadra, P., Yan, J., Li, J., Fong, S. & Siu, S. W. Ampep: sequence-based prediction of antimicrobial peptides using distribution patterns of amino acid properties and random forest. Sci. Rep. 8, 1–10 (2018).

    Article  CAS  Google Scholar 

  30. Xiao, X., Wang, P., Lin, W.-Z., Jia, J.-H. & Chou, K.-C. iamp-2l: a two-level multi-label classifier for identifying antimicrobial peptides and their functional types. Anal. Biochem. 436, 168–177 (2013).

    Article  CAS  PubMed  Google Scholar 

  31. Veltri, D., Kamath, U. & Shehu, A. Deep learning improves antimicrobial peptide recognition. Bioinformatics 34, 2740–2747 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Consortium, U. UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 47, D506–D515 (2019).

    Article  Google Scholar 

  33. Zhang, Y. & Skolnick, J. TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res. 33, 2302–2309 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Monzon, V., Haft, D. H. & Bateman, A. Folding the unfoldable: using alphafold to explore spurious proteins. Bioinform. Adv. 2, vbab043 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  35. Chung, C.-R., Kuo, T.-R., Wu, L.-C., Lee, T.-Y. & Horng, J.-T. Characterization and identification of antimicrobial peptides with different functional activities. Brief. Bioinform. 21, 1098–1114 (2020).

    Article  CAS  Google Scholar 

  36. Wylensek, D. et al. A collection of bacterial isolates from the pig intestine reveals functional and taxonomic diversity. Nat. Commun. 11, 6389 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Chen, C. et al. Expanded catalog of microbial genes and metagenome-assembled genomes from the pig gut microbiome. Nat. Commun. 12, 1106 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Scicchitano, D. et al. Dispersion of antimicrobial resistant bacteria in pig farms and in the surrounding environment. Anim. Microbiome 6, 17 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Holman, D. B., Brunelle, B. W., Trachsel, J. & Allen, H. K. Meta-analysis to define a core microbiota in the swine gut. mSystems 2, 10–1128 (2017).

    Article  Google Scholar 

  40. van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).

    Google Scholar 

  41. VanderWaal, K. & Deen, J. Global trends in infectious diseases of swine. Proc. Natl Acad. Sci. USA 115, 11495–11500 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Pillai, A., Ueno, S., Zhang, H., Lee, J. M. & Kato, Y. Cecropin p1 and novel nematode cecropins: a bacteria-inducible antimicrobial peptide family in the nematode ascaris suum. Biochem. J. 390, 207–214 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Wang, J. et al. The conserved domain database in 2023. Nucleic Acids Res. 51, D384–D388 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Carter, M. M. et al. Ultra-deep sequencing of hadza hunter-gatherers recovers vanishing gut microbes. Cell 186, 3111–3124 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  45. Levin, D. et al. Diversity and functional landscapes in the microbiota of animals in the wild. Science 372, eabb5352 (2021).

    Article  CAS  PubMed  Google Scholar 

  46. Bailey, T. L. Streme: accurate and versatile sequence motif discovery. Bioinformatics 37, 2834–2840 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Mookherjee, N., Anderson, M. A., Haagsman, H. P. & Davidson, D. J. Antimicrobial host defence peptides: functions and clinical potential. Nat. Rev. Drug Discov. 19, 311–332 (2020).

    Article  CAS  PubMed  Google Scholar 

  48. Shen, J. et al. Unbiased organism-agnostic and highly sensitive signal peptide predictor with deep protein language model. Nat. Comput. Sci. 4, 29–42 (2024).

    Article  CAS  PubMed  Google Scholar 

  49. Yu, T. et al. Enzyme function prediction using contrastive learning. Science 379, 1358–1363 (2023).

    Article  CAS  PubMed  Google Scholar 

  50. Ahn, S., Kim, S., Ko, J. & Yun, S. -Y. Fine-tuning pre-trained models for robustness under noisy labels. In Proc. Thirty-Third International Joint Conference on Artificial Intelligence 3643–3651 (International Joint Conferences on Artificial Intelligence Organization, 2024).

  51. Chen, C. H. & Lu, T. K. Development and challenges of antimicrobial peptides for therapeutic applications. Antibiotics 9, 24 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Oliveira Júnior, N. G., Souza, C. M., Buccini, D. F., Cardoso, M. H. & Franco, O. L. Antimicrobial peptides: structure, functions and translational applications. Nat. Rev. Microbiol. 23, 687–700 (2025).

  53. Fu, L., Niu, B., Zhu, Z., Wu, S. & Li, W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28, 3150–3152 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Burdukiewicz, M. et al. Proteomic screening for prediction and design of antimicrobial peptides with ampgram. Int. J. Mol. Sci. 21, 4310 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Sidorczuk, K. et al. Benchmarks in antimicrobial peptide prediction are biased due to the selection of negative data. Brief. Bioinform. 23, bbac343 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  56. Su, X., Xu, J., Yin, Y., Quan, X. & Zhang, H. Antimicrobial peptide identification using multi-scale convolutional network. BMC Bioinformatics 20, 1–10 (2019).

    Article  Google Scholar 

  57. Chaumeil, P.-A., Mussig, A. J., Hugenholtz, P. & Parks, D. H. GTBD-Tk: a toolkit to classify genomes with the Genome Taxonomy Database. Bioinformatics 36, 1925–1927 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  58. Cao, K., Wei, C., Gaidon, A., Arechiga, N. & Ma, T. Learning imbalanced datasets with label-distribution-aware margin loss. In Proc. 33rd Advances in Neural Information Processing Systems 1567–1578 (Curran Associates, Inc., 2019).

  59. Chen, T. & Guestrin, C. XGBoost: a scalable tree boosting system. In Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16 785–794 (ACM, 2016).

  60. Loshchilov, I. & Hutter, F. Decoupled weight decay regularization. In Proc. International Conference on Learning Representations (2019).

  61. Rost, B. Twilight zone of protein sequence alignments. Protein Eng. 12, 85–94 (1999).

    Article  CAS  PubMed  Google Scholar 

  62. Xu, J. & Zhang, Y. How significant is a protein structure similarity with TM-score = 0.5?. Bioinformatics 26, 889–895 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Crooks, G. E., Hon, G., Chandonia, J.-M. & Brenner, S. E. WebLogo: a sequence logo generator. Genome Res. 14, 1188–1190 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Tamura, K., Stecher, G. & Kumar, S. MEGA11: molecular evolutionary genetics analysis version 11. Mol. Biol. Evol. 38, 3022–3027 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Mirdita, M. et al. ColabFold: making protein folding accessible to all. Nat. Methods 19, 679–682 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. The PyMOL Molecular Graphics System, Version 3.0 Schrödinger, LLC.

  68. Deutsch, E. W. et al. The ProteomeXchange consortium at 10 years: 2023 update. Nucleic Acids Res. 51, D1539–D1548 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  69. Yu, Q. Supporting data for: uncovering evolutionarily remote and highly potent antimicrobial peptides with protein language models. Zenodo https://doi.org/10.5281/zenodo.15622525 (2025).

Download references

Acknowledgements

This study was supported by the Shenzhen Medical Research Fund (award no. A2503002 to Y.L. and B2302036 to L.D.), the National Key R&D Program of China (grant no.2025YFA0923500 to Y.L.), the Chinese University of Hong Kong (CUHK; award nos. 4937025, 4937026, 5501517, 5501329 and SHIAE BME-p1-24 to Y.L.), the IdeaBooster Fund (award nos. IDBF23ENG05 and IDBF24ENG06 to Y.L.) and the Shenzhen Key Laboratory for the Intelligent Microbial Manufacturing of Medicines (award no. ZDSYS20210623091810032 to L.D.) and partially supported by grants from the Research Grants Council of the Hong Kong Special Administrative Region (Hong Kong SAR), China (project nos. CUHK 24204023 and 14208525 to Y.L.), and grants from the Innovation and Technology Commission of the Hong Kong SAR, China (project nos. GHP/065/21SZ, ITS/247/23FP and PRP/033/24FX to Y.L.). This research was also supported by the Research Matching Grant Scheme at CUHK (award nos. 8601603 and 8601663 to Y.L.) and the National Natural Science Foundation of China (award no. 32201313 to H.L.). We thank P. Wang and Q. Shen from the Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, China, for their help with animal experiments. We also thank S. Qiao from the China Agricultural University, China, for providing the enterotoxigenic Escherichia coli K88 and K99 strains used in this work.

Author information

Authors and Affiliations

Authors

Contributions

Q.Y., H.L., H.S., Z.D., L.D. and Y.L. conceived the research; Q.Y., H.L. and Y.A. collected the dataset; Q.Y. implemented the main algorithm; Q.Y., H.L., H.S., J.S., C.Z. and L.S. performed the experiments; Q.Y., H.L., H.S., J.S. and L.Z. conducted the analysis; Q.Y., H.L. and H.S. wrote the paper; and L.D. and Y.L. supervised the project. All authors read and approved the paper.

Corresponding authors

Correspondence to Lei Dai or Yu Li.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Biomedical Engineering thanks Martin Steinegger and Dong Xu for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Source data

Source Data Figs. 2–6. (download ZIP )

Source data for Figs. 2–6.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yu, Q., Liu, H., Shi, H. et al. Uncovering evolutionarily remote and highly potent antimicrobial peptides with protein language models. Nat. Biomed. Eng (2026). https://doi.org/10.1038/s41551-026-01630-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • DOI: https://doi.org/10.1038/s41551-026-01630-w

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing