Abstract
Lateral gene transfer (LGT), also known as horizontal gene transfer, facilitates genomic diversification in microbial populations. While previous work has surveyed LGT in human-associated microbial isolate genomes, the landscape of LGT arising in personal microbiomes is not well understood, as there are no widely adopted methods to characterize LGT from complex communities. Here we developed, benchmarked and validated a computational algorithm (WAAFLE or Workflow to Annotate Assemblies and Find LGT Events) to profile LGT from assembled metagenomes. WAAFLE prioritizes specificity while maintaining high sensitivity for intergenus LGT. Applying WAAFLE to >2,000 human metagenomes from diverse body sites, we identified >100,000 high-confidence previously uncharacterized LGT (~2 per microbial genome-equivalent). These were enriched for mobile elements, as well as restriction–modification functions associated with the destruction of foreign DNA. LGT frequency was influenced by biogeography, phylogenetic similarity of involved pairs (for example, Fusobacterium periodonticum and F. nucleatum) and donor abundance. These forces manifest as networks in which hub taxa donate unequally with phylogenetic neighbours. Our findings suggest that human microbiome LGT may be more ubiquitous than previously described.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout






Similar content being viewed by others
Data availability
HMP1-II metagenomes are available from the HMP DACC (http://hmpdacc.org) and from SRA BioProjects PRJNA48479 and PRJNA275349. Metadata and precomputed taxonomic profiles for HMP1-II samples are also available from the HMP DACC. The HMP2 IBDMDB metagenomes used in this work are available from SRA BioProject PRJNA398089. Metadata for HMP2 samples are available from the IBDMDB website (https://ibdmdb.org). WAAFLE’s databases (as used in this work), synthetic evaluation contigs, HMP1-II and HMP2 assemblies, and LGT calls are available from the WAAFLE website (https://github.com/biobakery/waafle). Source data are provided with this paper.
Code availability
WAAFLE is a free and open-source Python 3 software package available from GitHub (https://github.com/biobakery/waafle) and PyPI (https://pypi.org/project/waafle/). Installation and usage details, including links to download the databases and analysis products used in this work, are expanded on the WAAFLE website (https://github.com/biobakery/waafle). Additional Python and R code used in the statistical analyses and visualizations from this work are available from the authors upon request.
References
Arnold, B. J., Huang, I.-T. & Hanage, W. P. Horizontal gene transfer and adaptive evolution in bacteria. Nat. Rev. Microbiol. 20, 206–218 (2022).
Brito, I. L. Examining horizontal gene transfer in microbial communities. Nat. Rev. Microbiol. 19, 442–453 (2021).
Hall, R. J., Whelan, F. J., McInerney, J. O., Ou, Y. & Domingo-Sananes, M. R. Horizontal gene transfer as a source of conflict and cooperation in prokaryotes. Front. Microbiol. 11, 1569 (2020).
Zhaxybayeva, O. & Doolittle, W. F. Lateral gene transfer. Curr. Biol. 21, R242–R246 (2011).
Jain, R., Rivera, M. C. & Lake, J. A. Horizontal gene transfer among genomes: the complexity hypothesis. Proc. Natl Acad. Sci. USA 96, 3801–3806 (1999).
Woese, C. R. Interpreting the universal phylogenetic tree. Proc. Natl Acad. Sci. USA 97, 8392–8396 (2000).
Medigue, C., Rouxel, T., Vigier, P., Henaut, A. & Danchin, A. Evidence for horizontal gene transfer in Escherichia coli speciation. J. Mol. Biol. 222, 851–856 (1991).
Lawrence, J. G. & Ochman, H. Amelioration of bacterial genomes: rates of change and exchange. J. Mol. Evol. 44, 383–397 (1997).
Ochman, H., Lawrence, J. G. & Groisman, E. A. Lateral gene transfer and the nature of bacterial innovation. Nature 405, 299–304 (2000).
Vatanen, T. et al. Mobile genetic elements from the maternal microbiome shape infant gut microbial assembly and metabolism. Cell 185, 4921–4936.e15 (2022).
Smillie, C. S. et al. Ecology drives a global network of gene exchange connecting the human microbiome. Nature 480, 241–244 (2011).
Liu, L. et al. The human microbiome: a hot spot of microbial horizontal gene transfer. Genomics 100, 265–270 (2012).
Gomes, A. L. C. et al. Genome and sequence determinants governing the expression of horizontally acquired DNA in bacteria. ISME J. 14, 2347–2357 (2020).
Becq, J., Churlaud, C. & Deschavanne, P. A benchmark of parametric methods for horizontal transfers detection. PLoS ONE 5, e9989 (2010).
Douglas, G. M. & Langille, M. G. I. Current and promising approaches to identify horizontal gene transfer events in metagenomes. Genome Biol. Evol. 11, 2750–2766 (2019).
Jeong, H., Arif, B., Caetano-Anollés, G., Kim, K. M. & Nasir, A. Horizontal gene transfer in human-associated microorganisms inferred by phylogenetic reconstruction and reconciliation. Sci. Rep. 9, 5953 (2019).
Groussin, M. et al. Elevated rates of horizontal gene transfer in the industrialized human microbiome. Cell 184, 2053–2067.e18 (2021).
Lerner, A., Matthias, T. & Aminov, R. Potential effects of horizontal gene exchange in the human gut. Front. Immunol. 8, 1630 (2017).
Coyte, K. Z. et al. Horizontal gene transfer and ecological interactions jointly control microbiome stability. PLoS Biol. 20, e3001847 (2022).
Brito, I. L. et al. Mobile genes in the human microbiome are structured from global to individual scales. Nature 535, 435–439 (2016).
Andam, C. P. & Gogarten, J. P. Biased gene transfer in microbial evolution. Nat. Rev. Microbiol. 9, 543–555 (2011).
Soucy, S. M., Huang, J. & Gogarten, J. P. Horizontal gene transfer: building the web of life. Nat. Rev. Genet. 16, 472–482 (2015).
Polz, M. F., Alm, E. J. & Hanage, W. P. Horizontal gene transfer and the evolution of bacterial and archaeal population structure. Trends Genet. 29, 170–175 (2013).
Nakamura, Y., Itoh, T., Matsuda, H. & Gojobori, T. Biased biological functions of horizontally transferred genes in prokaryotic genomes. Nat. Genet. 36, 760–766 (2004).
Hehemann, J.-H., Kelly, A. G., Pudlo, N. A., Martens, E. C. & Boraston, A. B. Bacteria of the human gut microbiome catabolize red seaweed glycans with carbohydrate-active enzyme updates from extrinsic microbes. Proc. Natl Acad. Sci. USA 109, 19786–19791 (2012).
Vernikos, G. S. & Parkhill, J. Interpolated variable order motifs for identification of horizontally acquired DNA: revisiting the Salmonella pathogenicity islands. Bioinformatics 22, 2196–2203 (2006).
Podell, S. & Gaasterland, T. DarkHorse: a method for genome-wide prediction of horizontal gene transfer. Genome Biol. 8, R16 (2007).
Trappe, K., Marschall, T. & Renard, B. Y. Detecting horizontal gene transfer by mapping sequencing reads across species boundaries. Bioinformatics 32, i595–i604 (2016).
Seiler, E., Trappe, K. & Renard, B. Y. Where did you come from, where did you go: refining metagenomic analysis tools for horizontal gene transfer characterisation. PLoS Comput. Biol. 15, e1007208 (2019).
Li, C., Jiang, Y. & Li, S. LEMON: a method to construct the local strains at horizontal gene transfer sites in gut metagenomics. BMC Bioinformatics 20, 702 (2019).
Song, W., Wemheuer, B., Zhang, S., Steensen, K. & Thomas, T. MetaCHIP: community-level horizontal gene transfer identification through the combination of best-match and phylogenetic approaches. Microbiome 7, 36 (2019).
Lloyd-Price, J. et al. Strains, functions and dynamics in the expanded Human Microbiome Project. Nature 550, 61–66 (2017).
Human Microbiome Project Consortium. Structure, function and diversity of the healthy human microbiome. Nature 486, 207–214 (2012).
Popa, O., Hazkani-Covo, E., Landan, G., Martin, W. & Dagan, T. Directed networks reveal genomic barriers and DNA repair bypasses to lateral gene transfer among prokaryotes. Genome Res. 21, 599–609 (2011).
Mark Welch, J. L., Rossetti, B. J., Rieken, C. W., Dewhirst, F. E. & Borisy, G. G. Biogeography of a human oral microbiome at the micron scale. Proc. Natl Acad. Sci. USA 113, E791–E800 (2016).
Li, C., Chen, J. & Li, S. C. Understanding Horizontal Gene Transfer network in human gut microbiota. Gut Pathog. 12, 33 (2020).
Barabasi, A. L. & Oltvai, Z. N. Network biology: understanding the cell’s functional organization. Nat. Rev. Genet. 5, 101–113 (2004).
Finn, R. D. et al. Pfam: the protein families database. Nucleic Acids Res. 42, D222–D230 (2014).
Jeltsch, A. & Pingoud, A. Horizontal gene transfer contributes to the wide distribution and evolution of type II restriction-modification systems. J. Mol. Evol. 42, 91–96 (1996).
Alcock, B. P. et al. CARD 2023: expanded curation, support for machine learning, and resistome prediction at the Comprehensive Antibiotic Resistance Database. Nucleic Acids Res. 51, D690–D699 (2023).
Franzosa, E. A. et al. Species-level functional profiling of metagenomes and metatranscriptomes. Nat. Methods 15, 962–968 (2018).
Lloyd-Price, J. et al. Multi-omics of the gut microbial ecosystem in inflammatory bowel diseases. Nature 569, 655–662 (2019).
Stokes, H. W. & Gillings, M. R. Gene flow, mobile genetic elements and the recruitment of antibiotic resistance genes into Gram-negative pathogens. FEMS Microbiol. Rev. 35, 790–819 (2011).
One Health https://www.who.int/health-topics/one-health (WHO, 2024).
Forster, S. C. et al. A human gut bacterial genome and culture collection for improved metagenomic analyses. Nat. Biotechnol. 37, 186–192 (2019).
Almeida, A. et al. A new genomic blueprint of the human gut microbiota. Nature 568, 499–504 (2019).
Zou, Y. et al. 1,520 reference genomes from cultivated human gut bacteria enable functional microbiome analyses. Nat. Biotechnol. 37, 179–185 (2019).
Pasolli, E. et al. Extensive unexplored human microbiome diversity revealed by over 150,000 genomes from metagenomes spanning age, geography, and lifestyle. Cell 176, 649–662.e20 (2019).
Valles-Colomer, M. et al. The person-to-person transmission landscape of the gut and oral microbiomes. Nature 614, 125–135 (2023).
Almeida, A. et al. A unified catalog of 204,938 reference genomes from the human gut microbiome. Nat. Biotechnol. 39, 105–114 (2021).
Lai, S. et al. metaMIC: reference-free misassembly identification and correction of de novo metagenomic assemblies. Genome Biol. 23, 242 (2022).
Logsdon, G. A., Vollger, M. R. & Eichler, E. E. Long-read human genome sequencing and its applications. Nat. Rev. Genet. 21, 597–614 (2020).
Zhao, S. et al. Adaptive evolution within gut microbiomes of healthy people. Cell Host Microbe 25, 656–667.e8 (2019).
Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).
Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11, 119 (2010).
Suzek, B. E., Huang, H., McGarvey, P., Mazumder, R. & Wu, C. H. UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics 23, 1282–1288 (2007).
Truong, D. T. et al. MetaPhlAn2 for enhanced metagenomic taxonomic profiling. Nat. Methods 12, 902–903 (2015).
Suzek, B. E., Wang, Y., Huang, H., McGarvey, P. B. & Wu, C. H. UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics 31, 926–932 (2015).
UniProt Consortium. UniProt: the universal protein knowledgebase. Nucleic Acids Res. 46, 2699 (2018).
Shikov, A. E., Malovichko, Y. V., Nizhnikov, A. A. & Antonets, K. S. Current methods for recombination detection in bacteria. Int. J. Mol. Sci. 23, 6257 (2022).
Sánchez-Soto, D. et al. ShadowCaster: compositional methods under the shadow of phylogenetic models to detect horizontal gene transfers in prokaryotes. Genes 11, 756 (2020).
Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60 (2015).
Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 25, 1043–1055 (2015).
Bansal, M. S., Alm, E. J. & Kellis, M. Efficient algorithms for the reconciliation problem with gene duplication, horizontal transfer and loss. Bioinformatics 28, i283–i291 (2012).
Li, D. et al. MEGAHIT v1.0: a fast and scalable metagenome assembler driven by advanced methodologies and community practices. Methods 102, 3–11 (2016).
Huerta-Cepas, J., Serra, F. & Bork, P. ETE 3: reconstruction, analysis, and visualization of phylogenomic data. Mol. Biol. Evol. 33, 1635–1638 (2016).
Segata, N., Bornigen, D., Morgan, X. C. & Huttenhower, C. PhyloPhlAn is a new method for improved phylogenetic and taxonomic placement of microbes. Nat. Commun. 4, 2304 (2013).
El-Gebali, S. et al. The Pfam protein families database in 2019. Nucleic Acids Res. 47, D427–D432 (2019).
Koressaar, T. et al. Primer3_masker: integrating masking of template sequence with primer design software. Bioinformatics 34, 1937–1938 (2018).
Caporaso, J. G. et al. Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample. Proc. Natl Acad. Sci. USA 108, 4516–4522 (2011).
Acknowledgements
This work was supported by National Institutes of Health grants T32CA009001 (E.N.), U54DE023798 (C.H.), R24DK110499 (C.H.) and K23DK125838 (L.H.N.), the American Gastroenterological Association Research Foundation’s Research Scholars Award (L.H.N.), the Crohn’s and Colitis Foundation Career Development Award (L.H.N.), and the MGH/Chen Institute Transformative Scholars Award in Medicine (L.H.N.). The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. We thank A. Pawluk, K. N. Thompson, K. Curry and T. Treangen for comments on the manuscript and helpful discussions; M. Michaud, C. Dulong and Y. Yan for help with validation experiments. Computational work was conducted on the FASRC Cannon cluster supported by the FAS Division of Science Research Computing Group at Harvard University.
Author information
Authors and Affiliations
Contributions
T.Y.H., E.N., L.H.N. and E.A.F. had full access to all of the data in the study and took responsibility for the integrity of the data and the accuracy of the data analysis. T.Y.H., E.N., C.H., L.H.N. and E.A.F. conceptualized and designed the project. All authors acquired, analysed and interpreted data, and critically reviewed the paper for important intellectual content. T.Y.H. and E.N. wrote the paper draft. T.Y.H. and E.N. performed statistical analysis. C.H. and L.H.N. obtained funding. C.H., L.H.N. and E.A.F. provided administrative, technical or material support. L.H.N. and E.A.F. supervised the project.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Microbiology thanks the anonymous reviewers for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Relative accuracy of WAAFLE, DarkHorse, and MetaCHIP on synthetic LGT and control contigs.
WAAFLE was penalized with a 20% holdout of its search database, while DarkHorse was evaluated using a translated version of the complete database, and MetaCHIP was evaluated without further constraints on its respective input. (a) DarkHorse only achieved non-negligible sensitivity (TPR) for the longest contigs (rightmost column) containing the most “extreme” LGT events (that is between pairs of species with the kingdom- or phylum-level LCAs). WAAFLE’s specificity (FPR) is stratified according to the taxonomic level of the LGT LCA as in Fig. 1 from the main text (for example an intragenus false positive is counted as a true negative at the family level; x-axis). This level of stratification was not possible for DarkHorse, and so a single FPR value is plotted at “genus” resolution for comparison. DarkHorse offered better specificity than WAAFLE on shorter contigs (where it made relatively few LGT calls) but not on the longest contigs. (b) Here, an additional comparison was performed between WAAFLE and MetaCHIP using a separate synthetic dataset designed for MetaCHIP compatibility. TPR and FPR were computed and plotted as in ‘(A)’ with TPR calculations restricted to taxonomic ranks assigned to at least 100 LGT LCAs (that is kingdom, order, and genus). Results are stratified according to the completeness of the metagenomic bins into which LGT and control contigs were grouped. WAAFLE’s sensitivity here was similar to that observed in the preceding evaluations and consistently higher than MetaCHIP. While MetaCHIP’s specificity was correspondingly very high, WAAFLE again exhibited a peak FPR of only ~0.5% at the intragenus level, improved at higher ranks. Notably, WAAFLE’s performance was not dependent on bin completeness, while MetaCHIP proved less sensitive to LGT events in less-complete bins.
Extended Data Fig. 2 Relationship between phylogenetic relatedness and WAAFLE performance.
(a) Distribution of phylogenetic distance (PD) by taxonomic rank. Genomes from the WAAFLE synthetic benchmarking dataset were compared with each dot representing the PD value of a pair of genomes which were then binned based on the level of their taxonomic LCA. The boxplot limits display the first and third quartiles, with a horizontal line indicating the median, the whiskers extend to 1.5x the interquartile range below Q1 and above Q3 (the inner fence) outside of which outliers are plotted as individual data points. Phylogenetic relatedness generally corresponds to the hierarchical rank of taxonomy. Nevertheless, taxa distantly related by taxonomy may have similar phylogenetic distances and vice versa. Taxonomic rank, s: species (n = 237), g: genus (n = 238), f: family (n = 233), o: order (n = 227), c: class (n = 216), p: phylum (n = 237), k: kingdom (n = 227). (b) The true positive rate (TPR) for WAAFLE calling an LGT based on PD. The relationship between phylogenetic relatedness and WAAFLE sensitivity follows an exponential fit of y = 0.779*(1-exp(−2.012*x)). The vertical dotted lines indicate the 90th percentile of intraspecies and intragenus PD values corresponding to panel A. (c) WAAFLE controlled the false positive rate for closely related genomes, similar to what was observed for genomes within the same genus. (d) Comparison of WAAFLE and MetaCHIP sensitivity using a MetaCHIP-compatible synthetic dataset. MetaCHIP had consistently lower sensitivity for detecting LGT among genomes whose PD is greater or equal to the 90th percentile of intragenus PD values. (e) MetaCHIP and WAAFLE each adequately controlled the false positive rate, which improved with greater PD.
Extended Data Fig. 3 Rates of undirected and directed inter-genus LGT for HMP1-II metagenomes profiled by WAAFLE.
LGT rates normalized to total assembly size in 1000 s of genes, stratified by body site and correspondingly colored by body area. (a) All inter-genus LGT events were considered, regardless of whether the donor and recipient clades were known. (b) Only directed inter-genus LGT events were considered (that is, cases where the donor and recipient clade were clear from gene adjacency). Major body sites are labeled in bold type. (c) Total assembly sizes for the same set of samples; only genes resolved to at least the genus level were counted. Only the first sequenced visit from each unique HMP1-II participant is plotted. Each box displays the first and third quartiles, with a horizontal line indicating the median, while the whiskers extend to 1.5x the interquartile range below Q1 and above Q3 (the inner fence) outside of which outliers are plotted as individual data points.
Extended Data Fig. 4 Differential abundance of putative antimicrobial resistance genes in LGT vs. non-LGT contigs.
(a) ARG enrichment patterns in HMP1-II metagenomes are similar across body areas and higher in LGT contigs. (b) Cross-tabulation of ARG abundance by resistance mechanism demonstrates that genes related to antibiotic inactivation are more abundant in LGT contigs. (c) Cross-tabulation of ARG genes by drug class demonstrates an enrichment of genes conferring resistance to ribosomal-targeting antimicrobial drugs in LGT. (d) Differential abundance of ARG families in LGT vs. non-LGT contigs. Specifically, those conferring resistance to antimicrobials acting on protein synthesis pathways (for example, ribosomal proteins like aminoglycosides) and folate synthesis (for example, sul genes) are enriched in LGT, while ARGs conferring resistance to drugs disrupting cell wall biosynthesis (for example, glycopeptides) are depleted. Heatmaps indicate the log-scaled fold enrichment of ARGs in LGT vs. non-LGT contigs.
Supplementary information
Supplementary Information
Supplementary Appendix.
Supplementary Tables 1–8
Supplementary Table 1 Assembly statistics of the HMP1-II metagenomes. Tables 2–5 Counts and rates of detected LGT. Tables 6 and 7 Molecular function data. Table 8 Experimental validation PCR data and results. All contigs and LGT profiles are provided on the WAAFLE website (https://github.com/biobakery/waafle).
Source data
Source Data Unprocessed Gels
Unprocessed gel images for both Fig. 6 and Supplementary Fig. 15.
Source Data Figs. 1–6 and Extended Data Figs. 1–4
Source data provided in separate, labelled sheets.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Hsu, T.Y., Nzabarushimana, E., Wong, D. et al. Profiling lateral gene transfer events in the human microbiome using WAAFLE. Nat Microbiol 10, 94–111 (2025). https://doi.org/10.1038/s41564-024-01881-w
Received:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/s41564-024-01881-w
This article is cited by
-
A survey of computational approaches for characterizing microbial interactions in microbial mats
Genome Biology (2025)
-
The food-associated resistome is shaped by processing and production environments
Nature Microbiology (2025)