Introduction

Antimicrobial resistance (AMR) is an escalating global health crisis with profound implications for human medicine, animal husbandry, and ecological stability1,2,3. Among the various contributors to this crisis, the gut microbiota of animals has emerged as a critical reservoir and vehicle for AMR genes (ARGs), virulence factor genes (VFGs), and mobile genetic elements (MGEs). These microbial communities facilitate the emergence, enrichment, and horizontal transfer of resistance determinants within and between species, as well as into the environment4,5,6. Understanding the distribution and transmission pathways of these genetic elements is therefore essential for assessing the risk of AMR spread across ecological and host boundaries.

Caprinae species—including domestic sheep (Ovis aries), goats (Capra hircus), and bharals (Pseudois nayaur)—occupy a broad range of ecological niches and play a vital role in global agriculture and rural livelihoods. Their diverse husbandry practices, close interaction with human populations, and ecological mobility position them as potentially significant players in the resistome landscape7. While previous studies have characterized AMR in the gut microbiota of ruminants8,9, comprehensive metagenomic investigations focused specifically on Caprinae species remain limited. A previous study provided an initial overview of the gut microbial composition in Caprinae; however, that analysis was constrained by a small sample size (n = 30) and focused exclusively on a limited number of species within China7. Consequently, there is still a lack of high-resolution genomic data spanning larger cohorts and diverse geographic regions.

In addition to bacterial populations, bacteriophages and other gut-associated viruses may also influence AMR dynamics. Although phage-mediated transfer of ARGs is not yet considered a primary mechanism, growing evidence suggests that under certain ecological conditions, viruses can serve as ARG carriers and facilitate their dissemination10,11,12,13,14. The presence of resistance genes in viral genomes within the gut ecosystem, therefore, warrants closer scrutiny.

In this study, we performed a large-scale metagenomic analysis of 779 gut samples from Caprinae animals to systematically characterize the distribution of ARGs, VFGs, and MGEs. We also assessed the presence of ARGs within the viral fraction of the microbiome. By integrating taxonomic and functional profiling, this work enhances our understanding of the role of Caprinae gut microbiota in shaping AMR ecology. Our findings provide a foundational framework for future investigations into resistance transmission and offer critical insights into the broader One Health implications of AMR dissemination in livestock-associated environments.

Results

A MAGs catalog of the caprinae gut microbiota

In this study, we performed a large-scale metagenomic analysis of 779 gut samples, comprising four distinct Caprinae species: domestic goats (Capra hircus, n = 622), domestic sheep (Ovis aries, n = 135), Siberian ibex (Capra ibex, n = 12), and blue sheep (Pseudois nayaur, n = 10). From a total of 779 gut metagenomic samples derived from Caprinae animals, we reconstructed 63,126 metagenome-assembled genomes (MAGs). Following stringent quality filtering (completeness ≥ 50%, contamination ≤ 10%) and dereplication at 99% average nucleotide identity (ANI), we curated a high-confidence dataset of 17,023 non-redundant MAGs, comprising 13,359 medium-quality MAGs (78.48%) and 3664 high-quality MAGs (21.52%) (Fig. 1A; Supplementary Data 1). Among the medium-quality MAGs, 6124 (45.84%) were not classified at the species level, compared with 1384 (37.77%) of the high-quality MAGs. This indicates that the relatively relaxed quality thresholds may have resulted in more MAGs failing to reach species-level classification. The 17,023 MAGs demonstrated variable genomic quality, with completeness values ranging from 50% to 100% (mean: 76.95%) and contamination levels between 0% and 10% (mean: 2.09%) (Fig. 1B). The GC content spanned from 22.47% to 73.02% (mean: 47.27%), while genome sizes varied from 0.22 to 8.15 Mb (mean: 1.92 Mb) (Fig. 1C), reflecting the phylogenetic and functional diversity within the dataset.

Fig. 1: Comprehensive genomic landscape of the Caprinae gut microbiota.
Fig. 1: Comprehensive genomic landscape of the Caprinae gut microbiota.The alternative text for this image may have been generated using AI.
Full size image

A Workflow illustrating the selection and quality filtering process of MAGs from Caprinae gut samples. B Scatter plot depicting completeness and contamination metrics for the 17,023 MAGs, with each dot representing a single MAG. C Distribution of GC content versus genome size across the MAG dataset. D Sankey diagram showing the taxonomic composition of the 17,023 MAGs across taxonomic levels from domain to species. At each level, only the ten most abundant taxa are shown, with the remaining taxa grouped as “Other” and unclassified taxa designated as “Unknown”.

Taxonomic classification revealed one MAG that could not be assigned to any known domain, while 218 were identified as Archaea, and the remainder were of bacterial origin (Fig. 1D; Supplementary Data 1). Collectively, these 17,023 MAGs spanned 32 phyla, 45 classes, 109 orders, 260 families, 1253 genera, and 3612 species. Remarkably, 7508 MAGs (44.11%) could not be assigned to any known species, underscoring a rich reservoir of previously uncharacterized microbial diversity in the Caprinae gut. Among the identified species, Alistipes sp015060115 (0.31%), Succiniclasticum sp900315925 (0.23%), and JAHHUH01 sp017623055 (0.20%) were the most abundant. At higher taxonomic levels, the phyla Bacillota_A (48.09%) and Bacteroidota (28.29%) were predominant. At the genus level, Prevotella (5.13%), Cryptobacteroides (4.10%), and Alistipes (3.67%) were most abundant, highlighting the dominance of anaerobic commensals commonly associated with herbivorous gut ecosystems. Taxonomic profiling of the MAGs reconstructed from the two primary hosts revealed a shared core microbiota. In Ovis aries, the 5891 recovered MAGs were predominantly assigned to the genera Prevotella (n = 429, 7.28%), Cryptobacteroides (n = 219, 3.72%), and Alistipes (n = 189, 3.21%). Similarly, the 10,752 MAGs derived from Capra hircus were dominated by Cryptobacteroides (n = 462, 4.30%), Prevotella (n = 444, 4.13%), and Alistipes (n = 379, 3.52%). These findings indicate that while the rank order of dominant genera differs slightly, distinct Caprinae species share a conserved microbial framework.

ARG landscape in the caprinae gut microbiota

To assess the prevalence and distribution of ARGs in the Caprinae gut microbiome, all 17,023 MAGs were screened against the Comprehensive Antibiotic Resistance Database (CARD). This analysis identified a total of 2440 ARGs representing 208 distinct ARGs, of which 1490 (61.07%) were from medium-quality MAGs and 950 (38.93%) from high-quality MAGs (Supplementary Data 2). In total, 1448 MAGs (8.51%) harbored at least one ARG. The most frequently detected resistance categories targeted tetracyclines (60.96%), multiple antibiotic classes (classified as multi-drug resistance, 18.22%), and lincosamides (6.19%) (Fig. 2A). The dominant resistance mechanisms included antibiotic target protection (62.98%) and antibiotic inactivation (15.17%), followed by antibiotic efflux (13.72%) (Fig. 2B).

Fig. 2: Distribution patterns of ARGs in the Caprinae gut microbiota.
Fig. 2: Distribution patterns of ARGs in the Caprinae gut microbiota.The alternative text for this image may have been generated using AI.
Full size image

A Relative abundance of ARGs categorized by drug classes. B Relative abundance of ARGs grouped by resistance mechanisms. C The top 10 bacterial families with the highest cumulative relative abundance of ARGs across all samples. The y-axis indicates the cumulative relative abundance of ARGs, while stacked colors represent the associated drug classes. D The 10 MAGs harboring the greatest number of ARGs, with their taxonomic classification shown in parentheses. E Prevalence distribution of ARGs, where each point represents an ARG colored by its drug class.

ARGs are predominantly associated with Bacteroidaceae, Selenomonadaceae, and the F082 family (Fig. 2C). At the species level, HGM04593 sp017503645 and Campylobacter sp002139875 emerge as the top carriers (Supplementary Fig. 1A). While most ARG-positive MAGs harbor 1–10 ARGs, 11 MAGs carry 11-50, and four Escherichia coli MAGs harbor over 50 ARGs each (Supplementary Fig. 1B and Fig. 2D). These E. coli MAGs are highly prevalent (>60% of samples) and reach high relative abundance (>1%) specifically in goats (Supplementary Data 3).

The distribution of ARGs across the recovered MAGs revealed that 80 ARGs were confined to a single MAG, 79 were found in 2–10 MAGs, 45 occurred in 11–50 MAGs, and four ARGs were broadly distributed across more than 50 MAGs (Supplementary Fig. 1C). In terms of prevalence, two ARGs were highly widespread, each detected in over 75% of the ARG-positive samples: tet(W/N/W) (75.48%), a mosaic tetracycline resistance gene encoding a ribosomal protection protein that mediates tetracycline resistance; adeF (75.22%), the membrane fusion protein component of the AdeFGH multidrug efflux complex (Fig. 2E; Supplementary Data 4).

VFGs in Caprinae gut microbiota and their link to antibiotic resistance

To explore the landscape of virulence within the gut microbiota of Caprinae animals, we screened 17,023 MAGs against the Virulence Factor Database (VFDB). This analysis revealed a substantial presence of VFGs: a total of 5401 VFGs were identified, representing 141 distinct virulence factors (VFs) across 13 virulence factor classes (VFCs). Interestingly, 2694 MAGs (15.83%) contained at least one VFG (Supplementary Data 5), highlighting the widespread potential for pathogenicity in the Caprinae gut ecosystem. At the individual gene level, the most prevalent VF was EF-Tu (a surface-expressed protein that mediates bacterial attachment by interacting with host cell nucleolin) found in 53.15% of VFG-positive MAGs (Fig. 3A). This was followed by the capsule, which enhances immune evasion by inhibiting phagocytosis and complement-mediated killing while promoting adhesion and colonization (20.18%), and GroEL (a stress-induced chaperonin that can be released extracellularly and facilitate adhesion or invasion of host tissues) (8.88%). When categorized by VFCs, the majority of VFGs were linked to adherence systems (63.64%), with immune modulation (21.98%) and motility mechanisms (5.47%) also well represented (Fig. 3B). In terms of prevalence across samples, EF-Tu (76.38%), Capsule (75.35%), and GroEL (74.33%) emerged as the most widespread VFs (Fig. 3C; Supplementary Data 6).

Fig. 3: Features of VFs in the Caprinae gut microbiota and their relationship with ARGs.
Fig. 3: Features of VFs in the Caprinae gut microbiota and their relationship with ARGs.The alternative text for this image may have been generated using AI.
Full size image

A Relative abundance distribution of VFs. B Relative abundance of VFCs. C Prevalence of VFs, with each dot representing a VF colored by its respective VFC. D Circular genome map of SRR12529377.bin.455_1. E Procrustes analysis demonstrating a significant correlation between the relative abundances of ARGs and VFGs. F Correlation network of the top 20 most prevalent ARGs and VFs, showing significant positive associations (r > 0.6, P < 0.05).

The distribution of VFGs per genome varied significantly. The MAG SRR12529377.bin.455, classified as Pseudomonas aeruginosa, harbored the highest number with 199 VFGs, and also carried 26 ARGs (Fig. 3D). This genomic profile represents a substantial virulence repertoire, as typical pathogenic reference strains like PAO1 and PA14 are known to possess ~250–290 VFGs15,16. The presence of such an extensive virulence arsenal, combined with its high number of ARGs, underscores its pathogenic potential. Other MAGs carrying large numbers of VFGs included E. coli strains SRR14556685.bin.22 (107 VFGs) and SRR7755428.bin.33 (104 VFGs) (Supplementary Fig. 2A). Importantly, a strong positive correlation was observed between the abundance of VFGs and ARGs within the microbiota. Diversity analyses revealed significant associations in both Shannon index (R = 0.85, P < 2.2 × 10¹⁶) and richness index (R = 0.86, P < 2.2 × 10¹⁶) (Supplementary Fig. 2B, 2C), indicating that communities rich in virulence genes are also rich in resistance genes. Procrustes analysis further reinforced this relationship (M² = 0.726, P = 0.001) (Fig. 3E).

Focusing on the 30 most prevalent ARGs and VFs, we identified 23 significant pairwise correlations (r > 0.6, P < 0.05) (Supplementary Data 7; Fig. 3F). Interestingly, KatA (a catalase that helps bacteria counteract host-derived reactive oxygen species, thereby enhancing survival and infection) was strongly associated with 13 ARGs, while GroEL correlated with six.

MGEs and phage-mediated resistance in the caprinae gut microbiota

To investigate the potential for horizontal gene transfer and dissemination of antibiotic resistance in Caprinae gut microbiota, we analyzed 17,023 MAGs against a curated MGE database. This revealed 457 MGE-associated genes across 246 MAGs (1.45%), encompassing 139 unique MGEs grouped into eight functional types (Supplementary Data 8). MGE detection was slightly more frequent in high-quality MAGs (3.57%, 130/3664) than in medium-quality ones (2.45%, 327/13,359). This suggests that overall genomic integrity may influence the recovery of MGEs, which often reside in complex or repetitive regions; thus, our findings may provide a conservative estimate of the resistome mobility.

At the gene level, the most abundant MGEs were 2204_tnpA_AB646744.1 (56.23%), 2548_tnpA_U75371.3 (22.35%), and 1897_IS91_MNRK01000014.1 (4.09%) (Fig. 4A). Transposases overwhelmingly dominated the MGE landscape, accounting for 92.87% of all elements, followed by insertion sequences, particularly from the IS91 family (4.80%) (Fig. 4B). In terms of sample prevalence, 2204_tnpA_AB646744.1 was detected in 71.63% of microbiome samples, followed by 2548_tnpA_U75371.3 (50.96%) and 1897_IS91_MNRK01000014.1 (43.13%) (Fig. 4C; Supplementary Data 9).

Fig. 4: Characteristics of MGEs in the Caprinae gut microbiota and their association with ARGs.
Fig. 4: Characteristics of MGEs in the Caprinae gut microbiota and their association with ARGs.The alternative text for this image may have been generated using AI.
Full size image

A Relative abundance of MGEs. B Relative abundance distribution across different MGE types. C Prevalence distribution of MGEs, with each point representing an MGE colored by its type. D Circular genome map of GX5.bin.331_1. E Schematic of ARG–MGE co-localization within contigs; arrows indicate gene orientation (right-pointing for forward strand, left-pointing for reverse strand). F Correlation network of the top 20 most prevalent ARGs and MGEs, showing significant positive correlations (r > 0.6, P < 0.05).

Some genomes exhibited extensive MGE loads: the E. coli genome GX5.bin.331 carried the highest number (22 MGEs), followed by E. coli JL4.bin.478, Dielma fastidiosa SRR14556658.bin.12, and E. coli SX5.bin.389 (each with 13 MGEs) (Supplementary Fig. 3A). Remarkably, GX5.bin.331 also harbored 48 ARGs, underscoring a strong potential for multi-drug resistance and genetic mobility (Fig. 4D).

To assess the potential for horizontal gene transfer, we mapped ARGs located within ±5 kb of MGE sequences, uncovering 19 unique MGE–ARG co-localizations. Marked combinations included tet(B)-1484_tnpA10_HQ840942.1, acrB-386_tnpA_JX077110.1, and Ecol_EFTu_KIR-1724_IS91_LRKE01000216.1 (Fig. 4E), suggesting these resistance genes may be mobilizable.

Diversity-based correlation analyses further reinforced the connection between MGEs and ARGs. Both the Shannon index (R = 0.82, P < 2.2 × 10¹⁶) and Richness index (R = 0.89, P < 2.2 × 10¹⁶) showed strong positive associations (Supplementary Fig. 3B, 3C). Procrustes analysis supported this finding, revealing significant compositional overlap between MGE and ARG profiles (M² = 0.7233, P = 0.001) (Supplementary Fig. 3D). Among the 20 most prevalent MGEs and ARGs, 17 significant correlations (r > 0.6, P < 0.05) were observed, particularly involving three MGEs, 1897_IS91_MNRK01000014.1, 2548_tnpA_U75371.3, and 2204_tnpA_AB646744.1, and ten ARGs (Fig. 4F; Supplementary Data 10).

Beyond MGEs, the gut virome, particularly bacteriophages, may also serve as an important reservoir and vector for the dissemination of ARGs. In this study, a total of 69,226 viral sequences were identified, of which 4008 were of medium-to-high quality and complete. After clustering these medium-to-high-quality complete sequences, 3765 representative viral sequences were obtained, and ARGs were identified within these sequences. We identified three ARGs within viral genomes: poxtA (unclassified viral genome), lnuC (Myoviridae), and tetO (Myoviridae) (Supplementary Data 11). These genes confer resistance to multiple antibiotic classes (i.e., multi-drug resistance), as well as resistance to lincosamides and tetracyclines, mediated through mechanisms such as antibiotic target protection and antibiotic inactivation. Predicted bacterial hosts comprised unclassified species from the genera Coprovivens, CAG-196, and Zag111.

Unique and shared ARG profiles in the Caprinae gut microbiota

To characterize host-specific antibiotic resistance profiles, we compared the ARG counts of Caprinae gut microbiota with those of humans. Across both hosts, multi-drug resistance emerged as the most dominant resistance type (Supplementary Data 12). In Caprinae, multi-drug resistance accounted for 42.31% of ARG counts, followed by resistance to tetracyclines (9.13%) and aminoglycoside antibiotics (7.69%) (Fig. 5A). In humans, multi-drug resistance represented 46.25% of ARG counts, followed by aminoglycoside (7.85%) and fluoroquinolone resistance (5.29%) (Fig. 5B). In terms of resistance mechanisms, Caprinae microbiota was primarily shaped by antibiotic efflux (42.79% of ARG counts), inactivation (25.48%), and target alteration (16.83%) (Supplementary Fig. 4A). Human-associated resistomes were dominated by antibiotic inactivation (45.39% of ARG counts), followed by efflux (29.18%) and target alteration (13.31%) (Supplementary Fig. 4B).

Fig. 5: Comparative analysis of ARG counts across gut microbiota from Caprinae animals and humans.
Fig. 5: Comparative analysis of ARG counts across gut microbiota from Caprinae animals and humans.The alternative text for this image may have been generated using AI.
Full size image

A Top 10 ARG classes with the highest gene counts in Caprinae gut microbiota. B Top 10 ARG classes with the highest gene counts in human gut microbiota. C Venn diagram illustrating the unique and shared ARGs (based on gene counts) between Caprinae and humans. D Taxonomic composition of ARG-carrying microbial genomes at the family level, based on gene counts. E Taxonomic composition of ARG-carrying microbial genomes at the genus level, based on gene counts.

Host-specific and overlapping ARG profiles

A comparative analysis of ARG repertoires revealed both unique and overlapping ARG counts between Caprinae and humans (Fig. 5C). Caprinae harbored 24 unique ARGs, in contrast to 402 unique ARGs in humans. Importantly, 184 distinct ARGs were shared between Caprinae and humans (Supplementary Data 13), underscoring a substantial common resistome across species. At the taxonomic level, Enterobacteriaceae emerged as the primary ARG-carrying family across both hosts (Fig. 5D), with Escherichia being the most represented genus (Fig. 5E). However, Caprinae microbiota showed greater diversity in ARG-carrying taxa, exhibiting a broader distribution across genera compared to the more genus-dominated resistome of humans.

Clinically relevant ARGs in caprinae

To assess potential clinical implications, we screened for ARGs potentially conferring resistance to critical antibiotics used against Gram-negative pathogens, including tigecycline, vancomycin, polymyxins, and β-lactams (Supplementary Data 14). Using stringent criteria (identity ≥ 90%, coverage ≥ 70%, E-value ≤ 1e−10), BLASTn analysis of the 184 distinct ARGs shared between Caprinae and humans identified four high-priority resistance genes (tetX1, tetX4, vanD, and vanR). In total, 17 gene sequences corresponding to these resistance genes were detected in the Caprinae-derived genomes (Supplementary Data 15). These genes are of concern due to their association with last-line antibiotic resistance and their potential for horizontal transmission.

Discussion

This study presents a metagenomic analysis of ARGs, VFGs, and MGEs in the gut microbiota of Caprinae animals, using 779 publicly available metagenomic datasets. We observed a highly conserved core microbiota in both O. aries and C. hircus, predominantly composed of Prevotella, Cryptobacteroides, and Alistipes. This suggests that despite distinct host lineages, domestic Caprinae share a stable microbial framework essential for their herbivorous physiology. By extending the analysis beyond bacterial genomes to include viral components, particularly bacteriophages, we provide a broader view of the potential vectors involved in ARG dissemination. Our findings highlight Caprinae animals as important reservoirs and potential facilitators of AMR gene persistence and horizontal transfer within host-associated microbial ecosystems.

Among the 17,023 MAGs, 8.51% harbored at least one ARG, spanning 208 distinct ARGs. The dominant ARGs were associated with tetracycline resistance and resistance to multiple antibiotic classes (i.e., multi-drug resistance)—a pattern that aligns with the historical use of these antibiotics in livestock production4,8. Several ARGs were detected in over 50 distinct microbial species, suggesting high host plasticity and an increased likelihood of horizontal gene transfer5,17,18. Some MAGs carried exceptionally large ARG loads, up to 56 resistance genes in a single genome, with E. coli consistently identified as a major ARG carrier. This mirrors trends observed in humans and pigs, reinforcing E. coli’s role as a cross-species amplifier of resistance elements and a sentinel for environmental AMR monitoring19,20,21.

Our analysis identified 5401 VFGs across 2694 MAGs, with EF-Tu, Capsule, and GroEL being the most prevalent—corresponding to functions such as adhesion, immune modulation, and motility. Importantly, ARG and VFG profiles showed strong positive correlations in both diversity and relative abundance, supporting the idea of co-selection under antibiotic pressure22. This linkage between virulence and resistance suggests that antibiotic use may inadvertently select for more pathogenic strains, thereby increasing health risks for both animals and humans22,23. Such dual-functional strains may also disrupt gut microbial equilibrium, potentially reducing resilience to opportunistic pathogens24,25,26. These findings highlight a potentially synergistic relationship between virulence and antibiotic resistance in the Caprinae gut microbiome, underscoring the importance of integrated surveillance for both traits in animal-associated microbial communities.

Although MGEs were detected in only 1.45% of MAGs, they showed a strong positive correlation with ARG diversity, consistent with their recognized role in mediating horizontal gene transfer. Nineteen physically linked MGE–ARG pairs were identified within ±5 kb, pointing to a potential for mobilization27. Transposases, particularly 2204_tnpA_AB646744.1, were highly prevalent, present in over 70% of samples—highlighting their importance as genetic vehicles facilitating the horizontal transfer of ARGs28. In addition, three ARGs were identified in viral genomes, primarily from Myoviridae phages, further suggesting that viruses may contribute to ARG dynamics29,30. While current evidence does not yet support a dominant role for phage-mediated ARG transmission, the ecological and evolutionary significance of these findings warrants further experimental investigation31,32. These findings suggest a multi-layered architecture of antibiotic resistance in Caprinae gut microbiota, potentially involving mobile genetic elements and viruses, highlighting the importance of integrated surveillance to monitor AMR.

To assess host specificity and overlap, we compared the Caprinae gut resistome to that of humans. Caprinae microbiota contained 24 unique ARGs, while 184 distinct ARGs were shared with humans. These overlaps suggest potential routes of genetic exchange, particularly in shared environments where humans and livestock intersect33,34. Interestingly, while multi-drug resistance dominated across both hosts, the Caprinae microbiota primarily relied on efflux pumps, contrasting with the enzymatic inactivation mechanisms more prevalent in humans35.

Of particular concern, four ARGs shared between Caprinae and human microbiota conferred resistance to clinically important antibiotics, including tigecycline and vancomycin. The presence of such high-priority resistance genes in animal microbiota raises concerns about livestock serving as environmental reservoirs for ARGs with potential public health implications 33,36. While this does not confirm direct transfer to human pathogens, it underscores the importance of continued surveillance under a One Health framework that integrates human, animal, and environmental health.

This study has several limitations. First, the use of publicly available datasets introduces specific sampling biases. Geographically, most samples originate from China, while major regions such as Africa and South America are underrepresented. Taxonomically, our dataset primarily focuses on C. hircus and O. aries—species typically associated with livestock production. Consequently, our findings may not fully reflect the microbiota dynamics of wild or isolated Caprinae populations. Furthermore, many other genera within the Caprinae subfamily (e.g., Budorcas or Ovibos) are largely missing, which may limit the broader applicability of our results. Second, functional annotation relied on existing databases, which could exclude novel genes, and virulence factor identification was based solely on sequence similarity. As many virulence-associated genes are also present in non-pathogenic strains, and only specific variants or genomic contexts determine pathogenicity, this approach may overestimate the actual virulence potential. Third, the comparison of Caprinae and human resistomes was limited to ARG counts rather than read-based abundance, restricting quantitative interpretation. Additionally, MGE identification is inherently limited by genomic integrity, as these elements often reside in repetitive regions that are difficult to assemble and bin37, potentially leading to an underestimation of their prevalence in fragmented MAGs. Finally, sequence co-localization provides only indirect evidence of horizontal gene transfer, requiring experimental validation. Future studies integrating functional metagenomics, long-read sequencing, and in vitro assays, as well as accounting for both abundance measures and sampling variation, will be essential for a more comprehensive understanding of host-specific resistome and mobilome dynamics.

Conclusion

This study offers the first large-scale metagenomic exploration of ARGs, VFGs, and MGEs in the gut microbiota of Caprinae animals. Our findings reveal a broad distribution of ARGs and VFGs across diverse microbial taxa, with E. coli emerging as a major resistance reservoir. Significant correlations between ARGs, VFGs, and MGEs suggest co-selection and potential genetic linkage, which may facilitate the persistence and spread of these traits within and between hosts. Importantly, we also detected ARGs in viral genomes, implicating bacteriophages in resistance gene dynamics, although their exact role remains to be fully clarified. Comparative analysis with human gut microbiota revealed substantial overlaps, including ARGs conferring resistance to tigecycline and vancomycin; antibiotics of critical importance to human health. These findings suggest that Caprinae gut microbiota may act as a hidden reservoir of high-priority ARGs, with implications for both veterinary and human medicine. This work lays a foundation for future studies on the ecological and evolutionary drivers of AMR in livestock and emphasizes the need for integrated AMR monitoring strategies across sectors.

Methods

Sample collection

The gut metagenomic datasets of Caprinae animals were retrieved from the NCBI database and the China National Center for Bioinformation (CNCB) by searching with a combination of keywords, including but not limited to “Caprinae,” “gut,” “feces,” and “metagenome.” The following filtering criteria were applied: (1) Samples with fewer than ten million sequencing reads were excluded; (2) Only short-read sequencing data were retained; (3) Samples without clear host information were excluded, and only those belonging to the Caprinae subfamily were kept; (4) Only samples derived from feces or intestinal contents were included.

After applying these criteria, a total of 779 gut metagenomic samples were obtained (Supplementary Data 16), including 622 from Capra hircus, 135 from Ovis aries, 12 from C. ibex, and 10 from P. nayaur. Among them, 743 samples were collected in China, 16 in the United States, 12 in Italy, 6 in Saudi Arabia, and 2 from unknown locations.

Preprocessing and assembly

The 779 raw metagenomic datasets were quality-trimmed and filtered using Fastp (v0.23.0)38 with the following parameters “-q 20 -u 30 -n 5 -y -Y 30 -l 80 --trim_poly_g” to remove low-quality reads and adapter contamination. The number of reads before and after quality control is provided in Supplementary Data 17. Host-derived sequences were then removed by aligning reads against host reference genomes using Bowtie2 (v2.5.0)39. The reference genome accession numbers were GCF_016772045.1 for O. aries and P. nayaur, and GCF_001704415.2 for C. hircus and C. ibex. The resulting clean reads (reads that have been quality-trimmed, filtered, and host-depleted) were assembled into contigs using MEGAHIT (v1.2.9)40 with the parameter “--k-list 21,41,61,81,101,121,141” while all other settings were kept at their defaults. The resulting contigs were mapped using BWA (v0.7.17-r1198)41, and sequencing depth was calculated with SAMtools (v1.18)42 alongside the jgi_summarize_BAM_contig_depths script. Metagenome-assembled genomes (MAGs) were binned using MetaBAT2 (v2.15)43 with parameters set to -m 2000, -s 200000, and --seed 2024. Each bin’s completeness and contamination were assessed using CheckM2 (v1.0.1)44, retaining only those with ≥50% completeness and ≤10% contamination6. Among these, bins with >90% completeness and <5% contamination were classified as high-quality MAGs, while the remaining bins were considered medium-quality MAGs45,46. To eliminate redundancy, MAGs were dereplicated using dRep (v3.4.3)47 at a 99% ANI threshold, applying parameters -pa 0.9 and -sa 0.99, resulting in a final non-redundant set of MAGs. The representative for each cluster was selected based on a quality score (Completeness - 5 × Contamination)48, aiming to prioritize high-integrity genomes while preserving strain-level genetic diversity.

Taxonomic annotation and gene prediction of MAGs

Taxonomic classification of the MAGs was conducted using GTDB-Tk (v2.3.2)49 based on the Genome Taxonomy Database (GTDB; release214). Open reading frames (ORFs) were predicted with Prodigal (v2.6.3)50 using default parameters.

Functional annotation

ARGs were identified by aligning predicted protein sequences against the CARD database (v3.1.4)51 using RGI (v5.2.1)51 with the command “rgi main --input_type protein --alignment_tool DIAMOND”. Only hits classified as “Perfect” or “Strict” were retained for downstream analyses. Genes conferring resistance to more than two antibiotic classes were classified as multi-drug resistance genes, while those linked to more than two resistance mechanisms were defined as multi-mechanism resistance genes. VFGs were detected by aligning protein sequences against the VFDB (v2023723)52 using the DIAMOND (v2.1.8.162)53 with the parameters “--min-score 60 --id 80 --query-cover 80 --evalue 1e-5” 6. MGEs were identified by BLASTN (v2.13.0) searches against a customized MGE database curated by Pärnänen et al. 54, with alignment thresholds set to an E-value ≤ 1e−5, minimum 80% identity, and at least 80% query coverage. To estimate the relative abundances of ARGs, VFGs, and MGEs at the read level, 20 million clean reads (reads that have been quality-trimmed, filtered, and host-depleted) were randomly subsampled from each metagenomic dataset and aligned to the gene set derived from all high- and medium-quality MAGs using Bowtie2 (v2.5.0)39 under default settings. Gene relative abundance was quantified as reads per kilobase per million mapped reads (RPKM), normalized for gene length and sequencing depth using SAMtools and custom in-house scripts. ARG-, VFG-, and MGE-associated genes were then extracted from this gene-level relative abundance table to generate read-level relative abundance profiles for each functional category.

Identification and processing of viral sequences

To identify viruses potentially involved in the dissemination of AMR, established methods were used to screen contigs >5000 bp from 2709 MAGs carrying ARGs55,56. Initially, CheckV (v1.0.1)57 was used to assess the ratio of viral to host genes. Contigs containing more than 10 host genes or where host genes outnumbered viral genes by more than fivefold were excluded. Proviral fragments were also identified using CheckV. Next, multiple viral detection strategies were employed, including (1) viral gene enrichment as determined by CheckV, (2) identification by DeepVirFinder (v1.0.19)58 with a score >0.90 and p-value < 0.01, and (3) viral classification by VIBRANT (v1.2.1)59 using default parameters. Contigs meeting any of these criteria were retained as putative viral sequences. To remove potential bacterial contamination, BUSCOs60 were used in combination with hmmsearch to detect bacterial single-copy orthologs within the viral candidates. The BUSCO ratio (calculated as the number of BUSCOs divided by total gene count) was computed, and sequences with a ratio of 5% or higher were excluded. The remaining sequences underwent quality assessment with CheckV, and only viral genomes with medium or higher completeness were retained for downstream analyses.

Viral clustering and representative sequence selection

To eliminate redundancy among viral sequences, viral operational taxonomic units (vOTUs) were constructed following a systematic workflow. First, all viral sequences underwent pairwise alignment using BLASTN with the parameters set to an e-value of 1e−10, a word size of 20, and allowing up to 99,999 alignments. Sequences sharing 95% or greater nucleotide identity over at least 70% of their aligned region were clustered into the same vOTU. Within each vOTU cluster, the longest sequence was chosen as the representative sequence for subsequent analyses.

Taxonomic clustering of viral populations

To assign vOTUs to genus- and family-level taxonomic groups, pairwise protein sequence comparisons were performed using DIAMOND with an e-value cutoff of 1e−5 and a maximum target sequence count of 99,999. From these alignments, the proportion of shared genes and the average amino acid identity (AAI) between vOTU pairs were calculated. Following the criteria established by Nayfach et al. 61, hierarchical clustering was conducted using the Markov Cluster Algorithm. For family-level clustering, connections with AAI > 20% and shared gene content above 10% were retained, applying an inflation parameter of 1.2. For genus-level clustering, more stringent thresholds were used, requiring AAI > 50% and shared gene content exceeding 20%, with an inflation parameter of 2.

Viral taxonomic annotation

Functional annotation of viral genomes was conducted through protein sequence alignments. Predicted proteins were compared against a comprehensive reference database, including the Virus-Host DB (May 2024 version), crAss-like viral proteins62, and other published viral protein datasets. Alignments were performed using DIAMOND with parameters set to a minimum identity of 30%, subject coverage of 50%, query coverage of 50%, and a minimum score of 50. For viral genomes with fewer than 30 genes, classification to a viral family required at least 20% of the encoded proteins to match proteins from the same family. For genomes containing 30 or more genes, at least 10 proteins are needed to match the same viral family for taxonomic assignment55.

Correlation analysis

Correlation analyses were conducted using R (v4.4.1). Procrustes analysis was performed with the procrustes function from the ‘vegan’ package. Spearman’s rank correlation coefficient was applied to evaluate the relationships between ARGs and VFGs, as well as between ARGs and MGEs, considering both diversity and abundance profiles.

Comparative analysis of ARGs in Caprinae and human gut microbiota

To assess similarities and differences in ARG profiles between Caprinae animals and humans, a comparative analysis was performed using a publicly available human gut microbiome datasets. A total of 60,664 human-derived MAGs were obtained from the Integrated Gut Genomes (IGG) database63. All human MAGs underwent consistent downstream processing, including taxonomic classification, gene prediction, quality assessment, and functional annotation. Applying the same quality criteria as for Caprinae MAGs (completeness ≥ 50%, contamination ≤ 10%, and 99% ANI dereplication), a total of 36,467 human-derived MAGs were retained, including 12,909 high-quality MAGs (35.40%) and 23,558 medium-quality MAGs (64.60%) (Supplementary Data 18). ARG annotation was performed uniformly across both datasets. Using these annotations, we systematically compared ARG counts across gut microbiota from Caprinae animals and humans, with special attention to shared and host-specific distributions of ARG-associated drug classes and their differences in gene numbers.

Identification of clinically critical ARGs

We collected ARGs potentially conferring resistance to clinically important antibiotics, including tigecycline, vancomycin, polymyxins, and β-lactams, from the NCBI database (Supplementary Data 14). In total, 113 ARG sequences representing 80 distinct ARGs were retrieved. The ARG sequences shared between Caprinae and human gut microbiota were then aligned against these ARGs using BLASTn, with thresholds of identity ≥ 90%, coverage ≥ 70%, and E-value ≤ 1e−10.

Statistics and reproducibility

Genome structure visualization and target gene annotation were performed using the CGView platform (https://proksee.ca/). Gene arrow diagrams were created with the ‘gggenes’ package (v0.5.1), and network visualizations were generated using the ‘ggraph’ package (v2.1.0). Venn diagrams were plotted with the ‘ggvenn’ package (v0.1.10), while all other figures were produced using the ‘ggplot2’ package (v3.3.6). All statistical analyses were carried out in the R programming environment (v4.4.1).

Ethics declarations

Ethics approval was not required for this study as it exclusively utilized publicly available metagenomic data.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.