Introduction

Fermented foods have a long history of consumption in China and remain an integral part of the contemporary diet. In recent years, fermented foods have attracted growing consumer interest, driven by their perceived health benefits and a rising body of research linking gut microbiota to human health1,2,3. Fermented foods offer a potential means to replenish “missing microbes” depleted by industrialized or Westernized dietary patterns4. Accumulating evidence indicates that fermented foods can directly or indirectly influence the composition and activity of gut microbiota, resulting in measurable impacts on human health, including improved gut barrier function and modulation of the immune system5,6. In 2019, the International Scientific Association for Probiotics and Prebiotics (ISAPP) recommended including fermented foods in dietary guidelines, citing their significant content of live and potentially health-promoting microorganisms7. Studies have shown that microorganisms present in fermented foods can survive gastric transit and reach the colon8,9. A notable study demonstrated that food-associated lactic acid bacteria were significantly represented in the human fecal metagenome following fermented food consumption10. Upon entering the gastrointestinal tract, these microorganisms can transiently colonize, synthesize bioactive compounds that inhibit enteric pathogens7, and mediate regulatory effects on the intestinal epithelium11. Long-term consumption of fermented foods may reinforce these dynamic interactions, further underscoring their relevance to human health.

Despite the documented health benefits associated with microorganisms in fermented foods, significant safety concerns remain. Fermented foods can harbor foodborne pathogens, such as Salmonella spp. and Listeria monocytogenes12, some of which may be introduced during the production process13,14. Moreover, fermented foods have historically served as important vectors for the transmission of antibiotic resistance genes via plasmids and other mobile genetic elements15,16. Bacteria isolated from fermented foods carrying mobile antibiotic resistance genes can transfer these elements to human commensal and pathogenic microbiota through horizontal gene transfer17. Recent research indicated that dietary interventions involving fermented foods can significantly increase antibiotic resistance within the gut microbiota of most individuals15. The public health risks associated with fermented food consumption are likely underestimated, particularly among vulnerable populations, such as those with gastrointestinal disorders or compromised immune function. These considerations highlight the need for a comprehensive assessment of both the benefits and potential risks of fermented food consumption, as well as the importance of enhanced food safety monitoring.

Fermented soybean products are widely consumed in China, where traditional fermentation techniques—often involving spontaneous or minimally controlled microbial processes—are deeply rooted in cultural and culinary practices18,19. Key examples include fermented tofu (fu ru, FR), soybean paste (dou jiang, DJ), and fermented black beans (dou chi, DC), which are central to Chinese cuisine. Fermentation of soybean produces numerous functional and bioactive compounds that may offer health benefits such as antioxidative, anti-inflammatory, as well as blood sugar- and lipid-lowering effects20,21,22. Previous studies examining the microbiota of these fermented foods have predominantly focused on fermentation microorganisms, with limited investigation into their broader microbial profiles and potential impacts on gut health. Although a comprehensive review has highlighted safety concerns associated with fermented foods in China23, most microbiota studies to date have relied on 16S rRNA gene amplicon sequencing, which generally offers resolution only at the genus level and limited insight into functional potential. In contrast, shotgun metagenomics sequencing enables species- and strain-level taxonomic resolution and permits genome assembly, providing a deeper understanding of microbial composition and functionality within food matrices. This approach offers new opportunities to illuminate the complex interactions between fermented soy products and human gut health.

In this study, we applied shotgun metagenomic sequencing to characterize the microbial and functional compositions of three commonly consumed fermented soy products in China. By comparing the microbiota and functional profiles across these products, we aimed to delineate product-specific microbial characteristics. Furthermore, through comparative analysis with the gut microbiota of the Chinese population, our study pursued two objectives: (i) to explore the relationship between microorganisms in fermented soy products and the gut microbiota of the Chinese population; and (ii) to investigate the potential transfer of antibiotic resistance genes from fermented foods to gut microbes. Our findings provide detailed insights into the microbiota of fermented soy products, shedding light on potential public health risks associated with their consumption.

Results

Microbial composition in three fermented soy products

A total of 93 fermented soy products (DC, DJ and FR) were purchased from online and offline supermarkets across 20 provinces. All samples underwent shotgun metagenomic sequencing, generating a total of 203.4 Gbp data, with an average of 12,691,369 reads per sample (Supplementary Table S1).

Compared to DC and FR, DJ samples exhibited a significantly higher abundance of eukaryotic organisms (Fig. 1A), likely due to its two-stage fermentation process, which involves mold-driven koji preparation followed by yeast proliferation during brine fermentation24,25. To minimize false positives while maintaining sensitivity for rare taxa, species with assigned reads greater than 500 in at least 10 samples per group were retained for further analysis (Kraken2-derived taxonomic abundances are provided in Supplementary Table S2). Notably, DJ samples contained 10 distinct eukaryotic species, while DC and FR samples had only three and one identified species, respectively (Supplementary Fig. S1A). All three fermented foods shared a single eukaryotic species, Aspergillus oryzae, which was highly prevalent in each (Supplementary Fig. S1B), consistent with its established role as a starter culture in China fermentations26.

Fig. 1: Distinct microbial composition of DC, DJ and FR.
figure 1

A Relative abundance at the Domain, Phylum, Order, and Genus level. B Alpha diversity comparison using the Shannon index. C Non-metric multidimensional scaling (NMDS) plot showing distinct microbial compositions. D Mantel test assessing the relationship between geographical distance and Bray–Curtis distance of microbiome. Euclidean distance was used for geographical distance (green circles), and Bray–Curtis distance for gut microbiome (purple circles). E LEfSe results at Phylum, Order, and Species level (LDA score >4). F Presence of foodborne pathogens in fermented soy products (read count >1000 as cut-off).

Regarding bacterial composition, the dominant phyla found in DC, FR, and DJ samples were Firmicutes (80.9%, 48.8%, 46.2%, respectively) and Proteobacteira (9%, 41.1%, 25.3%, respectively) (Fig. 1A). In DC, Bacillales was the dominant order (66%), whereas Lactobacillales was most prevalent in FR and DJ samples (39.9% and 30.1%, respectively), and was also the second most abundant order in DC (14.4%) (Fig. 1A). Notably, Lactobacillales encompasses a diverse group of lactic acid bacteria that play crucial roles in food production, fermentation, and development of probiotic products. The primary genera in DC were Bacillus (50.3%), Staphylococcus (5.8%), Tetragenococcus (5.31%), and Weissella (2.46%). In DJ samples, dominant genera included Staphylococcus (9.38%), Tetragenococcus (6.60%) Weissella (5.49%), and Leuconostoc (3.56%). FR samples primarily contained Tetragenococcus (15.4%), Enterobacter (9.19%), Lactococcus (8.04%), Pseudomonas (7.22%), and Leuconostoc (5.33%) (Fig. 1A). FR had the highest proportion of identified bacterial species (76.2% of total bacterial species), with 52.9% exclusive to FR samples (Supplementary Fig. S1A). Prevalent genera found in over 90% of samples from each group, many of which are lactic acid bacteria belonging to the order Lactobacillales within the class Bacilli, include Enterococcus, Lactococcus, Leuconostoc, Tetragenococcus, and Weissella.

Alpha diversity analysis revealed that the DC group exhibited the least species diversity among the three fermented soy products (Shannon index, Fig. 1B; richness index Observed and Chao1 in Supplementary Fig. S1C, Supplementary Table S3). This observation is likely due to the predominantly solid-state nature of DC samples27. To investigate the microbial community structures, non-metric multidimensional scaling (NMDS) based on Bray–Curtis dissimilarity was performed. The NMDS plot showed significant differences in microbial communities across the three fermented foods (Fig. 1C), with PERMANOVA analysis revealing a 19% contribution to the total variability (R2 = 0.19, p = 0.001). Notably, geographic origin accounted for an even larger proportion of dissimilarity (R2 = 0.26, p = 0.001). These results indicate that microbial composition and structure in fermented soy products are influenced by both food type and geographic origin. A plausible hypothesis is that proximity between regions results in similar fermentation practices. To validate this theory, the Mantel test was applied to assess the correlation between geographical distance and Bray–Curtis dissimilarity of the microbiome (Supplementary Table S4). The results of the Mantel test, whether applied to the entire sample set (r = 0.092, p = 0.0045) or subgroup samples, supported this relationship (Fig. 1D). This indicated a strong positive correlation between the geographic distance and Bray–Curtis dissimilarity matrices (Supplementary Fig. S1D).

Linear discriminant analysis Effect Size (LEfSe) was used to identify enriched microbial taxa in fermented foods. The results highlighted that Firmicutes and Proteobacteria were the dominant phyla in DC and FR samples, respectively, while DJ samples exhibited enrichment in others (Fig. 1E). Notably, 56, 89, and 115 species were enriched in DC, DJ, and FR samples, respectively (Linear Discriminant Analysis, LDA > 2). The most enriched species in each fermented food included fermentation starters: Bacillus subtilis (LDA = 5.07) in DC, Aspergillus oryzae (LDA = 4.88) in DJ, and Tetragenococcus halophilus (LDA = 4.81) in FR. B. subtilis was identified as a major component in several DC products, and other Bacillus species, such as B. velegensis, B. amyloliquefaciens, and B. licheniformis, were also commonly used in fermented soy products. These Bacillus species are known for producing bioactive compounds with health benefits28,29. FR samples were enriched with lactic acid bacteria, including Lactococcus lactis, Ligilactobacillus acidipiscis, Leuconostoc citreum, Leuconostoc lactis, Leuconostoc mesenteroides (Supplementary Table S5).

A notable presence of opportunistic pathogenic species from the Proteobacteria phylum and Enterobacteriaceae family was observed across all groups. These included Klebsiella pneumoniae, Klebsiella aerogenes, Klebsiella oxytoca, Klebsiella michiganensis and Enterobacter hormaechei enriched in FR, as well as Enterobacter cancerogenus in DJ. Using a read count threshold of >1000, we also detected several foodborne pathogens, including Bacillus cereus, Clostridium botulinum, Clostridium perfringens, Cronobacter sakazakii, Escherichia coli, Listeria monocytogenes, Salmonella enterica, Staphylococcus aureus, Vibrio anguillarum and Yersinia enterocolitica (Fig. 1F, pathogen abundance in Supplementary Table S6). Among these, Salmonella enterica was notably present in FR (LDA = 2.63).

Functional landscapes of the microbiome in fermented soy products

To better understand the functional potential of microbiome in fermented soy products, sequenced reads were assembled into contigs, and predicted genes were dereplicated. This process generated the Soybean Fermented Food non-redundant Gene Catalogue (SFFGC, see Methods), which contained a total of 2,359,387 genes. Functional annotation was performed using multiple databases, including Clusters of Orthologous Genes (COG), Kyoto Encyclopedia of Genes and Genomes (KEGG), Enzyme Commission Categories (EC), Comprehensive Antibiotic Resistance Database (CARD), and Carbohydrate Active Enzymes (CAZy).

Overall, 84.4% of the SFFGC genes were successfully annotated with at least one database (Fig. 2A). Among the annotated genes, only 8.54% originated from DJ samples, 38.1% from DC, and 53.4% from FR, aligning with the richer microbial diversity observed in FR (76.2% contribution to the total detected species). In total, 11930 KEGG Orthology (KO) groups and 327 CAZy gene families were identified. Functional abundances were calculated based on the cumulative abundance of SFFGC genes assigned to each functional element. Additionally, the abundance of 562 MetaCyc pathways was profiled using HUMAnN3. Bray–Curtis based NMDS plots of KO profiles revealed significant functional differentiation among the three fermented food types (Fig. 2B).

Fig. 2: Functional landscapes of the fermented soy product (DC, DJ and FR) microbiomes and comparative analysis.
figure 2

A UpSet plot of annotated gene count(log10) by 5 databases. The number above each bar represent the total number of features in the corresponding intersection. B NMDS plot based on Bray–Curtis dissimilarity based on KO abundances. C Treemap of LEfSe analysis illustrating the number of food type-specific enriched functional elements across multiple categories, including KO, CARD, CAZy and MetaCyc pathways. D Bar plot showing food type-specific enriched CAZy gene counts across 6 CAZy gene families: glycoside hydrolases (GHs), glycosyltransferases (GTs), polysaccharide lyases (PLs), carbohydrate esterases (CEs), auxiliary activities (AAs) and carbohydrate-binding modules (CBMs). E Venn diagram depicting the distribution of annotated antibiotic resistance genes (ARGs) across three fermented soy products.

LEfSe analysis identified widespread functional enrichment across the fermented foods, with differentially abundant features comprising 7.4% of KO genes, 79.8% of CAZy families, and 21.5% of MetaCyc pathways (LDA score >2, Supplementary Table S7). Notably, FR was enriched for antibiotic resistance genes (28.34%), while DJ was most enriched for CAZy families (41.28%) (Fig. 2C). A higher proportion of HUMAnN3-unassigned reads was observed in all fermented foods (DJ: 66.7%; DC: 40%; FR: 35.4%), suggesting that their microbial communities harbor a substantial reservoir of uncharacterized or poorly annotated metabolic functions absent from current reference databases. Specifically, a significant increase in auxiliary activities (AA) and glycosyltransferases (GT) counts was observed among DJ-enriched CAZy gene families (Fig. 2D), likely reflecting microbial adaptations for degrading complex soybean substrates (AA) and synthesizing functional glycoconjugates (GT) during DJ’s multi-phase fermentation.

Regarding the resistome, 2406 distinct antibiotic resistance genes were identified and annotated with 734 terms from the Antibiotic Resistance Ontology (ARO). Notably, resistance-nodulation-cell division (RND) antibiotic efflux pump resistance genes were the most abundant (26.6% (640 genes)), followed by the major facilitator superfamily (MFS) antibiotic efflux pump genes (17.6% (424 genes)). Of the 734 ARO terms, 373 were shared across all three fermented soy products, each present in at least 10% of samples (Fig. 2E). LEfSe analysis (LDA score >2) revealed that 54.4% (399) of these genes exhibited significant variation across the different food types (Fig. 2C). The co-enrichment of both ARGs and Enterobacteriaceae species in FR is biologically consistent, as this bacterial family is the well-established reservoir of antibiotic resistance genes30.

Antagonistic activity of Lactobacillales or Bacillales against Enterobacterales

To investigate interspecies relationships, co-abundance microbial species networks were constructed for each food category. Only species present in at least five samples per group were included, with network inference performed using SparCC algorithm and co-occurrence/co-exclusion relationships determined at FDR < 0.05. We identified 458 correlations in DC, 371 in DJ, and 401 in FR (FDR < 0.05) (DC: Fig. 3A, FR: Fig. 3B and DJ: Fig. 3C; correlation details provided in Supplementary Table S8). Species exhibited a tendency to form “bacteria clique” based on taxonomic order affiliations.

Fig. 3: Antagonistic activity inferred from species interaction network.
figure 3

AC SparCC network plots showing co-abundance and co-exclusion correlations between species in DC, FR and DJ samples. Nodes represent species involved in significant co-abundance (red edges) or co-exclusion (blue edges) relationships. Node color indicates the taxanomic Order of each species. D Box plots illustrating the distribution of correlation coefficients between species, separated into intra-Order and inter-Order comparisons.

Competition between Lactobacillales and Enterobacterales was consistently observed across all three categories. In each group of fermented soy products, species belonging to the same taxonomic order demonstrated positive correlations, whereas species from different orders-such as Bacillales and Enterobacterales, or Enterobacterales and Lactobacillales-exhibited negative correlations. An exception was noted between Bacillales and Lactobacillales, which showed positive correlations in DC and DJ but negative correlations in FR (Fig. 3D). Species with a degree above the 80th percentile were designated as keystone species. In the context of DJ and FR networks, 50% (DJ) and 55.6% (FR) of the identified keystone species belonged to Enterobacterales (Supplementary Fig. S2), highlighting their pivotal role in the microbial ecosystem of fermented foods.

Recovery of fermented food bacteria genomes from metagenomes

To better characterize the genomic distinctions among the three fermented soy products, we used the binning module in metaWRAP to generate metagenomic assembled genomes (MAGs) from assembled contigs. A total of 707 MAGs were recovered, each with ≥70% completeness greater (average: 89.08%) and <10% contamination (average: 2.5%), ensuring retention of only medium- to high-quality genomes (Supplementary Fig. S3A).

The median genome size of the 707 MAGs was 2.31 megabases (MB) (2.08–3.75 MB), with N50 values ranging from 1.7 kilobases to 2.8 Mb, indicating a continuum from fragmented to near chromosome-scale assemblies. Of the total dataset, 363 MAGs (51.3%) achieved >90% completeness and <5% contamination, with the majority assigned to the genera Bacillus (n = 98), Enterococcus (n = 56), Weissella (n = 51), Leuconostoc (n = 51) and Lactococcus (n = 41). Ten MAGs achieved 100% completeness, representing near-complete genomes, and included both beneficial (e.g., probiotic Leuconostoc falkenbergense) and pathogenic (e.g., Proteus mirabilis) species. Among the recovered MAGs, 36%, 12%, and 52% originated from DC, DJ and FR, respectively (Supplementary Table S9). The 707 MAGs were dereplicated at 95% average nucleotide identity (ANI), resulting in 304 non-redundant MAGs, which were used to construct the phylogenetic tree (FR: Fig. 4, DC: Supplementary Fig. S3B, DJ: Supplementary Fig. S3C).

Fig. 4: Phylogenetic tree of dereplicated metagenomic assembled genome (MAGs) from FR samples.
figure 4

Progressing outward from the center, concentric rings indicate the food origin of each MAG, the metaProbiotics score, and whether the MAG remains unclassified (outermost ring).

The 707 recovered MAGs comprised 2 archaea and 705 bacterial genomes. The majority of bacterial genomes belonged to the phyla Firmicutes (n = 533, 76%), Proteobacteria (n = 87, 12%), Actinobacteriota (n = 50, 7%) and Bacteroidota (n = 30, 4%). A substantial proportion of MAGs (n = 648 or 91.6%) were successfully assigned to 184 known species, while 59 MAGs (8.4%) remained unclassified (Supplementary Fig. S3D). These unclassified MAGs may represent novel microbial lineages or understudied taxa specific to traditional fermentation environments, highlighting gaps in current reference databases.

To assess overlap with previously reconstructed food microbiomes, we compared our MAGs against 666 food-associated MAGs from Pasolli et al.10, using a 95% ANI threshold, corresponding to intraspecies-level similarity. The 666 food MAGs were reconstructed from 303 metagenomes across 12 datasets, predominantly representing Western fermented foods. Only 69 (22.7%) of our dereplicated MAGs matched 275 (41.3%) of the Pasolli et al. MAGs, suggesting that Chinese fermented foods harbor distinct microbial populations not commonly found in Western counterparts.

Potential probiotic candidates were predicted using metaProbiotics31, which applies natural language processing (NLP) techniques to DNA 8-mers and builds random forest models for probiotic prediction. Based on a metaProbiotics score threshold >0.5, 138 of the 304 dereplicated MAGs were classified as potential probiotics (Fig. 4). Among these, 53.1% belonged to the order Lactobacillales -predominantly Leuconostoc (12.7%), Weissella (11.9%), Tetragenococcus (10.4%), and Lactococcus (6.5%)- and 22.0% to Bacillales, primarily Bacillus (21.5%) with minor representation of Priestia (0.5%). Thirteen species including Leuconostoc mesenteroides, Pediococcus acidilactici, Pediococcus pentosaceus, Leuconostoc citreum, Lactococcus lactis, Xanthomonas campestris, Priestia megaterium, Lapidilactobacillus dextrinicus, Priestia megaterium, Lactobacillus delbrueckii, Streptococcus thermophilus, Acidipropionibacterium acidipropionici and Limosilactobacillus panis, were listed on the Qualified Presumption of Safety (QPS) list32 (Supplementary Table S9), collectively highlighting fermented foods as a reservoir of functionally diverse probiotic candidates.

Among the 2406 annotated antibiotic resistance genes (ARGs), 721 were detected in 209 MAGs, with 342 ARGs (47.43%) classified under the Enterobacterales order. The five most prevalent ARG-harboring species were Klebsiella pneumoniae (32 instances), Enterobacter hormaechei_A (31 instances; the “_A” suffix indicates GTDBtk subspecies designation), Acinetobacter baumannii (28 instances), Enterobacter cloacae (27 instances), and Enterobacter kobei (26 instances). Notably, ARG families associated with Enterobacterales included resistance-nodulation-cell division antibiotic efflux pump and major facilitator superfamily antibiotic efflux pump (Supplementary Fig. S4). These findings suggest that Enterobacterales species serve as a principal reservoir of antimicrobial resistance in fermented soy products.

Comparative analysis suggests Klebsiella strains in the gut of Chinese individuals may originate from fermented foods

To investigate the prevalence of fermented food-derived bacteria within the gut microbiome of healthy Chinese individuals, we analyzed the taxonomic profiles of 689 individuals from 8 publicly available Chinese metagenomic datasets curated in CurateMetagenomicData33. For consistency, taxanomic profiling of our fermented food metagenomic data was also performed using MetaPhlan3. Only species present in more than 50% of samples within each food group were included, resulting in a total of 67 species for comparative analysis.

The results revealed that the most prevalent species belonged to the order Enterobacterales, including three Klebsiella species: Klebsiella pneumoniae (62.4%), Klebsiella variicola (50.5%) and Klebsiella quasipneumoniae (42.5%). Additionally, members of the Enterobacter cloacae complex (16.9%) and Citrobacter freundii (12.04%) were prominent (Fig. 5A).

Fig. 5: Comparison analysis between fermented food microbiota and the gut microbiota of healthy Chinese individuals.
figure 5

A Heatmap of microbial species shared across all three fermented soy products, with adjacent bar plots showing their prevalence across 689 gut metagenomes from Chinese individuals. B Overlap between MAGs from fermented foods and Chinese gut MAGs, based on fastANI (≥95% ANI threshold for species-level match). The x-axis denotes eight Chinese gut metagenomic datasets sourced from curateMetagenomicData. C Phylogenetic reconstruction using StrainPhlAn of Klebsiella quasipneumoniae and Klebsiella pneumoniae strains from fermented foods (DC, DJ, FR) and Chinese gut samples. Clustering of strains within the same clade or showing short phylogenetic distances indicates potential strain-level transmission from food to gut.

Among lactic acid bacteria, Weissella confusa (7.55%) and Lactococcus lactis (4.35%) were the most common. Notably, the prevalence of Lactococcus lactis in the Chinese population was lower than the global average of 7.5% reported by Pasolli et al.10. To assess potential associations between Enterobacterales and lactic acid bacteria in the gut microbiome, samples containing both K. pneumoniae and either W. confusa or L. lactis were selected. Chi-squared tests demonstrated mutually exclusive relationship between Weissella confusa/Lactococcus lactis and Klebsiella pneumoniae (both p-values < 2.2e−16), supporting findings from the network analysis that suggested an antagonistic relationship between these groups.

To explore bacterial relationships at the genomic level, we retrieved 24,417 gut MAGs from Chinese individuals from the 154,723 human MAGs assembled by Pasolli et al.34. Using fastANI (95% ANI threshold for intraspecies identification), we identified 203 matches between the 303 food-derived MAGs and gut MAGs. Notably, MAGs annotated as K. pneumoniae had the highest number of matches (Fig. 5B), suggesting a close genomic relationship between strains found in fermented foods and those present in the gut. Additionally, potentially pathogenic species were more prevalent than putative beneficial species among gut MAGs. Only two probiotic species - Weissella confusa and Streptococcus thermophilus - matched four and two gut MAGs, respectively. In contrast, we detected multiple pathogenic species, including Proteus mirabilis, a widespread pathogen associated with various infections35, and Morganella morganii, colorectal cancer-associated species enriched in the gut microbiota of individuals with inflammatory bowel disease (IBD) and colorectal cancer (CRC)36.

Finally, to investigate strain-level relationships between fermented food and gut microbial species at the strain level, StrainPhlan337 was used to construct the phylogenetic relationships of species common to both fermented foods and gut samples of Chinese individuals. Only two phylogenetic trees -Klebsiella quasipneumoniae and Klebsiella pneumoniae—showed terminal branches representing lineages found in both food and gut samples (Fig. 5C). The phylogenetic tree for K. pneumoniae displayed a distinct branch specific to gut samples. However, several gut strains were interspersed among lineages predominantly populated by fermented foods strains, suggesting potential transmission of K. pneumoniae from fermented foods to the human gut.

Potential horizontal gene transfer of antibiotic resistance genes from fermented foods to the human gut microbiome

We conducted a comprehensive analysis to assess the potential risk of ARGs in fermented foods being transmitted to the human gut microbiota. A used for sequence alignment blastp search was performed using 2406 ARG protein sequences identified from the three fermented soy products against 216,849 ARGs from the human microbiome, curated by Lee et al.38.

This analysis revealed 898 matches between fermented food-derived ARGs and 4518 analogous ARGs in the human gut microbiome, of which 4219 hits (93.38%) were derived from fecal metagenomes. The matched ARGs spanned 140 ARG families, representing 32.4% of the known human gut ARG repertoire. Given that Lee et al. used the same Species-Level Genome Bins (SGBs) dataset curated by Pasolli et al. (http://segatalab.cibio.unitn.it/data/Pasolli_et_al.html), we further explore potential ARG transfer pathways by linking fermented food-derived ARGs to Chinese SGB genomes (Supplementary Table S10). Seventeen known species in fermented foods were identified as potential ARG donors, predominantly from the order Enterobacteriaceae (e.g., Klebsiella pneumoniae, Enterobacter hormaechei A and Enterobacter cloacae). Notably, ARGs were also found in non-Enterobacteriaceae probiotic species such as Lactococcus lactis and Leuconostoc lactis. On the recipient side, 17 gut bacterial species carrying homologous ARGs were identified, also mainly affiliated with Enterobacterales. Five overlapping species -Proteus mirabilis, Klebsiella pneumoniae, Enterobacter cloacae, Enterococcus faecalis, and Morganella morganii- were detected (Fig. 6).

Fig. 6: Visualization of potential ARG transfer.
figure 6

The Sankey diagram illustrates inferred HGT pathways of ARGs from fermented food-associated bacteria (left) to human gut bacteria species (right), with ARG families implicated in transfer events shown in the center. Nodes representing the same species in both food and gut microbiomes are color-matched, highlighting overlapping taxa potentially involved in ARG exchange.

Most of the transferred ARGs were associated with efflux pump systems, including acrD, acrB, oqxB, emrA, emrB, and mdtC. These genes encode components of Resistance-Nodulation-Division (RND) and Major Facilitator Superfamily (MFS) efflux systems, which actively expel a broad range of antibiotics, such as β-lactams and fluoroquinolones39,40. The second most prevalent functional category involved modulation of membrane permeability, including regulators such as OmpK37, ramA, and marA. These genes control outer membrane porin expression (e.g., OmpK) and transcriptionally regulate antibiotic uptake, particularly for carbapenems41,42,43. The frequent co-occurrence of membrane permeability genes with efflux pump genes (e.g., acrB) suggests potential synergistic resistance mechanisms conferring enhanced resistance.

We observed consistent ARG profiles between fermented food- and gut-derived strains of Klebsiella pneumoniae, with similar patterns in Proteus mirabilis and Morganella morganii. Importantly, ARGs from fermented food Enterobacter hormaechei in fermented foods were linked to multiple gut bacterial species, suggesting the possibility of cross-species horizontal gene transfer (Fig. 6). Collectively, these findings strongly support that the potential for ARGs in fermented foods to be horizontally transferred to the human gut microbiota, highlighting complex microbial interactions at the food-gut interface. However, it should be noted that while comparative genomic evidence supports potential ARG transmission, the study design did not directly address the molecular mechanisms underlying horizontal gene transfer events.

The resistome risk score analysis revealed that FR had the highest ecological resistome risk (ERR), followed by DC. In contrast, DC showed a relatively higher human health resistome risk (HHRR) than the other groups (Supplementary Fig. S5). Further analysis of the 5 DC samples with the highest HHRR identified 3 as originating from natto in Japan. These findings underscore the variability in antibiotic resistance risks among different types of fermented foods.

Discussion

The ISAPP has advocated for the inclusion of fermented foods as a distinct category in dietary guidelines7. Accumulating evidence supports their health benefits44,45,46,47, particularly through modulation of the gut microbiome4,27,48,49,50. However, a more comprehensive understanding of the microbial communities in fermented foods is needed, particularly in relation to safety and their potential impact on the human gut microbiome. Our study addressed this gap by applying metagenomics approaches to three traditional Chinese fermented soy products examining both their microbial compositions and potential interactions with the gut microbiota of the Chinese population. These findings provide a microbiological basis that may inform future dietary recommendations in China.

Each product displayed a distinct microbial profile, shaped by its fermentation process, starter culture, and environmental conditions. DJ exhibited particularly high fungal diversity, likely due to its two-stage fungal-yeast fermentation process, which creates ecological niches conducive to diverse fungi and yeasts24,25. DC were dominated by Bacillales, especially Bacillus spp., consistent with solid-state fermentation conditions that favors aerobic, spore-forming bacteria51. In contrast, DJ and FR were enriched in halophilic and heterofermentative Lactobacillales, including Tetragenococcus and Leuconostoc, which thrive under salt-rich, anaerobic or microaerophilic conditions52,53. These patterns reinforce previous findings21,22,54,55 and demonstrate how salt concentration, oxygen availability, and fermentation style shape microbial succession.

Geography exerted a modest but significant influence on microbial composition (Mantel test: R = 0.093, p = 0.005), suggesting that local environmental factors, raw materials, and artisanal practices contribute to microbial variability. This supports the concept of microbial terroir, wherein geographic origin imparts distinctive microbial signatures to fermentation foods56,57.

An unexpected finding was the detection of Plasmodium species in DJ samples (Supplementary Fig. S1B), with >500 reads in at least 10 samples identified via Kraken2 classification. As Plasmodium is not typically associated with fermented foods, potential sources include contamination from raw materials or insect exposure during fermentation. While the presence of these sequences does not indicate viable pathogens, it highlights the utility of metagenomics for food safety surveillance – capable of detecting biological contaminants often missed by conventional screening methods. Similarly, low-abundance reads from opportunistic (Klebsiella, Enterobacter) and foodborne pathogens (Salmonella enterica, Listeria monocytogenes) underscore the importance of strict hygiene controls, particularly in small-scale or artisanal production settings.

The construction of co-abundance networks revealed that microbial species tend to cluster into ‘bacterial cliques’ aligned with taxonomic orders. Such clustering may underlie observed antagonistic interactions, potentially reflecting niche competition, resource partitioning, or production of inhibitory compounds such as bacteriocins. For example, species within Lactobacillales, Bacillales, or Enterobacterales showed strong intra-order positive correlations, likely due to cooperative interactions such as nutrient sharing or co-metabolism. Conversely, consistent negative correlations were observed between Lactobacillales and Enterobacterales, as well as between Bacillales and Enterobacterales, across all three fermented soy products. These antagonistic relationships likely reflect competitive exclusion driven by resource competition and the production of inhibitory metabolites.

Lactic acid bacteria (LAB), a major group within Lactobacillales, produce organic acids such as lactic acid, which reduce pH and inhibit the growth of acid-sensitive pathogens like Enterobacterales58,59. This mechanism is well-established in food microbiology60, and parallels interactions observed in the gut, where LAB can inhibit colonization of multidrug-resistant Enterobacteriaceae61. In vitro and animal studies further suggest that Lactobacillus spp. suppress pathogens like Klebsiella pneumoniae59,62. Likewise, Bacillales such as Bacillus subtilis produce antimicrobial compounds and enzymes that contribute to pathogen suppression63,64. These interactions are integral to fermentation stability and food safety.

Interestingly, while Bacillales and Lactobacillales showed predominantly positive associations in DC and DJ, their interactions were negative in FR. This contrast may reflect differences in fermentation conditions or microbial succession dynamics, such as competition versus cross-feeding. These variations highlight the importance of environmental context in shaping microbial interactions during fermentation.

The recovery of 707 MAGs from three fermented soy products offers the first genomic-resolution overview of their microbial consortia. Nearly 10% of MAGs could not be assigned to known species, while approximately 45% were predicted to have probiotic potential, highlighting fermented foods as a reservoir for novel microbial species and candidate probiotics. This MAG collection not only enriches existing microbial genome databases but also advances our understanding of the ecological roles and safety implications of microbes in fermented foods.

Functional annotation revealed marked differences across the three fermented soy products. Notably, FR exhibited a significant enrichment of ARGs, consistent with previous studies65. Nearly half of these ARG-containing MAGs were affiliated with the Enterobacterales order, suggesting that Enterobacterales serve as key ARG carriers in FR. These findings underscore the need for strengthened food safety monitoring in fermented food production.

Comparative analysis further suggests that fermented foods may serve as a source of gut-associated Klebsiella species. High prevalence rates of Klebsiella pneumoniae (62.4%), Klebsiella variicola (50.5%), and Klebsiella quasipneumoniae (42.5%) in healthy Chinese individuals align with their frequent detection in fermented soy products, indicating possible dietary origins6. Synergistic interactions among Klebsiella species may enhance their capacity for gut colonization66. An increased relative abundance of Klebsiella pneumoniae has been associated with elevated risks of bacteremia, nosocomial transmission, and persistent colonization67. Beyond its established role in hospital-acquired infections, a high prevalence of Klebsiella species in the gut presents broader health concerns. Gut colonization by K. pneumoniae and related taxa can serve as reservoirs for antimicrobial resistance, facilitating gene dissemination and increasing susceptibility to subsequent infections68,69. Additionally, Klebsiella has been implicated in the onset and exacerbation of gastrointestinal disorders, including inflammatory bowel disease (IBD), through its capacity to provoke pro-inflammatory responses in the gut epithelium70.

These species are known to synergistically promote gut colonization and have been linked to nosocomial infections, bacteremia, and prolonged gut persistence67,68,69. Their abundance may also antagonize beneficial lactic acid bacteria (e.g., Weissella confusa, Lactococcus lactis), contributing to their relatively lower prevalence in the Chinese gut microbiome10. Moreover, Klebsiella can induce or exacerbate gastrointestinal diseases such as inflammatory bowel disease (IBD) by triggering inflammatory responses in the gut epithelium70. This dietary-driven imbalance could have broader implications for gut homeostasis and susceptibility to disease.

Genomic analysis revealed strong evidence for horizontal gene transfer (HGT) of ARGs between food-associated microbes and the human gut microbiome. Shared ARG profiles between fermented food-derived and gut-resident strains of K. pneumoniae, Proteus mirabilis, and Morganella morganii, along with inferred ARG transfer from Enterobacter hormaechei to multiple gut species, suggest active cross-species gene flow. These events are likely mediated by mobile genetic elements (MGEs), such as plasmids, transposons, and integrons, consistent with the known role of the gut as a hotspot for HGT71,72,73. The identification of ARGs in strains commonly used as starter cultures or probiotics, including lactic acid bacteria, raises additional concerns about fermented foods as vectors of antimicrobial resistance27,74. Evidence has been growing of continuous gene exchange between pathogenic strains and ostensibly harmless or even beneficial commensal species. The implication is that the latter are now considered “reservoirs” of ARGs, especially lactic acid bacteria in fermented dairy products75,76,77. Multidrug-resistant K. pneumoniae strains with high minimum inhibitory concentrations (MICs) to clinically relevant antibiotics further heighten the public health risk15,78.

While these genomic patterns suggest a plausible route for ARG dissemination via diet, direct evidence of HGT events following consumption remains lacking71. Future studies integrating metagenomics with functional assays, plasmid tracing, and longitudinal human sampling are essential to validate these findings and assess clinical implications. In summary, our study highlights fermented foods – particularly those containing Enterobacteriaceae — as potential reservoirs and vectors of antibiotic resistance, necessitating more rigorous microbial risk assessments in public health policy.

Limitations of this study include the narrow focus on three soy-based fermented foods, while representative, do not capture the full diversity of Chinese fermented products. Moreover, the use of metagenomic sequencing precluded assessment of viable cell counts, a critical factor given that many commercial fermented foods are heat-treated. The gut microbiome samples, although informative, were regionally biased and sourced from public datasets rather than collected contemporaneously. Additionally, although our comparative analysis encompassed species-, genome-, and strain-level resolutions between fermented foods and the gut microbiota, the gut metagenomic data were obtained from public datasets rather than derived from controlled experiments. Consequently, direct experimental validation (e.g., through colonization or gene transfer assays) is warranted to confirm these findings and assess their biological relevance under standardized conditions.

Although ISAPP recommends the inclusion of fermented foods in national dietary guidelines, rigorous evaluation of their microbiological safety remains limited. This study systematically characterized the microbial composition and functions of three traditional Chinese fermented soy products and compared them with the gut microbiome of healthy individuals. While these foods harbor beneficial microbes and serve as potential sources of probiotics, they also contain opportunistic and foodborne pathogens – some of which may contribute to human gut resistance through HGT.

We support ISAPP’s position on the potential health benefits of fermented foods. However, our findings highlight the urgent need for systematic, safety-focused investigations across diverse fermented food types. Only through comprehensive risk-benefit assessments can inform safety dietary recommendations be made that safeguard both public health and microbial ecology.

Materials and methods

Sample information

A total of 93 fermented soy products were collected from markets, including 42 dou chi (fermented soybean), 33 fu ru (fermented bean curd), and 18 dou jiang (fermented soybean sauce) samples. Selection was based on region, sales volume, raw materials, and flavor to ensure the representativeness of commonly consumed foods in China. Samples were purchased from both online and offline supermarkets. Homogenized samples were shipped frozen to the laboratory and stored at −80 °C until analysis.

Library preparation, metagenomic sequencing, and bioinformatic pipeline

DNA was extracted using the PowerSoil DNA Isolation Kit (QIAGEN Microbial Solutions) according to the manufacturer’s instructions. DNA concentrations were measured using the Qubit Fluorometric Quantitation system (DS DNA High-Sensitivity Kit, ThermoFisher Cat. No. Q32851). DNA sequencing libraries were prepared with the Nextera XT DNA Library Prep Kit (Illumina Cat. No. FC-131-1096) and sequenced on the Illumina NovaSeqX platform.

Raw paired-end reads were subjected to quality control using Trimmomatic (v0.39)79. Taxonomic profiling was performed on the clean reads using Kraken2/Bracken80,81 with the k2PlusPF database. To reduce noise, low-abundance filters were applied, retaining only taxonomic features with a read count greater than 500 and present in at least 10 samples. The 500-read cutoff excluded artifacts while maintaining sensitivity for rare taxa, and the 10% prevalence threshold ensured biological relevance by filtering out transient contaminants.

Clean reads from each sample were assembled into contigs using MEGAHIT (v1.1.3) with default parameters82. Gene prediction was performed on the contigs using Prodigal (v2.6.3)83. Redundant genes were removed using CD-HIT (v4.8.1, with parameters: -c 0.95 -aS 0.9)84, resulting in a non-redundant SFFGC. Gene annotation was performed by aligning to the eggNOG database (v5.0) using eggnog-mapper (v2.1.3)85. ARGs were annotated by aligning to CARD (v4.0.0) using RGI (v5.1.1)86. CAZymes were annotated using dbCAN287, which integrates three tools: HMMER (with dbCAN HMMdb v11), DIAMOND (against CAZy database), and Hotpep. The run_dbCAN.py pipeline was run with default parameters, with final annotations based on consensus hits across all three methods.

The abundance of each gene in each sample was estimated by mapping clean reads to the non-redundant gene catalog using BWA (v0.7.17)88 and normalized to relative abundance using reads/kilobase/million mapped reads (RPKM). Functional abundances for KO, KEGG pathway, eggNOG Orthology, ARO, and CAZy categories were calculated by summing the abundances of genes annotated to each category. In addition to functional abundance derived from gene alignment, metabolic pathway profiling was also performed using HUMAnN389.

Metagenomic assembled genomes (MAGs)

MAGs were reconstructed using the MetaWRAP pipeline90. Contigs were binned using the binning module and refined with the bin_refinement module (parameters: -c 70 -x 10), retaining only bins with ≥70% completeness and <10% contamination. Dereplication of refined bins was performed using dRep v3.2.291 with -strW 0, prioritizing sequence similarity over coverage during dereplication. Taxonomic classification of the dereplicated MAGs was conducted using GTDB-Tk (v2.0.0) with the GTDB r207 reference database. Phylogenetic relationships among bacterial MAGs were inferred using PhyloPhlAn (v.0.99)92, based on 400 universal marker genes, with the options “--diversity high --accurate”. A custom configuration was used: DIAMOND (v2.0.5) for protein mapping, MAFFT (v7.505) for multiple sequence alignment, trimAl (v1.4.rev15) for alignment trimming, FastTree for initial tree construction, and RAxML(v8.2.12) for the final tree generation. The resulting tree was visualized and annotated using iTOL (https://itol.embl.de).

Network construction and clustering

Co-abundance networks were generated separately for the DC, DJ and FR microbiomes. Species-level read count data from Kraken2 was analyzed using FastSpar93, a scalable implementation of the SparCC algorithm for correlation inference. SparCC was chosen for its robustness in handling compositional microbiome data. Correlation p-values were adjusted using the Benjamini–Hochberg (BH) method, and co-occurrence/exclusion edges with FDR < 0.05 were retained. Networks were visualized in Cytoscape, with species represented as nodes and statistically significant correlations as edges.

Comparison with the Chinese gut microbiome

Microbiome data from 689 healthy Chinese individuals were obtained from eight publicly available metagenomic datasets via CurateMetagenomicData33. The datasets included: JieZ_2017(ERP023788)94, LiJ_2014(ERP004605)95, LiJ_2017(PRJEB13870)96, QinJ_2012(SRA045646 and SRA050230)97, QinN_2014(ERP005860)98, YeZ_2018(PRJNA356225)99, YuJ_2015(PRJEB10878)100 and ZhuF_2020(CNP0000119)101. As taxonomic profiling in these datasets was performed using MetaPhlAn3102, we used the same tool (v3.1.0; --index: mpa_v30_CHOCOPhlAn_201901) for our fermented food samples to ensure comparability. Only species present in over 50% of samples within each group were included in comparisons.

At the genome level, 24,417 Chinese gut MAGs were retrieved from the human MAG collection published by Pasolli et al.10. Intra-species genome relationships were defined using ANI with a 95% threshold, computed via fastANI103. Strain-level profiling was conducted using StrainPhlAn3104 with default parameters. A total of 82 species with sufficient coverage in the fermented food metagenome were included. For comparative analysis, 100 gut microbiome samples were randomly selected from 689 individuals to match the fermented food sample size (n = 93), ensuring a balanced design for StrainPhlAn3 analysis.

The ARG proteins from the fermented food microbiome were aligned against 216,849 ARGs from the human microbiome cataloged by Lee et al.38 using the ‘blastp’ function in DIAMOND (v2.0.15)105. BLAST hits were filtered using a maximum e-value 1e−20, minimum identity of 80%, and minimum length of 100. A Sankey diagram was constructed using the ggsankey (https://github.com/davidsjoberg/ggsankey) R package to illustrate potential horizontal gene transfer events from foodborne microbial species to gut-associated species in Chinese individuals.

Resistome risk scores were computed using MetaCompare2106, which provides two indices: ERR and HHRR. ERR reflects the environmental mobility potential of ARGs, while HHRR emphasizes Rank I ARGs and human-associated pathogens based on an omics-informed framework of high-risk resistance determinants.

Statistical analysis

Alpha diversity was assessed using the Shannon index, calculated from species-level relative abundances with the vegan R package107. Pairwise comparisons between groups were performed using the Wilcoxon rank sum test, and multiple testing correction was applied using the Benjamini–Hochberg method. Bray–Curtis distances were computed with the phyloseq108 package, and PERMANOVA was performed with the adonis2 function in vegan to assess compositional differences among groups. To evaluate geographic effects, pairwise distances between sample collection sites were calculated using the geosphere R package. The correlation between geographic and microbial community distances was examined using Mantel tests with Pearson correlation (1999 permutations, significance at p < 0.05), implemented in vegan.

Microbial biomarkers differing across groups were identified using Linear Discriminant Analysis Effect Size (LEfSe)109. Features were first normalized by total sum scaling. The Kruskal–Wallis test (p < 0.05) was applied to detect significant differences, followed by linear discriminant analysis (LDA) to estimate effect sizes. Features with LDA scores >2.0 were considered biologically relevant.