Introduction

Cobalamin, commonly known as vitamin B12 (VB12), is an essential micronutrient for nearly all living organisms in Earth’s biosphere1,2. Beyond its critical role in natural ecosystems, VB12 serves as a vital cofactor in key metabolic processes, including DNA synthesis and nervous system function, particularly in mammalian hosts such as the human gut microbiome3. However, most animals lack the biosynthetic capacity to produce VB12 de novo4, necessitating its acquisition either from dietary sources or through microbial synthesis—primarily by select bacteria and archaea, provided sufficient cobalt is bioavailable5.

The microbial synthesis of VB12 is a highly complex, enzymatically driven process that proceeds via two distinct carbon feedstock pathways: five-carbon glutamate or four-carbon glycine6,7. This biosynthesis involves a series of condensation, dehydrogenation, and amination reactions, culminating in the formation of precorrin-2—a pivotal intermediate generated through either aerobic or anaerobic routes6,7. Subsequent enzymatic steps, including ring closure, deacetylation, and cobalt incorporation mediated by methyltransferases, yield cob(II)alamin, which is further metabolized into active VB12 via the adenosylcobalamin (AdoCbi) pathway6,7. Additionally, alternative routes such as the pseudocobalamin salvage pathway contribute to VB12 production8. Notably, VB12 deficiency in humans can disrupt methyl group metabolism, leading to severe neurological and hematological disorders such as methylmalonic aciduria and hyperhomocysteinemia9.

Concurrently, the global rise in antibiotic resistance represents one of the most pressing public health crises of the 21st century10. The proliferation of antibiotic resistance genes (ARGs) in environmental reservoirs—driven by both natural bacterial evolution and anthropogenic activities (e.g., clinical misuse, agricultural overuse)12,13,14,15—has accelerated the emergence of multidrug-resistant pathogens10,11. Environmental microbiota serve as dynamic hubs for ARG dissemination, facilitating horizontal gene transfer (HGT) among diverse bacterial taxa16,17. This genetic mobility, compounded by selective pressures from antibiotic pollution, threatens to render once-treatable infections increasingly recalcitrant to therapy18.

Despite the dual significance of VB12 biosynthesis and ARGs in microbial ecology and human health, their potential interplay remains unexplored. Given that both processes are predominantly mediated by prokaryotes, we hypothesize that ecological drivers (e.g., nutrient availability) and genetic linkages may concurrently shape their distributions. To test this, we employ urban lakes—anthropogenically influenced ecosystems—as model systems to address four key questions via metagenomics and geochemical analyses: (1) Distribution dynamics of VB12 synthesis genes across urban lakes of varying trophic states; (2) Environmental drivers modulating VB12-associated gene abundance; (3) Potential linkages between VB12 biosynthesis capacity and antibiotic resistome; (4) Genomic profiling of VB12-producing strains with ARG transmission potential. This research aspires to illuminate the intricate relationships among VB12 synthesis, microbial communities, and antibiotic resistance within urban lakes, offering valuable insights for comprehending and managing the repercussions of these processes on both human health and the environment.

Materials and methods

Study sites

In this study, a total of 23 surface water samples were systematically collected from five representative urban lakes in Wuhan, China. The two largest urban lakes in China, Tangxunhu (TXH) and Donghu (DH), characterized as mesoeutrophic, boast expansive water areas of 47.6 km2 and 33.9 km2respectively. Yanxihu (YXH), identified as a eutrophic lake, encompasses a water area of 11.8 km2. In contrast, Nanhu (NH) and Shahu (SH), both hypereutrophic lakes, are situated amidst hospitals, sewage treatment plants, universities, and a densely populated area, with water areas measuring 7.6 km2 and 3.1 km2respectively19,20. Sampling points were systematically distributed across each lake at intervals of approximately 1–5 km, with the spacing adjusted based on lake size to ensure representative coverage. Notably, Lake SH—being relatively small—was assigned only three sampling points, while all other lakes were sampled at five predetermined locations. Detailed information can be found in Fig. S1A.

Sampling, shotgun metagenomic sequencing and measurement of environmental factors

Total DNA was extracted using the DNeasy PowerWater Kit (QIAGEN, Hilden, Germany), The concentration and purity of the isolated DNA were assessed with a Qubit 3.0 fluorometer (Thermo Fisher Scientific, USA). Subsequently, a paired-end sequencing library (150 bp read length) was prepared and subjected to high-throughput sequencing on an Illumina Hiseq platform (Illumina, San Diego, CA) by Biomarker Technologies Corporation (Beijing, China) as previously study21. A multi-parameter probe (WTW 3430) was used to record in situ measurements of dissolved oxygen (DO), electrical conductivity (EC), pH. Laboratory analyses were conducted for nitrate, nitrite, ammonia, orthophosphate, dissolved organic carbon (DOC), and Chl a using standardized methods21. The detailed physiochemical parameters can be found in the supplementary material provided by Mao et al., in 2023.

Metagenome analysis

Raw metagenome sequencing reads underwent filtration to eliminate low-quality reads using FASTP22and were subsequently de novo assembled using MEGAHIT23with k-mer size ranging from 21 to 149. Assembled contigs longer than 500 bp were processed for open reading frame (ORF) prediction via Prodigal24and a non-redundant gene catalog was generated using CD-HIT (98% identity, 90% coverage)25. Gene abundance was normalized to transcripts per million (TPM) to enable cross-sample comparisons. This approach minimizes the impact of variations in total read count and gene length26. Functional annotation of predicted genes against the VB12Path database employed DIAMOND (with an E-value of ≤ 1e-5). All samples were rarified to the identical sequencing depth (33,055,308 sequences per sample) through random resampling. ARGs and metal resistance genes (MRGs) were annotated using the deepARG27 and metal resistance gene database28, respectively. The predicted ORFs were initially translated into amino acid sequences using the EMBOSS29. Subsequently, BLASTp was employed for annotation against the aforementioned databases, utilizing the same alignment thresholds as Mao et al., in 2023. Finally, MetaCompare was employed to evaluate the resistome risk by estimating the coexistence of ARGs, mobile genetic elements (MGEs), and human pathogens30.

Metagenomic Binning and metagenome-assembled genomes (MAGs) annotation

Genome assembly and binning procedures were executed using the MetaWRAP pipeline31. First, quality-filtered sequences were assembled into contigs with MEGAHIT23. Next, genome binning was performed on the assembled contigs with MetaBAT232, and the resulting bins were refined using the Bin_refinement module. These bins were then dereplicated with dRep33 at a 99% average nucleotide identity (ANI) threshold for strain-level clustering. The quality of the dereplicated bins was evaluated with CheckM34 based on completeness and contamination; only bins meeting the criteria of > 50% completeness and < 10% contamination were retained35. Taxonomic classification of the obtained MAGs was performed using GTDB-Tk36. Gene prediction for the MAGs was conducted with Prodigal, followed by functional annotation against the METABOLIC, VB12Path, KEGG, COG, pathogen-host interaction (PHI), CRISPR-Cas, mobileOG-db, and repeat VB12Path databases37,38,39,40.

Statistical analysis

The data were first subjected to logarithmic transformation, followed by Z-score normalization. Next, analysis of variance (ANOVA) was used to assess the statistical significance of differences in mean gene abundance among samples. Data were visualized using online tools (https://www.bioincloud.tech). Principal coordinate analysis (PCoA) was performed with the “amplicon” R package to evaluate β-diversity patterns of functional genes across lakes. Additionally, permutational multivariate analysis of variance (PERMANOVA, adonis) tested the significance of inter-group differences. To examine associations, Spearman rank correlation analysis was applied to (1) environmental factors vs. VB12 functional genes and (2) VB12 genes vs. resistance genes (ARGs/MRGs). Univariate linear regression further modeled relationships between VB12 genes and resistance risk. For visualization, phylogenetic trees were refined using iTOL (https://itol.embl.de/tree), and genome circular maps were generated with CGView (https://paulstothard.github.io/cgview).

Results and discussion

Spatial distribution variations of VB12 functional genes and environmental drivers

The recently developed VB12Path database provides a robust metagenomic platform to investigate cobalamin (VB12) biosynthesis across environmental and human microbiomes7. In our study, five functional genes critical for VB12 synthesis were identified across 23 sampling sites (Fig. 1A), annotated as: cobalt-precorrin-8 methylmutase (cbiC), uroporphyrinogen-III c-methyltransferase (cobA), precorrin-8X methylmutase (cobH), siroheme synthase (cysG), and glutamate-1-semialdehyde 2,1-aminomutase (hemL). These genes are involved in three pathways related to VB12 synthesis, namely anaerobic (cbiC), precorrin-2 synthesis (cysG, cobA, hemL), and the aerobic (cobH) pathway. Among them, the functional genes hemL and cobA, involved in the precorrin-2 synthesis pathway, dominated the gene pool (collectively 97.0% relative abundance), with hemL alone exceeding 70% in all lakes (Fig. 1A). This aligns with reports by Zhou et al. (2021), who observed high hemL abundance in marine and gut microbiomes7,8. Given its role in initiating VB12 synthesis7, hemL’s prevalence suggests a bottleneck effect in the pathway, where its activity may gatekeep downstream VB12 production.

Fig. 1
figure 1

Differential distribution characteristics of VB12 synthesis genes in different lakes. (A) Abundance distribution patterns of identified VB12 synthesis genes in the five lakes. (B) Variations in the distribution of VB12 synthesis genes among the five lakes (Data were standardized using Z-scores, the letters (a, b, ab, bc) denote homogeneous subsets based on Duncan’s multiple range test, where groups sharing no common letters are significantly different (p < 0.05).

Differential analysis revealed significant inter-lake variability (p < 0.01) for all genes except cysG (Fig. 1B; Tables S1–S2). Consistent with this, principal coordinates analysis (PCoA) based on the Bray-Curtis matrix further confirmed the spatial distribution differences of these functional genes (p < 0.01; Fig. 2A). Distinct separation was observed among NH, TXH, YXH, and SH lakes. The ADONIS test confirmed statistically significant differences between them (p < 0.05). These pronounced variations in functional genes were consistent with the significant distribution patterns of microbial communities21,41. Given the significant relationships between microbial communities and nutrients, we speculate that microbial VB12 synthesis in lakes may be associated with their eutrophication status. Therefore, a correlation analysis was conducted between all VB12 functional genes and environmental factors. CobA exhibited strong positive correlations with nitrate/nitrite (p < 0.05; Fig. 2B, Table S3), implicating nitrogen availability in precorrin-2 pathway activation. HemL showed inverse relationships with most measured factors (p < 0.05; Fig. 2B), suggesting suppression under eutrophic conditions. The previous research also has indicated that a deficiency in VB12 is associated with dietary, environmental, and genetic factors42. Nitrate nitrogen, nitrite nitrogen, and chlorophyll a—established indicators of aquatic eutrophication43,44, demonstrated significant correlations with VB12 synthesis-related genes in this study, suggesting their potential dichotomous regulation of VB12 biosynthesis. Stimulation of cobA by nitrogen influx, potentially accelerating early pathway steps. Concurrent repression of hemL, which dominates the gene pool (79.8%). We hypothesize that this paradox—nutrient enrichment favoring pathway initiation while limiting its keystone enzyme—could constrain net VB12 output. VB12 is essential for one-carbon metabolism and epigenetic regulation45. Its synthesis limitation may impair cellular reprogramming and tissue repair, particularly in ecosystems where eutrophication alters microbial niche partitioning. Future work should quantify VB12 titers directly to test whether gene abundance patterns translate to functional bottlenecks.

Fig. 2
figure 2

Distribution patterns and driving factors of functional genes among different lakes. (A) Principal coordinates analysis (PCoA) of VB12 functional genes based on the Bray-Curtis distances. The significance of the differential distribution was statistically tested by adonis. (B) Association analysis between VB12 functional genes and environmental factors (Spearman’s, ***p < 0.001; **p < 0.01; *p < 0.05).

Association between VB12 biosynthesis pathways and antimicrobial resistance: mechanistic insights and ecological implications

VB12 functional genes and ARGs represent inherent genomic features of prokaryotic organisms that have evolved through complex evolutionary processes. These genetic elements can confer either beneficial or detrimental traits to human health through diverse mechanisms46,47,48. In this study, we observed positive correlation between prokaryotic diversity and the abundance of four VB12 biosynthesis genes aligns with theoretical expectations (Fig. S1C). In addition, horizontal gene transfer (HGT) serves as a key evolutionary mechanism facilitating gene flux across diverse microorganisms and plays a critical role in the rapid dissemination of genetic traits within microbial communities49. Understanding potential associations between different gene categories is therefore essential for unveiling functional interactions within microbial systems. In this study, Spearman rank correlation was employed to investigate potential associations between VB12 functional genes and resistance genes (Fig. 3). The results indicated a significant negative correlation between the CbiC involved in the anaerobic pathway and ARGs (p < 0.05; Fig. 3A, Table S4). Furthermore, the CobH and CobA, participating in the aerobic pathway and precorrin-2 synthesis pathway, respectively, demonstrated a significant positive correlation with ARGs (p < 0.05; Fig. 3A, Table S4). This suggested a higher likelihood of co-occurrence between ARGs and CobH/CobA. Despite hemL exhibiting the highest abundance among all genes, no significant associations with any ARGs were observed. MGEs play a crucial role in the dissemination of ARGs in urban lakes21. The significant relationship observed in this study between ARGs and genes related to VB12 synthesis may suggest that MGEs, as crucial mediators of HGT50, may exhibit selectivity during the process of gene movement51.

It was noteworthy that the results of the correlation analysis revealed a significant positive correlation between all VB12 functional genes and diverse MRGs (p < 0.05, Fig. 3A, Table S5). Particularly, CbiC, CobH, and CobA exhibited a noteworthy positive correlation with the majority of MRGs (p < 0.05, Fig. 3A, Table S5), indicating that VB12-producing microbes may simultaneously evolve metal detoxification strategies. The metallation process during VB12 synthesis could structurally facilitate metal ion binding/sequestration52. This aligns with cellular-level evidence showing VB12 enzymes participate in metal coordination52. Furthermore, the results of the univariate linear regression analysis between VB12 functional genes and resistance risk indicated a significant negative correlation between CbiC and resistance risk (Fig. 3B, Fig. S1B). This suggests that the anaerobic pathway might compete for cellular resources with resistance mechanisms, thereby mitigating antimicrobial resistance development.

Fig. 3
figure 3

Correlation analysis between genes related to VB12 synthesis and resistance genes, as well as resistance risk. (A) The potential associations between VB12 functional genes and antibiotic resistance genes (ARGs) as well as metal resistance genes (MRGs) (Spearman’s, ***p < 0.001; **p < 0.01; *p < 0.05). (B) Regression analysis between VB12 functional genes and resistance risk.

Evolutionary analysis of MAGs co-harboring VB12 functional genes and resistance genes

The emergence of metagenomic binning techniques is paramount for the identification of genes with specific functions, such as resistance genes and pathway-related metabolic genes53. Based on these methodological advances, we conducted binning analysis across all samples, identifying 26 medium-to-high-quality MAGs (contamination rate less than 10%, completeness greater than 50%) associated with VB12 synthesis. These MAGs spanned 6 phyla, 9 classes, 13 orders, 15 families, and 21 genera (Fig. 4, Table S6), revealing distinct phylogenetic patterns in VB12 synthesis pathways. Specifically, 13 MAGs were assigned to Bacteroidota, involved in the salvage pathway, while 6 belonged to Proteobacteria, participated solely in precorrin-2 synthesis (Fig. 4, Table S6). This indicates distinct roles for Bacteroidota and Proteobacteria in the VB12 synthesis process. 4 MAGs associated with Cyanobacteria uniquely harbored three VB12 functional genes, engaging in both aerobic (cobB/cobM) and precorrin-2 pathways (hemC). This phylogenetic divergence aligns with existing literature. Proteobacteria are known to express vitamin biosynthesis genes54, while both Bacteroidota and Proteobacteria play established roles in biochar-driven composting55. Notably, cyanobacterial VB12 synthesis has been previously documented in planktonic species56, corroborating our findings.

All 26 MAGs carried ARGs (1–16 per MAG), with 18 concurrently hosting metal resistance genes (MRGs; 1–6 per MAG). Interestingly, MAGs involved in the precorrin-2 synthesis pathway carried more MRGs, while those participating in the salvage pathway exhibited more ARGs (Fig. 4, Table S6). This dichotomy suggests divergent evolutionary pressures on resistance gene co-occurrence: the precorrin-2 pathway may favor MRG retention, whereas the salvage pathway appears more permissive to ARGs. These observations support our earlier conclusion that VB12 synthesis potentially mitigates ARG-associated resistance risk while promoting MRG coexistence.

Functional characteristics of microbial taxa with VB12 synthesis potential

Microbial functional traits are of paramount importance for human societal development57. To characterize functional profiles of VB12-synthesizing microorganisms carrying resistance determinants, we selected MAGs involved in various VB12 synthesis pathways, carrying more than 10 ARGs, and simultaneously harboring MRGs for further functional analysis. The results reveal that selected MAGs possessed the potential for horizontal gene transfer. Each MAG carried 12–15 MGEs, contributing significantly to the horizontal transfer of genes (Fig. 5, Fig. S2, S3, Table S7). Notably, these MAGs carried an average of 175 pathogen-host interaction (PHI) genes—critical mediators of human disease transmission58. When PHI genes undergo MGE-mediated transfer to human-associated microbes, they may potentiate disease emergence59. This risk is particularly concerning for ARGs, as their HGT could exacerbate antibiotic resistance in clinical settings60.

Fig. 4
figure 4

Evolutionary relationship between MAGs with the potential for resistance and VB12 synthesis. Assign names to the MAGs based on the annotated genera, and then color them at the phylum level. The figure illustrates the phylum, class, order, and family to which the MAGs belong. The height of the right column indicates the number of resistance genes carried by each MAG.

In addition, understanding the CRISPR-Cas system is essential for gaining profound insights into how bacteria recognize and eliminate invading viral genomes, contributing to the development of innovative antiviral strategies61,62. Our investigation revealed that these MAGs carry 2–8 Cas proteins, predominantly consisting of Cas3 and Cas4 (Fig. 5, Fig. S2, S3, Table S7). Cas3 protein, with DNA cleavage and helicase activities, functions as a core enzyme in the Type I CRISPR-Cas system63, while Cas4 protein is involved in integrating foreign DNA sequences into bacterial DNA64,65. Bin.709 also found to host a Cas2 protein site (Fig. 6, Table S7), a core protein in the CRISPR system responsible for the initial phase of integrating foreign fragments66,67,68. These findings offer crucial insights for developing efficient and precise gene-editing tools.

Furthermore, functional annotation of the aforementioned MAGs was carried out using the COG and KEGG databases69 (Fig. 6, Fig. S4 – S9, Table S7). The results demonstrated a high degree of consistency between the annotations from both databases, with most genes being associated with metabolic pathways. This observation suggested that metabolic functions likely played a more critical role in these MAGs compared to other functional categories. Microbial metabolic functions encompass the degradation, synthesis, and transformation of organic compounds, as well as participation in biochemical processes linked to cellular energy and material metabolism40,70. Consequently, microorganisms harboring a greater abundance of metabolic genes may have contributed significantly to environmental nutrient cycling, energy flow, and ecological equilibrium. These findings implied that such microorganisms could have possessed adaptive advantages in their respective habitats71, enabling more efficient utilization and conversion of available nutrients. Additionally, KEGG annotation revealed that each MAG contained genes associated with infectious diseases and antimicrobial resistance (Fig. 6, Fig. S4–S9, Table S7), which corroborated earlier functional annotations from individual databases.

Fig. 5
figure 5

The genomic features of the metagenomic-assembled genome bin.709. In the figure, the characteristics of mobile genetic elements (MGEs) and the distribution of CRISPR-Cas sites are also shown separately.

Fig. 6
figure 6

The KEGG and COG functional annotation results for the metagenomic-assembled genome bin.709. The size of the circles and the height of the bars represent the number of genes, with different colors indicating various functional classifications.

Conclusion

Firstly, this study employed metagenomic approaches to investigate the distribution patterns and environmental drivers of microbial genes associated with vitamin B12 (VB12) biosynthesis in urban lake surface waters. Our analysis revealed that hemL was the predominant gene among all VB12 synthesis-related genes examined. Notably, we observed substantial spatial heterogeneity in the distribution of microbial VB12 synthesis genes across different urban lakes, with eutrophication levels (particularly nitrate nitrogen concentrations) showing a negative correlation with hemL gene abundance (Fig. S10). However, it should be noted that the VB12 synthesis-related genes detected in this study were identified through genomic prediction approaches. Further experimental validation via PCR and quantitative real-time PCR (qPCR) remains necessary to confirm these findings.

Secondly, by employing specific, meticulously curated functional gene databases alongside widely used public databases, this study conducted a more precise and comprehensive analysis of the genomic metabolic characteristics of microbes with the potential for synthesizing VB12. It was discovered that 26 metagenome-assembled genomes (MAGs) had the capability to synthesize VB12 and concurrently harbored ARGs. Among these, 4 MAGs simultaneously carried ARGs and MRGs, while also demonstrating the ability to synthesize VB12. These MAGs not only possessed functional elements conducive to horizontal gene transfer but also exhibited pathogenicity. Freshwater systems are increasingly recognized as critical reservoirs for antibiotic-resistant bacteria (ARB) and ARGs, posing substantial public and veterinary health risks21,72,73. Indirect transmission further escalates risks via contaminated aquatic food chains (e.g., raw shellfish) and urban agricultural irrigation, which can introduce ARB into the food supply. Compounding these threats, eutrophic waters serve as hotspots for horizontal gene transfer, accelerating the spread of resistance mechanisms to clinically relevant pathogens74. Addressing this multifaceted challenge demands a “One Health” approach75, integrating advanced wastewater treatment technologies, systematic ARB/ARG monitoring, and public awareness campaigns to curb antibiotic misuse. Without coordinated intervention, these urban aquatic reservoirs will continue to fuel the global antimicrobial resistance crisis, undermining both human and animal health.

These findings provide crucial evidence for future research exploring the link between the VB12 synthesis metabolic mechanism and human resistance risks.