Introduction

Groundwater is one of the major sources of drinking and agricultural water in regions where population density is high and economic activities such as industrial agriculture are well developed, even when surface water resources are sufficiently available1,2. Groundwater is an especially important water source in limestone areas. The Okinawa-Jima area of Japan as well as the remote island areas nearby are composed of limestone and have groundwaters with higher flow rates than river water; in fact, underground dams have been constructed to secure stable sources of irrigation water3,4. Groundwater is also a major water resource in other island countries around the world5,6.

In recent years, groundwater contamination has become a serious problem in many areas of the world. Various substances have been identified as contaminants, and despite the implementation of various control measures, contamination still occurs. Problems associated with nitrate-nitrogen (NO3-N) contamination are particularly severe in areas where livestock production is a major industry7,8,9,10,11. NO3-N contamination is thought to be caused by fertilizer application to agriculture fields12,13,14,15 or to originate from sewage from swine and cattle barns, as well as from livestock manure16. Continued exposure to groundwaters containing high levels of NO3-N is harmful to the human body17. Measures have been taken to reduce the amount of NO3-N in groundwater, and drinking water standards have been established in every country. However, the current level of pollution has yet to be evaluated, and a fundamental solution has yet to be reached.

Elevated NO3-N concentrations present groundwater contamination challenges in the southwestern islands of Japan, including the main island of Okinawa. The southern part of Okinawa Island near Naha City, Okinawa Prefecture, consists of a field cropping area, and groundwater has long been used for agricultural and domestic purposes in this region. Under the National Government's Southern Okinawa Main Island Agricultural Water Conservancy Project (1992–2005)18, two underground dams were constructed at Komesu and Giza to secure water for agriculture and develop water resources, and the groundwater has been used for agricultural purposes and drinking water19,20,21.

Studies have demonstrated that NO3-N is consumed by denitrification and nitrate reduction; these functions are carried out by microbial nitrogen metabolism and lead to a decrease in NO3-N concentration22,23. Microbial communities play particularly important roles in primary nitrogen metabolism and circulation24,25. Such functions have been ascertained in soil, water, and activated sludge environments, and some responsible microbial species have been identified26,27,28,29.

For microorganisms that have the abovementioned water purification functions, 16S rRNA amplicon analysis combined with functional gene analysis by real-time PCR has been actively employed30,31. For example, Guo et al.30 adopted metagenomic methods to identify microbial communities and functions for removing nitrogen and phosphorus from activated sludge. The bacterial communities and denitrification gene (nirS/nirK) expression in the groundwater of limestone islands have also been analysed by utilizing PCR-denaturing gradient gel electrophoresis (DGGE) and real-time PCR methods32. However, a comprehensive analysis of the microbial communities and associated nitrogen metabolism has not yet been conducted. We performed periodic shotgun metagenomic surveys in an urban river, the Tama River, and an enclosed bay, the Ofunato Bay, to comprehensively analyse microbial communities and to clarify the relationship between microbial functional genes and environmental factors33,34,35,36.

In this study, the microbial community of the groundwater and functional genes involved in nitrogen metabolism, such as the nitrate reduction, nitrogen fixation, ammonia oxidation, and denitrification genes, were analysed by the 16S rRNA amplicon and shotgun metagenomics analyses, respectively. The results revealed that nitrogen metabolism genes, mainly those associated with denitrification, were abundant in areas with high microbial diversity.

Materials and methods

Sample collection

Yaese Town and Itoman City in the southern region of Okinawa-Jima Island were selected as the study areas. The groundwater was sampled at three sites in this area in November and December 2021 and January 2022 (Fig. 1). The water sample at site 1 was collected from a faucet at the water plant. The water samples at sites 2 and 3 were collected with a motorized water sampling device, GEO-pump-Bennett-1400 (GEO Science Laboratory, Nagoya, Japan), from observation wells (with strainers placed down to the base rock), and the groundwater levels were measured with a groundwater level meter ML50 M (Sanyo Measuring Tools Co., Ltd., Tokyo, Japan). The water samples were transported to the laboratory of the Graduate School of Engineering and Science, University of the Ryukyus, in sterilized 1 L glass containers that were kept at a low temperature, and the samples were filtered with a vacuum manifold QIAvac 24 Plus (Qiagen GmbH, Hilden, Germany) in the laboratory through a SterivexTM HV 0.22 µm filter unit (Millipore, Darmstadt, Germany). The filters were stored at − 80 °C until DNA extraction.

Figure 1
figure 1

Map showing three groundwater sampling locations sites 1–3. The left panel indicates site 1, where samples were collected from a faucet at the water plant, and sites 2 and 3, where samples were collected from groundwater wells. The left panel also indicates groundwater velocities, whereas the right panel indicates the sampling area in Okinawa Island.

Methods for measuring various environmental factors

Water temperature, pH, electrical conductivity (EC), dissolved oxygen (DO), and oxidation‒reduction potential (ORP) were measured onsite using a portable pH meter (Horiba D-54 and D-55, Horiba, Tokyo, Japan). Bicarbonate ions (HCO3) were also measured onsite by an alkalinity titration method using an AL-DT digital titrator (HACH Company, Loveland, CO, USA). The following analyses were carried out in the laboratory. Cations and anions were determined with an Aquion Ion chromatography system (Thermo Fisher Scientific, Waltham, MA, USA), and metal elements were analysed by inductively coupled plasma‒mass spectrometry (ICP‒MS) (X-Series II, Thermo Fisher Scientific). Dissolved organic carbon (DOC) was analysed using a total organic carbon analyser (TOC-L Analyser, Shimadzu, Kyoto, Japan). Suspended solids (SS) were measured using a filtration system equipped with a glass filter (Grade CF/B, Whatman, Little Chalfont, UK), and filter was dried at 105–110 °C, according to the method of Japanese Industrial Standard (JIS) K0102. The DO values of the groundwater were only available for November 2021 at the three sites in this study because the portable pH meter equipped with the DO sensor did not work in December 2021 and January 2022.

DNA extraction

The cells trapped on the filters were frozen, stored and subjected to DNA extraction after thawing using a DNeasy Power Water Sterivex Kit (Qiagen GmbH) at School of Marine Biosciences, Kitasato University, according to the manufacturer’s instructions. DNA concentrations were quantified with a Qubit dsDNA HS assay kit (Invitrogen, Carlsbad, CA, USA) and read with a Qubit Fluorometer (Invitrogen).

16S rRNA amplicon analysis

The DNA extracts prepared for different sampling months and sites were diluted to 5 ng/µL to use as a PCR template. PCR primers, 530F (5′-ACACTCTTTCCCTACACGACGCTCTTCCGATCT-NNNNNGTGCCAGCMGCCGCGG-3′) and 907R (5′-GTGACTGGAGTTCAGACGTGTGC-TCTTCCGATCTNNNNNCCGTCAATTCMTTTRAGTTT-3′)37, adapted with dual-index barcodes for Illumina MiSeq, were used to amplify the region spanning V4 and V5 of the 16S rRNA gene. Samples prepared for different sampling months and sites were pooled and sequenced with an Illumina MiSeq using a MiSeq Reagent Kit v3 (600 cycles) (Illumina). The amplicon metagenomic reads thus obtained have been deposited into the DNA Data Bank of Japan (DDBJ) Sequence Read Archive under the accession number DRA017557.

The amplicon reads were firstly combined by overlapping forwards and reverse reads using FLASH software (version 1.2.10, https://ccb.jhu.edu/software/FLASH/)38 (minimum overlap = 10; maximum overlap = 65; maximum mismatch density = 0.25; allow outie pairs = false; cap mismatch quals = false; combiner threads = 20; input format = FASTQ, phred_offset = 33; output format = FASTQ, phred_offset = 33). The obtained data were further processed by using Seqkit (version 0.5.5 https://bioinf.shenwei.me/seqkit/)39 to remove the Illumina sequencing adapters. By adjusting the low-quality filter using the FASTX-toolkit (version 0.6.6 http://hannonlab.cshl.edu/fastx_toolkit)40, the reads, which had 20% or more bases with a quality score ≥ 20, were retained. Reads of < 50 bp (P error limit = 0.05; Q score = 30) were trimmed, and individual reads were paired together to sequence both ends of the fragment quality-control pass. Taxonomical assignment was performed by applying 100,000 reads of each sample obtained to the SILVAngs platform (version 1.9.10/1.4.9, SILVA: r138.1)41,42,43,44 with an estimation of metagenomes composed of microbes not included in the database with parameter “–unclassified_estimation”.

Shotgun metagenomic sequencing analysis

DNA libraries for shotgun metagenomics sequencing were prepared using the NextraXT DNA Preparation Kit (Illumina, San Diego, CA, USA). The quality and size of the products were assessed with an Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA). Finally, 0.2 µg of each of the DNA libraries prepared was sequenced with an Illumina MiSeq using MiSeq Regent Kit v3 (600 cycles) (Illumina). The whole genome shotgun (WGS) reads thus obtained have been deposited into the DDBJ Sequence Read Archive under the accession number DRA015527.

The WGS reads were subjected to fastp (version 0.23.4)45 to remove low-quality reads, N-containing reads, and adapters with the following parameters: “–qualified_quality_phred 20 and –n_base_limit 20”. This process yielded standard clean data (defined here as “standard data”), and subsequently this dataset was de novo assembled using MEGAHIT (version 1.2.9)46 with the k-min of 21, k-max of 141 and k-step of 12, thus producing the contig data. The metagenome assembly was evaluated by using QUAST (version5.0.2)47. Gene annotation was carried out with blastn analyses48 using Prokka (version1.14.6)49 and Prodigal (version2.6.3)50 based on the all contig data obtained for different months and sampling sites. Transcripts per kilobase million (TPM)51 values were calculated by Salmon (version1.10.2)52. All contig data were binned into metagenome assembled genomes (MAGs) with MetaWRAP53 in Portable pipeline54 and subjected to the depiction of heatmap. The assembled MAGs were visualized by Anvi’o (version 2.2.2)55. The bins thus obtained were subjected to the calculation of completeness and contamination using CheckM (version 1.1.3)56,57. Furthermore, GTDB-Tk (version 2.3.2) was also used for assigning taxonomic classification to bacterial and archaeal genomes based on the Genome Database Taxonomy GTDB (R214)58.

We calculated the specific abundances of nitrogen metabolism genes per L groundwater based on the DNA concentration of groundwater as shown in following equation:

$${\text{specific}}\;{\text{abundance}}\;{\text{of}}\;{\text{nitrogen}}\;{\text{metabolism}}\;{\text{gene}}\, = \,\left( {{\text{TPM}}} \right)\, \times \,({\text{amount}}\;{\text{of}}\;{\text{DNA}} - {\text{extracted}}\;\left( {{\text{ng}}} \right){\text{/L-groundwater}}).$$

Alternatively, the WGS reads were also subjected to taxonomical analysis using the same procedures described above for the 16S rRNA amplicon datasets, except that the SILVAngs platform (version 1.9.10) was replaced by MetaPhlAn 459 (version 4.0).

Statistical analysis

Nonmetric multidimensional scaling (NMDS) analysis was performed using R software60 (version 4.2.0, https://cran.r-project.org/bin/windows/base/) to compare the similarity of the microbial community between different samples. The NMDS plots were generated using the Bray‒Curtis dissimilarity distance matrix. The vegan package61 was used, and csv files of the results of the analysis using the MEGAN database were used for the microbial community data. The correlation between the microbial communities and environmental factors was calculated using “envfit” (vegan). To examine the relationship between microbial community composition and environmental factors, distance based redundancy (dbRDA)62 analysis was also performed using “dbrda” (vegan). The multicollinearity among environmental factors was determined using a standard function in R.

Microbial community diversity was determined by the Shannon diversity index63 and Simpson diversity index64 using R software (version 4.2.0) “renyi” (vegan) at the genus level to examine alpha diversity. Chao I65 and Abundance-based Coverage Estimator (ACE)66 were also determined using R software (version 4.2.0) “estimate” to determine alpha diversity at the genus level.

Results

Environmental factors in the groundwater samples collected from three sites in the Ryukyu limestone aquifer

Table 1 shows the analytical data for environmental factors in the groundwater samples collected from three sites in the Ryukyu limestone aquifer. Site 3 was characterized by a shorter distance from the ground surface (2.4 m) than at site 2 (18.3 m), whereas the sample at site 1 was collected from a faucet at the water plant (original depth, 34.5 m). The concentrations of NO3-N averaged over three months at sites 1, 2 and 3 were 8.75 ± 0.46, 22.7 ± 1.75 and 6.89 ± 2.80 mg/L (mean ± SD), respectively. The water temperature, EC, HCO3, and concentrations of Cl, NO3-N, SO42−, Na+, Mg2+ and Ca2+ were low in samples collected in December 2021 and January 2022 at site 3. Unfortunately, DO could not be measured in December 2021 and January 2022 due to equipment failure. The concentration of DO at site 3 was very low in November 2021, at less than 2 mg/L.

Table 1 Summary of geochemical parameters in groundwater samples collected from the three sites in the Ryukyu limestone aquifer in November and December 2021, and January 2022.

Microbial communities in the groundwater samples

The number of the 16S rRNA amplicon and WGS reads, groundwater sampling volume (= filtration volume), and DNA yield per L groundwater for each sample are shown in Table 2. Site 1 had the lowest DNA yield followed by sites 2 and 3 in the increasing order. Details of 16S rRNA amplicon and WGS reads are shown in Supplementary Tables S1 and S2.

Table 2 The number of the 16S rRNA amplicon and WGS reads togerther with water volumes for DNA yields obtained from groundwater samples collected from the three sites in the Ryukyu limestone aquifer in November and December 2021, and January 2022.

Figure 2a shows the microbial communities at the domain level based on the 16S rRNA amplicons datasets. Bacteria accounted for 94.3 to 98.3% and archaea for 1.7 to 5.7%, indicating that bacteria constituted the majority of the microbial communities. As shown in Fig. 2b, classes Cyanobacteriia, Oligoflexia and Kapabacteria accounted for 12.0 to 28.5%, 5.7 to 6.7%, and 3.3 to 9.9%, respectively, at site 1. At site 2, classes Alphaproteobacteria, Elusimicrobia and Vamprivibrionia accounted for 30.2 to 41.1%, 4.5 to 7.5%, and 2.4 to 4.7%, respectively. At site 3, classes Planctomycetes and Vicinamibacteria were dominant and accounted for 12.1% and 2.3 to 4.5%, respectively, in November 2021, but these bacteria were not found in other samples. As shown in Fig. 2c, genus Microcystis accounted for 4.2 to 10.4% at site 1, whereas lineage IV belonging to Elusimicrobia and others (< 1%) accounted for 4.5 to 7.5% and 29.4 to 35.4%, respectively, at site 2. Site 3 was characterised by unidentified bacteria more abundant than those at the other two sites (Fig. 2c).

Figure 2
figure 2

The relative abundances of microbes annotated for the 16S rRNA amplicon datasets at the domain (a), class (b) and genus (c) levels for groundwater samples collected from the Ryukyu limestone aquifer in November and December 2021 and in January 2022. “Others (< 1%)” indicates those with relative abundances less than 1%. Refer to the legends of Fig. 1 for the three sampling sites. Only taxonomic groups in the class and genus levels with relative abundances more than > 2.0% are listed.

MetaPhlAn 4 analysis of the WGS reads revealed that unclassified reads accounted for 74 to 100% irrespective of the three sites (Supplementary Figs. S1a, S1b and S1c). While unclassified reads accounted for 97.8 to 100% at sites 2 and 3, site 1 had the highest hit among the three sites, where the most hits belonged to phylum Cyanobacteria or domain Bacteria.

Diversity of microbial communities obtained by 16S rRNA amplicon analysis

The diversity indices of microbial communities obtained by the 16S rRNA amplicon analysis for the samples from the groundwater at site 3 were extremely high in December 2021 and January 2022 (Table 3), whereas those from site 1 were markedly low in November 2021, irrespective of Shannon, Simpson, Chao I and ACE. It was noted that Shanonn’s and Simpson’s diversity indices at site 3 in November 2021 were higher than those at sites 1 and 2.

Table 3 Summary of the diversity indices for the microbial communities obtained by the 16S rRNA amplicon analysis for groundwater samples collected from three sites in November and December 2021, and January 2022.

Relationship between microbial communities and environmental factors

Figure 3a shows the results of the NMDS analysis, which was conducted to verify the relationship between the microbial communities based on the 16S rRNA amplicon datasets and various environmental factors. The plots of the microbial communities were clustered depending on the sites. In particular, the microbial communities at site 3 in December 2021 and January 2022 were highly similar to each other. When we adopted all 17 environmental factors shown in Table 1 to the NMDS analysis, several environmental factors including groundwater level, SS, T-P, PO4-P, NO3-N, Na+, K+ and Mg2+were found to have significant relationship with the microbial communities. The microbial communities at site 3 are correlated with PO4-P, T-P, SS and K+, whereas those at site 1 are correlated with Mg2+ and NO3-N. It seems that the groundwater level and Na+ have correlation with both site 1 and site 2. Then, we focused on 8 environmental factors including pH, ORP, DOC, HCO3, SS, T-P, NO3-N and SO42−, which showed low multicollinearity (Supplementary Table S3). As shown in Fig. 3b, the dbRDA analysis revealed that all 8 environmental factors mentioned above had significant relationship with the microbial communities. Especially, T-P and SS show high correlation with the microbial communities at site 3 irrespective of three months, and SO42− and NO3-N with those at site 2.

Figure 3
figure 3

The relationship between microbial communities based on the 16S rRNA amplicon datasets and environmental parameters in the groundwater samples were collected from the three sites in November and December 2021 and in January 2022. Panel a shows the results of nonmetric multidimensional scaling (NMDS) analysis using all environmental factors listed in Table 1. Panel b shows the result of distance-based redundancy analysis (dbRDA), where environmental factors adopted were pH and ORP, and concentrations of HCO3, DOC, SS, T-P, NO3-N and SO42−, showing low collinearity (Table 3). Green, yellow and red circles indicate the microbial communities at sites 1, 2 and 3, respectively.

The abundances of nitrogen metabolism genes based on all contig dataset obtained from shotgun metagenomics

Figure 4 shows the specific abundances of nitrogen metabolism genes based on the all contig dataset obtained from the shotgun metagenomes using MEGAHIT and DNA -yield per L of groundwater at the three sites in November and December 2021 and in January 2022. These genes are arranged from in descending order for dissimilatory nitrate reduction, assimilatory nitrate reduction, denitrification, nitrogen fixation, and nitrification. Twenty-five genes were found to be involved in nitrogen metabolism in the collected samples. The genes involved in dissimilatory nitrate reduction, denitrification and nitrogen fixation were almost absent at site 1, whereas the abundances at site 3 were significantly higher for all genes than those at sites 1 and 2. At site 3, nasA participating in assimilatory nitrate reduction and napA in denitrification were highly abundant in December 2021. The abundance of narG was also markedly high at site 3 in January 2022. It was noted that the abundances of nitrogen metabolism genes in November 2021 were much lower than those in December 2021 and January 2022 irrespective of sites 2 and 3.

Figure 4
figure 4

The heatmap for specific abundances of nitrogen metabolism genes with groundwater samples collected from the three sites in November and December 2021 and in January 2022. The biological processes of nitrogen metabolisms are arranged according to the oxidation states of nitrogen. The calculation for the specific abundances of nitrogen metabolism genes is detailed in the methods section and involves multiplying the TPM by the amount of DNA extracted per L of groundwater (ng/L).

Binning of shotgun metagenomics datasets into MAGs and the distribution of nitrogen metabolism genes

Figure 6 shows the results of binning all contigs obtained from shotgun metagenomics into MAGs with MetaWRAP, which was further visualized by Anvi’o (version 2.2.2) (Fig. 5). Furthermore, CheckM and GTDB-Tk were used for checking the genomic features and assigning taxonomic classification (Table 4) and assigned results are shown in Fig. 5. Totally 11 bins were constructed, where bins 2, 6, 7, 8 and 11 were abundant at site 1, but almost absent at sites 2 and 3 (Fig. 6). Bins 2 and 6 corresponded to family Cyclobacteriaceae belonging to class Bacteroidia and family Kapabacteriaceae belonging to class Kapabacteria, respectively, whereas bins 7 and 11 both corresponded to family Nostocaceae belonging to class Cyanobacteriia but with different lineages (Table 4). Although bin 8 corresponded to family Microsystaceae, it also belonged to class Cyanobacteriia. Bins 3 and 4, which corresponded to family Micropepsaceae and undefined family JACAEB01, respectively, both belonged to class Alphaproteobacteria and were found only at site 2 and 3 irrespective of the three months as in the case of bin 5, which corresponded to class Elusimicrobia. Bin 9, which corresponded to family Chitinophagaceae belonging to class Bacteroidia, was found only in the sample collected at site 2 in December 2021, whereas bin 10, which corresponded to class Kapabacteria with a family lineage (Palsa-1295), different from family Kapabacteriaceae found in bin 6 was found only in the samples collected at site 2 irrespective of the three months. Bin 1, which corresponded to family Fredricksoniimonadaceae in phylum Omnitrophota, was found in the sample collected at site 3 in November 2021 and those collected at site 2 in the three months. These results indicate that bin cluster at site 1 was very different from those at sites 2 and 3 as also shown in the dendrogram of Fig. 6. It was noted that the total reads mapped were remarkably low at site 3 (Fig. 5).

Figure 5
figure 5

Anvi'o bin collection representation for the contig datasets obtained by shotgun metagenomics into MAGs with MetaWRAP for groundwater samples collected from the three sites in the three months. Layers from inside out include following: (1) tree displays the hierarchical clustering by the co-assembly of the contigs from draft genomes of 9 samples we determined. (2) The length that shows the actual length of a given split. (3) The GC-content. (4)–(12). The view layers display the “mean coverage” of each bin in 9 samples from the metagenomic dataset. (13) and (14) The view layers display each bin for the ribosomal RNAs identified from the metagenomic dataset. The two most outer layers show bin annotation (15) and (16). The dendrogram in the inlet shows the total reads mapped from each sample to the assembly.

Table 4 Summary of the genomic features and taxonomic characterization of bins for groundwater samples collected from three sites in November and December 2021, and January 2022.
Figure 6
figure 6

The heatmap of bin abundance estimated from MAGs with MetaWRAP for samples collected from the three sites in the three months.

Figure 7 shows the relative abundance of bacteria harbouring various nitrogen metabolism genes in all groundwater samples from sites 1, 2, and 3. Although most genes involved in nitrate reduction and denitrification could not be binned, a significant portion of the nitrogen fixation genes, including nifD, nifH, and nifK (except for vnfK), were assigned to JACAEB01 (Alphaproteobacteria) as bin 4, UBA9628 (Elusimicrobia) as bin 5, and to Nostocaceae (Cyanobacteriia) as bins 7 and 11. A smaller portion of the assimilatory nitrate reduction genes, such as nasA, and dissimilatory nitrate reduction/denitrification genes, such as narG, napA and nosZ, were also classified into JACAEB01 (Alphaproteobacteria) and UBA9628 (Elusimicrobia) as bins 4 or 5, respectively.

Figure 7
figure 7

The relative abundance of bacteria harbouring nitrogen metabolic genes for groundwater samples. Taxonomical classification was carried out by the binning of the total gene abundance (TPM) calculated for all groundwater samples collected from sites 1, 2 and 3 each for various nitrogen metabolism genes. The abbreviation "n.d." stands for "not detected".

Discussion

The topography of the southern part of Okinawa Island is characterized by the Pleistocene Ryukyu limestone terrace, which unconformably overlies the basement mudstone of the Miocene‐to‐Pliocene Shimajiri Group as the surface geology21. Groundwater has long been used in this region for agriculture and domestic purposes. In recent years, however, groundwater condition has become serious problem in this area. Actually, the groundwater at site 2 showed a markedly high concentration of NO3-N (not less than 20 mg/L, Table 1) and thus was thought to be unsuitable for drinking water purposes because the environmental quality standard in Japan is set to < 10 mg/L. Although the well at site 2 is not used for drinking water, it is located upstream of the drinking water supply. Therefore, the increase in nitrate nitrogen concentration in this local area is of serious concern. In contrast, site 3 tended to have the lowest DO values throughout the year among the three sampling sites21. It is noted that site 2 is characterized by a longer distance from the ground surface than that at site 3. DO is considered an indicator of a denitrification response and suggested that site 2 had a mild reducing environment18.

Taxonomical analysis based on the 16S rRNA amplicon datasets revealed that class Cyanobacteriia accounted for more than about 20% of the microbial community at site 1, which was markedly different from the other sites (Fig. 2b). The abundancy of Cyanobacteriia was also further confirmed as bins 7, 8 and 11 by assigning taxonomic classification with GTDB-Tk (Fig. 5). Class Kapabacteria was also abundant at site 1 in 16S rRNA amplicon datasets, which was further confirmed by binning (bin 6 in Fig. 6). Class Alphaproteobacteria, abundant at site 2 in 16S rRNA amplicon analysis, was assigned into bins 3 and 4. Sites 2 and 3 showed class Elusimicrobiota to be dominant in 16S rRNA amplicon analysis, which was also assigned as bin 5. Thus, 16S rRNA amplicon and shotgun metagenomic analyses provided the same dominant bacterial classes, demonstrating that the bacterial communities at site 1 were very different from those at sites 2 and 3. These differences among the three sites seem to be reflected by environmental factors as shown in Fig. 3.

The water samples at site 1 were collected from a faucet at the water treatment plant, whereas samples were collected via pumping from observation wells at the other sites. Cyanobacteria may have grown near the faucet from which the water was sampled, and some of the Cyanobacteria may have enter the water during sampling. The second possibility is that the groundwater was contaminated with Cyanobacteria from the surface filtration tank for observation. The growth rate of Cyanobacteria was relatively low67, suggesting in this case that this microbe was constantly being mixed with groundwater. It was noted that MetaPhlAn 4 analysis provided many samples with unclassified reads of more than 90%, suggesting that this analysis was not suitable in the present study.

This study demonstrated that various nitrogen metabolism genes were more abundant at site 3 than at sites 1 and 2 (Fig. 4), suggesting that nitrogen metabolism was particularly active at site 3. Although the representative denitrification marker genes, nirS/nirK32, were not detected, other denitrification genes and nitrate reduction genes were significantly more abundant at site 3 than those at the other two sites (Fig. 4). This may be related to the fact that the groundwater at site 3 had relatively anaerobic activity, which could be attributed to the low groundwater levels and low DO concentrations, as shown in Table 1. In addition, the high SS and DOC concentrations at site 3 indicate organic matter-driven active nitrogen metabolism in the microbial communities at this site. In addition, site 3 tended to have a high Shannon diversity index (Table 3). The presence of a variety of microorganisms is thought to increase the abundance of nitrogen metabolism genes.

We quantified nitrogen metabolism genes by defining the 'specific abundance of nitrogen metabolism gene' as the product of TPM and the amount of DNA extracted per L of groundwater (ng/L) as shown in Fig. 4. This approach takes into account not only the composition of nitrogen metabolism genes but also the abundance of bacteria in the groundwater. Hence, a high 'specific abundance' of a nitrogen metabolism gene suggests a high abundance of bacteria carrying that gene, indicating active nitrogen metabolism within the groundwater ecosystem. The specific abundances of nitrogen metabolism genes tended to be higher in the samples collected in December 2021 and January 2022 at sites 2 and 3 than those in November 2021(Fig. 4). This was possibly related to the rainfall event that occurred on the day before the samples were collected; the public atmospheric data showed heavy rainfall in December 2021 (Supplementary Table S4). It is plausible to infer that the rain facilitated the leaching of organic matter and nutrients into the groundwater, resulting in increased bacterial proliferation, including bacteria carrying nitrate reduction genes. This environmental condition may have affected the bacterial community.

As shown in Fig. 7, many genes related to nitrate reduction and denitrification could not be assigned, which may be associated with the fact that the groundwater examined in this study contains a high proportion of bacteria with relative abundances below 1% (Fig. 2). However, it is important to note that their presence within the overall microbial community works together for nitrate reduction and denitrification. In contrast, in environments such as soil, bacteria harbouring genes for nitrate reduction and denitrification have been identified27, further highlighting the distinctive microbial communities present in groundwater compared to soil. Furthermore, the total reads mapped were remarkably low at site 3 (Fig. 5). These results suggest a unique feature of the microbial community at site 3. Nitrogen metabolism plays very important roles in ecosystems and encompasses very complex pathways68. NO3-N is one of the most important elements for microorganisms and plants on Earth69. NO3 is converted to NO2 as an intermediate to ultimately produce NH3 which consists of assimilatory nitrate reduction and dissimilatory nitrate reduction70. It has been reported that narGHI and nasA genes are the key enzyme genes in the dissimilatory nitrate reduction pathway and assimilatory nitrate reduction pathway, respectively71. Interestingly, both genes were found to be abundant at site 3 (Fig. 4). The reaction process changes based on the reaction environment, and various microorganisms contain genes for the respective functions. It is important to understand the comprehensive picture of these processes. However, it was impossible in this study to identify microbial species or genera which may function in nitrogen metabolism. Shotgun metagenomics has been developed72, and the data based on the simultaneous analysis of microbial communities and functional genes have been accumulated73,74. It is important to isolate or identify groundwater microbes which contain the genes related to nitrogen metabolism for further development of current research.

Further research is needed to verify the correlation between nitrate reduction, denitrification, and other functions that consume NO3-N in groundwater to prevent nitrate pollution.