Genetic isolation and metabolic complexity of an Antarctic subglacial microbiome

Kim, Kyung Mo; Hwang, Kyuin; Lee, Hanbyul; Cho, Ahnna; Davis, Christina L.; Christner, Brent C.; Priscu, John C.; Kim, Ok-Sun

doi:10.1038/s41467-025-62753-3

Download PDF

Article
Open access
Published: 18 August 2025

Genetic isolation and metabolic complexity of an Antarctic subglacial microbiome

Nature Communications volume 16, Article number: 7501 (2025) Cite this article

5584 Accesses
39 Altmetric
Metrics details

Subjects

Abstract

Microbes inhabiting and evolving in aquatic ecosystems beneath polar ice sheets subsist under energy-limited conditions while in relative isolation from surface gene pools and their common ancestral populations of origin. Samples obtained from beneath West Antarctic Ice Sheet (WAIS) allowed us to examine evolutionary relationships of and identify metabolic pathways in microbial genomes recovered from the Mercer Subglacial Lake (SLM) ecosystem. We obtained 1,374 single-cell amplified genomes (SAGs) from individual bacterial and archaeal cells that were isolated from samples of SLM’s water column and sediments. These genomes reveal that a diversity of microorganisms including Patescibacteria exists in SLM. Comparative analyses show that most genomes correspond to new species and taxonomic groups, with phylogenomic and functional evidence supporting their genetic isolation from marine and surface biomes. Genomic data reveal diverse metabolisms in SLM that are capable of oxidizing organic and inorganic compounds via aerobic or anaerobic respiration. Distinct metabolic guild structures are observed for the subglacial populations, where trophic shifts from organotrophy to chemolithotrophy may depend on oxygen availability. Our SAG data suggest versatile metabolic capabilities in the characterized microbial assemblage, reveal key energy-generating strategies in the subglacial aquatic ecosystem, and provide a framework to assess microbial evolution beneath WAIS.

Biogeochemical and historical drivers of microbial community composition and structure in sediments from Mercer Subglacial Lake, West Antarctica

Article Open access 30 January 2023

Recovery of complete genomes and non-chromosomal replicons from activated sludge enrichment microbial communities with long read metagenome sequencing

Article Open access 16 March 2021

Discovery of a novel bacterial class with the capacity to drive sulfur cycling and microbiome structure in a paleo-ocean analog

Article Open access 18 August 2023

Introduction

Ice streams along West Antarctica’s Siple Coast flow over deformable sediments saturated with water¹, and overlie an extensive subglacial hydrological network that exists beneath the ice². Sediments upstream from the modern West Antarctic Ice Sheet (WAIS) grounding line consist of crustal sedimentary infill and overridden marine sediment, indicating the ice sheet extent has varied over geological time^3,4,5,6. The subglacial hydrological systems beneath the Whillans Ice Stream (WIS) and Mercer Ice Stream (MIS) consist of numerous lakes connected by a network of drainage pathways, with episodic flows eventually discharging water⁷, solutes⁸, and sediments⁹ to the marine cavity beneath Ross Ice Shelf¹⁰. Research in this region has revealed the roles of water and sediments in ice stream flow¹¹, the historical dynamics of WAIS^{4,5,6,12,13,14}, and the geobiology of its subglacial aquatic ecosystems^{15,16,17,18,19}. Given that genomic¹³ and glaciological¹⁴ evidence implies the form, evolution, distribution, and function of subglacial ecosystems were heavily influenced by ice sheet extent and sea level from the Late Pleistocene to Middle Holocene¹², new observations from beneath WAIS provide a means to evaluate these hypotheses.

Drilling projects that explored subglacial lakes in West Antarctica [i.e., Whillans Subglacial Lake (SLW) and Mercer Subglacial Lake (SLM)] have discovered metabolically active microbial communities^12,15,16,20 and evidence for metacommunity structure across distinct subglacial hydrological basins¹⁷. Because WAIS is a barrier to mixing between marine, surface, and subglacial systems, the sources of inorganic nutrients and organic matter to the lakes are basal ice melting²¹, rock comminution²², material stored in the sediments¹⁶, and in-situ water column and sedimentary transformations^9,12. In the absence of sunlight, the metabolic energy sources available in the subglacial environment include the oxidation of methane¹⁹ and reduced nitrogen, iron, and sulfur compounds^15,16,18,20. The processes driving microbial metabolisms in SLW and SLM extend beyond the subglacial ecosystems, exerting influence on the geochemical and biological systems as they discharge to the Southern Ocean. For instance, several studies have highlighted the potential for nutrients in subglacial discharge to affect oceanic primary productivity³, bacterial productivity²¹, and the level of greenhouse gases^3,19.

SLM is overlain by 1,087 m of ice, is located near the confluence of MIS and WIS, and formed ~180 y before present with stagnation of Kamb Ice Stream⁹ (Fig. 1). When sampled in December 2018, SLM had a surface area of ~143 km², and a water column depth of ~15 m deep at the drill location²³. Approximately 80% of the freshwater in SLM is sourced from WAIS basal melt, with the remainder originating from East Antarctica^7,12. The lake undergoes a fill-drain cycle every 4 to 6 years⁹. Measurements in the water column of SLM indicated a temperature of −0.74 °C, a pH of 8.2, low dissolved organic carbon (DOC; ~30 µM), low inorganic nutrients (NH₄⁺, NO₃^- and soluble reactive P of 0.5, 3.0 and 2.0 µM, respectively), and oxygen supersaturation^17,23,24. Sediment cores retrieved from SLM consisted of a ~12 cm thick surface unit of finely laminated material deposited from glacially-sourced lake water. The laminated sediments overlie diamict that pre-dates the lake and were deposited from the upstream catchment⁹. Sediment porewater profiles for chloride and conductivity that increase with depth indicate an influence of past marine incursion events, the most recent of which occurred during the Middle Holocene¹².

**Fig. 1: Location and physicochemical characteristics of Mercer Subglacial Lake (SLM).**

We used a single-cell genomic approach to recover microbial genomes from the water column and surficial sediments of SLM. Current information on the structure and function of microbial communities in WAIS subglacial lakes has largely relied on studies of 16S rRNA genes^15,16,17, metabolic marker genes^18,19, and biogeochemical data^20,21. Given the limitations of single-gene datasets for inferring evolutionary relationships and metabolic diversity, our single-cell genomic data provides a large complement of identifiable functional genes for the prediction of metabolic pathways and to conduct robust phylogenomic analysis. Single-cell genomics has several advantages over shotgun metagenomics, including the ability to resolve strain heterogeneity within a species that is collapsed within metagenome assemblies²⁵, as well as being ideal for samples in which recoverable cell biomass is too low to yield sufficient DNA for robust shotgun library construction, sequencing, and assembly²⁶. Our data from 1374 single-cell amplified genomes (SAGs) allowed us to reconstruct metabolic pathways contributing to biogeochemical cycling in SLM and offer insight into the emerging understanding of microbial evolution beneath WAIS.

Results

Microbial diversity of SLM SAGs

A total of 3170 microbial cells were sorted from four filter-concentrated lake water samples (>3 µm, 3 to 0.8 µm, 0.8 to 0.2 µm, and <0.2 µm), five sediment core samples (10 cm depth, sectioned at 2 cm intervals), and a bulk surficial sediment sample using flow cytometry²⁷. After genome amplification and sequencing, bioinformatic analyses identified 1,374 SAGs with <5% CheckM contamination rate (Supplementary Note 1)²⁸. Using GTDB-Tk²⁹, the SAG sequences were aligned to GTDB marker genes and classified by their taxonomic placement into the GTDB reference tree³⁰. Given that the placement-based approach can produce incorrect taxonomic assignments when queries are distantly related to reference genomes³¹, the taxonomy inferred from these alignments was evaluated using FastTree³². This indicated that 57 of the 1,374 SAGs were misidentified (e.g., Supplementary Fig. 1), so the taxonomic assignment for these SAGs were revised based on FastTree phylogeny (Supplementary Data 1).

Analysis of the curated SAGs revealed differences in the taxonomic composition (Fig. 2) and structure (Supplementary Fig. 2a) of assemblages from the water column and sediment (P-value < 0.05 by PERMANOVA test). The clear taxonomic differences between the microbes in water and sediments suggest that the two habitats are environmentally heterogeneous. The six sediment samples exhibit similar microbial community structures (Supplementary Fig. 2a); hence, they were pooled into a single sample (total of 756 SAGs, hereafter called sediment_SAGs). The three water samples (fractions >0.2 µm) sharing similar community structures were also pooled (a total of 428 SAGs, hereafter called lake_SAGs), while the remaining water sample fraction (<0.2 µm) was examined separately (a total of 190 SAGs, hereafter called lake_small_SAGs).

**Fig. 2: Phylogenomic trees of bacteria (1367 SAGs) and archaea (7 SAGs) of Subglacial Lake Mercer.**

The sediment_SAGs classify within seven bacterial phyla, but the vast majority are Actinomycetota (69.2% SAGs) and Pseudomonadota (16.8%) (Fig. 2 and Supplementary Data 2). Similarly, the set of lake_SAGs showed high abundances of Pseudomonadota (53.7%) and Actinomycetota (19.9%) among the 14 bacterial phyla and 1 archaeal phylum identified, while Patescibacteria (83.7% of lake_small_SAGs) and Actinomycetota (10.0%) were the most abundant phyla of the eight comprising the lake_small_SAGs. Archaeal SAGs were rare in the dataset and represented by five lake_SAGs affiliated with the genus Nitrosarchaeum and two lake_small_SAGs affiliated with the family GW2011-AR1 (the phylum Nanoarchaeota). Only two SAGs were taxonomically assignable at the species level (Supplementary Data 1). Nearly 95% of the SAGs (1296) could be identified as members of 162 GTDB genera (e.g., Polaromonas and UBA1004). Of the remaining 76 SAGs that could not be classified at the lowest levels of the taxonomic hierarchy, 5 belong to unclassified families (e.g., unclassified Hydrogenedentiales), and 71 are members of 28 unclassified genera (e.g., unclassified Nanopelagicaceae). The CHAO1 statistic predicts 351 genera and indicates that the 162 genera (inc. two species) and 33 higher taxonomic groups comprising the 1374 SAGs account for approximately half of the microbial richness in SLM. The rarefaction analysis also demonstrated that the SAG data underrepresented the diversity of the water and sediment communities (Supplementary Fig. 3a).

The SAG data indicated that viruses are integral components of the SLM ecosystem. VirSorter which searches for viral hallmark genes and viral-like genes detected four phage contigs (4 SAGs) and two prophage-containing contigs (2 SAGs) as the most confident predictions³³. The less confident prediction of VirSorter yielded 98 phage contigs and 50 prophages from 89 and 49 SAGs, respectively. Except for five phage contigs and three prophages, all of the phage contigs and prophages contain viral hallmark genes and viral-like genes that classify within Caudovirales. In addition, CRISPR spacer sequences were detected in 854 SAGs at an average frequency of 3.2 copies per SAG, providing direct evidence for past viral infection of SLM’s microbial populations.

A total of 13,022 amplicon sequence variants (ASVs) from SLM, SLW, and Whillans Grounding Zone (WGZ)¹⁷ were compared to 686 SAGs containing 16S rRNA sequences longer than 1000 bp (Supplementary Data 1). Of these, 668 SAGs (97.4%) had 16S rRNA sequences with ≥99% identity to 803 ASVs, comprising 616 from SLM, 340 from SLW, and 24 from WGZ (Supplementary Fig. 4b). The remaining 18 SAGs did not match any ASVs. Notably, 645 of the 668 SAGs shared ≥99% sequence identity with the 616 ASVs from SLM, indicating that approximately 94% of the SAGs (645 out of 686) are represented in the SLM ASV dataset. Comparison of abundance-weighted beta diversity showed that community structure based on the SAG-derived 16S rRNA sequences was distinct from that based on analysis of ASVs of SLM (P-value < 0.05 by PERMANOVA) (Supplementary Fig. 2b, c). The disparity in taxonomic abundance between assemblages in the SAG and ASV datasets is likely attributable to well established technical biases associated with single-cell sorting, whole genome amplification, and 16S rRNA sequencing, including differences based on cell size, lysis efficiency, particle association, taxon abundance, GC content, and primer affinity (Supplementary Note 2)²⁵.

Evolutionary divergence of SLM SAGs

Phylogenomic analysis of the 19 most abundant genera (genera with abundance > 1% of the total SAGs; 71% of the 1,374 SAGs; Fig. 2) revealed that most SAGs cluster together and that they are phylogenetically separated from genomes from other environmental sources (P-value < 0.05 by genealogical sorting index test³⁴, Fig. 3a and Supplementary Figs. 6–25). To assess evolutionary divergence, we examined genomic average nucleotide identity (ANI). ANI values for 1,367 SAGs to their closet GTDB genomes average 77.0% (Fig. 3b), while the remaining 7 SAGs, which showed less than 10% alignment with any GTDB reference genome, were classified as ‘No match’. ANI values were also calculated for the 64 of 66 SAGs with <5% CheckM contamination that were recovered from seawater beneath Ross Ice Shelf (RIS; EMBL, Bioproject PRJEB35712)³⁵. This site is ~800 km from the location where outflow from SLM enters the marine cavity below RIS and its SAGs were more similar to their closet GTDB genomes than the SLM SAGs (P < 0.01; two-tailed t-test), having an average ANI of 87.5%. Only one RIS SAG was classified as ‘No match’. We also computed the amino acid sequence similarity of individual SAG proteins against all proteins in the NCBI non-redundant database. To achieve this, the 1,399,955 predicted protein sequences from the 1374 SAGs were grouped into 534,964 clusters based on a > 95% sequence similarity cutoff and blasted against the database. The same procedure was conducted for the 63,006 protein-encoding genes (45,895 clusters) of the 64 RIS SAGs. The results produced distinctly different distributions of protein clusters corresponding to each site and that have significant differences in sequence similarity to their nearest neighbor proteins in GenBank (P < 0.001 by t-test; Fig. 3c). The lower mean of amino acid similarity for SLM SAGs (67.8%; RIS, 82.1%) is a genomic feature consistent with a higher degree of evolutionary divergence in Antarctic subglacial microbial populations relative to those inhabiting the marine system.

**Fig. 3: Genetic isolation of SLM’s SAGs.**

We reconstructed phylogenomic trees for the 19 most abundant genera (>1% SAG abundance for each; Supplementary Figs. 7–25) to identify sister taxa of the SAGs on rooted trees for examining the possible environmental (e.g., freshwater versus marine) and evolutionary origins of SLM’s extant populations. The environmental origins of sister taxa to the SLM lineages were determined by extracting the isolation source from NCBI metadata for 1,810 GTDB genomes taxonomically related to the 19 genera (Supplementary Data 3, 4). The majority of source habitats inferred were terrestrial ecosystems that included freshwater and ice (e.g., groundwater, soil, glacial ice, acid-mine drainage, and wastewater), while few related genomes were identified from marine environments (Supplementary Figs. 7–25, Supplementary Data 5). When comparing the relative abundance of metabolic pathways (i.e., the proportion of genomes; see below for details) between SLM taxa and their sister taxa within phylogenomic trees of the most abundant genera, we observed clear functional divergence between these two phylogenetic groups (Supplementary Figs. 7–25; Supplementary Data 10).

Metabolic pathway inferred from SAG data

The annotation of the SAGs produced an average of 1,058 genes and 550 KEGG Orthology (KO) per SAG. Compared with 14,536 KOs predicted by CHAO1, the 11,645 KOs observed from the 1,374 SAGs represented approximately 80% of the expected functional diversity of SLM (Supplementary Fig. 3b). Metabolic potential of individual SAGs was determined by identifying key enzymes and other criteria (Supplementary Data 6–8). The true proportion of genomes with a specific metabolic potential was estimated by correcting for the incompleteness of the SAGs using the formula from Acker et al. (2022)³⁶. The 20 SAGs with <10% completeness inflate values for the proportion of genomes, and therefore, were excluded from this analysis. It is noteworthy that these 20 large genomes exhibit both 0.2% of genomic contamination and a clear correlation between genome size and gene number (Supplementary Fig. 26), suggesting that they are genuine and not the result of bias from whole genome amplification. Bacterial strains within a species exhibit distinct functional traits³⁷, and strain heterogeneity was evident in our statistical analysis (Supplementary Fig. 28 and Supplementary Note 3), where the observed proportion (52.3%) of KEGG orthologs shared by three near-complete SAGs (89.4% completeness on overage) of the same species (>98% ANI values) was significantly lower (P < 0.01; Monte Carlo test) than the expected value (71.6%). Because bacterial strains of the same species can differ significantly in their ecological functions, we examined the metabolic potential for each SAG.

The 1374 SAGs exhibited an average genome completeness of 38.7% and contamination of 0.4% (Supplementary Data 1). Based on the minimum information standards for microbial genomes³⁸, approximately 73% of these SAGs were classified as low-quality drafts with <50% completeness. To assess whether the low genome completeness biased the estimation of metabolic potential, we statistically compared the genomic proportions of 70 metabolic pathways between high-quality reference genomes and those subjected to artificial genome reduction. We first identified the closest GTDB genomes for 1229 SAGs based on ANI values, excluding SAGs with extremely low completeness, no close GTDB genome, or higher completeness than their GTDB counterpart. From the 1229 GTDB genomes (average completeness of 82.4%), the true proportion of genomes for each metabolic pathway was estimated while correcting for genome incompleteness³⁶ (Supplementary Data 9). Next, each GTDB genome was artificially reduced to match the genome completeness of its corresponding SAG (see Methods), and the expected proportion of genomes for individual metabolic pathways were calculated from the set of 1229 reduced GTDB genomes. This process was repeated 1000 times, generating distributions of genomic proportions for the metabolic pathways. Comparison of these distributions to the proportions of the unreduced GTDB genomes revealed that only 4 out of 70 metabolic pathways (glycolysis, gluconeogenesis, glyoxylate cycle and hydrogenase) showed significantly lower representation (fold change <1 and FDR-adjusted P < 0.05), suggesting that 94.3% of pathways (66 out of 70) were not significantly underestimated at ~38% genome completeness. However, 19 out of the 70 metabolic pathways were significantly overrepresented (fold change > 1 and FDR-adjusted P < 0.05), including the aa₃-type cytochrome c oxidase, ED pathway, fermentation metabolisms, nitrogen metabolisms, and sulfur metabolisms (Supplementary Data 9). Thus, caution is warranted when interpreting metabolic potential in these pathways, referring to the observed inflation levels (e.g., fold changes in Supplementary Data 9). Using only higher-quality SAGs (e.g., those with >50% genome completeness) has been proposed as a means to improve the reliability of functional inference. However, this approach can reduce observed microbial diversity and introduce bias into the community structure. For example, when the analysis was restricted to the 372 SAGs with >50% completeness out of the 1374 SAGs, Actinomycetota (45.6% of the complete dataset) disproportionately increased to 74.5% in the filtered subset. Additionally, such filtering excluded rare but ecologically important taxa, including Nitrosarchaeum (completeness <22.8%) and an ammonia-oxidizing Nitrosomonadaceae SAG with 48.2% completeness. Therefore, we included all the 1374 SAGs in the functional analyses, regardless of their genome completeness. We first present a quantitative assessment of metabolic potential for lake_SAGs and sediment_SAGs, followed by a separate analysis of lake_small_SAGs, given that highly incomplete pathways in these smaller genomes may underestimate overall metabolic potential.

None of the SAGs possess genes for photosynthetic pathways, but genes involved in autotrophic CO₂ fixation pathways were identified in 38.2% of lake_SAGs and 10.6% of sediment_SAGs (Fig. 4). All the putative autotrophs possessed hallmark genes of the Calvin-Benson-Bassham (CBB) cycle, with the exception of two lake_SAGs belonging to the family UBA6902 (the class Thermodesulfovibrionia), which contained genes for the Wood-Ljungdhal pathway. Though we found no genetic evidence for the hydroxypropionate/4-hydroxybutyrate cycle typical of ammonium-oxidizing archaea³⁹ in any of the five lake_SAGs of the genus Nitrosarchaeum, their CheckM completeness estimates (average of 18.3%) were low. The water column taxa with primary production potential were dominated by the genera Nitrotoga (18 SAGs), SURF-13 (7), Polaromonas (4), and 39-52-133 (4), whereas in the sediments, Nitrotoga (11 SAGs), SPCO01 (8), UBA4592 (5), and Polaromonas (5) were the most abundant autotrophic taxa inferred (Supplementary Data 8).

**Fig. 4: Metabolic potential inferred from SLM SAGs.**

Multiple copies of genes encoding glycoside hydrolases (GHs) and peptidases were present in nearly all SAGs. There is also evidence for the capacity to conduct Beta-oxidation in 37.3% of lake_SAGs and 41.9% of sediment_SAGs (Figs. 4 and 5). The glycolytic pathway was present in 82.0% of lake_SAGs and 83.5% of sediment_SAGs, with the Embden-Meyerhof (64.8% and 73.0%, respectively), Entner-Doudoroff (ED; 21.3% and 11.8%, respectively), and semi-phosphorylative ED (23.1% and 18.0%, respectively) pathways identified (Supplementary Data 8). The lake_SAGs and sediment_SAGs reveal the prevalence of the TCA cycle (75.5% and 109.9%, respectively), glyoxylate shunt (16.9% and 58.4%, respectively), and gluconeogenesis (51.5% and 75.1%, respectively). Where proportion values exceed 100%, multi-copy genes are present in the SAGs. Genes for fermentation pathways that produce butyrate, lactate, acetate, formate, and ethanol from pyruvate were observed at low frequency in SAGs (Supplementary Fig. 29).

**Fig. 5: Taxonomic distribution and network representation of metabolic potential.**

The aa₃-type cytochrome c oxidase (cyt-aa₃, low affinity for oxygen) was detected in 103.0% of lake_SAGs and 147.2% of sediment_SAGs, whereas the cyt-cbb₃ type (high affinity for oxygen) appeared in 50.6% of lake_SAGs and 12.7% of sediment_SAGs (Figs. 4 and 5). SAGs of Actinomycetota exclusively possessed cyt-aa₃, while the co-occurrence of both cytochrome genes was frequently observed in Pseudomonadota (Supplementary Data 8). Both Nitrotoga and SURF-13 possessed only cyt-cbb₃.

Neither N₂ fixation nor anammox genes were present in the SAG dataset. The complete operon for ammonium monooxygenase (amoCAB) was detected in a lake_SAG classifying within the family Nitrosomonadaceae, but not in the five Nitrosarchaeum SAGs (Supplementary Data 8)³⁹. Hydroxylamine oxidation genes were observed in 0.9% of lake_SAGs (Polaromonas) and 3.1% of sediment_SAGs (mostly the genus 12-FULL-67-14b). The operon for dissimilatory nitrate reductase (narGHI) was identified in 53.3% of lake_SAGs and 44.7% of sediment_SAGs. Periplasmic nitrate reductase (napAB) was detected in 16.9% of lake_SAGs and 5.6% of sediment_SAGs. Reversible nitrite reductase (16.9% lake_SAGs and 50.6% sediment_SAGs), nitrite reductase (6.2% and 2.8%, respectively), nitric oxide reductase (9.7% and 5.3%, respectively), and nitrous oxide reductase (8.0% and 9.6%, respectively) genes involved in denitrification were also present. Ammonifying nitrite reductase for dissimilatory nitrate reduction to ammonium (DNRA) was common to 67.5% of the lake_SAGs and 58.1% of sediment_SAGs. Nitrogen mineralization pathways were highly represented in the dataset (113.3% of the lake_SAGs and 151.5% of the sediment_SAGs; Supplementary Note 4).

Many of the genes encoding enzymes involved in reverse dissimilatory sulfate reduction (rDSR) were abundant in the SAG dataset (Fig. 4): sulfate adenylyltransferase (62.2% lake_SAGs and 17.4% sediment_SAGs), adenylylsulfate reductase (57.7% and 13.7%, respectively), dissimilatory sulfur oxidase (46.2% and 9.3%, respectively), sulfite oxidase (12.4% and 16.5%, respectively), and sulfide oxidase (86.2% and 101.5%, respectively). Phylogenetic analysis confirmed the dsrA genes from SLM SAGs are more closely related to the genetic form of dsrA possessed by sulfur-oxidizing lithotrophs, and more distantly related to the version involved with sulfite reduction (Supplementary Fig. 30). No other genes for dissimilatory sulfate reduction were present in the water column and surficial sediment of SLM. The metabolic potential for thiosulfate oxidation was shown by genes for the SOX complex (54.2% lake_SAGs and 13.7% sediment_SAGs), while those for thiosulfate oxidases were less abundant (2.7% and 0.9%, respectively). Genes encoding enzymes for the oxidation of thiosulfate into tetrathionate (65.7% lake_SAGs and 24.8% sediment_SAGs) are more abundant than those catalyzing the reverse reaction (5.3% and 3.7%). The rDSR and SOX complexes occurred exclusively in Pseudomonadota, whereas sulfide oxidase was widely distributed across diverse phyla (Fig. 5a). Sulfur mineralization pathways were encoded in 95.6% of lake_SAGs and 141.3% of sediment_SAGs (Supplementary Note 4).

A proportion of SAGs contained genes involved in iron cycling and trace gas metabolism (Fig. 4). Three types of iron oxidase were detected: neutrophilic iron oxidase (1.8% lake_SAGs and 1.6% sediment_SAGs), probable neutrophilic iron oxidase (mtoAB; 15.1% and 10.6%, respectively), and acidophilic iron oxidase (5.3% and 3.1%, respectively). The SAGs of putative iron oxidizers were mostly Pseudomonadota, but five were affiliated with Acidobacteriota. Genes for iron reductases and probable iron reductases (mtrABC) were found in 1.8% of lake_SAGs and 0.6% of sediment_SAGs. Only three lake_SAGs belonging to the genus Methylobacter_A possessed aerobic methane oxidation genes (pmoCAB). Genes involved in methanogenic pathways were absent in the SAG dataset, while genes encoding C-P lyase were detected in two lake_SAGs and two sediment_SAGs. The potential to oxidize hydrogen was encoded in 7.1% of lake_SAGs and 2.2% of sediment_SAGs that possessed at least one of the following hydrogen oxidases: hoxHFUY, hndABCD, hyaABC, and mbhLKJ. Aerobic CO oxidases (coxLMS) were present in 411 SAGs (77.3% lake_SAGs and 96.6% sediment_SAGs). Phylogenetic analysis of CoxL sequences (Supplementary Fig. 31) identified 216 SAGs (19.5% lake_SAGs and 57.1% sediment_SAGs) that encode the form I-like CoxL; a marker of the molybdenum-containing carbon monoxide dehydrogenase (Mo-CODH) used to derive energy from CO oxidation⁴⁰. The form I-like coxL was only found in SAGs classifying within the phyla Actinomycetota (159 SAGs), Pseudomonadota (37 SAGs), and Chloroflexota (20 SAGs).

Network constructs using co-occurrence of biogeochemically relevant genes from SAGs produced four metabolic guilds in SLM’s water column and sediments with a distinct functional association (Fig. 5b, c). The water column guild lake_G1 has characteristics that include aerobic (Cyt-aa₃) and nitrate respiration with catabolism of monosaccharides (EMP and TCA), fatty acids (beta-oxidation), carbon monoxide, and nitroalkane. The lake_G2 guild contains the potential for lithotrophic sulfur oxidation and probable iron oxidation (MtoAB) coupled to Cyt-cbb₃, urea hydrolysis, and carbon fixation. The guild sediment_G1 is inferred to be capable of using urea as a N source (urea amidolyase) and resembles lake_G1 devoid of CO oxidation and nitroalkane oxidation. The guild sediment_G2 is similar to lake_G2, but also includes nodes associated with CO oxidation, DNRA, denitrification, nitroalkane oxidation, and alkane sulfonate oxidation. Guilds lake_G2 and sediment_G2 are almost exclusively composed of Pseudomonadota SAGs, whereas lake_G1 and sediment_G1 contain various taxonomic groups.

The lake_small_SAGs (<0.2 µm fraction) exhibited significantly smaller genome size (1,279,069 bp on average) compared to lake_SAGs from cells >0.2 µm (2,837,467 bp) and sediment_SAGs (2,979,839 bp) (Welch’s two-tailed t-test, P < 0.05). Because the genes and metabolic pathways in these genomes are highly streamlined, combining lake_small_SAGs with lake_SAGs would underestimate the proportion of genomes for metabolic potential. Therefore, the proportion of genomes with a specific metabolic potential for lake_small_SAGs was separately estimated, considering the incompleteness of the SAGs³⁶. In contrast to the higher metabolic complexity observed in lake_SAGs and sediment_SAGs, lake_small_SAGs showed a marked deficiency in genes involved in carbon, nitrogen, sulfur and iron cycling (Supplementary Fig. 27 and Supplementary Data 8). Although genes associated with aerobic energy conservation through glycolysis or beta-oxidation were detected, these pathways were predominantly found in non-Patescibacteria SAGs (31 out of 190 lake_small_SAGs). Aside from limited sulfur oxidation potential (7% of lake_small_SAGs for sulfide and 3% for thiosulfate), no complete pathways for the oxidation of inorganic compounds (e.g., nitrification and ferrous oxidation) were detected. Although the nitrite reductase gene (nirK) was identified in both Patescibacteria and non-Patescibacteria, the complete pathway for denitrification was not found.

The phylum Patescibacteria (i.e., Candidate Phyla Radiation), which comprised the majority of lake_small_SAGs (159 out of 190 SAGs) and a smaller subset of lake_SAGs (43 out of 428 SAGs), possessed highly streamlined genomes averaging 1.15 Mbp in estimated size (Supplementary Fig. 32). These genomes lack complete biosynthetic pathways for nucleotides, most amino acids, and lipids (Fig. 6 and Supplementary Data 8). Instead, genes for nutrient uptake and degradation that could complement these metabolic deficiencies were widely distributed in the Patescibacteria SAGs. For example, genes involved in DNA uptake that include type IV pilus assembly proteins (K02654, 125.5%; K02662, 160.1%; K02669, 142.8%) and competence proteins ComEC (K02238, 156.9%) and ComFA (K02242, 111.4%) were overrepresented when the proportion of genomes was estimated considering genome incompleteness (33.8% on average) for the Patescibacteria SAGs. Both exodeoxyribonuclease III (K01142, 97.3%) and TatD DNase (K03424, 149.1%), involved in DNA degradation, were also highly prevalent among the Patescibacteria genomes. The gene encoding peptide/nickel transport system substrate-binding protein (K02035), responsible for oligopeptide uptake, was present at a high genomic proportion (160.1%). A wide array of peptidases were detected, including ClpP (K01358, 120.8%) and DegP (K04771, 86.3%). Genes involved in the oxidation of inorganic electron donors (e.g., sulfur, nitrogen, iron, and hydrogen) appear to be underrepresented in Patescibacteria from SLM. Instead, genes that utilize external carbohydrates as an energy source were frequently observed. These included a multiple sugar transporter system (K02027, 124.0%), glycosidic hydrolases like alpha-amylase (K07405, 43.9%), glucoamylase (K01178, 31.4%), and beta-glucosidase (K05349 and K05350, 31.4%). Interestingly, phosphofructokinase, a key enzyme in the energy investment phase of glycolysis, was not detected in any Patescibacteria SAGs. On the other hand, pyruvate kinase (K00873), which is involved in the energy payoff stage of glycolysis, was detected in 30 Patescibacteria SAGs. Complete pathways for the TCA cycle and beta-oxidation were not observed. A complete electron transport chain was not detected, although cytochrome bo₃ oxidase was found in six SAGs of the genus UBA1550 (the class Paceibacteria) and one SAG of the genus JAICHO01 (the class Saccharimonadia). In contrast, genes encoding subunits of the F-type ATPase (K02108-K02115) were common, with the proportion of genomes ranging from 62.8% to 92.6%. Intriguingly, transaldolase, a key enzyme in the nonoxidative pentose phosphate pathway that serves as a metabolic shunt of glycolysis, was scarcely found in a few Patescibacteria SAGs. Genes encoding lactate dehydrogenases were found in 24 Patescibacteria SAGs (21 with the D-lactate form, K03778; three of the L-lactate form, K00016). Pyruvate fermentation pathways to ethanol or acetate were not complete in any of the individual SAGs, although 38 SAGs possessed genes encoding acylphosphatase (K01512, 35 SAGs) or acetate kinase (K00925, 3 SAGs). Additionally, four SAGs possessed the phosphoketolase gene (K01621), enabling conversion of xylulose-5P to acetyl-phosphate. Genes encoding superoxide dismutase (K04564, K00518, and K04565 for SOD2) that play a critical role in protecting bacteria from free radicals in oxic environments were found in 85 Patescibacteria SAGs at a genomic proportion of 133.4%. Haemolysin (K11068) was detected in a small fraction (3.1%) of the SAGs. Finally, CRISPR spacers that are the genetic footprints of past viral infections were present in 65 SAGs, averaging 1.9 copies per SAG.

**Fig. 6: Schematic representation of metabolic potential for Patescibacteria in SLM.**

Discussion

SLM is below sea level, is proximal to the oceanographic system (~50 km from the current grounding line), and has sediments containing evidence for recent marine interactions¹², yet the relatively low chloride concentration in sediment pore waters indicates deposition by an upstream freshwater catchment⁴¹. This conclusion aligns with our high-resolution phylogenomic analysis of single-cell populations in SLM’s water column and sediments, which reveals that nearly all genomes are most closely related to those documented in terrestrial aquatic ecosystems. Rarely did they form sister groups with genomes recovered from marine environments (Supplementary Figs. 6–25), providing low support that its extant species evolved from marine taxa.

The SLM SAGs form cohesive phylogenetic clusters to the exclusion of any of the 596,859 genomes available in release 220 of GTDB (Fig. 3a and Supplementary Figs. 7–25) and encode ORFs with low similarities to known proteins in the NCBI non-redundant database (Fig. 3c). The phylogenetic and genetic differences are mirrored at the functional level across the most abundant genera (Supplementary Figs. 7–25 and Supplementary Data 10), which is likely an evolutionary outcome of microbial isolation in Antarctic subglacial environments. Average ANI values of 77.0% (Fig. 3b) are below the cutoff of 95% commonly used to demarcate prokaryotic species⁴², providing a glimpse of the unknown biodiversity under the Antarctic ice sheet and implying that the 1372 SAGs likely represent new species candidates or higher novel taxonomic groups. A further consideration for populations in SLM is the low energy conditions and their slow growth rates²⁰, which should fundamentally influence their evolutionary rates⁴³. As such, the level of diversification observed in SLM SAGs would not be expected if derived from populations introduced to this region ~6300 years before present¹². Data based on 16S rRNA gene sequences¹⁷ showed that approximately half of the taxa detected in SLW were also present in SLM, suggesting a common upglacier source of microbial innocula to the lakes and implicating co-transport with subglacial water and sediment as a dispersal mechanism. If SLM is a component of a metacommunity that is linked via subglacial hydrological systems, then the upper timeframe for population isolation may correspond to the most recent destabilization of WAIS in the last few million years^44,45. Evolutionary dating was not performed in this study, as the differences in evolutionary rates between subglacial microorganisms and their nearest neighbors will need to be carefully considered before accurate divergence time estimates are possible (Supplementary Note 5)⁴⁶.

Due to the lack of sunlight and photosynthesis in Antarctic subglacial environments, the sole source of oxygen is from ice-entrapped atmospheric gases introduced to subglacial water during basal melting²¹. Our study revealed that both Cyt-aa3 and Cyt-cbb3 type cytochrome c oxidases were intimately associated with metabolic guilds in the water column and surficial sediments (Fig. 5), indicating the importance of oxygen as an electron acceptor for facilitating redox reactions in SLM. Specifically, the prevalence of cyt-aa₃ (103.0% of lake_SAGs and 147.2% of sediment_SAGs) over cyt-cbb₃ (50.6% lake_SAGs and 12.7% sediment_SAGs) is a characteristic consistent with high oxygen availability. Interestingly, SAGs of Actinomycetota only possess the cyt-aa₃ type, while those of Pseudomonadota often have both types, suggesting that the proteobacterial species may have larger tolerances to oxygen availability.

The four metabolic guilds inferred by the co-occurrence network analysis also highlight the role of oxygen as a parameter shaping the microbial community and influencing SLM biogeochemical processes (Fig. 5b, c). Pathways for oxidizing simple sugars (glycolysis pathway in 83.5% sediment_SAGs) and fatty acids (beta-oxidation in 41.9% sediment_SAGs) may be coupled with either aerobic or anaerobic respiration in surficial sediments (sediment_G1 in Fig. 5c). The widespread incidence of genes encoding GHs and peptidases in the SAGs indicates a high potential for the utilization of complex carbon sources including carbohydrates, peptides, organic sulfur (e.g., sulfate esters), and organic nitrogen (e.g., nitroalkane) (Figs. 4 and 5). Both sediment guilds possess the genetic capacity for aerobic respiration but have different preferences in their electron donors (inorganic vs. organic; Fig. 5c). In sediment_G2, ~10% of sediment_SAGs contain genes for lithotrophic energy generation and fixing carbon dioxide, including the oxidation of a variety of reduced inorganic compounds such as sulfide, thiosulfate, carbon monoxide, and Fe²⁺ using oxygen or nitrate. DOC that is supplied by upward diffusion from sediments⁸ and chemosynthesis in the water column (lake_G2 in Fig. 5b) would support organotrophic lifestyles (lake_G1 in Fig. 5b). The presence of genes enabling metabolism of carbon monoxide (sediment_G2, 57.1% sediment_SAGs; lake_G1, 19.5% lake_SAGs) suggests CO oxidization could supplement cellular energy production in the water column and surficial sediment communities.

Although SLM’s water column was supersaturated with oxygen at the time of sampling²⁴, our single-cell genomes exhibited characteristics consistent with flexibility to metabolize under suboxic and anoxic conditions. For example, ~13% of sediment_SAGs contained the gene complement for Fe²⁺ oxidation coupled to Cyt-cbb₃ for energy generation (sediment_G2 in Fig. 5c). In the relatively lower oxygen concentration in the surficial sediments, the presence of this pathway may enhance metabolic competition for oxygen over chemical oxidation⁴⁷. However, since a similar proportion of the genomes from iron oxidizers (19.5% of lake_SAGs) also occurred in lake_G2 with Cyt-cbb₃, it is unclear if the presence of distinct cytochrome c oxidases has phenotypic consequences that enhance iron-oxidation over a range of oxygen concentrations. Denitrification and fermentation, both of which are known to occur under anoxic conditions, were also detected in both the water column and sediments (Fig. 4 and Supplementary Fig. 29). Their presence may be explained by the formation of anoxic microenvironments⁴⁸ within oxygenated habitats. Alternatively, these metabolisms may occur when SLM becomes anoxic due to the depletion of dissolved oxygen driven by chemical oxidation and microbial respiration⁴⁹. While methanotrophic functional genes were abundant and methane oxidation rates were measured in SLW^17,19, our single-cell data from SLM contain only five lake_SAGs affiliated with the genus Methylobacter_A and with the functional complement of genes to oxidize methane. However, it remains possible that rare, yet active taxa contribute to those pathways⁵⁰.

None of the SAGs encode genes involved in nitrogen fixation, while genes encoding the complete denitrification pathway are relatively abundant (Fig. 4). This suggests that microbe-mediated nitrogen cycling in SLM may diminish the biogenic N pool⁵⁰. The main sources of nitrogen (e.g., ammonium and nitrate) are presumably from basal ice melt, the underlying sediments, and bedrock comminution^22,51. Ammonium is believed to be an important electron donor in many subglacial environments^15,16,52, but surprisingly, only one SAG was identified that contains genes for ammonium oxidation (the family Nitrosomonadaceae) (Supplementary Data 1). The putative nitrite oxidase gene (nxrA) was identified via KEGG annotation in only three out of 108 Nitrotoga SAGs, despite this genus being known for nitrite oxidation⁵³. Since NxrA shares the same KEGG ortholog K00370 with nitrate reductase (NarG), we compared the three sequences against curated sequences of NarG and NxrA from the nitrogen cycling database NcycDB⁵⁴. The results indicated higher similarity to NarG than to NxrA. BLASTp comparisons further revealed that the sequences are more similar to NarG of the nitrate reducer Sulfuriferula multivorans (88% sequence similarity with WP_124703789.1 in GenBank) than to NxrA of the experimentally verified nitrifying Candidatus Nitrotoga arctica (23% similarity with WP_239795842.1). Hence, the metabolic capacity for nitrite oxidation is not exhibited by the 108 Nitrotoga SAGs we characterized from SLM. Instead, their gene complement suggests they conserve energy organotrophically or use a variety of inorganic electron donors, including reduced sulfur compounds, Fe²⁺, and CO, to grow lithotrophically (Fig. 5a). To evaluate the overall metabolic potential for nitrite oxidation in SLM, we screened all the proteins of the 1,374 SAGs against NxrA sequences from NCycDB. This analysis identified one nxrA gene in the single-cell SLM_LV1_0208-B14, which is affiliated with the family Opitutaceae in the phylum Verrucomicrobiota (Supplementary Data 1). A subsequent BLASTp comparison of this sequence revealed 70.8% similarity to the nxrA gene of the known nitrite oxidizer Thiocapsa KS1⁵⁵ (CRI68048.1 in GenBank), and only 24.2% similarity to the narG gene of the nitrate-reducer Opitutus terrae⁵⁶ (WP_012374288.1 in GenBank), a member of Opitutaceae. These results suggest the possible presence of nitrite oxidizers in SLM.

Chemosynthetic primary production could be the most important process supporting heterotrophic life in dark subglacial environments¹⁶. Among the 1,374 SAGs, 77 were identified that possess the Calvin-Benson-Bassham (CBB) cycle. These putative autotrophs were found in 11% of sediment_SAGs and 38% of lake_SAGs (Fig. 4), suggesting that both habitats have the potential to support primary production. Although some primary producers were exclusive to each habitat (e.g., the genera SURF-13 and 39-52-133 in the water column, and SPCO01 and UBA4592 in the sediments), Nitrotoga and Polaromonas were common to the water column and sediments (Fig. 5a). In addition, we observed a strong association between the CBB cycle and the cbb₃ type cytochrome c oxidase (Fig. 5b, c). Cells with the cyt-cbb₃ type (27 out of 102 SAGs; 26.5%) were more likely to possess the CBB cycle than those with the cyt-aa₃ type (26 of 610; 4.3%). For example, Nitrotoga, the most abundant autotrophic genus in the SAG datasets (33 out of 77 cells), was exclusively associated with the cyt-cbb₃ type (Supplementary Data 8). SAGs containing both the cyt-cbb₃ type and the CBB cycle were frequently linked to the oxidation of reduced sulfur compounds (Fig. 5b, c). These findings suggest that chemolithoautotrophs harboring the cyt-cbb₃ type may become more competitive under low-oxygen conditions, where the activity of the cbb₃-type cytochrome c oxidase is optimal. Only 7 out of the 77 SAGs (6 are Nitrotoga) possessed both the CBB cycle and nitrate reductase, indicating that carbon fixation can occur when the SLM environment becomes anoxic. The major primary producers identified in this study (e.g., 31 Nitrotoga SAGs, 9 Polaromonas SAGs, and 7 SURF-13 SAGs) are all considered mixotrophs since they possessed both organotrophic pathways (e.g., EMP pathway, TCA cycle, or β-oxidation) and the CBB cycle (Fig. 5a). Interestingly, mixotrophic Polaromonas species are also considered major contributors to primary production in groundwater environments⁵⁷, but not all are able to fix carbon dioxide (e.g., supraglacial Polaromonas species relying on existing organic carbon for growth while using carbon monoxide as an electron donor⁵⁸). These taxa are predicted to be capable of oxidizing reduced sulfur compounds and Fe²⁺ through both aerobic and nitrate-based anaerobic respiration, representing a range of trophic strategies spanning from heterotrophy to chemoautotrophy. This metabolic versatility could provide microbes in SLM with the capacity to physiologically adapt to the dynamic changes of oligotrophic subglacial lake environments associated with periodic fill-drainage events.

Approximately 15% of the 1,374 SAGs discovered in SLM belonged to Patescibacteria, and the majority of these cells were isolated from 0.2 µm pore size filtrates that indicate their ultrasmall cell size. Patescibacteria also exhibited the smallest genome sizes compared to other bacterial phyla present in SLM (Supplementary Fig. 32). Both the abundance and genome reduction indicate that Patescibacteria are competitive to survive in the oligotrophic SLM environment. Genome reduction has evolutionarily resulted from streamlining genomes involved in energy-demanding molecular functions under natural selection⁵⁹. Therefore, small microbes like Patescibacteria suffer from leaky metabolisms in terms of energy conservation and biosynthesis. Comparative genomic analysis in this study revealed that Patescibacteria in SLM possess rudimentary energy metabolisms (Fig. 6), which is consistent to other studies^{60,61,62,63,64,65}. Since the complete pathways for the TCA cycle and the electron transport chain were not detected, Patescibacteria in SLM probably ferment organic substrates to conserve energy. These bacteria are likely to uptake sugar molecules via transporters and degrade them using glycoside hydrolases. However, metabolizing glucose to pyruvate seems not possible since phosphofructokinase was not present in the Patescibacteria SAGs. While the possibility that salvaging glucose-6-phosphate into fructose-6-phosphate and glyceraldehyde-3-phosphate can be proposed⁶⁰, this metabolic shunt would be scarce since the non-oxidative pentose phosphate pathway was complete in only a few Patescibacteria SAGs. Given the presence of pyruvate kinase in 30 Patescibacteria SAGs, ATP generation through the second half of glycolysis could occur if glyceraldehyde-3-phosphate or other three-carbon glycolytic intermediates are supplied from unknown pathways or external sources^63,66. In addition, the pathway that ferments pyruvate to acetate is unlikely to occur since the acetate kinase gene was only found in three Patescibacteria SAGs. Although various hypotheses have been suggested (e.g., capturing protons from host cells and generating ATP using F-type ATPase)⁶³, the cultivation of Patescibacterial strains is necessary to elucidate their energy metabolisms⁶⁷. Given their limited biosynthetic capabilities (e.g., amino acids, lipids, and nucleotides), Patescibacteria likely rely on close associations with co-occurring microbes in SLM to sustain their survival. The widespread presence of genes encoding type IV pilus assembly proteins supports their epi-symbiotic potential for cell-to-cell interactions with host bacteria. Additionally, the high abundance of competence proteins, nucleases, and peptidases may compensate for their metabolic deficiencies by enabling the breakdown of macromolecules and the acquisition of essential building blocks for growth. These metabolic dependencies highlight the communal lifestyle of the subglacial Patescibacteria. However, the rare detection of haemolysin genes suggests that a small proportion of these bacteria could adopt a parasitic strategy⁶⁵. In contrast to Patescibacteria isolated from low-oxygen groundwater samples⁶¹, the frequent occurrence of CRISPR spacers in patescibacterial SAGs implies their potential mutualistic role, in which they may act as decoys to divert phage infections away from their host cells⁶³. Collectively, Antarctic subglacial Patescibacteria likely exhibit a continuum of symbiotic relationships, ranging from communal to parasitic and mutualistic interactions, as previously reported in freshwater Candidate Phyla Radiation⁶⁴.

In conclusion, the metabolic pathways we have inferred from SLM genomes reveal that microorganisms in the subglacial ecosystem have the capacity to conserve energy through the oxidation of various organic and inorganic compounds (e.g., reduced sulfur and Fe²⁺) with oxygen, nitrate, or Fe³⁺ as electron acceptors. Oxygen availability has played a key role in shaping the microbial community and influences the metabolic processes occurring in SLM. Comparative analysis of functional and evolutionary divergence in the single-cell genomes from SLM supports a contention for genetic isolation of the subglacial populations from those in contemporary surface and marine biomes. Despite a marine incursion at the current location of SLM 6300 years ago¹², its SAGs are not related to those of contemporary marine microorganisms. If the microorganisms in the lake have an upglacial source from the ice sheet interior, then SLM may be a constituent of a metacommunity that has evolved in relative isolation beneath WAIS. Examining organismal and genetic flow between SLM, SLW, and the other subglacial lakes in this region could determine if there is metacommunity structure across discrete basins beneath the Antarctic ice sheet.

Methods

Sampling of lake water and sediment from SLM

The lake was directly accessed through a ~0.4 m diameter borehole melted through 1,087 m of ice with an environmentally clean, hot water drilling system^16,23. Using a Large Volume Water Transfer System (WTS-LV; McLane Research Laboratories Inc.), lake water was sampled from the first cast (LV1) at mid-depth of the ~15 m water column and concentrated in-situ using filters with pore sizes of 3.0, 0.8, and 0.2 µm²³. A sediment catcher attached to the base of the WTS-LV frame was used to collect bulk surface (BS) sediment. Sediment cores were obtained using a multicorer (UWITEC)⁶⁸. The second core of the first multicore cast (MC1B) was horizontally sectioned at 2 cm intervals, and inner portions from each interval were collected for molecular and single-cell genome analysis. All samples were collected aseptically and immediately cryopreserved at −80 °C as described previously^16,17.

Microbial sample treatments and single-cell genomics

Triplicate lake water samples (1 ml each) were supplemented with 5% v/v glycerol, 1× TE buffer (final concentrations) and stored at −80 °C until analysis. Approximately 5 g of sediment sample was mixed with 20 ml sterile-filtered PBS, vortexed for 30 s, and centrifuged for 30 s at 2500 g to remove large particles. The resulting supernatant was treated with the GlyTE buffer as described above. Samples for cell sorting were pre-screened through a 40μm mesh size cell strainer (Becton Dickinson) and stained with SYTO 9 (5 μM, Thermo Fisher Scientific) for 1 h. Single-cell sorting using fluorescence-activated cell sorting (FACS) was followed by whole genome amplification (WGA) and was carried out at the Single Cell Genomic Center at Bigelow Laboratory for Ocean Sciences (SCGC) as previously described²⁷. Briefly, stained and filtered samples were processed on a BD Influx flow cytometer equipped with a 488 nm laser. Sorting gates were configured to isolate single cells while minimizing background signals (Supplementary Fig. 33). The sorted cells were then subjected to single-cell whole genome amplification for downstream genomic analysis.

The libraries were prepared using the TruSeq Nano DNA High Throughput Library Prep Kit (Illumina) and following the manufacturer’s protocols. The products generated were purified and PCR-amplified. The library pool was quantified using a KAPA qPCR library quantification kit (KAPA Biosystems, Wilmington, MA, USA), and library quality was assessed using the Agilent Technologies 4200 TapeStation D1000 ScreenTape (Agilent technologies). Sequencing was carried out using the HiSeq X ten system (Illumina).

Genome assembly and annotation

The Illumina sequencing raw reads were trimmed and filtered using Trimmomatic v.0.39 with the following parameters: ILLUMINACLIP:TruSeq3-PE.fa:2:30:10:2:keepBothReads LEADING:0 TRAILING:5 SLIDINGWINDOW:4:15 MINLEN:36. Low complexity reads containing less than 5% of any nucleotide were discarded²⁷. The remaining reads were mapped to a human reference genome (GRCh38) using BWA v0.7.17⁶⁹, and reads mapping with 95% or more similarity were considered human DNA and removed. To improve assembly quality⁷⁰, the reads were normalized in silico using kmernorm v1.05 (https://sourceforge.net/projects/kmernorm/) with the following options: -k 21 -t 30 -c 3.

Genome assembly was conducted using SPAdes v3.13.0 and the following parameters: --sc --careful⁷⁰. The 100 bp end of each resulting contig was trimmed, and contigs shorter than 2 kbp were removed. Both completeness and contamination of assemblies were evaluated by CheckM v1.1.3²⁸, and their taxonomic positions were determined by GTDB-Tk v2.4.3 with GTDB r220^29,71. After discarding genomes with either >5% contamination or failure of taxonomic assignment, the remaining genomes were further screened using the NCBI’s Foreign Contamination Screen tool suite to remove contaminant sequences⁷². Then, the quality and taxonomy of the genomes were reexamined by running CheckM and GTDB-Tk. Structural and functional annotations were conducted using Prokka v1.13⁷³. To predict gene function, translated protein sequences were BLASTP-searched against KEGG DB release 2020-03-23⁷⁴, and KEGG orthologs (KO) of top-scored subject sequences were assigned to query proteins when alignments between query and subject sequences had >30% sequence similarity and >70% query coverage⁷⁵. To distinguish NxrA from NarG that share the same KEGG ortholog (K00370), protein sequences from the SAGs were searched using DIAMOND v2.1.11.165 against curated NxrA and NarG sequences in NCycDB⁵⁴. To detect genes involved in iron metabolism and viral sequences, the single-cell genome sequences were analyzed using FeGenie v1.0⁷⁶ and VirSorter v1.0.6³³, respectively. For sulfur-metabolizing genes absent in the KEGG database, BLAST searches were conducted using reference sequences from the literature⁷⁷. The average nucleotide identity of genomes compared was calculated using orthoANIu v1.2 and default options⁷⁸.

Phylogenomic reconstruction of single-cell genomes

To determine the phylogeny of the bacterial SAGs, 120 marker genes were aligned with the GTDB multiple sequence alignment using GTDB-Tk v2.4.3²⁹, and a phylogenomic tree was reconstructed using RAxML v8.2.12⁷⁹, 100 nonparametric bootstrap replicates, and a PROTGAMMAWAG substitution model⁷¹. The bacterial tree was rooted by the most recent common ancestor of the Candidate Phyla Radiation (i.e., Patescibacteria), which are deep-branching phyla in domain Bacteria⁸⁰. An unrooted phylogenomic tree of the archaeal SAGs was similarly constructed but using an archaeal GTDB alignment consisting of 53 marker genes. The phylogenetic trees generated were visualized using either Dendroscope v3.5.10⁸¹ or ETE v.3.1.3⁸².

Estimating taxonomic and functional diversity

CHAO1 was used to estimate taxonomic richness and functional richness for the genomic occurrence of every KEGG ortholog. Since the ANI calculation between the incomplete SAG genomes often failed, we used GTDB genera or higher rank taxonomic annotation as operational taxonomic units instead of de novo SAG clusters determined by the ANI. For both taxonomic and functional richness, the calculations to generate rarefaction curves with 95% confidence intervals using 100 iterative randomizations were made by in-house Python scripts.

Comparing 16S rRNA gene sequences from SAGs to amplicon data from SLM, SLW and WGZ

Sequences of the V4 region for 16S rRNA genes from SLM (bioproject PRJNA790995), SLW (PRJNA244335), and WGZ (PRJNA869494) were retrieved from NCBI¹⁷. The sequences were trimmed using Cutadapt v4.2⁸³ and analyzed with DADA2 v1.25.2. This included quality filtering, error correction, chimera removal, and taxonomic annotation with SILVA DB v138.1⁸⁴. The resulting ASVs were BLASTn-searched against the 16S rRNA gene sequences from SAGs with thresholds of >70% query coverage, >200 bit-score, and <1% sequence dissimilarity. A Venn diagram was used to compare the number of ASVs shared among the three sites. The ASVs were further clustered into OTUs with >97% sequence similarity using VSEARCH v2.24⁸⁵, compared to the 16S rRNA gene sequences of SAGs by Bray-Curtis dissimilarity, and visualized with nonmetric multidimensional scaling (NMDS) of samples using vegan v2.6-4 in R v4.3.1. Differences in community structures between microbial habitats, as well as between 16S rRNA and SAGs, were statistically assessed using the PERMANOVA method with 10,000 iterations.

Evaluation in the divergence of SAGs and functional genes

For all possible pairs between SAGs and the GTDB representative genomes, ANI values were calculated using orthoANIu⁷⁸. When less than 10% of the SAG sequence aligned with any GTDB genomes, the SAG was considered a ‘No match’. Phylogenetic distance of predicted SAG proteins was estimated by clustering sequences with a 95% similarity cutoff using CD-HIT v4.8.1⁸⁶. The longest translated amino acid sequence from each protein cluster was BLASTp-searched against the NCBI NR DB released on 27 September 2021. The sequences with no significant hits at the E-value of 10 were excluded in the calculation of the mean and mode. To compare SAG protein sequence divergence from known homologous proteins, the longest amino acid sequence from each cluster and the best hit resulting from each BLASTp search were analyzed. To compare sequence divergence in the SAGs from SLM, SAG data derived from previous studies were analyzed using the same procedure. The following criteria were used to select appropriate data: (1) > 30 SAGs were produced to allow robust statistical analysis; (2) SAGs publicly available at the time of data survey in November, 2021; (3) single cells were randomly chosen and sequenced; (4) the study site was a natural habitat; and (5) in order to avoid self-hits, environmental studies whose protein sequences were already recorded in the NCBI NR DB were not chosen. Screening of metadata and sequence comparisons (~100% identity) showed that the only dataset fulfilling the above criteria was a single-cell genomic study of seawater beneath Ross Ice Shelf³⁵ which was chosen for the analysis. Genome quality was evaluated using CheckM with the threshold of <5% contamination. ANI values between RIS and GTDB genomes were also calculated using orthoANIu. Since the RIS’s SAGs were included in the GTDB r220, self-comparisons were excluded during ANI calculations. The proteins of RIS’s SAGs were structurally annotated, clustered, and compared with the extant proteins as described above.

Phylogenomic tree reconstruction for the 19 most abundant genera

For genera with a relative abundance >1% of the SAGs, the GenBank assembly accessions of genomes were extracted from the GTDB metadata and joined with those retrieved from NCBI. Given the possibility that a SAG may represent a novel genus, we conducted a broader taxonomic sampling that included all GTDB genomes from the same family as each SAGs and an outgroup phylogenetically external to the genomes analyzed. A rooted phylogenetic tree was constructed using FastTree v2.1.11³² of de_novo_wf of GTDB-Tk, and the node confidence was evaluated using the Shimodaira-Hasegawa test⁸⁷. Based on the node of the tree representing the most recent common ancestor of the genus of interest and its neighboring genera, a subtree was extracted using ETE. In the subtree, the monophyly of the SAGs was evaluated using the genealogical sorting index (GSI) with 10,000 permutations³⁴. Protein sequences from the GTDB genomes identified as sister taxa in each phylogenomic tree were functionally annotated using BLASTp searches against the KEGG database, as described above.

Extracting environmental sources of public genomes from metadata

The environmental source of genomes deposited in public databases such as NCBI is often omitted and described using non-standardized terminology. To extract the source environments from metadata for large set of genomes, we employed a two-steps procedure. At first, habitat-describing keywords (Supplementary Data 3) were searched against genomes in all bioprojects and biosamples of NCBI. When a keyword is matched, a corresponding habitat “flag” turned on (e.g., the flag for freshwater includes the keywords lake and lentic). The combination of the flags with ‘on’ was used to classify habitats for each genome for which metadata were available (Supplementary Data 4). Referring to GOLD Ecosystem Classification paths⁸⁸ and a previous work⁶², we designed habitats, habitat flags, and their associated keywords.

Sequence-based determination of metabolic function

We identified the phylogenetically distinct forms of dissimilatory sulfite oxidase (large subunit, DsrA) that catalyze sulfide oxidation or sulfite reduction, allowing prediction of the likely reaction direction catalyzed by the queried gene products. DsrA sequences from cultivated sulfur oxidizers and reducers available in public databases^89,90 were combined with those identified in the SAGs, aligned using ClustalW v2.1⁹¹, and a maximum-likelihood tree was constructed using RAxML with 100 bootstrap replicates and a protein substitution model determined using ModelTest-NG⁹². Based on the tree topology, the DsrA sequences of SAGs were classified as either preferentially catalyzing sulfide oxidation or sulfite reduction. In a similar manner, the form I of aerobic carbon monoxide oxidase (CoxL) that likely functions in CO oxidation was identified by a clear phylogenetic separation from the functionally unknown form II of CoxL. The form I and II sequences were analyzed as described previously⁴⁰.

Determination of metabolic potential encoded in the SAGs

The potential for a SAG to encode a single-step pathway was determined by the presence or absence of genes responsible for that reaction (Supplementary Data 6). For example, a SAG is considered to exhibit sulfite oxidation when one or more of the following genes were found: dsrAB, fsrNC, and hdr-like. A set of genes (e.g., dsrAB) was considered present in a SAG when half or more of the genes were detected. Accessory subunits (e.g., dsrL) were excluded in these determinations. The potential for a SAG to encode a multiple-step pathway (e.g., glycolysis) was determined by the presence or absence of genes encoding key enzymes. Since the SAG data do not represent complete genomes, an additional condition was that a SAG possess genes for >50% pathway completeness (Supplementary Data 7). Extensive surveys of literature^74,77,93,94 and KEGG metabolic pathways guided efforts that evaluated the genes encoding enzymes involved in key metabolic reactions and multiple-step pathways (Supplementary Data 6-7). The proportion of SAGs with a metabolic pathway of interest was estimated to correct for genome incompleteness using the method described by Acker et al. (2022)³⁶. In essence, the observed proportion was adjusted by the ratio between the expected²⁸ and observed genome sizes.

Artificial reduction of GTDB genomes closest to SAGs

To simulate the quantitative impact of low genome completeness on metabolic potential, each of GTDB genomes closest to the SAGs based on ANI distance was artificially reduced to match the genome completeness of its corresponding SAG using the following procedure: (1) contigs were randomly shuffled; (2) to preserve operon structure, contigs were selected sequentially from the first until the cumulative genome completeness matched that of the SAG counterpart; and (3) in the final contig, a DNA fragment containing the required number of genes to achieve the genome completeness was randomly selected to pursue equal probability of gene inclusion regardless their positions on the contig. Metabolic pathways of the GTDB genomes were predicted by BLASTP-searching translated protein sequences against KEGG DB release 2020-03-23⁷⁴. KEGG orthologs (KO) were assigned to query proteins when alignments between query and subject sequences had >30% sequence similarity and >70% query coverage⁷⁵. Only 70 out of the 80 metabolic pathways surveyed in this study were analyzed since the remaining 10 were either based on non-KEGG annotations or were entirely undetected in the GTDB genomes. Statistical significance was assessed using the Monte Carlo permutation test (1,000 permutations), with p-values corrected for false discovery rate (FDR).

Genomic co-occurrence network of biogeochemically-relevant genes

For each of the 1,374 SAGs, a profile was prepared that described the presence or absence of key metabolic functions of ecological importance. For each functional category, co-occurrence was examined across the profiles by evaluating the statistical significance of the hypergeometric distribution and a threshold E-value of 0.1 using CoNet v1.1.1⁹⁵ of Cytoscape v3.9.0. Results for significant co-occurrences were used to construct a network that was divided into individual metabolic guilds using the fast agglomerate algorithm FAG-EC of ClusterViz v1.0.3⁹⁶. The co-occurrence analysis was conducted separately for lake water and sediment SAGs.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The genome sequence data generated in this study have been deposited in NCBI under the project PRJNA1084198. Source data are provided within this paper. Source data are provided with this paper.

Code availability

The custom Python scripts used in this study are available at the following repositories: 1) habitat classification of microorganisms based on GTDB and NCBI metadata at https://github.com/kyuinHwang/genome-habitat-classification-pipeline, 2) KO-based inference of metabolic potential in microbial genomes at https://github.com/kyuinHwang/ko2pathway, and 3) microbial genome reduction to specified completeness levels at https://github.com/kyuinHwang/genome-completeness-reduction-pipeline.

References

Horgan, H. J. et al. Subglacial Lake Whillans—Seismic observations of a shallow active reservoir beneath a West Antarctic ice stream. Earth Planet. Sci. Lett. 331, 201–209 (2012).
Article ADS Google Scholar
Fricker, H. A., Scambos, T., Bindschadler, R. & Padman, L. An active subglacial water system in West Antarctica mapped from space. Science 315, 1544–1548 (2007).
Article ADS CAS PubMed Google Scholar
Wadham, J. L. et al. The potential role of the Antarctic Ice Sheet in global biogeochemical cycles. Earth Environ. Sci. Trans. R. Soc. Edinb. 104, 55–67 (2013).
CAS Google Scholar
Venturelli, R. et al. Mid-Holocene grounding line retreat and readvance at Whillans Ice Stream, West Antarctica. Geophys. Res. Lett. 47, e2020GL088476 (2020).
Article ADS Google Scholar
Hodson, T. et al. Physical processes in Subglacial Lake Whillans, West Antarctica: inferences from sediment cores. Earth Planet. Sci. Lett. 444, 56–63 (2016).
Article ADS CAS Google Scholar
Kingslake, J. et al. Extensive retreat and re-advance of the West Antarctic Ice Sheet during the Holocene. Nature 558, 430–434 (2018).
Article ADS CAS PubMed Google Scholar
Carter, S., Fricker, H. & Siegfried, M. Evidence of rapid subglacial water piracy under Whillans Ice Stream, West Antarctica. J. Glaciol. 59, 1147–1162 (2013).
Article ADS Google Scholar
Michaud, A. B. et al. Solute sources and geochemical processes in Subglacial Lake Whillans, West Antarctica. Geology 44, 347–350 (2016).
Article ADS CAS Google Scholar
Siegfried, M. et al. The life and death of a subglacial lake in West Antarctica. Geology 51, 434–438 (2023).
Article ADS Google Scholar
Vick-Majors, T. J. et al. Biogeochemical connectivity between freshwater ecosystems beneath the West Antarctic Ice Sheet and the sub-ice marine environment. Glob. Biogeochem. Cycles 34, e2019GB006446 (2020).
Article ADS Google Scholar
Siegfried, M. R., Fricker, H. A., Carter, S. P. & Tulaczyk, S. Episodic ice velocity fluctuations triggered by a subglacial flood in West Antarctica. Geophys. Res. Lett. 43, 2640–2648 (2016).
Article ADS Google Scholar
Venturelli, R. A. et al. Constraints on the timing and extent of deglacial grounding line retreat in West Antarctica. AGU Adv. 4, e2022AV000846 (2023).
Article ADS Google Scholar
Lau, S. C. Y. et al. Genomic evidence for West Antarctic Ice Sheet collapse during the Last Interglacial. Science 382, 1384–1389 (2023).
Article ADS CAS PubMed Google Scholar
Grieman, M. M. et al. Abrupt Holocene ice loss due to thinning and ungrounding in the Weddell Sea Embayment. Nat. Geosci. 17, 227–232 (2024).
Article ADS CAS Google Scholar
Achberger, A. M. et al. Microbial community structure of subglacial lake Whillans, West Antarctica. Front. Microbiol. 7, 1457 (2016).
Article PubMed PubMed Central Google Scholar
Christner, B. C. et al. A microbial ecosystem beneath the West Antarctic ice sheet. Nature 512, 310–313 (2014).
Article ADS CAS PubMed Google Scholar
Davis, C. L. et al. Biogeochemical and historical drivers of microbial community composition and structure in sediments from Mercer Subglacial Lake, West Antarctica. ISME Commun. 3, 8 (2023).
Article PubMed PubMed Central Google Scholar
Purcell, A. M. et al. Microbial sulfur transformations in sediments from Subglacial Lake Whillans. Front. Microbiol. 5, 594 (2014).
Article PubMed PubMed Central Google Scholar
Michaud, A. B. et al. Microbial oxidation as a methane sink beneath the West Antarctic Ice Sheet. Nat. Geosci. 10, 582–586 (2017).
Article ADS CAS Google Scholar
Vick-Majors, T. J. et al. Physiological ecology of microorganisms in subglacial Lake Whillans. Front. Microbiol. 7, 1705 (2016).
Article PubMed PubMed Central Google Scholar
Vick-Majors, T., Achberger, A., Michaud, A. & Priscu, J. Metabolic and taxonomic diversity in antarctic subglacial environments. Life in Extreme Environments: Insights in Biological Capability: Ecological Reviews, 279–296 (2020).
Gill-Olivas, B. et al. Subglacial erosion has the potential to sustain microbial processes in Subglacial Lake Whillans, Antarctica. Commun. Earth Environ. 2, 134 (2021).
Article ADS Google Scholar
Priscu, J. C. et al. Scientific access into Mercer Subglacial Lake: scientific objectives, drilling operations and initial observations. Ann. Glaciol. 62, 340–352 (2021).
Article ADS Google Scholar
Hawkings, J. R. et al. Enhanced trace element mobilization by Earth’s ice sheets. Proc. Natl. Acad. Sci. 117, 31648–31659 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Woyke, T., Doud, D. F. R. & Schulz, F. The trajectory of microbial single-cell sequencing. Nat. Methods 14, 1045–1054 (2017).
Article CAS PubMed Google Scholar
Rinke, C. et al. Validation of picogram-and femtogram-input DNA libraries for microscale metagenomics. PeerJ 4, e2486 (2016).
Article PubMed PubMed Central Google Scholar
Stepanauskas, R. et al. Improved genome recovery and integrated cell-size analyses of individual uncultured microbial cells and viral particles. Nat. Commun. 8, 84 (2017).
Article ADS PubMed PubMed Central Google Scholar
Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 25, 1043–1055 (2015).
Article CAS PubMed PubMed Central Google Scholar
Chaumeil, P.-A., Mussig, A. J., Hugenholtz, P. & Parks, D. H. GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database. Bioinformatics 36, 1925–1927 (2019).
Article PubMed PubMed Central Google Scholar
Matsen, F. A., Kodner, R. B. & Armbrust, E. V. pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinforma. 11, 538 (2010).
Article Google Scholar
Czech, L., Stamatakis, A., Dunthorn, M. & Barbera, P. Metagenomic analysis using phylogenetic placement—a review of the first decade. Front. Bioinforma. 2, 44 (2022).
Article Google Scholar
Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree: computing large minimum evolution trees with profiles instead of a distance matrix. Mol. Biol. Evol. 26, 1641–1650 (2009).
Article CAS PubMed PubMed Central Google Scholar
Roux, S., Enault, F., Hurwitz, B. L. & Sullivan, M. B. VirSorter: mining viral signal from microbial genomic data. PeerJ 3, e985 (2015).
Article PubMed PubMed Central Google Scholar
Cummings, M. P., Neel, M. C. & Shaw, K. L. A genealogical approach to quantifying lineage divergence. Evolution 62, 2411–2422 (2008).
Article PubMed Google Scholar
Martínez-Pérez, C. et al. Phylogenetically and functionally diverse microorganisms reside under the Ross Ice Shelf. Nat. Commun. 13, 117 (2022).
Article ADS PubMed PubMed Central Google Scholar
Acker, M. et al. Phosphonate production by marine microbes: exploring new sources and potential function. Proc. Natl. Acad. Sci. 119, e2113386119 (2022).
Article CAS PubMed PubMed Central Google Scholar
Schloter, M., Lebuhn, M., Heulin, T. & Hartmann, A. Ecology and evolution of bacterial microdiversity. FEMS Microbiol. Rev. 24, 647–660 (2000).
Article CAS PubMed Google Scholar
Bowers, R. M. et al. Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nat. Biotechnol. 35, 725–731 (2017).
Article CAS PubMed PubMed Central Google Scholar
Jung, M.-Y., Islam, M. A., Gwak, J.-H., Kim, J.-G. & Rhee, S.-K. Nitrosarchaeum koreense gen. nov., sp. nov., an aerobic and mesophilic, ammonia-oxidizing archaeon member of the phylum Thaumarchaeota isolated from agricultural soil. Int. J. Syst. Evolut. Microbiol. 68, 3084–3095 (2018).
Article CAS Google Scholar
King, G. M. & Weber, C. F. Distribution, diversity and ecology of aerobic CO-oxidizing bacteria. Nat. Rev. Microbiol. 5, 107–118 (2007).
Article CAS PubMed Google Scholar
Campbell, T. D. et al. Dynamic subglacial meltwater history archived in Antarctic subglacial lake sediments. GSA Bull. 137, 3055–3068 (2025).
Article Google Scholar
Jain, C., Rodriguez-R, L. M., Phillippy, A. M., Konstantinidis, K. T. & Aluru, S. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat. Commun. 9, 5114 (2018).
Article ADS PubMed PubMed Central Google Scholar
Biddle, J. F. et al. Prospects for the study of evolution in the deep biosphere. Front. Microbiol. 2, 285 (2012).
Article PubMed PubMed Central Google Scholar
Naish, T. et al. Obliquity-paced Pliocene West Antarctic ice sheet oscillations. Nature 458, 322–328 (2009).
Article ADS CAS PubMed Google Scholar
McKay, R. et al. Pleistocene variability of Antarctic ice sheet extent in the Ross embayment. Quat. Sci. Rev. 34, 93–112 (2012).
Article ADS Google Scholar
Weller, C. & Wu, M. A generation-time effect on the rate of molecular evolution in bacteria. Evolution 69, 643–652 (2015).
Article ADS CAS PubMed Google Scholar
Bonnissel-Gissinger, P., Alnot, M., Ehrhardt, J.-J. & Behra, P. Surface oxidation of pyrite as a function of pH. Environ. Sci. Technol. 32, 2839–2845 (1998).
Article ADS CAS Google Scholar
Kappler, A. et al. An evolving view on biogeochemical cycling of iron. Nat. Rev. Microbiol. 19, 360–374 (2021).
Article CAS PubMed Google Scholar
Siegert, M. J. et al. Physical, chemical and biological processes in Lake Vostok and other Antarctic subglacial lakes. Nature 414, 603–609 (2001).
Article ADS CAS PubMed Google Scholar
Wilhelm, L. et al. Rare but active taxa contribute to community dynamics of benthic biofilms in glacier-fed streams. Environ. Microbiol. 16, 2514–2524 (2014).
Article CAS PubMed Google Scholar
Achberger, A. M. et al. Microbiology of subglacial environments. in Psychrophiles: from biodiversity to biotechnology 83–110 (Springer, 2017).
Boyd, E. S. et al. Diversity, abundance, and potential activity of nitrifying and nitrate-reducing microbial assemblages in a subglacial ecosystem. Appl. Environ. Microbiol. 77, 4778–4787 (2011).
Article ADS CAS PubMed PubMed Central Google Scholar
Boddicker, A. M. & Mosier, A. C. Genomic profiling of four cultivated Candidatus Nitrotoga spp. predicts broad metabolic potential and environmental distribution. ISME J. 12, 2864–2882 (2018).
Article CAS PubMed PubMed Central Google Scholar
Tu, Q., Lin, L., Cheng, L., Deng, Y. & He, Z. NCycDB: a curated integrative database for fast and accurate metagenomic profiling of nitrogen cycling genes. Bioinformatics 35, 1040–1048 (2019).
Article CAS PubMed Google Scholar
Schott, J., Griffin, B. M. & Schink, B. Anaerobic phototrophic nitrite oxidation by Thiocapsa sp. strain KS1 and Rhodopseudomonas sp. strain LQ17. Microbiology 156, 2428–2437 (2010).
Article CAS PubMed Google Scholar
Chin, K.-J., Liesack, W. & Janssen, P. H. Opitutus terrae gen. nov., sp. nov., to accommodate novel strains of the division’Verrucomicrobia’isolated from rice paddy soil. Int. J. Syst. Evolut. Microbiol. 51, 1965–1968 (2001).
Article CAS Google Scholar
Taubert, M. et al. Bolstering fitness via CO2 fixation and organic carbon uptake: mixotrophs in modern groundwater. ISME J. 16, 1153–1162 (2022).
Article CAS PubMed Google Scholar
Franzetti, A. et al. Light-dependent microbial metabolisms drive carbon fluxes on glacier surfaces. ISME J. 10, 2984–2988 (2016).
Article CAS PubMed PubMed Central Google Scholar
Morris, J. J., Lenski, R. E. & Zinser, E. R. The Black Queen Hypothesis: evolution of dependencies through adaptive gene loss. MBio 3, e00036–12 (2012).
Article PubMed PubMed Central Google Scholar
Chaudhari, N. M. et al. The economical lifestyle of CPR bacteria in groundwater allows little preference for environmental drivers. Environ. Microbiome 16, 24 (2021).
Article CAS PubMed PubMed Central Google Scholar
Tian, R. et al. Small and mighty: adaptation of superphylum Patescibacteria to groundwater environment drives their genome simplicity. Microbiome 8, 51 (2020).
Article CAS PubMed PubMed Central Google Scholar
Luef, B. et al. Diverse uncultivated ultra-small bacterial cells in groundwater. Nat. Commun. 6, 6372 (2015).
Article ADS CAS PubMed Google Scholar
Castelle, C. J. et al. Biosynthetic capacity, metabolic variety and unusual biology in the CPR and DPANN radiations. Nat. Rev. Microbiol. 16, 629–645 (2018).
Article CAS PubMed Google Scholar
Chiriac, M.-C. et al. Ecogenomics sheds light on diverse lifestyle strategies in freshwater CPR. Microbiome 10, 84 (2022).
Article CAS PubMed PubMed Central Google Scholar
Moreira, D., Zivanovic, Y., López-Archilla, A. I., Iniesto, M. & López-García, P. Reductive evolution and unique predatory mode in the CPR bacterium Vampirococcus lugosii. Nat. Commun. 12, 2454 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Kadnikov, V. V., Mardanov, A. V., Beletsky, A. V., Karnachuk, O. V. & Ravin, N. V. Complete genome of a member of a new bacterial lineage in the microgenomates group reveals an unusual nucleotide composition disparity between two strands of DNA and limited metabolic potential. Microorganisms 8, 320 (2020).
Article CAS PubMed PubMed Central Google Scholar
He, X. et al. Cultivation of a human-associated TM7 phylotype reveals a reduced genome and epibiotic parasitic lifestyle. Proc. Natl. Acad. Sci. 112, 244–249 (2015).
Article ADS CAS PubMed Google Scholar
Rosenheim, B. E. et al. A method for successful collection of multicores and gravity cores from Antarctic subglacial lakes. Limnol. Oceanogr.: Methods 21, 279–294 (2023).
Article Google Scholar
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. bioinformatics 25, 1754–1760 (2009).
Article CAS PubMed PubMed Central Google Scholar
Bankevich, A. et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19, 455–477 (2012).
Article MathSciNet CAS PubMed PubMed Central Google Scholar
Parks, D. H. et al. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat. Biotechnol. 36, 996–1004 (2018).
Article CAS PubMed Google Scholar
Astashyn, A. et al. Rapid and sensitive detection of genome contamination at scale with FCS-GX. Genome Biol. 25, 60 (2024).
Article CAS PubMed PubMed Central Google Scholar
Seemann, T. Prokka: rapid prokaryotic genome annotation. Bioinformatics 30, 2068–2069 (2014).
Article CAS PubMed Google Scholar
Kanehisa, M. & Goto, S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27–30 (2000).
Article CAS PubMed PubMed Central Google Scholar
Kang, I., Kim, S., Islam, M. R. & Cho, J.-C. The first complete genome sequences of the acI lineage, the most abundant freshwater Actinobacteria, obtained by whole-genome-amplification of dilution-to-extinction cultures. Sci. Rep. 7, 42252 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Garber, A. I. et al. FeGenie: A comprehensive tool for the identification of iron genes and iron gene neighborhoods in genome and metagenome assemblies. Front. Microbiol. 11, 499513 (2020).
Article Google Scholar
Vigneron, A. et al. Genomic evidence for sulfur intermediates as new biogeochemical hubs in a model aquatic microbial ecosystem. Microbiome 9, 46 (2021).
Article CAS PubMed PubMed Central Google Scholar
Yoon, S.-H., Ha, S. -m, Lim, J., Kwon, S. & Chun, J. A large-scale evaluation of algorithms to calculate average nucleotide identity. Antonie Van. Leeuwenhoek 110, 1281–1286 (2017).
Article CAS PubMed Google Scholar
Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014).
Article CAS PubMed PubMed Central Google Scholar
Bokhari, R. H. et al. Bacterial origin and reductive evolution of the CPR Group. Genome Biol. Evol.12, 103–121 (2020).
Article CAS PubMed PubMed Central Google Scholar
Huson, D. H. & Scornavacca, C. Dendroscope 3: an interactive tool for rooted phylogenetic trees and networks. Syst. Biol. 61, 1061–1067 (2012).
Article PubMed Google Scholar
Huerta-Cepas, J., Serra, F. & Bork, P. ETE 3: reconstruction, analysis, and visualization of phylogenomic data. Mol. Biol. Evol. 33, 1635–1638 (2016).
Article CAS PubMed PubMed Central Google Scholar
Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet. J. 17, 10–12 (2011).
Article Google Scholar
Callahan, B. J. et al. DADA2: High-resolution sample inference from Illumina amplicon data. Nat. Methods 13, 581–583 (2016).
Article CAS PubMed PubMed Central Google Scholar
Rognes, T., Flouri, T., Nichols, B., Quince, C. & Mahé, F. VSEARCH: a versatile open source tool for metagenomics. PeerJ 4, e2584 (2016).
Article PubMed PubMed Central Google Scholar
Fu, L., Niu, B., Zhu, Z., Wu, S. & Li, W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28, 3150–3152 (2012).
Article CAS PubMed PubMed Central Google Scholar
Shimodaira, H. & Hasegawa, M. Multiple comparisons of log-likelihoods with applications to phylogenetic inference. Mol. Biol. Evol. 16, 1114–1116 (1999).
Article CAS Google Scholar
Ivanova, N. et al. A call for standardized classification of metagenome projects. Environ. Microbiol. 12, 1803–1805 (2010).
Article PubMed Google Scholar
Tully, B. J., Wheat, C. G., Glazer, B. T. & Huber, J. A. A dynamic microbial community with high functional redundancy inhabits the cold, oxic subseafloor aquifer. ISME J. 12, 1–16 (2018).
Article CAS PubMed Google Scholar
Dahl, C. et al. Novel genes of the dsr gene cluster and evidence for close interaction of Dsr proteins during sulfur oxidation in the phototrophic sulfur bacterium Allochromatium vinosum. J. Bacteriol. 187, 1392–1404 (2005).
Article CAS PubMed PubMed Central Google Scholar
Thompson, J. D., Higgins, D. G. & Gibson, T. J. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673–4680 (1994).
Article CAS PubMed PubMed Central Google Scholar
Darriba, D. et al. ModelTest-NG: a new and scalable tool for the selection of DNA and protein evolutionary models. Mol. Biol. Evol. 37, 291–294 (2020).
Article MathSciNet CAS PubMed Google Scholar
Graham, E., Heidelberg, J. & Tully, B. Potential for primary productivity in a globally-distributed bacterial phototroph. ISME J. 12, 1861–1866 (2018).
Article CAS PubMed PubMed Central Google Scholar
Kuypers, M. M., Marchant, H. K. & Kartal, B. The microbial nitrogen-cycling network. Nat. Rev. Microbiol. 16, 263–276 (2018).
Article CAS PubMed Google Scholar
Faust, K. & Raes, J. CoNet app: inference of biological association networks using Cytoscape. F1000Research 5, 1519 (2016).
Article PubMed PubMed Central Google Scholar
Wang, J. et al. ClusterViz: a cytoscape APP for cluster analysis of biological network. IEEE/ACM Trans. Comput. Biol. Bioinforma. 12, 815–822 (2014).
Article Google Scholar
Haran, T. et al. MEaSUREs MODIS Mosaic of Antarctica 2013-2014 (MOA2014) Image Map, Version 1. [Dataset]. Boulder, Colorado USA. NASA National Snow and Ice Data Center Distributed Active Archive Center. https://doi.org/10.5067/RNF17BP824UM (2018).
Smith, B. E., Fricker, H. A., Joughin, I. R. & Tulaczyk, S. An inventory of active subglacial lakes in Antarctica detected by ICESat (2003–2008). J. Glaciol. 55, 573–595 (2009).
Article ADS Google Scholar
Sauthoff, W. & Freer, B. Subglacial water flow paths beneath Whillans Ice Plain, West Antarctica (Version 1.0)[Data set]. NERC EDS UK Polar Data Centre. https://doi.org/10.5285/0df5d4e9-2fcd-4420-b403-24d76848a5a5 (2024).
Rignot, E., Mouginot, J. & Scheuchl, B. MEaSUREs Antarctic Grounding Line from Differential Satellite Radar Interferometry, Version 2. [Dataset]. Boulder, Colorado USA. NASA National Snow and Ice Data Center Distributed Active Archive Center. https://doi.org/10.5067/IKBWW4RYHF1Q (2016).

Download references

Acknowledgements

We thank Dr. Kitae Kim at Korea Polar Research Institute for sharing his knowledge about chemical oxidation and Ms. Soyeon Kim for technical assistance. This work was supported by Korea Polar Research Institute (grant number PE18340 to OSK, PE20130 to OSK, PE25130 to OSK) and the US National Science Foundation, Section for Antarctic Sciences, Antarctic Integrated System Science program as part of the interdisciplinary (Subglacial Antarctic Lakes Scientific Access (SALSA): Integrated study of carbon cycling in hydrologically-active subglacial environments) project (NSF-OPP 1543537 to JCP and NSF-OPP 1543396 to BCC).

Author information

Christina L. Davis
Present address: Department of Natural Resource Sciences, McGill University, Ste-Anne-de-Bellevue, QC, Canada
These authors contributed equally: Kyung Mo Kim, Kyuin Hwang.

Authors and Affiliations

Division of Life Sciences, Korea Polar Research Institute, 26 Songdomirae-ro, Yeonsu-gu, Incheon, 21990, Republic of Korea
Kyung Mo Kim, Kyuin Hwang, Hanbyul Lee, Ahnna Cho & Ok-Sun Kim
Department of Microbiology and Cell Science, University of Florida, Gainesville, FL, USA
Christina L. Davis & Brent C. Christner
Department of Land Resources and Environmental Sciences-Emeritus, Montana State University, Bozeman, MT, USA
John C. Priscu
Desert Research Institute, Reno, NV, USA
John C. Priscu

Authors

Kyung Mo Kim
View author publications
Search author on:PubMed Google Scholar
Kyuin Hwang
View author publications
Search author on:PubMed Google Scholar
Hanbyul Lee
View author publications
Search author on:PubMed Google Scholar
Ahnna Cho
View author publications
Search author on:PubMed Google Scholar
Christina L. Davis
View author publications
Search author on:PubMed Google Scholar
Brent C. Christner
View author publications
Search author on:PubMed Google Scholar
John C. Priscu
View author publications
Search author on:PubMed Google Scholar
Ok-Sun Kim
View author publications
Search author on:PubMed Google Scholar

Contributions

K.M.K.: supervised research, analyzed data, wrote the original draft, and revised the manuscript. K.H.: analyzed data, wrote the original draft, and revised the manuscript. H.L.: revised the manuscript. A.C.: conducted sample analysis processing. C.L.D.: collected samples and revised the manuscript. B.C.C.: Project PI, acquired funding, collected samples, supervised research, and revised the manuscript. J.C.P.: Designed the study, was Chief Scientist, assisted with sample collection, and revised the manuscript. O.S.K.: Project PI, designed the study, acquired funding, supervised research, and revised the manuscript.

Corresponding authors

Correspondence to Kyung Mo Kim, John C. Priscu or Ok-Sun Kim.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks Mukan Ji Ji, Yongqin Liu, and the other, anonymous, reviewer for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Description of Additional Supplementary Files

Supplementary Data 1

Supplementary Data 2

Supplementary Data 3

Supplementary Data 4

Supplementary Data 5

Supplementary Data 6

Supplementary Data 7

Supplementary Data 8

Supplementary Data 9

Supplementary Data 10

Reporting summary

Transparent Peer Review file

Source data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Kim, K.M., Hwang, K., Lee, H. et al. Genetic isolation and metabolic complexity of an Antarctic subglacial microbiome. Nat Commun 16, 7501 (2025). https://doi.org/10.1038/s41467-025-62753-3

Download citation

Received: 11 December 2024
Accepted: 29 July 2025
Published: 18 August 2025
DOI: https://doi.org/10.1038/s41467-025-62753-3

Subjects

Abstract

Similar content being viewed by others

Introduction

Results

Microbial diversity of SLM SAGs

Evolutionary divergence of SLM SAGs

Metabolic pathway inferred from SAG data

Discussion

Methods

Sampling of lake water and sediment from SLM

Microbial sample treatments and single-cell genomics

Genome assembly and annotation

Phylogenomic reconstruction of single-cell genomes

Estimating taxonomic and functional diversity

Comparing 16S rRNA gene sequences from SAGs to amplicon data from SLM, SLW and WGZ

Evaluation in the divergence of SAGs and functional genes

Phylogenomic tree reconstruction for the 19 most abundant genera

Extracting environmental sources of public genomes from metadata

Sequence-based determination of metabolic function

Determination of metabolic potential encoded in the SAGs

Artificial reduction of GTDB genomes closest to SAGs

Genomic co-occurrence network of biogeochemically-relevant genes

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links