Background & Summary

The Arctic is undergoing the most rapid climate changes, warming by 1.9 °C over the last 30 years which is two to three times faster than the global average1,2,3. As a consequence, the Arctic marine ecosystem is rapidly changing with reductions in sea-ice cover and submarine permafrost thawing3,4,5,6. Environmental changes in the coastal marine ecosystem are also accelerated by the terrestrial impacts. For example, freshwater and sedimentary deposits from enhanced terrestrial runoff pose a threat to shallow estuaries and coastal benthic ecosystems of the continental shelf6,7,8.

Marine sediments are the largest organic carbon reservoirs, supporting rich and diverse benthic microbial communities9. Benthic microorganisms play key roles in biochemical cycles through diverse metabolisms, including the oxidation of organic matter, the production of carbon oxide and other hydrocarbons, and the removal of sulfates9,10,11. The structure and function of benthic microbial communities are influenced by various environmental factors12,13,14,15,16,17,18,19. Previous studies on the benthic microbial communities in the coastal areas in the Beaufort Sea, Greenland, and Svalbard in the Arctic have revealed a close relationship between these communities and environmental variables including organic matter or salinity, which are largely linked to terrestrial impacts16,17,18,19. Since changes in organic matter and salinity in coastal environments affect microbial communities and their metabolisms, which play an increasingly significant role in coastal biogeochemistry and carbon fluxes, it is important to characterize microbial communities in the Arctic seas across a wide range of continental shelves. This characterization is crucial for understanding and predicting the responses and functional changes of microbial communities along the marine environmental gradient20.

The East Siberian Sea (ESS) is the widest of the Arctic Ocean shelf seas and the shallowest, with a mean depth of 52 m21. It is mostly underlain by subsea permafrost and is experiencing rapid warming22. With a vast coastline and significant sea-ice coverage, the ESS receives substantial terrestrial inflows from coastal erosion and river runoff, accounting for approximately 2.2 million tons and 1.9 million tons of organic carbon, respectively22. Additionally, emissions of dissolved organic matter from the subsea permafrost, which contain substantial amounts of organic carbon (ranging from 943 to 2,240 g C m−2 yr−1 at the continuous-discontinuous transition zone of subsea permafrost and from 10 to 55 g C m−2 yr−1 in the remaining shelf and slope sites) underscore the crucial role of the ESS in Arctic climate dynamics22. These processes also drive shifts in benthic microbial communities19. Changes in organic matter inputs have led to the dominance of specific bacterial groups, affecting the decomposition and remineralization of organic carbon23. Furthermore, alterations in the structure of microbial communities impact their function in terms of nutrient cycling processes24. However, studies on microbial structures and functions, which would provide important insights into microbial contributions to nutrient cycling in the ESS, have not been performed yet.

Here, we present 16S rRNA gene amplicon and shotgun metagenome sequencing datasets from surface sediments from 13 and 7 sites, respectively, in the ESS, covering latitudes from 73°N to 77°N (Fig. 1a and Tables 13). A schematic representation of the metagenomic analysis in this study is shown in Fig. 1b. Taxonomic classification by 16S rRNA amplicon sequencing revealed that the bacterial community was dominated by the phyla Pseudomonadota (51.1 ± 6.6%) followed by Bacteroidota (16.4 ± 9.5%), Planctomycetota (8.9 ± 3.8%), Acidobacteriota (6.5 ± 4.2%), and Actinomycetota (2.6 ± 2.0%) (Fig. 2a). In the archaeal community, Thaumarchaeota (70.9 ± 11.1%) and Euryarchaeota (27.7 ± 11.4%) were predominant (Fig. 2b). Some microbial taxa showed significant changes along the latitude, with the proportion of Alphaproteobacteria and Acidobacteriota increasing, while those of Bacteroidota, Deltaproteobacteria, and Thaumarchaeota decreased (Fig. 2a,b). At the amplicon sequence variant (ASV) level, bacterial and archaeal communities exhibited distinct community patterns according to different water depth (100 m), with significant analysis of similarities (ANOSIM) R values of 0.83 and 0.75 (p < 0.0001), respectively (Fig. 2c,d).

Fig. 1
figure 1

Sampling location and sequencing approaches. (a) Geographic location of samples of marine sediments. The exclusive economic zone (EEZ) is labeled as a gray dashed line. (b) Schematic representation of 16S rRNA gene amplicon and shotgun metagenome sequencing analysis pipeline.

Table 1 Sample and sequencing information.
Table 2 16S rRNA amplicon sequencing dataset.
Table 3 Shotgun metagenome sequencing dataset.
Fig. 2
figure 2

The relative abundance of microbial communities at the phylum and class level for Pseudomonadota. (a) Bacterial community at the phylum and proteobacterial class level. Phyla classified as ‘Others’, which each have an average relative abundance of less than 0.5%, including 66 phyla such as Rhodothermota, Calditrichota, candidate division GN04, Candidatus phylum TM6, Spirochaetota, and Lentisphaerota, all of which have an average relative abundance of more than 0.1%. (b) Archaeal community at the phylum level. (c) Non-metric multidimensional scaling (NMDS) analysis for the bacterial community at the amplicon sequence variant (ASV) level. (d) NMDS analysis for the archaeal community at the ASV level. n/a, not applicable indicates that communities were not analyzed due to the low number of sequences obtained from each sample.

For metagenome sequencing, surface sediments at a depth of 1 centimeter below the seafloor (cmbsf) from 7 sites were used (Fig. 1; Table 1). The shotgun metagenome sequencing generated a total of 229.1 Gbp, with 31.9–34.0 Gbp per sample, and 1.51 billion paired-end reads, with an average of 216.8 million reads per sample (Table 3). After quality control to discard low-quality reads, 1.45 billion paired-end reads were retained, accounting for an average of 95.75% of raw reads. Subsequently, these metagenomic data were individually assembled into contigs. Using the metaWRAP (v1.3.2) pipeline25, we reconstructed 211 metagenome-assembled genomes (MAGs), and their quality metrics are summarized in Table S1 (see supplementary xlsx file). All the MAGs had >70% completeness and <10% contamination, and 61 high-quality MAGs (>90% completeness and <5% contamination) were obtained (Fig. 3a). Among high-quality MAGs, 28 MAGs showed >95% completeness and <5% contamination, and 3 MAGs represented >97% completeness and <1% contamination. The relationship between completeness and contamination represented a negative correlation (R = −0.17, p = 0.016; Fig. 3a). The genome size of the MAGs ranged from 1.11 to 5.93 Mbp, with an average of 2.67 Mbp. The majority of the genomes fell within 2 to 3 Mbp range (Fig. 3b; Table S2, see supplementary xlsx file). The 124 (58%) MAGs showed an N50 metric greater than 10 Kbp, with the longest value of 131 Kbp (Fig. 3c). The relationship between the genome size and N50 metric exhibited a positive correlation (R = 0.25, p = 0.00029; Fig. 3b). Half of the MAGs consisted of fewer than 300 contigs (Fig. 3d). The GC content of the MAGs ranged from 30.88% to 70.26% with an average of 54.5% (Table S1, see supplementary xlsx file).

Fig. 3
figure 3

Overview of 211 MAGs recovered from the East Siberian Sea. (a) The relationship between the completeness and contamination of MAGs. (b) The bar plot compares the genome sizes of MAGs. (c) The relationship between the genome size and N50 length of MAGs. (d) The bar plot compares the number of contigs of MAGs.

Using the Genome Taxonomy Database Toolkit (GTDB-tk, v.2.1.1)26, the taxonomic classification of the MAGs identified 209 bacteria and 2 archaea (Fig. 4; Table S2, see supplementary xlsx file). The two archaeal MAGs belonged to the family Nitropumilaceae of the phylum Thermoproteota. Among the 209 bacterial MAGs, 15 phyla were identified, with the most abundant being Pseudomonadota (n = 82), Actinobacteriota (n = 38), and Desulfobacterota (n = 23) (Figs. 4, 5). Notably, 86% of MAGs (n = 183) could not be taxonomically assigned to any entry in the Genome Taxonomy Database (GTDB), suggesting that most of these MAGs belonged to unknown microbial taxa at different taxonomical level including 2 orders, 9 families, 51 genera, and 121 species (2 archaea and 181 bacteria) (Table S2, see supplementary xlsx file). Based on an average nucleotide identity (ANI) > 95%27, 130 bacterial MAGs were classified into 48 species, with varying recovery across the sites while 81 MAGs were reconstructed from a single metagenome (Table S3, see supplementary xlsx file).

Fig. 4
figure 4

Taxonomic classification of 211 MAGs recovered from the East Siberian Sea. (a) The Sankey diagram represents the classification of MAGs at different taxonomic ranks using GTDB-tk26. Unclassified MAGs are not shown. (b) The bar plot indicates the taxonomic novelty of the constructed MAGs. The number of classified taxa at each taxonomic level corresponds to the summed number of taxa shown in Fig. 4a.

Fig. 5
figure 5

Phylogenetic tree of 209 bacterial MAGs constructed from 120 bacterial single-copy genes. The circle colors at the ends of the branches indicate the phylum of the corresponding MAG.

Our study unveiled the microbial communities and microbial genomes harbored in the sediments of the ESS. Microbial communities were clearly differentiated by water depth, which may be partially related to the impact of terrestrial input. In addition, the MAGs reconstructed from metagenome sequences revealed a high proportion of unknown genomes. These findings suggest that microbial communities in the ESS surface sediments are correlated with water depth and latitude, and that the benthic communities harbor a largely unexplored microbial diversity. To the best of our knowledge, this is the first report to recover microbial genomes from the surface sediments of the ESS. The presented datasets can be further used to understand the structure and function of microorganisms in the rapidly changing oceanic environment in the Arctic.

Methods

Sample preparation and sequencing of 16S rRNA gene amplicon and metagenome

Marine sediment samples were collected at 13 stations outside of the Russia’s exclusive economic zone (EEZ) of the ESS (Fig. 1; Table 1). Sampling was conducted using a box corer or multi corer in September of 2016 and 2019 during ARA07C and ARA10C cruises of the Korean ice-breaker RV Araon. The total length of the cores ranged from 27 to 65 cmbsf. Upon recovery, core sediments were sliced into 1 cm section (2–5 cm intervals for samples of ST16). The edges of each slice were removed, a portion of each 1 cm slice was subsampled for 16S rRNA gene sequences and metagenome analyses on board, and stored at –80 °C until analysis.

Genomic DNAs from core samples at depths of 1, 2, and 3 cmbsf at each station were extracted using the FastDNA spin kit for soil (MP Biomedicals, USA). DNA samples were submitted for PCR amplification, library preparation, and paired-end Illumina MiSeq sequencing (2×300 bp) to the Integrated Microbiome Resource (IMR), Dalhousie University, Canada (http://cgeb-imr.ca). Two primer sets were used independently to amplify bacterial and archaeal 16S rRNA genes. The primer pair 515 F (5′-GTGYCAGCMGCCGCGGTAA)/926 R (5′-CCGYCAATTYMTTTRAGTTT) was used to amplify bacterial 16S rRNA genes targeting V4-V5 regions, and the primer pair 956 F (5′-TYAATYGGANTCAACRCC)/1401 R (5′-CRGTGWGTRCAAGGRGCA) was utilized to amplify archaeal 16S rRNA genes targeting V6-V8 regions28,29. Amplicon sequencing was conducted using the paired-end (2 × 300 bp) Illumina MiSeq system (Illumina, USA) at IMR (Table 2). In total, 59 sequencing datasets were used for community analyses including 22 datasets obtained using the 956 F/1401 R primer set for archaeal community analysis, and 37 datasets using the 515 F/926 R primer set for bacterial community analysis (Tables 1, 2). Based on community similarity analysis of the 16S rRNA gene amplicon sequences from 13 sites at a depth of 1–3 cmbsf, 7 sites (ST07, ST05, ST04, ST03, ST02, ST08, and ST20) were selected and metagenome sequencing of sediments at 1 cmbsf was performed (Table 3). All metagenomic libraries were shotgun sequenced to generate 151 bp paired-end reads using the Illumina HiSeq X system (Illumina, USA) at Phyzen (Seongnam, Korea).

16S rRNA gene sequences processing

The adapter and primer sequences were removed using Cutadapt (v2.10)30 and the resultant sequences were processed using DADA2 (v0.9.5)31 to infer amplicon sequence variants (ASVs). For the quality trimming process, we applied the filtering option as maxN = 0, maxEE = c(2,2), and truncQ = 2. The low-quality tails of both reads were removed with truncLen = c(270,210). Denoising was performed after trimming based on the DADA2 error model. Sequences were dereplicated and a core sample inference algorithm was applied to the dereplicated data. Paired reads were merged together and chimeric sequences were removed. Following processes were performed after constructing a sequence table of ASVs to assign taxonomy using the mothur package (v1.44.1)32. Taxonomic assignments of representative ASV sequences were determined against the EzBiocloud database by sequence similarity searches33. After taxonomic assignment, archaea and unknown ASVs were removed for bacterial analysis, and bacteria and unknown ASVs were removed for archaeal analysis. Non-metric multidimensional scaling (NMDS) analysis was performed based on the Bray-Curtis dissimilarities with relative abundance matrix using the vegan package (v2.64) in R. An analysis of similarities (ANOSIM) was performed with 9,999 permutations using the vegan package (v2.64) in R.

Metagenomic assembly, binning, and refinement

Raw reads were introduced into Sickle (v1.33) (https://github.com/najoshi/sickle) to perform the quality control with the options of -n -q 20 -l 60. Filtered reads were then individually assembled by MEGAHIT (v1.2.9)34 using the option of --min-contig-len 500.

Assembled contigs with a length >1,000 bp were binned to recover MAGs using the metaWRAP (v1.3.2)25. This process employed tetranucleotide frequencies, GC content, and coverage as criteria, and included binning tools MaxBin2 (v2.2.6)35, MetaBAT2 (v2.12.1)36, and CONCOCT (v1.0.0)37 integrated within the metaWRAP pipeline. Afterward, the “bin_refinement” module in the metaWRAP was performed to improve the bin quality with options of -c 70 and -x 10 (completeness >70% and contamination <10%). The completeness and contamination of the bins were assessed using CheckM (v1.2.2)38 as part of the metaWRAP pipeline. The bins were then reassembled using the “reassemble_bins” module in the metaWRAP with options -c 70 and -x 10. To dereplicate multiple bin sets recovered from seven individual assemblies, dRep (v3.5)39 was used with a 95% ANI threshold. Finally, a total of 211 MAGs were retained (Table S2, see supplementary xlsx file).

Taxonomic assignment and phylogenetic assessment

Taxonomic classification of MAGs was performed using the “classify_wf” module in the Genome Taxonomy Database toolkit (GTDB-tk, v2.1.1)26 with the Genome Taxonomy Database (GTDB, Release 207 v2).

The phylogenetic tree of 209 bacterial MAGs was constructed with 120 bacterial single-copy marker genes which were obtained from the result of GTDB-tk analysis (Table S4, see supplementary xlsx file). Concatenated single-copy genes were aligned using MUSCLE (v3.8.31)40 with the default option, and the evolutionary distance between 209 MAGs was calculated using the maximum-likelihood method with bootstrap analyses of 1,000 replications on the MEGA 6.0 program (v6.06)41. The final tree was visually annotated using the Interactive Tree of Life (iTOL, v6)42.

Data Records

The 16S rRNA gene sequencing data, metagenome sequencing data, and reconstructed MAGs generated in this study are publicly available at the European Nucleotide Archive (ENA) under the accession number PRJEB7667243. The 211 reconstructed MAGs have been deposited at the DDBJ/ENA/GenBank database under accession numbers listed in Table S2 (see supplementary xlsx file).

Technical Validation

All software and parameters used in this study were described in the Methods section. The removal of adapter sequences was performed using Cutadapt, and low-quality reads were trimmed using Sickle. CheckM (v1.2.2)38 was used to assess the completeness and contamination of constructed MAGs. To investigate the distribution of MAGs between different 7 samples, all-against-all comparisons were performed using OrthoANI (v.140)44.

Usage Notes

This study provides 16S rRNA gene and shotgun metagenome sequencing datasets of surface sediments collected from the ESS, spanning latitudes from 73°N to 77°N. The comprehensive dataset from the ESS, where microbial communities have not been investigated, serves as a reference for comparing with other microbial communities and understanding their role in the rapidly changing Arctic.

Researchers should consider the sampling depth of surface sediments in this study. For 16S rRNA gene sequencing, sediments from depths of 1–3 cmbsf were sequenced, while for shotgun metagenome sequencing, only sediments from 1 cmbsf were used. Any interpretation of the data should take these depths into account. In addition, a total of 211 MAGs were categorized into 129 species based on all-against-all comparisons with an ANI >95%.

In the Methods section, we described the procedures for sampling, DNA extraction, library preparation, sequencing, and data processing and analysis used in this study. Detailed information about the samples is provided in Tables 13. Statistics for the constructed MAGs are listed in Tables S1, S2 (see supplementary xlsx file).