Background & Summary

The hadal zone is the deepest habitat of the ocean, referring to the deep region with >6000 meters water depth, and approximately 1%–2% of the global benthic area, but constitutes the deepest 45% of the vertical depth gradient1. Tectonically, the hadal zone is in the subduction zone, creating topographic V-shaped depressions that form a unique topographic feature in the deep ocean2,3. The geophysical and geochemical features of the hadal zone are distinctive from those of other habitats in the deep ocean1,4. Topography, geographical isolation, and spatio-temporal variation in food supply, as well as low temperature and extremely high hydrostatic pressure, created a unique habitat that accommodated a diverse and active microbial community4. With advances in deep-sea sampling technologies and in high-throughput sequencing, the knowledge of the hadal biosphere has been largely improved. Sediments of the hadal zone harbor microbial communities with high abundance and diverse metabolic functions, showing clear shifts of composition and assembly strategies from bathyal and abyssal sediments to deep hadal zones5,6,7,8,9. The proportion of heterotrophic microbial communities was dominant in hadal sediments10,11. They are able to degrade various organic matter, such as aromatic compounds, alkane, and long-chain hydrocarbons, as revealed from previous metagenome sequencing-based analyses5,6,8,12. Further, growing evidence indicates that chemoautotrophic carbon fixation occurred within the hadal trench13,14. Despite the increasing number of studies in the hadal biosphere, the current microbiome data are not sufficient to carry out a comprehensive investigation on microbial diversity and function in the sediment of hadal trenches. Therefore, the knowledge of the diversity, composition, and function of microbial communities in the hadal sediment remains deficient.

The Yap Trench is located at the southern end of the Philippine Sea Plate and is a tectonic region of convergence among the Philippine Sea, the Pacific and the Caroline plates in the southwestern of the Pacific Ocean. There are three different trenches, namely the Yap Trench, the Mariana Trench and the Palau Trench, created by the process of tectonic plate collisions15. The Yap Trench is located between the Mariana Trench and the Palau Trench, extending about 700 km long and 50 km wide from the trench axis to the island arc. The width of the Yap Trench is much less than that of other arc-trench systems, forming a sharp “V” shape. The Yap Trench is divided into northern and southern sections, with the boundary between them marked at 8°26′N, based on its relation to the Caroline Ridge16,17. The geological, geophysical, and geochemical characteristics were different between northern and southern sections16,17. For instance, the southern Yap Trench has a gentler trench slope and lower seismic intensity compared to the northern section18. Additionally, the concentration of organic matter in the sediment of the southern section is higher than that of the northern section19,20. These contrasting characteristics may influence the formation of different microbial communities in the sediments of the two sections of the Yap Trench, which is important to broad our understanding of microbial community functions on hadal zone.

To better understand the diversity, composition and function of sediment microbial communities at the Yap Trench, and compare the microbiome of northern and southern parts, we collected three push cores from different water depths, covering abyssal (Sites 1 and 2) and hadal trench (Site 3) regions in the northern and southern parts of the Yap Trench. 35 metagenomes obtained from top to bottom layers of three push cores (Fig. 1A and Supplementary Table S1). Through metagenome assembly and binning processes (Fig. 1B), we obtained 32 million non-redundant predicted genes and 404 metagenome assembled genomes (MAGs) with completeness >50% and contamination <10% from the whole dataset. Within these MAGs, 142 MAGs were estimated to be >70% completeness, account for 35% of total MAGs (Supplementary Table S2). Based on taxonomy classification and the relative abundance of these MAGs, Alpha- and Gammaproteobacteria, Phycisphaerae, Nitrospiria and Dehalococcoidia were dominant classes across all samples. Gammaproteobacteria and Acidimicrobiia were highly abundant in abyssal sediment, while Alphaproteobacteria and Dehalococcoidia were dominant classes in the hadal sediments (Fig. 2). The assembled contigs of each sample were integrated and redundant genes were removed with sequence similarity (cut off = 99%, Fig. 1B). After clustering, 3,976,582 non-redundant genes were retrieved from the datasets. We blasted these non-redundant genes against the KEGG, Pfam, CAZy, and eggNOG databases to predict their functions, and 63% of these genes could be assigned to known genes in the databases (Fig. 3). The MAGs with more than 70% completeness and less than 10% contamination were used to construct a phylogenomic tree (Fig. 4). The results showed that the taxonomy of MAGs included 26 phyla indicating highly diverse of microbial community in Yap trench sediment (Fig. 4). Among them, top three numbers of MAGs were affiliated with Pseudomonadota (n = 50), Acidobacteriota (n = 17) and Chloroflexota (n = 8). The archaeal MAGs belonged to Thermoproteota and Nanoarchaeota (Supplementary Table S2). These datasets will enable us to further understand the diversity, composition and function of microbiota in the hadal trench, and highlight their critical roles in the hadal biosphere.

Fig. 1
Fig. 1
Full size image

The map of sampling sites and pipeline of metagenome analysis (A). Sampling location. The red dots showed the location of sampling sites (B). The pipeline of metagenomic analysis for sediment samples.

Fig. 2
Fig. 2
Full size image

The relative abundance and distribution of recovered MAGs in three sites. The stacked bar plot based on the relative abundance of MAGs obtained from this study.

Fig. 3
Fig. 3
Full size image

Functional characterization of the non-redundant gene catalog. Non-annotation indicates that these genes were not annotated in at least one of the following databases: eggNOG, Pfam, KEGG, and CAZy. The low panel of filled-in cells indicated the databases are in an intersection. The vertical bars in top panel represent the number of annotated genes in the intersection or shared between different databases in intersection. The horizontal bars in left panel indicate the total number of annotated genes in each database.

Fig. 4
Fig. 4
Full size image

Phylogenetic tree of MAGs including bacteria and archaea. The tree was constructed based on concatenated 37 conserved single copy proteins alignment. The black points in the branches of the tree represent bootstrap values >0.7. The MAGs recovered from this study were labeled red. Phyla are color-coded, and taxonomy is from the Genome Taxonomy Database (GTDB). The gray bars in the outside cycle indicated the relative abundance of MAGs.

Methods

Sample collection

Three push cores were retrieved from the western trench slope of Yap Trench during R/V Xiangyanghong 10th cruise with manned submersible Jiaolong (Fig. 1). The subsample of sediment was split at 1-cm intervals using sterilized tools on board; additionally, the subsamples were split at 2-cm intervals below 10 cm. Only the interior sections of the sediment were used for microbiological study to avoid potential contamination21. A total of 35 subsamples obtained from 3 push cores were analyzed (Supplementary Table S1), and subsequently, sediments for microbiological analyses were stored at −80 °C until further processing.

DNA extraction and sequencing

Total DNA was extracted from the sediments with the PowerSoil DNA Isolation Kit (Qiagen, Germany) according to the manufacturer’s instructions. The DNA was purified and concentrated with the Genomic DNA Clean & Concentrator kit (Zymo Research, USA). DNA was fragmented into smaller pieces with a Covaris instrument (Covaris, USA) and selected 300–500 bp DNA fragments to construct libraries with Illumina Nextera DNA libraries kit (Illumina, USA), and sequencing on Illumina HiSeq X-Ten platform (Wuhan Onemore-tech Co., Ltd.).

Metagenome assembly and binning

The trimming of raw reads was performed using Trimmomatic v.0.39. The clean reads of each sample were assembled using MEGAHIT v1.2.9 with parameters ‘–k-min 21–k-max 144–k-step 10’22. The length of contigs larger than 1000 bp was used for downstream analysis. The coverage of contigs was determined using BWA software (v0.7.17; BWA-MEM algorithm)23. Binning process performed with metaWRAP binning module (v1.3.2; parameters: -metabat2, -maxbin2, -concoct, -m 2000)24 and VAMB25 with default parameters, respectively. The reconstructed MAGs were refined using the ‘bin_refinement’ module of MetaWRAP v1.312, and their quality and taxonomic information were identified using CheckM2 v1.0.226 and GTDB-TK v2.4.027 with the GTDB-TK reference database (version 220), respectively. MAGs with completeness more than 50% and contamination less than 10% were used for downstream analysis. 404 representative MAGs were obtained based on an average nucleotide identity (ANI) cutoff value of 95% with dRep v3.5.028. The coverage of each MAG was calculated using CoverM in genome mode (v0.6.1; https://github.com/wwood/CoverM; parameters: -min-read-percent-identity 0.95, -min-read-aligned-percent 0.75, -trim-min 0.10, -trim-max 0.90, -m relative_abundance).

Functional gene annotation and phylogenetic analysis

The open reading frames (ORFs) of the genomes and contigs were predicted using Prodigal v2.6.329 with the ‘-p meta’ parameter and then annotated against the Kyoto Encyclopedia of Genes and Genomes (KEGG) (version Jan. 1st, 2025) using KofamScan v1.3.030 with E-values ≤ 1e-20, and Tigrfam31 using hmmscan (v3.3.2)32. The peptidase and proteinase encoding genes were annotated in the MEROPS database 12.433 using Diamond blastp v0.9.1434 with a threshold of coverage >40% and E-value < 1e-20.

 We used 142 MAGs with completeness >70% and contamination <10% to construct the phylogenetic tree. The concatenated set of 37 conversed single-copy genes based on a hidden Markov Model profile was used for phylogenetic analysis with IQ-TREE (v2.2.0.3)35 with the best-fit model (Q.pfam + I + I + R9) and 1000 times ultrafast bootstrapping. The tree file was edited using the online tool iTOL (https://itol.embl.de/).

Data Records

The 35 raw metagenome sequences are available on the NCBl Sequence Read Archive (SRA) associated with BioProject number PRJNA131417336 and accession number SRP61789737. A total of 404 non-redundant FASTA formatted MAGs from these metagenomes were available at European Nucleotide Archive (ENA) under accession code PRJEB10696838, PRJEB10696939 and PRJEB10691440. The detailed information for these qualified MAGs, including genomic quality, GTDB taxonomy, accession number and relative abundance was shown in the Supplementary Table S2.

Technical Validation

To avoid contamination of sediment samples, all sampling tools and containers have been sterilized before sampling and only the interior sections of the sediment core were collected for DNA extraction. After the samples collection, the sediment samples were stored at −80 °C until further processing. All processes of DNA extraction and library construction were carried out in an ultra-clean lab. To ensure the quality of genes prediction, we selected assembled contigs with a length larger than 1000 bp. To maximize the number of MAGs, the length of contigs more than 1000 bp and four different binning tools were used in the binning process, such as CONCOCT, MetaBat2, Maxbin2 and VAMB. The quality of MAGs was identified with CheckM2. The high-quality MAGs were completeness >50% and contamination <10%. To increase the accuracy of phylogenetic analysis, we used MAGs with completeness >70% and contamination <10% to construct a phylogenetic tree.

Usage Notes

The biosphere in hadal zone sediments has many enigmas and is only partially explored. This study provides comprehensive metagenomic data from the sediments retrieved from different depths of the northern and southern Yap Trench, covering abyssal and trench sediments. The datasets contained 21 and 14 metagenomes from abyssal and trench sediments, respectively. All data were analyzed with a commonly used pipeline, generating bacterial and archaeal high-quality MAGs. The datasets can be used for exploring the diversity and potential metabolic function of microorganisms inhibited in the hadal sediment and comparing with the microbiome of other hadal trench.