35 metagenomic datasets from the northern and southern parts of the Yap trench sediments

Niu, Mingyang; Fu, Lulu; Yan, Qingyun; He, Zhili; Li, Dong; Zhen, Yu; Wang, Minxiao; Li, Chaolun

doi:10.1038/s41597-026-06812-4

Download PDF

Data Descriptor
Open access
Published: 11 February 2026

35 metagenomic datasets from the northern and southern parts of the Yap trench sediments

Scientific Data volume 13, Article number: 422 (2026) Cite this article

1169 Accesses
Metrics details

Subjects

Abstract

The hadal trench is the deepest part of the global ocean and harbors highly abundant microbial cells. However, the diversity and function of the majority of microbial communities in this part of the ocean are still unclear. Here, we collected 35 metagenomes from three push cores across different sites in both the northern and southern Yap trench to construct a comprehensive gene and genome dataset. A total of 32 million non-redundant genes were predicted from the whole metagenome datasets, with 63% assigned to known functional groups based on currently available databases. A total of 404 metagenome-assembled genomes (MAGs) with completeness >50% and contamination <10% were retrieved, and their taxonomy was highly diverse across 26 phyla. Alpha- and Gammaproteobacteria, Phycisphaerae, Nitrospiria, and Dehalococcoidia were dominant classes across all samples. The nonredundant gene and MAG datasets are valuable resources for advancing our understanding of the diversity, composition, and functions of microbiota in the sediment of the hadal trench.

Metagenome sequencing and 982 microbial genomes from Kermadec and Diamantina Trenches sediments

Article Open access 01 October 2024

Metagenome sequencing and 103 microbial genomes from ballast water and sediments

Article Open access 10 August 2023

500 metagenome-assembled microbial genomes from 30 subtropical estuaries in South China

Article Open access 16 June 2022

Background & Summary

The hadal zone is the deepest habitat of the ocean, referring to the deep region with >6000 meters water depth, and approximately 1%–2% of the global benthic area, but constitutes the deepest 45% of the vertical depth gradient¹. Tectonically, the hadal zone is in the subduction zone, creating topographic V-shaped depressions that form a unique topographic feature in the deep ocean^2,3. The geophysical and geochemical features of the hadal zone are distinctive from those of other habitats in the deep ocean^1,4. Topography, geographical isolation, and spatio-temporal variation in food supply, as well as low temperature and extremely high hydrostatic pressure, created a unique habitat that accommodated a diverse and active microbial community⁴. With advances in deep-sea sampling technologies and in high-throughput sequencing, the knowledge of the hadal biosphere has been largely improved. Sediments of the hadal zone harbor microbial communities with high abundance and diverse metabolic functions, showing clear shifts of composition and assembly strategies from bathyal and abyssal sediments to deep hadal zones^5,6,7,8,9. The proportion of heterotrophic microbial communities was dominant in hadal sediments^10,11. They are able to degrade various organic matter, such as aromatic compounds, alkane, and long-chain hydrocarbons, as revealed from previous metagenome sequencing-based analyses^5,6,8,12. Further, growing evidence indicates that chemoautotrophic carbon fixation occurred within the hadal trench^13,14. Despite the increasing number of studies in the hadal biosphere, the current microbiome data are not sufficient to carry out a comprehensive investigation on microbial diversity and function in the sediment of hadal trenches. Therefore, the knowledge of the diversity, composition, and function of microbial communities in the hadal sediment remains deficient.

The Yap Trench is located at the southern end of the Philippine Sea Plate and is a tectonic region of convergence among the Philippine Sea, the Pacific and the Caroline plates in the southwestern of the Pacific Ocean. There are three different trenches, namely the Yap Trench, the Mariana Trench and the Palau Trench, created by the process of tectonic plate collisions¹⁵. The Yap Trench is located between the Mariana Trench and the Palau Trench, extending about 700 km long and 50 km wide from the trench axis to the island arc. The width of the Yap Trench is much less than that of other arc-trench systems, forming a sharp “V” shape. The Yap Trench is divided into northern and southern sections, with the boundary between them marked at 8°26′N, based on its relation to the Caroline Ridge^16,17. The geological, geophysical, and geochemical characteristics were different between northern and southern sections^16,17. For instance, the southern Yap Trench has a gentler trench slope and lower seismic intensity compared to the northern section¹⁸. Additionally, the concentration of organic matter in the sediment of the southern section is higher than that of the northern section^19,20. These contrasting characteristics may influence the formation of different microbial communities in the sediments of the two sections of the Yap Trench, which is important to broad our understanding of microbial community functions on hadal zone.

To better understand the diversity, composition and function of sediment microbial communities at the Yap Trench, and compare the microbiome of northern and southern parts, we collected three push cores from different water depths, covering abyssal (Sites 1 and 2) and hadal trench (Site 3) regions in the northern and southern parts of the Yap Trench. 35 metagenomes obtained from top to bottom layers of three push cores (Fig. 1A and Supplementary Table S1). Through metagenome assembly and binning processes (Fig. 1B), we obtained 32 million non-redundant predicted genes and 404 metagenome assembled genomes (MAGs) with completeness >50% and contamination <10% from the whole dataset. Within these MAGs, 142 MAGs were estimated to be >70% completeness, account for 35% of total MAGs (Supplementary Table S2). Based on taxonomy classification and the relative abundance of these MAGs, Alpha- and Gammaproteobacteria, Phycisphaerae, Nitrospiria and Dehalococcoidia were dominant classes across all samples. Gammaproteobacteria and Acidimicrobiia were highly abundant in abyssal sediment, while Alphaproteobacteria and Dehalococcoidia were dominant classes in the hadal sediments (Fig. 2). The assembled contigs of each sample were integrated and redundant genes were removed with sequence similarity (cut off = 99%, Fig. 1B). After clustering, 3,976,582 non-redundant genes were retrieved from the datasets. We blasted these non-redundant genes against the KEGG, Pfam, CAZy, and eggNOG databases to predict their functions, and 63% of these genes could be assigned to known genes in the databases (Fig. 3). The MAGs with more than 70% completeness and less than 10% contamination were used to construct a phylogenomic tree (Fig. 4). The results showed that the taxonomy of MAGs included 26 phyla indicating highly diverse of microbial community in Yap trench sediment (Fig. 4). Among them, top three numbers of MAGs were affiliated with Pseudomonadota (n = 50), Acidobacteriota (n = 17) and Chloroflexota (n = 8). The archaeal MAGs belonged to Thermoproteota and Nanoarchaeota (Supplementary Table S2). These datasets will enable us to further understand the diversity, composition and function of microbiota in the hadal trench, and highlight their critical roles in the hadal biosphere.

Methods

Sample collection

Three push cores were retrieved from the western trench slope of Yap Trench during R/V Xiangyanghong 10th cruise with manned submersible Jiaolong (Fig. 1). The subsample of sediment was split at 1-cm intervals using sterilized tools on board; additionally, the subsamples were split at 2-cm intervals below 10 cm. Only the interior sections of the sediment were used for microbiological study to avoid potential contamination²¹. A total of 35 subsamples obtained from 3 push cores were analyzed (Supplementary Table S1), and subsequently, sediments for microbiological analyses were stored at −80 °C until further processing.

DNA extraction and sequencing

Total DNA was extracted from the sediments with the PowerSoil DNA Isolation Kit (Qiagen, Germany) according to the manufacturer’s instructions. The DNA was purified and concentrated with the Genomic DNA Clean & Concentrator kit (Zymo Research, USA). DNA was fragmented into smaller pieces with a Covaris instrument (Covaris, USA) and selected 300–500 bp DNA fragments to construct libraries with Illumina Nextera DNA libraries kit (Illumina, USA), and sequencing on Illumina HiSeq X-Ten platform (Wuhan Onemore-tech Co., Ltd.).

Metagenome assembly and binning

The trimming of raw reads was performed using Trimmomatic v.0.39. The clean reads of each sample were assembled using MEGAHIT v1.2.9 with parameters ‘–k-min 21–k-max 144–k-step 10’²². The length of contigs larger than 1000 bp was used for downstream analysis. The coverage of contigs was determined using BWA software (v0.7.17; BWA-MEM algorithm)²³. Binning process performed with metaWRAP binning module (v1.3.2; parameters: -metabat2, -maxbin2, -concoct, -m 2000)²⁴ and VAMB²⁵ with default parameters, respectively. The reconstructed MAGs were refined using the ‘bin_refinement’ module of MetaWRAP v1.312, and their quality and taxonomic information were identified using CheckM2 v1.0.2²⁶ and GTDB-TK v2.4.0²⁷ with the GTDB-TK reference database (version 220), respectively. MAGs with completeness more than 50% and contamination less than 10% were used for downstream analysis. 404 representative MAGs were obtained based on an average nucleotide identity (ANI) cutoff value of 95% with dRep v3.5.0²⁸. The coverage of each MAG was calculated using CoverM in genome mode (v0.6.1; https://github.com/wwood/CoverM; parameters: -min-read-percent-identity 0.95, -min-read-aligned-percent 0.75, -trim-min 0.10, -trim-max 0.90, -m relative_abundance).

Functional gene annotation and phylogenetic analysis

The open reading frames (ORFs) of the genomes and contigs were predicted using Prodigal v2.6.3²⁹ with the ‘-p meta’ parameter and then annotated against the Kyoto Encyclopedia of Genes and Genomes (KEGG) (version Jan. 1st, 2025) using KofamScan v1.3.0³⁰ with E-values ≤ 1e-20, and Tigrfam³¹ using hmmscan (v3.3.2)³². The peptidase and proteinase encoding genes were annotated in the MEROPS database 12.4³³ using Diamond blastp v0.9.14³⁴ with a threshold of coverage >40% and E-value < 1e-20.

We used 142 MAGs with completeness >70% and contamination <10% to construct the phylogenetic tree. The concatenated set of 37 conversed single-copy genes based on a hidden Markov Model profile was used for phylogenetic analysis with IQ-TREE (v2.2.0.3)³⁵ with the best-fit model (Q.pfam + I + I + R9) and 1000 times ultrafast bootstrapping. The tree file was edited using the online tool iTOL (https://itol.embl.de/).

Data Records

The 35 raw metagenome sequences are available on the NCBl Sequence Read Archive (SRA) associated with BioProject number PRJNA1314173³⁶ and accession number SRP617897³⁷. A total of 404 non-redundant FASTA formatted MAGs from these metagenomes were available at European Nucleotide Archive (ENA) under accession code PRJEB106968³⁸, PRJEB106969³⁹ and PRJEB106914⁴⁰. The detailed information for these qualified MAGs, including genomic quality, GTDB taxonomy, accession number and relative abundance was shown in the Supplementary Table S2.

Technical Validation

To avoid contamination of sediment samples, all sampling tools and containers have been sterilized before sampling and only the interior sections of the sediment core were collected for DNA extraction. After the samples collection, the sediment samples were stored at −80 °C until further processing. All processes of DNA extraction and library construction were carried out in an ultra-clean lab. To ensure the quality of genes prediction, we selected assembled contigs with a length larger than 1000 bp. To maximize the number of MAGs, the length of contigs more than 1000 bp and four different binning tools were used in the binning process, such as CONCOCT, MetaBat2, Maxbin2 and VAMB. The quality of MAGs was identified with CheckM2. The high-quality MAGs were completeness >50% and contamination <10%. To increase the accuracy of phylogenetic analysis, we used MAGs with completeness >70% and contamination <10% to construct a phylogenetic tree.

Usage Notes

The biosphere in hadal zone sediments has many enigmas and is only partially explored. This study provides comprehensive metagenomic data from the sediments retrieved from different depths of the northern and southern Yap Trench, covering abyssal and trench sediments. The datasets contained 21 and 14 metagenomes from abyssal and trench sediments, respectively. All data were analyzed with a commonly used pipeline, generating bacterial and archaeal high-quality MAGs. The datasets can be used for exploring the diversity and potential metabolic function of microorganisms inhibited in the hadal sediment and comparing with the microbiome of other hadal trench.

Data availability

Metagenome sequences are deposited on the NCBl Sequence Read Archive (SRA) associated with BioProject number PRJNA1314173³⁶ and accession number SRP617897³⁷. 404 high-quality non-redundant MAGs retrieved from these metagenomes of three cores were deposited in ENA database under accession code PRJEB106968³⁸, PRJEB106969³⁹ and PRJEB106914⁴⁰, respectively. The metadata of MAGs and custom codes were available at FigShare⁴¹.

Code availability

The custom scripts to generate the datasets are publicly available on Figshare⁴¹. The parameters and versions of all bioinformatics tools used for the metagenomic analysis are described in the Methods section.

References

Jamieson, A. J., Fujii, T., Mayor, D. J., Solan, M. & Priede, I. G. Hadal trenches: the ecology of the deepest places on Earth. Trends in Ecology & Evolution 25(3), 190–7 (2010).
Article Google Scholar
Stewart, H. A. & Jamieson, A. J. Habitat heterogeneity of hadal trenches: Considerations and implications for future studies. Progress in Oceanography 161, 47–65 (2018).
Article ADS Google Scholar
Stern, R. J. Subduction zones. Reviews of Geophysics 40(4), 3-1–3-38 (2002).
Article Google Scholar
Du, M. et al. Geology, environment, and life in the deepest part of the world’s oceans. The Innovation 2(2), 100109 (2021).
Article MathSciNet CAS PubMed PubMed Central Google Scholar
Zhou, Y.-L., Mara, P., Cui, G.-J., Edgcomb, V. P. & Wang, Y. Microbiomes in the Challenger Deep slope and bottom-axis sediments. Nature Communications 13(1), 1515 (2022).
Article CAS PubMed PubMed Central ADS Google Scholar
Liu, R. et al. Novel Chloroflexi genomes from the deepest ocean reveal metabolic strategies for the adaptation to deep-sea habitats. Microbiome 10(1), 75 (2022).
Article CAS PubMed PubMed Central Google Scholar
Li, Y., Cao, W., Wang, Y. & Ma, Q. Microbial diversity in the sediments of the southern Mariana Trench. Journal of Oceanology and Limnology 37(3), 1024–9 (2019).
Article CAS ADS Google Scholar
Xiao, X. et al. Microbial ecosystems and ecological driving forces in the deepest ocean sediments. Cell 188(5), 1363–77.e9 (2025).
Article CAS PubMed Google Scholar
Fu, L. et al. Characteristics of the archaeal and bacterial communities in core sediments from Southern Yap Trench via in situ sampling by the manned submersible Jiaolong. Science of The Total Environment 703, 134884 (2020).
Article CAS PubMed Google Scholar
Peoples, L. M. et al. Microbial Community Diversity Within Sediments from Two Geographically Separated Hadal Trenches. Front Microbiol 10, 2019 (2019).
Google Scholar
Hiraoka, S. et al. Microbial community and geochemical analyses of trans-trench sediments for understanding the roles of hadal environments. ISME J 14(3), 740–56 (2020).
Article CAS PubMed Google Scholar
Wang, Y. et al. Genomics insights into ecotype formation of ammonia-oxidizing archaea in the deep ocean. Environ Microbiol 21(2), 716–29 (2019).
Article CAS PubMed Google Scholar
Wenzhöfer, F. et al. Benthic carbon mineralization in hadal trenches: Assessment by in situ O2 microprofile measurements. Deep Sea Research Part I: Oceanographic Research Papers 116, 276–86 (2016).
Article ADS Google Scholar
Luo, M., Gieskes, J., Chen, L., Shi, X. & Chen, D. Provenances, distribution, and accumulation of organic matter in the southern Mariana Trench rim and slope: Implication for carbon cycle and burial in hadal trenches. Marine Geology 386, 98–106 (2017).
Article CAS ADS Google Scholar
Yang, Y. et al. Geology of the Yap Trench: new observations from a transect near 10°N from manned submersible Jiaolong. International Geology Review 60(16), 1941–53 (2018).
Article ADS Google Scholar
Xia, C.-L. et al. Geological and geophysical differences between the north and south sections of the Yap trench-arc system and their relationship with Caroline Ridge subduction. Geological Journal 55(12), 7775–89 (2020).
Article ADS Google Scholar
Fujiwara, T. et al. Morphology and tectonics of the Yap Trench. Marine Geophysical Researches 21(1), 69–86 (2000).
Article ADS Google Scholar
Jamieson, A. The Hadal Zone: Life in the Deepest Oceans. Cambridge: Cambridge University Press (2015).
Li, D. et al. Spatial heterogeneity of organic carbon cycling in sediments of the northern Yap Trench: Implications for organic carbon burial. Mar Chem 223, 103813 (2020).
Article CAS Google Scholar
Li, D. et al. Comparison of sedimentary organic carbon loading in the Yap Trench and other marine environments. Journal of Oceanology and Limnology 38(3), 619–33 (2020).
Article CAS ADS Google Scholar
Lever, M. A. et al. Life under extreme energy limitation: a synthesis of laboratory- and field-based investigations. FEMS Microbiology Reviews (2015).
Li, D., Liu, C.-M., Luo, R., Sadakane, K., Lam, T.- W. J. B. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. 31(10):1674-6 (2015).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nature Methods 9, 357 (2012).
Article CAS PubMed PubMed Central Google Scholar
Uritskiy, G. V., DiRuggiero, J. & Taylor, J. MetaWRAP—a flexible pipeline for genome-resolved metagenomic data analysis. Microbiome 6(1), 158 (2018).
Article PubMed PubMed Central Google Scholar
Nissen, J. N. et al. Improved metagenome binning and assembly using deep variational autoencoders. Nature Biotechnology 39(5), 555–60 (2021).
Article CAS PubMed Google Scholar
Chklovski, A., Parks, D. H., Woodcroft, B. J. & Tyson, G. W. CheckM2: a rapid, scalable and accurate tool for assessing microbial genome quality using machine learning. Nature Methods 20(8), 1203–12 (2023).
Article CAS PubMed Google Scholar
Chaumeil, P.-A., Mussig, A. J., Hugenholtz, P. & Parks, D. H. GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database. Bioinformatics 36(6), 1925–7 (2019).
Article PubMed PubMed Central Google Scholar
Olm, M. R., Brown, C. T., Brooks, B. & Banfield, J. F. dRep: a tool for fast and accurate genomic comparisons that enables improved genome recovery from metagenomes through de-replication. ISME J 11(12), 2864–8 (2017).
Article CAS PubMed PubMed Central Google Scholar
Hyatt, D. et al. LJJBb. Prodigal: prokaryotic gene recognition and translation initiation site identification. 11(1):119 (2010).
Aramaki, T. et al. KofamKOALA: KEGG Ortholog assignment based on profile HMM and adaptive score threshold. Bioinformatics 36(7), 2251–2 (2019).
Article Google Scholar
Haft, D. H. et al. TIGRFAMs and Genome Properties in 2013. Nucleic Acids Research 41(D1), D387–D95 (2012).
Article PubMed PubMed Central Google Scholar
Eddy, S. R. Accelerated Profile HMM Searches. PLOS Computational Biology 7(10), e1002195 (2011).
Article MathSciNet CAS PubMed PubMed Central ADS Google Scholar
Rawlings, N. D. & Bateman, A. How to use the database and website to help understand peptidase specificity. Protein Science 30(1), 83–92 (2021).
Article CAS PubMed Google Scholar
Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nature Methods 12, 59 (2014).
Article PubMed Google Scholar
Minh, B. Q. et al. IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era. Mol Biol Evol 37(5), 1530–4 (2020).
Article CAS PubMed PubMed Central Google Scholar
NCBI BioProject https://identifiers.org/ncbi/bioproject:PRJNA1314173 (2025).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRP617897 (2025).
Euuropean Nucleotide Archive https://www.ebi.ac.uk/ena/browser/view/PRJEB106968 (2026).
Euuropean Nucleotide Archive https://www.ebi.ac.uk/ena/browser/view/PRJEB106969 (2026).
Euuropean Nucleotide Archive https://www.ebi.ac.uk/ena/browser/view/PRJEB106914 (2026).
Mingyang, N. et al. 35 metagenomic datasets from the northern and southern slope of Yap trench sediments. Figshare https://doi.org/10.6084/m9.figshare.29328314 (2025).

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (42030407), National Key Basic Research and Development Project of China (2015CB755904), National Natural Science Foundation of China (42006083, 41906124), Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai) (SML2023SP220, Dong, SML2024SP002).

Author information

Authors and Affiliations

Marine Synthetic Ecology Research Center, Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai), Zhuhai, 519082, China
Mingyang Niu, Qingyun Yan & Zhili He
Center of Deep Sea Research & Key Laboratory of Marine Geology and Environment, Institute of Oceanology, Chinese Academy of Sciences, Qingdao, 266071, China
Lulu Fu, Minxiao Wang & Chaolun Li
Key Laboratory of Marine Ecosystem Dynamics, Second Institute of Oceanography, Ministry of Natural Resources, Hangzhou, 310012, China
Dong Li
College of Environmental Science and Engineering, Ocean University of China, Qingdao, 266100, China
Yu Zhen
State Key Laboratory of Tropical Oceanography, South China Sea Institute of Oceanology, Chinese Academy of Sciences, Guangzhou, 510301, China
Chaolun Li
Guangdong Provincial Key Laboratory of Applied Marine Biology, South China Sea Institute of Oceanology, Chinese Academy of Sciences, Guangzhou, 510301, China
Chaolun Li
University of Chinese Academy of Sciences, Beijing, 10049, China
Chaolun Li

Authors

Mingyang Niu
View author publications
Search author on:PubMed Google Scholar
Lulu Fu
View author publications
Search author on:PubMed Google Scholar
Qingyun Yan
View author publications
Search author on:PubMed Google Scholar
Zhili He
View author publications
Search author on:PubMed Google Scholar
Dong Li
View author publications
Search author on:PubMed Google Scholar
Yu Zhen
View author publications
Search author on:PubMed Google Scholar
Minxiao Wang
View author publications
Search author on:PubMed Google Scholar
Chaolun Li
View author publications
Search author on:PubMed Google Scholar

Contributions

M.N. and L.F.: designed this study, performed data analysis, interpreted the data and wrote manuscript. D.L. collected samples. D.L., Q.Y., M.W., C.L. and Z.H. edited the manuscript.

Corresponding authors

Correspondence to Lulu Fu or Dong Li.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementmary Table (download XLSX )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Niu, M., Fu, L., Yan, Q. et al. 35 metagenomic datasets from the northern and southern parts of the Yap trench sediments. Sci Data 13, 422 (2026). https://doi.org/10.1038/s41597-026-06812-4

Download citation

Received: 18 June 2025
Accepted: 03 February 2026
Published: 11 February 2026
Version of record: 20 March 2026
DOI: https://doi.org/10.1038/s41597-026-06812-4