Recovery of 679 metagenome-assembled genomes from different soil depths along a precipitation gradient

Kazarina, Anna; Wiechman, Hallie; Sarkar, Soumyadev; Richie, Tanner; Lee, Sonny T. M.

doi:10.1038/s41597-025-04884-2

Download PDF

Data Descriptor
Open access
Published: 28 March 2025

Recovery of 679 metagenome-assembled genomes from different soil depths along a precipitation gradient

Scientific Data volume 12, Article number: 521 (2025) Cite this article

5005 Accesses
4 Citations
11 Altmetric
Metrics details

Subjects

Abstract

Soil contains a diverse community of organisms; these can include archaea, fungi, viruses, and bacteria. In situ identification of soil microorganisms is challenging. The use of genome-centric metagenomics enables the assembly and identification of microbial populations, allowing the categorization and exploration of potential functions living in the complex soil environment. However, the heterogeneity of the soil-inhabiting microbes poses a tremendous challenge, with their functions left unknown, and difficult to culture in lab settings. In this study, using genome assembling strategies from both field core samples and enriched monolith samples, we assembled 679 highly complete metagenome-assembled genomes (MAGs). The ability to identify these MAGs from samples across a precipitation gradient in the state of Kansas (USA) provided insights into the impact of precipitation levels on soil microbial populations. Metabolite modeling of the MAGs revealed that more than 80% of the microbial populations possessed carbohydrate-active enzymes, capable of breaking down chitin and starch.

Depth-dependent Metagenome-Assembled Genomes of Agricultural Soils under Managed Aquifer Recharge

Article Open access 24 May 2025

A genomic catalogue of soil microbiomes boosts mining of biodiversity and genetic resources

Article Open access 11 November 2023

Soil microbial responses to multiple global change factors as assessed by metagenomics

Article Open access 31 May 2025

Background & Summary

The soil environment is host to many microbes with diverse functions^1,2, but the high heterogeneity of the soil-associated microbial community results in challenging microbial culturing conditions in lab settings^3,4. For example, some bacterial soil taxa have a slow growth rate or complex nutrient demands including the dependence on the “cross-feeding” in the natural soil environment^5,6. These characteristics not only complicate the establishment of optimal growth conditions, but also require precise environmental mimicking to achieve successful cultivation. The difficulty of some microbes to be cultured has propelled the field of metagenomics to grow both in use and popularity^7,8. Metagenomics has been used to categorize the complex microbes; their identity and functional potential, living in the environment^9,10.

Metagenome-assembled genomes (MAGs) are assembled from the short fragments of DNA, and potentially increase the identity resolution of the microbial populations in a community^11,12. The use of genome resolved metagenomes has enabled ground-breaking work in exploring potential functions across various communities like soil¹³, human and animal guts¹⁴, and many more^15,16. Soil-associated MAGs are challenging to assemble in vivo^17,18. One of the reasons is because of the complexity and biodiversity within the soil community, and the low presence of the soil microorganisms in the reference databases^19,20. The soil environment contains many microorganisms like viruses, fungi, archaea, and bacteria, adding to the challenge of assembling microbial genomes^19,21. Similarly, some soil-associated microbes are low in abundance, and getting enough coverage, coupled with limited information on these microbes^22,23, makes assembling these microbial populations extremely hard to achieve^17,24.

Understanding the distribution of bacterial populations across precipitation gradients is crucial for understanding how environmental factors such as moisture influence microbial diversity, function and overall ecosystem health. Microbes play essential roles in nutrient cycling²⁵, organic matter decomposition^2,26, and plant health²⁷. All of which can be affected significantly with changes in precipitation^28,29. With the ability to assemble genomes, and analyze them across different precipitation levels, thus, enabling the identification of adaptive traits and community shifts that respond to water availability, and therefore, provide insights into how ecosystems may respond to global climate change. This is vital for predicting and managing the impacts of shifting precipitation patterns on soil health, agricultural productivity, conservation efforts, and overall ecosystem stability^13,30.

In this study, we assembled 679 metagenome-assembled genomes (MAGs) using a co-assembly strategy of 18 field soil core samples and 36 enriched monolith samples from two locations in Kansas, USA. The two locations were Hays, Kansas (Western Kansas; WKS), which has less yearly precipitation compared to Ottawa, Kansas (Eastern Kansas; EKS) (Fig. 1). One of the field core samples did not meet the minimum sequencing requirements, resulting in a total of 53 samples for this study. While co-assembly strategy has its limitations, we used this method to recover MAGs because (1) it resulted in higher read depth to enhance more robust assembly³¹, capturing the diversity across the system³²; (2) it facilitates our comparison of the MAGs across the samples^17,33; and (3) it used differential coverage across the samples to substantially improve our ability of recovering genomes from the metagenomes^31,34. The monolith samples enriched the respective field soil cores which harbored a long history of land precipitation. This allowed us to successfully assemble soil-associated MAGs. We assigned the assembled microbial genomes as MAGs if they had a completion ≥70% and a redundancy <10%. Of all the assembled genomes, 5 MAGs had a completion of 100%; two of the five MAGs had a redundancy of 5.6%, with the rest having a redundancy of 1.4%, 1.4%, and 2.8% respectively (Supplementary Table S1). Of the ~ 42 ± 21 million metagenomic reads, an average of ~ 39 ± 19 million reads per sample passed quality control criteria, and were used for the 4 co-assemblies. The average total length of the MAGs was ~ 3 ± 1.4 Mbp, with the average number of contigs being ~ 464 ± 360. The average N50 was ~ 15991 ± 20615, and the average GC content was ~ 63 ± 8% (Supplementary Table S1).

In this study, we resolved a large number of high-quality MAGs to the phyla Acidobacteriota (n = 96), Actinobacteriota (n = 271), and Proteobacteria (n = 105). The other MAGs were resolved to Bacteroidota (n = 1), Chloroflexota (n = 16), CSP1-3 (n = 1), Desulfobacterota (n = 10), Dormibacterota (n = 3), Eremiobacterota (n = 1), Firmicutes (n = 1), Gemmatimonadota (n = 17), Halobacteriota (n = 1), Methylomirabilota (n = 31), Myxococcota (n = 5), Nitrospirota (n = 7), Planctomycetota (n = 1), Tectomicrobia (n = 2), Thermoproteota (n = 42), Verrucomicrobiota (n = 36) (Fig. 2). There were 32 MAGs which were unidentified. There were 457 MAGs annotated to the genus level, with the highest number of MAGs annotated to the genera AV55 (n = 28) and Methyloceanibacter (n = 17). 395 MAGs were completed to the species level.

We observed that MAGs were more highly detected in the monolith as compared to the field samples, which highlighted the enrichment of the field samples in the monoliths (Fig. 3). Our results also highlighted the influence of locations and depth of sampling on the detection of the MAGs. Most of the MAGs were more detected in Eastern Kansas where the precipitation level is higher than the locations in Western Kansas. Furthermore, we also observed higher detection of the MAGs in the shallow region (5 cm) as compared to the deeper regions (15 cm and 30 cm) (Fig. 3; Supplementary Table S2).

We used Distilled and Refined Annotation of Metabolism (DRAM) to profile and compare the metabolic potentials of the MAGs. We observed that around 80% of the MAGs had enzymes capable of breaking down chitin, and ~ 75% of the MAGs contain enzymes that can metabolize starch (Fig. 4, Supplementary Table S3). We also observed that 17 MAGs possessed beta-galactan cleaving enzymes, while none of the MAGs could potentially degrade mucin. It was surprising to see that only ~25% of the MAGs contained enzymes necessary to convert nitrite to nitric oxide, while only 4 of 679 MAGs could convert nitrogen to ammonia. Based on the metabolic potential profiles, it appeared that the MAGs were more likely to be involved in denitrification rather than the nitrification process. Putting it all together, we showed that precipitation had a strong impact on the MAGs’ detection as well as their metabolic potentials, across the two locations.

Our findings provided insights into further understanding of how the soil microbial community, their essential functions and metabolism^35,36 would be affected due to precipitation change and land use regime^37,38. The availability of these high-resolution MAGs from this study would also provide the framework for exploring microbe-microbe interactions and microbial functional shifts under abiotic stresses^39,40.

Methods

Experimental design and sample preparation

We used the Giddings probe (Giddings Machine Company, Windsor, CO, USA) to collect field soil samples (n = 3; depth 60 cm, diameter 5 cm) in June 2018 from 2 sites across the precipitation gradient (Fig. 1). The sites in Western (WKS) experience an average of 533 mm/year precipitation, and Eastern (EKS) Kansas average rainfall of 1045 mm/year⁴¹. Within the 3 plots at each precipitation site, we collected 1 soil core from 3 subplots. Soil collecting plots represented the land-use legacies: WKS (native/N: 38.84°N, 99.30°W, agricultural/Ag: 38.84°N, 99.31°W, and post-agricultural/PAg: 38.84°N, 99.32°W), and EKS (native/N: 38.18°N, 95.27°W, agricultural/Ag: 38.54°N, 95.25°W, and post-agricultural/PAg: 38.18°N, 95.27°W). Since we were interested in the representative samples of the land use rather than differences within plots, we then pooled and homogenized the soil cores to create 1 composite sample per plot. In this report, we were also interested in the gene distribution across the precipitation gradient. We aliquoted 25 g of soil for each depth segment (0–5 cm, 6–30 cm, 31–60 cm) for shotgun sequencing, and the rest of the samples were stored at −80 °C for archiving.

From August to October 2018, we collected monolith samples for the enrichment of soil microbial communities⁴². We collected intact soil cores (depth 60 cm, diameter 30 cm) from the same sites described above. We dried and stored monolith samples in the greenhouse (University of Kansas Field Station) until further processing. The monolith experiment was set up in April 2019 and continued for 6 months. Each monolith soil sample was rewetted and randomly assigned to the “dry” and “wet” watering treatment groups. The amount of water per treatment was determined by averaging the 30 years of EKS and WKS annual rainfall data and adding 50% of the total to account for the greenhouse conditions. As a result, the “dry” treatment group received 1000 mm/yr and the “wet” treatment reserved 2000 mm/yr. We also mimicked the summer rainstorm, by applying 450 mm/yr of each treatment in three 150 mm intense watering events. In November 2019, we harvested monolith samples, and segmented them according to the depth (0–5 cm, 6–30 cm, 31–60 cm) for shotgun sequencing, and the rest of the samples were stored at −80 °C for archiving.

Field soil samples were homogenized, and total genomic DNA was extracted using the DNeasy PowerSoil Pro Kit following the manufacturer’s protocol (Qiagen, Germantown, MD, USA). We extracted total DNA from 0.160 g of monolith soil samples using an Omega E.Z.N.A. Soil DNA Kit (Omega Biotek, Inc., Norcross, GA, United States) as per the manufacturer’s protocol with a slight modification. We mechanically lysed the cells in a Qiagen TissueLyser II (Qiagen, Hilden, Germany) including bead-beating (20 rev/s, 2 mins) and vortexing (3 mins) prior to downstream DNA extraction steps. The extracted DNA from both kits was eluted to a 100-μL final volume. Extracted DNA was sequenced on the Illumina NovaSeq. 6000 platform (Illumina, San Diego, CA, USA) at the University of Kansas Medical Center Genome Sequencing Facility. We used an S1 flow cell to undertake a 150-paired sequencing strategy using Nextera DNA Flex library preparation.

Bioinformatics

We co-assembled reads from 12 “metagenomic sets” - 6 from the field and 6 from the monolith soils based on the geographical locations (WKS, EKS) and their soil depths (0–5 cm, 6–30 cm, 31–60 cm). We automated our metagenomics bioinformatics workflow using the ‘anvi-run-workflow’⁴³ in anvi’o v7.1^44,45. The workflows use Snakemake⁴³ to implement numerous tasks. Details of each process as outlined below:

We used the program ‘iu-filer-quality-minoche’ to process the short metagenomic reads and removed low-quality reads accordingly⁴⁶. We used MEGAHIT v1.2.9³¹ to co-assemble quality-filtered short reads into longer contiguous sequences (contigs). Following the assembling of contigs, we used ‘anvi-gen-contigs- database’ to compute k-mer frequencies and identify open reading frames (ORFs) using Prodigal v2.6.3⁴⁷; ‘anvi-run-hmms’ to identify sets of bacterial and archaeal single-copy core genes using HMMER v.3.2.1⁴⁸; ‘anvi-run-ncbi-cogs’ to annotate ORFs from NCBI’s Clusters of Orthologous Groups (COGs)⁴⁹; and ‘anvi-run-kegg-kofams’ to annotate ORFs from KOfam HMM databases of KEGG orthologs⁵⁰. Next, we mapped metagenomic short reads to contigs using Bowtie2 v2.3.5⁵¹ and profiled the BAM files using ‘anvi-profile’ with a minimum contig length of 1000 bp. Finally, we used ‘anvi-merge’ to combine all profiles into an anvi’o merged profile for all downstream analyses. For the construction of metagenome-assembled genomes (MAGs), we first used ‘anvi-cluster-contigs’ to group contigs into initial bins using CONCOCT v1.1.0⁵², and used ‘anvi-refine’ to manually curate the bins based on tetranucleotide frequency and different coverage across the samples. We marked bins that were more than 70% complete and less than 10% redundant as MAGs. The completion and redundancy values of the MAGs were based on matching single copy genes in the MAGs with multiple hidden Markov models (HMM): Bacteria_71⁵³, Archaea_76⁵⁴, and Protista_83. Finally, we used ‘anvi-compute-genome-similarity’ to calculate the average nucleotide identity (ANI) of our MAGs using PyANI v0.2.9⁵⁵, and identified non-redundant MAGs.

We annotated the non-redundant MAGs with Distilled and Refined Annotation of Metabolism (DRAM), to provide a metabolic profile for each of the MAGs⁵⁶. We used Prodigal⁴⁷ v.2.6.3 to detect open reading frames (ORFs) and predict their amino acid sequences. We then used DRAM to search for all amino acid sequences against KEGG⁵⁷, UniRef90⁵⁸, and MEROPS⁵⁹ using MMseqs. 2⁶⁰, with the best hits (defined by bit score with a default minimum threshold of 60). We also used DRAM to perform HMM profile searches of the Pfam⁶¹ database, and HMMER3⁶² for dbCAN⁶³, with coverage length > 35% of the model and e-value < 10–15 to be considered a hit 40.

Data Records

The shotgun metagenome reads as well as the sequences for the MAGs generated from this study are publicly available on the NCBI Sequence Read Archive (SRA) under BioProject PRJNA855256, SRR20019782-20019834⁶⁴. All other analyzed data in the form of databases and fasta files are also accessible in github (https://github.com/SonnyTMLee/Recovery-of-679-soil-MAGs)⁶⁵, and also available on NCBI GenBank⁶⁶ - see Supplementary Table 1 for the individual genome (MAGs) accession numbers.

Technical Validation

All data processing steps, and software used in this study are described in the “Methods” section. The DNA yield and quality were measured using the NanoDrop One C (NanoDrop Technologies Inc, Wilmington, DE, USA) and a Qubit 4 Fluorometer dsDNA BR Assay Kit (Life Technologies, Paisley, UK).

Code availability

All intermediate databases, files and MAGs fasta are available in github (https://github.com/SonnyTMLee/Recovery-of-679-soil-MAGs)⁶⁵. Bioinformatic codes to generate the MAGs are also available in github.

References

Ramoneda, J. et al. Building a genome-based understanding of bacterial pH preferences. Sci Adv 9, eadf8998 (2023).
Article CAS PubMed PubMed Central Google Scholar
Hoorman, J. J. The role of soil bacteria. Ohio State University Extension, Columbus 1–4 (2011).
Nguyen, T. M. et al. Effective Soil Extraction Method for Cultivating Previously Uncultured Soil Bacteria. Appl Environ Microbiol 84 (2018).
Lloyd, K. G., Steen, A. D., Ladau, J., Yin, J., Crosby, L. Phylogenetically Novel Uncultured Microbial Cells Dominate Earth Microbiomes. mSystems 3 (2018).
Hibbing, M. E., Fuqua, C., Parsek, M. R. & Peterson, S. B. Bacterial competition: surviving and thriving in the microbial jungle. Nat Rev Microbiol 8, 15–25 (2009).
Article Google Scholar
Butaitė, E., Baumgartner, M., Wyder, S. & Kümmerli, R. Siderophore cheating and cheating resistance shape competition for iron in soil and freshwater Pseudomonas communities. Nat Commun 8, 414 (2017).
Article ADS PubMed PubMed Central Google Scholar
Waschulin, V. et al. Biosynthetic potential of uncultured Antarctic soil bacteria revealed through long-read metagenomic sequencing. ISME J 16, 101–111 (2022).
Article CAS PubMed Google Scholar
Rondon, M. R. et al. Cloning the soil metagenome: a strategy for accessing the genetic and functional diversity of uncultured microorganisms. Appl Environ Microbiol 66, 2541–2547 (2000).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Hugenholtz, P., Tyson, G. W. Metagenomics. Nature Publishing Group UK. https://doi.org/10.1038/455481a. Retrieved 21 November 2023 (2008).
Sorbara, M. T. et al. Functional and Genomic Variation between Human-Derived Isolates of Lachnospiraceae Reveals Inter- and Intra-Species Diversity. Cell Host Microbe 28, 134–146.e4 (2020).
Article CAS PubMed PubMed Central MATH Google Scholar
Almeida, A. et al. A new genomic blueprint of the human gut microbiota. Nature 568, 499–504 (2019).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Tyson, G. W. et al. Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature 428, 37–43 (2004).
Article ADS CAS PubMed MATH Google Scholar
Nelkner J et al. Effect of Long-Term Farming Practices on Agricultural Soil Microbiome Members Represented by Metagenomically Assembled Genomes (MAGs) and Their Predicted Plant-Beneficial Genes. Genes 10 (2019).
Chen, C. et al. Expanded catalog of microbial genes and metagenome-assembled genomes from the pig gut microbiome. Nat Commun 12, 1–13 (2021).
ADS Google Scholar
Wilkins, L. G. E., Ettinger, C. L., Jospin, G. & Eisen, J. A. Metagenome-assembled genomes provide new insight into the microbial diversity of two thermal pools in Kamchatka, Russia. Sci Rep 9, 3059 (2019).
Article ADS PubMed PubMed Central Google Scholar
Nathani, N. M. et al. 309 metagenome assembled microbial genomes from deep sediment samples in the Gulfs of Kathiawar Peninsula. Sci Data 8, 194 (2021).
Article PubMed PubMed Central Google Scholar
Alteio LV et al. Complementary Metagenomic Approaches Improve Reconstruction of Microbial Diversity in a Forest Soil. mSystems 5 (2020).
Geisen, S. et al. A methodological framework to embrace soil biodiversity. Soil Biol Biochem 136, 107536 (2019).
Article CAS MATH Google Scholar
Fierer, N. et al. Metagenomic and small-subunit rRNA analyses reveal the genetic diversity of bacteria, archaea, fungi, and viruses in soil. Appl Environ Microbiol 73, 7059–7066 (2007).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Curtis, T. P., Sloan, W. T. & Scannell, J. W. Estimating prokaryotic diversity and its limits. Proc Natl Acad Sci USA 99, 10494–10499 (2002).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
White RA et al Moleculo Long-Read Sequencing Facilitates Assembly and Genomic Binning from Complex Soil Metagenomes. mSystems 1 (2016).
Jin, H. et al. Hybrid, ultra-deep metagenomic sequencing enables genomic and functional characterization of low-abundance species in the human gut microbiome. Gut Microbes 14, 2021790 (2022).
Article PubMed PubMed Central Google Scholar
Riley, R. et al. Terabase-Scale Coassembly of a Tropical Soil Microbiome. Microbiol Spectr 11, e0020023 (2023).
Article PubMed Google Scholar
Sun, Y. et al. Genome enrichment of rare and unknown species from complicated microbiomes by nanopore selective sequencing. Genome Res 33, 612–621 (2023).
Article CAS PubMed PubMed Central MATH Google Scholar
Jiao, S., Xu, Y., Zhang, J., Hao, X., Lu, Y. Core Microbiota in agricultural soils and their potential associations with nutrient cycling. mSystems 4 (2019).
Yuan, Q. et al. Soil bacterial community mediates the effect of plant material on methanogenic decomposition of soil organic matter. Soil Biol Biochem 116, 99–109 (2018).
Article CAS MATH Google Scholar
Kumar, A., Singh, S., Mukherjee, A., Rastogi, R. P. & Verma, J. P. Salt-tolerant plant growth-promoting Bacillus pumilus strain JPVS11 to enhance plant growth attributes of rice and improve soil health under salinity stress. Microbiol Res 242, 126616 (2021).
Article CAS PubMed Google Scholar
Zuo, X. et al. Contrasting relationships between plant-soil microbial diversity are driven by geographic and experimental precipitation changes. Sci Total Environ 861, 160654 (2023).
Article CAS PubMed MATH Google Scholar
Fahey, C., Koyama, A., Antunes, P. M., Dunfield, K. & Flory, S. L. Plant communities mediate the interactive effects of invasion and drought on soil microbial communities. ISME J 14, 1396–1409 (2020).
Article PubMed PubMed Central Google Scholar
Sarkar, S. et al. Pseudomonas cultivated from Andropogon gerardii rhizosphere show functional potential for promoting plant host growth and drought resilience. BMC Genomics 23, 784 (2022).
Article PubMed PubMed Central MATH Google Scholar
Li, D., Liu, C.-M., Luo, R., Sadakane, K. & Lam, T.-W. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31, 1674–1676 (2015).
Article CAS PubMed Google Scholar
Li, D. et al. MEGAHIT v1.0: A fast and scalable metagenome assembler driven by advanced methodologies and community practices. Methods 102, 3–11 (2016).
Article CAS PubMed MATH Google Scholar
Nelkner, J. et al. Abundance, classification and genetic potential of Thaumarchaeota in metagenomes of European agricultural soils: a meta-analysis. Environ Microbiome 18, 26 (2023).
Article CAS PubMed PubMed Central Google Scholar
Cleary, B. et al. Detection of low-abundance bacterial strains in metagenomic datasets by eigengenome partitioning. Nat Biotechnol 33, 1053–1060 (2015).
Article CAS PubMed PubMed Central MATH Google Scholar
Nascimento Lemos, L. et al. Metagenome assembled-genomes reveal similar functional profiles of CPR/Patescibacteria phyla in soils. Environ Microbiol Rep 12, 651–655 (2020).
Article CAS PubMed Google Scholar
Li, Z. et al. Genome-Resolved Proteomic Stable Isotope Probing of Soil Microbial Communities Using 13CO2 and 13C-Methanol. Front Microbiol 10, 2706 (2019).
Article PubMed PubMed Central MATH Google Scholar
Bei, Q. et al. Extreme summers impact cropland and grassland soil microbiomes. ISME J 17, 1589–1600 (2023).
Article CAS PubMed PubMed Central MATH Google Scholar
Nelson, A. R. et al. Wildfire-dependent changes in soil microbiome diversity and function. Nat Microbiol 7, 1419–1430 (2022).
Article CAS PubMed PubMed Central MATH Google Scholar
Braga, L. P. P. et al. Genome-resolved metagenome and metatranscriptome analyses of thermophilic composting reveal key bacterial players and their metabolic interactions. BMC Genomics 22, 652 (2021).
Article CAS PubMed PubMed Central Google Scholar
Bulgarelli, D. et al. Structure and function of the bacterial root microbiota in wild and domesticated barley. Cell Host Microbe 17, 392–403 (2015).
Article CAS PubMed PubMed Central MATH Google Scholar
Galliart, M. et al. Local adaptation, genetic divergence, and experimental selection in a foundation grass across the US Great Plains’ climate gradient. Glob Chang Biol 25, 850–868 (2019).
Article ADS PubMed MATH Google Scholar
Sarkar, S. et al. Ammonia-oxidizing archaea and bacteria differentially contribute to ammonia oxidation in soil under precipitation gradients and land legacy. bioRxiv https://doi.org/10.1101/2023.11.08.566028 (2023).
Shaiber, A. et al. Functional and genetic markers of niche partitioning among enigmatic members of the human oral microbiome. Genome Biol 21, 292 (2020).
Article PubMed PubMed Central Google Scholar
Eren, A. M. et al. Community-led, integrated, reproducible multi-omics with anvi’o. Nat Microbiol 6, 3–6 (2021).
Article CAS PubMed PubMed Central MATH Google Scholar
Eren, A. M. et al. Minimum entropy decomposition: unsupervised oligotyping for sensitive partitioning of high-throughput marker gene sequences. ISME J 9, 968–979 (2015).
Article CAS PubMed MATH Google Scholar
Minoche, A. E., Dohm, J. C. & Himmelbauer, H. Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and genome analyzer systems. Genome Biol 12, R112 (2011).
Article CAS PubMed PubMed Central Google Scholar
Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11, 119 (2010).
Article PubMed PubMed Central MATH Google Scholar
DIE Himmers – Startseite. http://himmer.org. Retrieved 2 January (2024).
COG - NCBI. https://www.ncbi.nlm.nih.gov/research/cog (2024). Retrieved 2 January.
Kanehisa, M. & Goto, S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28, 27–30 (2000).
Article CAS PubMed PubMed Central MATH Google Scholar
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat Methods 9, 357–359 (2012).
Article CAS PubMed PubMed Central MATH Google Scholar
Alneberg, J. et al. Binning metagenomic contigs by coverage and composition. Nat Methods 11, 1144–1146 (2014).
Article CAS PubMed MATH Google Scholar
Hug, L. A. et al. A new view of the tree of life. Nat Microbiol 1, 16048 (2016).
Article CAS PubMed MATH Google Scholar
Lee, M. D. GToTree: a user-friendly workflow for phylogenomics. Bioinformatics 35, 4162–4164 (2019).
Article CAS PubMed PubMed Central MATH Google Scholar
Pritchard, L., Glover, R. H., Humphris, S., Elphinstone, J. G. & Toth, I. K. Genomics and taxonomy in diagnostics for food security: soft-rotting enterobacterial plant pathogens. Anal Methods 8, 12–24 (2015).
Article Google Scholar
Shaffer, M. et al. DRAM for distilling microbial metabolism to automate the curation of microbiome function. Nucleic Acids Res 48, 8883–8900 (2020).
Article CAS PubMed PubMed Central MATH Google Scholar
Kanehisa, M., Goto, S., Furumichi, M., Tanabe, M. & Hirakawa, M. KEGG for representation and analysis of molecular networks involving diseases and drugs. Nucleic Acids Res 38, D355–60 (2010).
Article CAS PubMed Google Scholar
Suzek, B. E., Huang, H., McGarvey, P., Mazumder, R. & Wu, C. H. UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics 23, 1282–1288 (2007).
Article CAS PubMed MATH Google Scholar
Rawlings, N. D. et al. The MEROPS database of proteolytic enzymes, their substrates and inhibitors in 2017 and a comparison with peptidases in the PANTHER database. Nucleic Acids Res 46, D624–D632 (2018).
Article CAS PubMed Google Scholar
Mirdita, M., Steinegger, M. & Söding, J. MMseqs. 2 desktop and local web server app for fast, interactive sequence searches. Bioinformatics 35, 2856–2858 (2019).
Article CAS PubMed PubMed Central Google Scholar
Mistry, J. et al. Pfam: The protein families database in 2021. Nucleic Acids Res 49, D412–D419 (2021).
Article CAS PubMed MATH Google Scholar
Wheeler, T. J. & Eddy, S. R. nhmmer: DNA homology search with profile HMMs. Bioinformatics 29, 2487–2489 (2013).
Article CAS PubMed PubMed Central MATH Google Scholar
Yin, Y. et al. dbCAN: a web resource for automated carbohydrate-active enzyme annotation. Nucleic Acids Res 40, W445–51 (2012).
Article CAS PubMed PubMed Central Google Scholar
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRP385305
Kazarina, A. et al. Recovery of 679 metagenome-assembled genomes from different soil depths along a precipitation gradient. GitHub https://github.com/SonnyTMLee/Recovery-of-679-soil-MAGs
Kazarina, A. et al. Recovery of 679 metagenome-assembled genomes from different soil depths along a precipitation gradient. GenBank https://identifiers.org/ncbi/bioproject:PRJNA855256

Download references

Acknowledgements

The work is supported by the National Science Foundation under Award No. OIA-1656006 and matching support from the State of Kansas through the Kansas Board of Regents. This study was also funded by the United States Department of Agriculture, National Institute of Food and Agriculture (USDA NIFA), under the Award Number: 2020-67019-31803. This work is also supported by the National Science Foundation under Award No. 2238633. We are thankful to the National Institutes of Health (NIH) grant support - Kansas Intellectual and Developmental Disabilities Research Center (NIH U54 HD 090216), the Molecular Regulation of Cell Development and Differentiation – COBRE (P30 GM122731-03) - the NIH S10, High-End Instrumentation Grant (NIH S10OD021743) and the Frontiers CTSA grant (UL1TR002366), at the University of Kansas Medical Center, Kansas City, KS 66160. We appreciate the Kansas Medical Center Genomics Core for the metagenomic data generated. We are thankful for the help extended by Paige M. Hansen, Willow Kessler, Terry D. Loecke, Benjamin A. Sikes in the coordination of the collection of samples.

Author information

These authors contributed equally: Anna Kazarina, Hallie Wiechman.

Authors and Affiliations

Division of Biology, Kansas State University, Manhattan, Kansas, USA
Anna Kazarina, Hallie Wiechman, Tanner Richie & Sonny T. M. Lee
Center for Fundamental and Applied Microbiomics, Biodesign Institute, Arizona State University, Tempe, Arizona, USA
Soumyadev Sarkar

Authors

Anna Kazarina
View author publications
Search author on:PubMed Google Scholar
Hallie Wiechman
View author publications
Search author on:PubMed Google Scholar
Soumyadev Sarkar
View author publications
Search author on:PubMed Google Scholar
Tanner Richie
View author publications
Search author on:PubMed Google Scholar
Sonny T. M. Lee
View author publications
Search author on:PubMed Google Scholar

Contributions

A.K. and H.W. performed the data analysis and wrote the manuscript. S.T.M.L. and S.S. conceptualized and designed the study. S.T.M.L. collected samples. S.S. performed the monolith experiments and extracted DNA for the Illumina sequencing. S.T.M.L. supervised the study. All authors reviewed and approved of this manuscript.

Corresponding author

Correspondence to Sonny T. M. Lee.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Table S2

Supplementary Table S3

Supplementary Table S1

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Kazarina, A., Wiechman, H., Sarkar, S. et al. Recovery of 679 metagenome-assembled genomes from different soil depths along a precipitation gradient. Sci Data 12, 521 (2025). https://doi.org/10.1038/s41597-025-04884-2

Download citation

Received: 20 March 2024
Accepted: 24 March 2025
Published: 28 March 2025
Version of record: 28 March 2025
DOI: https://doi.org/10.1038/s41597-025-04884-2

This article is cited by

A genomic view of Earth’s biomes
- Gitta Szabó
- Emiley A. Eloe-Fadrosh
- Tanja Woyke
Nature Reviews Genetics (2026)
Bioactive molecules unearthed by terabase-scale long-read sequencing of a soil metagenome
- Ján Burian
- Robert E. Boer
- Sean F. Brady
Nature Biotechnology (2025)