Backgroud & Summary

The Qaidam Basin, located in the northeast part of the Tibetan Plateau, is one of the driest (average annual precipitation < 45 mm) and highest (average elevation ~2800 m) deserts on Earth1,2. It is characterized by polyextreme conditions, including hyperaridity, low temperatures, intense solar radiation, and hypersaline soils, making it a unique and representative Mars-analog site3,4,5. The western Qaidam Basin (arid index: 0.01–0.05) is the most arid region of the desert1. Mars-like geomorphological features (e.g., yardang, gully, playa, and dune) and evaporites (e.g., sulfate and chloride) are widespread in this region due to prolonged drought and intense aeolian erosion processes3,5.

In hyperarid deserts with low biomass and biodiversity, such as the Qaidam Basin6,7, the Antarctic Desert8, and the Atacama Desert9, microbial life plays an essential role in biogeochemical cycling and biogeological processes10. The ability of microorganisms to survive and thrive in these extreme environments expands our understanding of the limits of life and provides implications for potential extraterrestrial life11. Moreover, extremophiles in desert ecosystems represent largely unexplored reservoirs of biological and genetic resources12,13. However, the vast majority of microorganisms in hyperarid deserts remain uncultured and uncharacterized, commonly referred to as “microbial dark matter”10. In contrast to microbial research, studies on desert soil viruses and their interactions with microbiomes are nascent despite viruses playing vital roles in shaping microbial community structure and function14. Previous studies have revealed that viral auxiliary metabolic genes (AMGs) related to stress tolerance may enhance host adaptation and resilience in extreme environments such as the Atacama Desert15 and the Antarctic Desert16. Due to the challenges in isolating and culturing viruses, the viral communities in hyperarid deserts remain poorly characterized17,18. Nonetheless, the rapid progress of metagenomics, along with the development of bioinformatics tools, standardized protocols, and reference databases, have enabled the recovery of metagenome-assembled genomes, facilitating the identification of uncultured microbes and viruses and offering insights into their diversity and ecological roles in desert ecosystems.

In this study, we reconstructed 1,773 mMAGs and 2,060 vMAGs from 58 soil metagenomes collected from the Qaidam Basin desert across different landforms and depths (Fig. 1a,b and Table S1). All of the 1,773 mMAGs were medium-quality with completeness > 50% and contamination < 10%. Among them, 327 were classified as high-quality genomes with completeness ≥ 90% and contamination ≤ 5% (Fig. 2a)19,20. Bin length of mMAGs ranged from 0.6 Mb to 8.8 Mb, and 74.5% (n = 1,326) had GC content ≥ 60% (Table S2). Based on GTDB R22021, 94.5% (n = 1,675) of the mMAGs represent novel taxa. These novel taxa comprise 4 orders, 29 families, 501 genera, and 1,141 species (Fig. 2b). The recovered mMAGs were assigned to 31 phyla, including 27 bacterial phyla (n = 1,630) and 4 archaeal phyla (n = 143) (Fig. 2c). The bacterial and archaeal phyla with the largest numbers of MAGs were Actinomycetota (n = 565) and Halobacteriota (n = 111), respectively (Figs. 2c, 4). Among the 2,060 vMAGs, 325 were classified as high-quality (completeness ≥ 90%), and 552 as medium-quality (50% ≤ completeness < 90%) (Fig. 3a). Viral bin length ranged from 10.2 kb to 363.0 kb (Table S3). Only 43.1% (n = 887) of the vMAGs could be taxonomically classified, among which two were assigned to unknown phyla (Fig. 3b, c). Notably, the vast majority (n = 853, 96.2%) of the classified vMAGs could not be assigned at species level. The viral phylum with the largest number of MAGs was Uroviricota (n = 836) within the realm of Duplodnaviria. Other classified vMAGs belonged to Nucleocytoviricota (n = 32), Dividoviricota (n = 7), Saleviricota (n = 5), Preplasmiviricota (n = 4), and Hofneiviricota (n = 1).

Fig. 1
figure 1

Sample collection and metagenomic analysis. (a) Sampling sites in the Qaidam Basin. A total of 27 regolith samples and 31 soil samples from 5 vertical profiles were collected. (b) Schematic representation of workflow for the recovery of MAGs.

Fig. 2
figure 2

Quality and taxonomic classification of mMAGs. (a) Assessment of the completeness and contamination of 1,773 mMAGs. (b) Taxonomic novelty of mMAGs at various taxonomic levels. (c) The number of mMAGs from different prokaryotic phyla.

Fig. 3
figure 3

Quality and taxonomic classification of vMAGs. (a) Assessment of the completeness and bin length of 2,060 vMAGs. (b) The percentage of classified vMAGs. (c) The number of vMAGs from different viral phyla.

Fig. 4
figure 4

Phylogenomic tree of 1,773 mMAGs and their genomic characteristics. The tree was constructed using 400 universal marker genes. The archaeal branch was marked in the tree.

Methods

Sample collection and soil physicochemical characteristics

A total of 58 desert soil samples were collected from the Qaidam Basin. Sampling sites were primarily distributed in the western Qaidam Basin, the driest region in the basin. These samples comprised 27 surface soil samples (0–5 cm depth) and 31 subsurface soil samples (Table S1). Subsurface samples (n = 31) were collected from five vertical profiles, with 3–10 depth intervals per profile, to a maximum depth of 50 cm. At each sampling site, three replicated soil samples were aseptically collected with sterilized instruments and transferred into clean 50-mL centrifuge tubes. All samples were transported to the laboratory and stored at −80 °C until further processing. For each sample, three replicates were combined for further analysis. A Mettler Toledo DELTA 320 PH Meter (Mettler-Toledo, Switzerland) was used to measure soil pH in a 1:2 (w/v) soil-to-water slurry. The electrical conductivity (EC) of soil was measured in a 1:5 (w/v) soil-to-water mixture with a HACH HQ40D meter (HACH, USA). Soil water content was calculated based on the weight loss after drying at 105 °C for 24 hours. Total organic carbon (TOC) was measured by an Elemental Analyzer ECS 4024 (NC Technologies, Italy) using powdered soil pretreated with 3 M HCl to remove inorganic carbon. The metadata of soil physicochemical characteristics is included in Table S1.

Metagenomic sequencing and assembly

Metagenomic DNA was extracted from soil samples using a modified protocol as described in previous studies6,7. Briefly, DNA was extracted from approximately 30 g of soil using the PowerMax Soil DNA Isolation Kit (Qiagen, Hilden, Germany). The quality of extracted DNA was evaluated by agarose gel electrophoresis. Library preparation was conducted using the TruSeqTM DNA PCR-free library Prep Kit (Illumina, USA). The length of the inserted fragments was approximately 400 bp. Paired-end sequencing (Illumina NovaSeq 6000 platform, 2 × 150 bp) was conducted at Shanghai Majorbio Bio-pharm Technology (Majorbio, Shanghai, China). Raw metagenomic data were processed using fastp v1.0.122 to remove the adapter, short reads (<50 bp), and low-quality reads (quality scores < 20). Subsequently, clean metagenomic reads were additionally trimmed and quality-controlled using the “Read_qc” module of MetaWRAP v1.3.223. Filtered reads were de novo assembled by MEGAHIT v1.1.324 and contigs < 2000 bp were removed.

Microbial genome binning, taxonomic assignment, and phylogenetic analysis

Assembled contigs were binned using the “Binning module” of MetaWrap23 with MetaBAT2 v2.12.125, MaxBin2 v2.2.626, and CONCOCT v1.0.027. The resulting bins were refined using MetaWRAP’s “Bin_refinement” module (parameters: -c 50 -x 10). All bins were subsequently combined and dereplicated at 99% average nucleotide identity (ANI) using dRep v3.4.028 (parameters: -pa 0.9 -sa 0.99). The quality of MAGs was evaluated using CheckM v1.2.329 with the “lineage_wf” function, and MAGs with completeness > 50% and contamination < 10% were retained. The coverage of mMAGs was calculated by CoverM v0.7.030. Taxonomic classification of the 1,773 MAGs was performed using GTDB-Tk v2.4.021 with the GTDB Release R220 and the “classify_wf” function. Phylogenetic analysis was conducted using PhyloPhlAn v3.0.5831 with 400 universal marker genes (parameters: -d phylophlan–diversity low -f supermatrix_aa.cfg). During the phylogenetic analysis, 21 MAGs were excluded due to the detection of fewer than 100 universal marker genes, including genomes from Patescibacteria (n = 12), Thermoproteota (n = 4), Nanohaloarchaeota (n = 2), Proteobacteria (n = 2), and Thermoplasmatota (n = 1). The final phylogenetic tree was visualized and modified using the interactive Tree Of Life (iTOL v6)32.

Viral identification, genome binning, and classification

Viral sequences were identified from metagenomic assembly following the ViWrap v1.3.133 pipeline (parameters:–identify_method vb-vs–input_length_limit 5000). The intersection of the results of VIBRANT v1.2.134 and VirSorter2 v2.2.335 was retained to generate an accurate viral scaffold collection and viral scaffolds < 5,000 bp were removed. A total of 20,270 viral scaffolds were obtained from 58 samples. Subsequently, vRhyme v1.1.036 was used to bin vMAGs for each sample. A total of 2,060 vMAGs were recovered. The quality of viral genomes was evaluated using CheckV v1.0.137. The genus-level clusters were classified using vConTACT2 v0.11.038 (parameters:–rel-mode Diamond–pcs-mode MCL–vcs-mode ClusterONE), and species-level clusters were classified using dRep v3.4.028 (parameters: -pa 0.8 -sa 0.95 -nc 0.85). Viral taxonomy was assigned by ViWrap v1.3.133 against the NCBI RefSeq viral protein database39, the VOG HMM marker protein database40, and IMG/VR v4.1 high-quality vOTU representative proteins41.

Data Records

The metagenomic sequencing data have been deposited in NCBI Sequence Read Archive SRP56618042 under accession numbers SRR32481248-SRR32481298 (n = 51) and SRP33629043 under accession numbers SRR15827331-SRR15827333 (n = 3) and SRR24765285-SRR24765288 (n = 4). The retrieved mMAGs are available through NCBI BioProject PRJNA121334244 under accession numbers SAMN49651471-SAMN49653500. The mMAGs and vMAGs are also available at https://doi.org/10.5281/zenodo.1574321045.

Technical Validation

The metagenomic data were evaluated for quality using fastp22 and further quality-controlled using the “Read_qc” module of MetaWRAP v1.3.223. The binning of mMAGs and vMAGs was performed according to the pipelines of MetaWRAP v1.3.223 and ViWrap v1.3.133, respectively. The quality of mMAGs and vMAGs was assessed using CheckM v1.2.329 and CheckV v1.0.137, respectively. The mMAGs with completeness > 50% and contamination < 10% were retained for further analysis.