Abstract
Holobionts are highly organized assemblages of eukaryotic hosts, cellular microbial symbionts, and viruses, whose interactions and evolution involve complex biological processes. It is largely unknown which specific determinants drive similarity or individuality in genetic diversity between holobionts. Here, we combine short- and long-read sequencing and DNA-proximity-linkage technologies to investigate intraspecific diversity of the microbiomes, including host-resolved viruses, in individuals of a model marine sponge. We find strong impacts of the sponge host and the cellular hosts of viruses on strain-level organization of the holobiont, whereas substantial overlap in nucleotide diversity between holobionts suggests frequent exchanges of microbial cells and viruses at intrastrain level in the local sponge population. Immune-evasive arms races likely restricted virus-host co-evolution at the intrastrain level, generated holobiont-specific genome variations, and linked virus-host genetics through recombination. Our work shows that a decoupling of strain- and intrastrain-level interactions is a key factor in the genetic diversification of holobionts.
Similar content being viewed by others
Introduction
Holobionts are ecological and evolutionary entities consisting of host animals or plants and their associated communities of symbiotic microbial eukaryotes, bacteria, archaea, and viruses (collectively the microbiome)1,2. Information on the biological processes that structure microbiomes and generate genetic variability within and between holobiont individuals is essential to understanding holobiont health and evolution. In addition to processes such as ecological drift and intra-host diversification, animal hosts can structure their microbiome composition by selective processes (e.g., immune interactions, cell-cell recognition, and metabolic mutualism) or by dispersal restriction (e.g., strict vertical transmission of microbial symbionts and geographic isolation)3,4. Evaluating the relative contribution of these host-driven processes, along with the combination of non-host processes, such as horizontal symbiont transmission and genome recombination, requires careful methodological and experimental design5.
Sponges (phylum Porifera) are early-diverging sessile animals6 that can reproduce both sexually and asexually and play increasingly important roles in reef ecosystems, where corals are declining due to climate change or anthropogenic impacts7. Studies have revealed that complex communities of sponge microbiomes are often highly specific to a particular sponge species8,9,10,11, thereby implying a degree of co-evolution9,12. the existence and extent of horizontal symbiont transmission in sponge holobionts are still being actively debated. For example, previous studies reported that strict vertical transmission of microbial symbionts between sponge generations was unlikely to exist13 and that sponges shared a large component of their microbial symbionts with sediments and other reef invertebrates by horizontal transmission14. In contrast, another study identified cophylogeny between sponge hosts and microbial symbionts over their long evolutionary history, suggesting vertical transmission15. All these studies were based on sequencing of the partial 16S rRNA genes, resolving at best species-level taxonomy1. As microbial cells of the same species can vary substantially in metabolism, virulence, cell surface characteristics, and consequently their interactions with holobiont hosts16, sponge individuals may selectively host distinct strains of the same microbial species. Therefore, to resolve the ongoing debates in microbiome transmission modes in sponge holobionts, it will be critical to analyze sponge-associated microorganisms with intraspecific taxonomic resolution, such as at strain or intrastrain levels.
Like cellular microbiomes, the taxonomic composition of viromes is also generally specific to a sponge species and distinct from the ambient free-living viral communities11,17,18. The hosts of viruses (i.e., the animal/plant host plus the cellular microbial symbionts) should be a dominant factor in influencing the virome structure of a holobiont. However, due to the technical difficulties in linking viruses to their cellular hosts at the community level19, the mechanisms of virome assembly in holobionts are still largely unknown. Moreover, technological limitations in taxonomic resolution have obscured an understanding of the mode of viral transmission. For example, the extent of maternal transmission in human infant viromes was reported at various levels in studies based on viral assemblies of short sequencing reads, which provides only consensus sequences of viral populations20,21. Similarly to cellular microbiome studies, the availability of analytical tools with nucleotide resolution to study at strain or intrastrain levels may provide more robust approaches to understanding viral transmission between holobiont individuals22.
The collection of genomes from all holobiont members provides a rich pool of genetic diversity, which underpins the potential of a holobiont to adapt to environmental changes23,24. Although many mechanisms are involved, virus-host interactions are expected to be an important factor in driving intraspecific diversity of both the viruses and their cellular hosts within and between holobiont individuals24. In principle, viral activities may promote intraspecific genetic variability by selectively killing microbial strains, facilitating mutations in the virus-host arms race, and transferring, disrupting, or rearranging genes in and between microbial hosts25. Pioneering studies have demonstrated that bacteria26 and viruses26,27 in human gut microbiomes are in rapid evolution. However, it is still uncertain how much of their evolutions, especially in the generation of intraspecific diversity, are contributed by virus-host interactions.
In this study, we apply Oxford nanopore sequencing, high-throughput chromosome conformation capture (HiC) metagenomics (a DNA proximity linkage technology), and single-nucleotide variation analysis to investigate the strain- and intrastrain-level organization, interactions, and co-evolution of cellular microbial symbionts and their viruses in a subset of a local population of Cymbastela concentrica, a widely distributed and ecologically important temperate sponge in the Indo-Pacific ocean with a relatively stable microbial community28,29,30. These technologies have recently been successfully applied to recover high-quality viral genomes31, to identify virus-host interactions previously missed by clustered regularly interspaced short palindromic repeats (CRISPR)-based methods32, and to reveal bacterial and viral transmission between holobiont individuals22, respectively. We elucidate the intraspecific genetic diversity of holobiont members in individuals of C. concentrica and then test whether the genetics of the sponge host impact the microbiome structure at the strain and intrastrain levels. Host specificity and intracellular activity of sponge-associated viruses are then evaluated by revealing their infectious relationships with their cellular hosts. Finally, we investigate the level of microbial transmission and the mechanisms of virus-host interactions in generating intrastrain genetic variations within and between sponge individuals.
Results
Rich intraspecific diversity in the C. concentrica holobiont
The definitions of taxonomic ranks below species of bacteria, archaea, and viruses are still under ongoing discussion and are largely study-dependent16,22,33. For a practical purpose, we here define “strain-level diversity” as the genetic difference between bacterial, archaeal, or viral genomes sharing average nucleotide identity (ANI) higher than 95% but lower than 99%. The lower limit of 95% is taken as commonly suggested by the scientific community, including in a recently published review16, whereas the upper limit of 99% is generally recommended for distinguishing metagenome-assembled genomes (MAGs) when using Illumina short reads34. Accordingly, we define “intrastrain-level diversity” as single nucleotide polymorphism between cells or viral variants within the above defined “strain”.
From a subset of a local population of C. concentrica at Bare Island, Sydney, Australia, DNA samples of C. concentrica (CC) individuals CC1-CC7 were sequenced on the Illumina, Roche 454, and Oxford nanopore platforms, and RNA samples of individuals CC8-CC31 were sequenced on the Illumina platforms in this and our previous studies35,36,37 (Supplementary Data 1). The sequencing reads were used to reconstruct the genomes of the sponge host, the cellular microbial symbionts, and the viruses (Supplementary Fig. 1). The analysis of strain-level community composition and intrastrain-level nucleotide diversity was performed with Illumina sequencing reads of CC1-CC4 and CC8-CC28 (Supplementary Data 1).
In total, 37 bacterial and two archaeal non-redundant (pairwise ANI < 99%) MAGs (Fig. 1a and Supplementary Data 2) were generated through a customized pipeline combining short- and long-read sequences (Supplementary Fig. 1a). The average completeness and contamination of these MAGs was 80.70% ( ± 12.56%) and 0.94% ( ± 0.84%), respectively, as evaluated by CheckM38, and 80.63% ( ± 15.09%) and 2.31% ( ± 1.77%), respectively, as evaluated by CheckM239. We note that the completeness of some MAGs could have been underestimated. For example, one complete circular MAG (M411805 Nitrosopumilus) was estimated to be of only 85.04% and 65.18% completeness by CheckM and CheckM2, respectively (Supplementary Data 2).
a Phylogeny and abundance of archaeal and bacterial MAGs. Gray boxes show zero values. The maximum likelihood tree was constructed based on 40 single-copy marker proteins. The archaeal tree is rooted when the bacteria are treated as the outgroup, and the bacterial tree is rooted when the archaea are treated as the outgroup. The tree scale is shown. Branch supporting values ≥ 95% are shown as dots based on 1000 iterations of ultrafast bootstrapping of IQ-TREE. b Nucleotide diversity of the archaeal, bacterial, diatom, and sponge host MAGs. c Abundance of viral families. Gray boxes show zero values. d Nucleotide diversity of the viral families. For boxplots, data points of independent samples and the sample sizes are shown. The center lines represent median values. Box limits represent upper and lower quartiles. Whiskers represent 1.5 × interquartile range. Outliers are detected based on chi-squared scores using the R package outliers and are shown as dark red points. Source data are provided as a Source Data file.
The combination of long- and short-read sequencing enabled us to obtain a partial genome of the sponge host and a nearly complete genome of a symbiotic diatom Bacillariophyta (Fig. 1a). The other MAGs were classified into one archaeal and eight bacterial phyla. Read recovery analysis shows these MAGs were present in almost all sponge individuals but varied in their relative abundance.
Genetic analysis at a single-nucleotide resolution of holobiont members identified substantial intrastrain genetic variations in sponge individuals. In individuals, CC1-CC4, the nucleotide diversity (π) of cellular microbial symbionts ranged from 0 to 0.0065 (mean = 0.0028) (Fig. 1b and Supplementary Data 2). The highest values were observed in the bacteria M10 Rhizobiaceae (mean = 0.0052 ± 0.0016), M38 Pseudohongiellaceae (mean = 0.0037 ± 0.0013), and M11238 Alphaproteobacteria (mean = 0.0036 ± 0.0012), whereas the lowest in M45 Bdellovibrionales (mean = 0.0017 ± 0.0010), M6398 Opitutales (mean = 0.0019 ± 0.0014), and M9 Verrucomicrobiales (mean = 0.0020 ± 0.0015). Moreover, intrastrain-level nucleotide diversities of the sponge hosts (mean = 0.0024 ± 0.0009) were close to the median of the microbial symbionts (0.0029), suggesting each sponge individual contains a population of genetically diverged sponge cells (Fig. 1b and Supplementary Data 2). A similar trend was observed in the metatranscriptomes of individuals CC8-CC28, where the π of the sponge ranked 23 in all 41 MAGs (Fig. 1b).
Sequences of 15,400 viral operational taxonomic units (vOTUs) clustered based on an ANI of 99% and 66 plasmids were obtained from metagenome and metatranscriptome assemblies of sponge tissue, “microbial cell fraction” (particle size > 0.22 and < 3 µm) and “virion fraction” (particle size < 0.22 µm) samples of C. concentrica individuals (Supplementary Data 1, Supplementary Data 3, and Supplementary Fig. 1b). Completeness assessment using CheckV40 found that 0.3% of vOTUs were complete, 1.3% were of high completeness, 0.9% were of medium completeness, and 10.3% were of low completeness. However, the completeness for the majority of vOTUs (87.1%) failed to be determined by CheckV. This proportion is much higher than those of seawater (3.3%)41 but close to those of deep-sea sediments (73.2%)42. Thus, like the deep-sea sediments, the C. concentrica holobiont may possess a substantial diversity of newly discovered viruses, whose completeness was difficult to evaluate with the current CheckV database. Supporting this hypothesis, only 933 (6.1%) vOTUs could be annotated to viral families. Double-strand DNA (dsDNA) viruses were the most abundant viral type, followed by single-strand DNA (ssDNA) and RNA viruses (Fig. 1c and Supplementary Data 3). The abundance of plasmids was generally low, ranging from 0 to 6.55 reads per kilobase per million mapped reads (RPKM) (Supplementary Data 3), consistent with previous reports on sponge holobionts43. The most abundant vOTUs were present in all sponge individuals with different relative abundances (Supplementary Fig. 2).
The intrastrain nucleotide diversities of vOTUs ranged from 0 to 0.1 (mean = 0.0032) in the metagenome of the microbial cell fractions of CC1-CC4, 0 to 0.16 (mean = 0.0019) in the metagenome of the virion fractions of CC1-CC3, and 0 to 0.16 (mean = 0.0009) in the metatranscriptome of the sponge tissue of CC8-CC28 (Supplementary Data 3). Such a level of intrastrain nucleotide diversity is even higher than the previously reported intraspecific nucleotide diversity of seawater viruses (approximately mean = 0.0004)44, suggesting faster evolutionary rates in sponge-associated viruses. The highest π values were observed in the viral families Podoviridae-like (single value = 0.0054), Pithoviridae (mean = 0.0048 ± 0.0008), and Pithoviridae (mean = 0.0039 ± 0.0064), whereas the lowest values were observed in the families Lipothrixviridae-like (mean = 0.0015 ± 0.0010), Caliciviridae-like (mean = 0.0002 ± 0.0002) and Siphoviridae-like (mean = 0.0002 ± 0.0006) in the metagenomes of the microbial cell fractions of CC1-CC4, the metagenomes of the virion fractions of CC1-CC3, and the metatranscriptome of the sponge tissue of CC8-CC28, respectively (Fig. 1d).
High strain-level host-specificity and restricted interspecific gene transfer in viruses associated with C. concentrica
Metagenomic HiC is an approach to retrieve physical proximity between DNA fragments and has the potential of linking extrachromosomal DNA like viruses and plasmids to their cellular hosts45. We identified 186 potential virus-host interactions based on their DNA proximity (140, through HiC-read pairing of individuals CC1-CC3), immune memory (33, through CRISPR spacer-protospacer matching), and/or provirus insertion (15, through virus-provirus matching) (Fig. 2 and Supplementary Data 4). No plasmid could be linked to any MAG, possibly due to the low abundance of plasmids found in the data (Supplementary Data 3). Substantial variation in the numbers and types of viral interactions was found among the MAGs — some MAGs and viruses were mainly linked through spacer-protospacer matching (e.g., M411805 Nitrosopumilus), whereas others by virus-provirus matching (e.g., M6398 Opitutales), or solely by HiC read pairing (e.g., M18 Mariniblastus). With two exceptions, all virus-host interactions were exclusively detected by only one method, highlighting the capability of the HiC approach in capturing virus-host interactions that are not immunized by CRISPR-Cas systems or in the form of proviruses. In addition, HiC was effective at discovering viruses of eukaryotic hosts, including the sponge host and the diatom, which are difficult to predict from metagenome sequences using only bioinformatic tools46.
Virus-host links based on CRISPR, provirus, and HiC evidence are shown as lines in different colors connecting MAGs and vOTUs. HiC-links are retrieved from the metagenomes of the tissue of individual CC1 and the microbial cell fractions of individuals CC1-CC3. The thickness of a HiC link reflects the number of HiC read pairs supporting the link. Genome recombination events between MAGs and vOTUs are shown as gray lines. Two MAGs with ANI > 95% and < 99% are connected by spiral lines.
Whereas the sponge host and many cellular microbial symbionts were infected by multiple viruses of different species (ANI < 95%), 94.9% (167 out of 176) of viruses each only infected one host strain (ANI < 99%) and 98.3% (173 out of 176) each only infected one host species (ANI < 95%), indicating a high degree of host-specificity for viruses in the C. concentrica holobiont (Fig. 2). Exceptions included a virus V957 linked to M10 and M11238, which both belonged to Alphaproteobacteria but shared a low genome identity (average amino acid identity, AAI = 58.5%), and V1226 and V6338, which were linked to two and three Verrucomicrobiota MAGs, respectively (AAI < 71.0%). Another six vOTUs were linked to two closely related Acidimicrobiales MAGs (AAI = 97.5%). Consistent with the paucity of broad-host-range viruses, we found very few (43) possible horizontal gene transfer (HGT) events between bacterial symbionts in the C. concentrica holobiont (Supplementary Fig. 3 and Supplementary Data 5). This HGT frequency was much less than what was recently reported in the seaweed surface biofilm using the same approach (i.e., 1.1 versus 2.1 transfers per sponge microbial symbiont versus seaweed microbial symbiont)47.
High intracellular viral activity and potential chronic infection
Many host-linked viruses of the C. concentrica holobionts maintained high intracellular abundance and activity showing potential chronic infection features. We used the read coverage of a virus in the tissue of CC1 relative to its bacterial or archaeal host in the tissue of CC1 (VT/HT ratio) to identify active viruses. We also used the read coverage of a virus in the microbial cell fractions of individuals CC1-CC4 relative to its bacterial or archaeal host in the microbial cell fractions of individuals CC1-CC4 (VC/HC ratio) to specify intracellularly active viruses. In total, 62.0% (67 out of 108) of viruses linked to bacteria or archaea were found either to be “active” (VT/HT > 2) or “intracellularly active” (VC/HC > 2) (Supplementary Fig. 4 and Supplementary Data 4). These proportions were approximately seven times higher than that reported in gut microbiomes45,48, suggesting intensive and ongoing microbe-virus interactions in the sponge holobiont. Of these, 28.7% (31 out of 108) were determined to be intracellularly active in at least one of the individuals CC1-CC4. The intracellular activity of viruses was generally supported by their transcriptional level in individuals CC8-CC28 (Supplementary Fig. 4 and Supplementary Data 4). It is likely that these viruses were maintaining a high number of genome copies in host cells without packing and releasing virions, indicating chronic infection. 58.8% (20 out of 34) links between intracellularly active viruses and their hosts were based on CRISPR-matching, whereas only 15.1% (13 out of 86) links between non-intracellularly active viruses and their hosts were based on this method. This result implies that viruses with high intracellular genome copies are more likely to be captured by CRISPR-Cas adaptative immune systems than those maintaining low intracellular copies.
Host impacts on the strain-level compositions of the C. concentrica microbiomes
The individuals of C. concentrica investigated here contained structured genetic backgrounds. A pattern like inter-specific phylosymbiosis (i.e., that the “microbial community relationships recapitulate the phylogeny of their host”49) between the sponge genetics and the strain-level compositions of sponge host viromes and cellular microbiomes was observed, respectively (Fig. 3a). Specifically, Illumina metagenomic sequences of individuals CC1-CC4 were used to extract the gene sequences of ribosomal protein S6e of the sponge host (Supplementary Fig. 5). Phylogenetic analysis of this protein shows that CC1 and CC3 were genetically closer to each other in comparison to CC2 and CC4, whereas CC4 was genetically distant from the other three individuals (Fig. 3a). The community composition (beta diversity) of viruses infecting the sponge host largely mirrored the genetic heterogeneity of the latter (Robinson-Foulds test, dist = 0, P = 0.042) indicating a strong host influence (Fig. 3a). Similarly, the clustering of CC1-CC4 based on dissimilarity of their cellular microbiome composition was consistent with the genetic relationship of their respective sponge hosts (Robinson-Foulds test, dist = 0, P = 0.042).
Illumina metagenomes of microbial cell fractions of individuals CC1-CC4 are used for analysis. a The analysis for phylosymbiosis-like patterns between sponge host genetics and community composition of the sponge host viromes (HiC-linked), the cellular microbiomes, and the viromes of cellular microbial symbionts (HiC-linked). The maximum likelihood tree of the sponge host is inferred from the consensus sequences of sponge ribosomal protein S6e in sponge individuals. The tree scale is shown. The tree is rooted to CC4 as its sequence shows the largest deviation from the others. Branch support value is based on 1000 iterations of ultrafast bootstrapping in IQ-TREE. The community compositions are clustered based on Bray-Curtis dissimilarity using the average method. The information on virus-host links is based on the results of Fig. 2. RF, Robinson-Foulds test (one-sided test). b The analysis for phylosymbiosis-like patterns between the popANI relationships of bacterial symbionts and the community compositions of their viromes. The virome compositions are clustered based on Bray-Curtis dissimilarity using the average method. RF, Robinson-Foulds test (one-sided test). c and d popANIs (c) and SGCs (d) of the sponge host viromes, cellular microbial symbionts, and their viromes between sponge individuals in comparison with the phylogenetic distance of the sponge host. For boxplots, data points of independent samples and the sample sizes are shown. The center lines represent median values. Box limits represent upper and lower quartiles. Whiskers represent 1.5 × interquartile range. Outliers are detected based on chi-squared scores using the R package outliers and are shown as dark red points. e and f Consensus trees of sample clusters based on popANIs (e) and SGCs (f) of cellular microbial symbionts (n = 33). Source data are provided as a Source Data file.
In comparison, such a consistency in sponge individual relationship was not found in the viromes of cellular microbial symbionts — individuals CC1 and CC4 were clustered together, whereas CC2 and CC3 were found similar (Robinson-Foulds test, dist = 1, P = 1, Fig. 3a). To further test the potential impact of cellular microbial symbionts on their virome compositions, we clustered sponge individuals based on the population ANIs (popANIs) or shared genome coverages (SGCs)22 of their microbial symbionts. These two values were used as a proxy for the genetic relationships of these microbial symbionts between sponge individuals. The higher the values are, the more similar the genetic background of the microbial cell populations are shared by two sponge individuals. In two Acidimicrobiales of the four studied bacterial symbionts, each with over eight HiC-linked vOTUs, their strain-level virome composition was shaped by their genetic patterns as indicated by popANI-based clustering (Robinson-Foulds test, dist = 0, P = 0.042), whereas in the other two, this influence is weaker (Fig. 3b). No significant host impact was found by SGC-based clustering (Supplementary Fig. 6).
Intrastrain-level genetic variations between sponge holobionts
We further investigated the intrastrain-level genetic diversity of the cellular microbial symbionts and viruses between sponge individuals CC1-CC4 by using the Illumina reads of their microbial cellular fractions (Supplementary Data 1). For each MAG or vOTU, which represents strain-level classification, its nucleotide variation (i.e., intrastrain-level heterogeneity) between sponge individuals was compared by calculating the popANI or SGC values.
Although it shows clear heterogeneity in phylogenetic distance between the four individuals of sponge hosts, the intrastrain nucleotide diversities of their associated microorganisms were generally highly similar, as shown by median popANI value of 1 for sponge host viruses, values between 0.99979 and 0.99992 for cellular microbial symbionts, and value of 1 for viruses of cellular microbial symbionts (Fig. 3c). The correlations between sponge host phylogenetic distance and average popANI values were not significant for sponge viruses, cellular microbial symbionts or viruses of cellular microbial symbionts (Supplementary Fig. 7a). The inconsistency between the phylogenetic distances of sponge hosts and nucleotide diversities of their microbial cells and viruses is further supported by SGC-based analysis (Fig. 3d and Supplementary Fig. 7b). These results suggest the host selection and/or isolation signals in the intrastrain-level nucleotide variation between sponge individuals were weak.
A more specific analysis identified a few cases of host impact on intrastrain-level nucleotide variation of cellular microbial symbionts or viruses. For example, in only one of the 33 analyzed cellular microbial symbionts, including the diatom, symbiont popANI reflected sponge host phylogenetics (Robinson-Foulds test, dist = 0, P = 0.042), whereas in others, such coherence was not supported (Supplementary Data 6). In SGC patterns, this ratio was five in 33. The topologies of sponge individual clusters based on the popANIs or SGCs of the 33 cellular microbial symbionts variated greatly. Thus, a consensus tree of these clusters with strong branch supports (> 80%) was not obtained (Fig. 3e, f). Furthermore, at single nucleotide resolution, the popANI patterns of nine in 35 bacteriophages reflected the popANI structures of their specific bacterial hosts (Robinson-Foulds test, dist = 0, P = 0.042) (Supplementary Data 6). In SGC patterns, this ratio is six in 35. Common topology in popANI- or SGC-based sponge individual clustering was also rare between viruses of the same bacterial host. Strong branch supports (> 80%) were only found in parts of the consensus trees for the viruses of M10 Rhizobiaceae (popANI, Supplementary Fig. 8a), M5 Acidimicrobiales and M61 Acidimicrobiales (SGC, Supplementary Fig. 8b).
The intensive immune-related arms race between holobiont members
Our previous studies found enriched defense systems against foreign DNA in sponge microbiomes, such as CRISPR-Cas and restriction-modification (R-M) systems, in comparison to planktonic microbiomes30,50. This finding was confirmed and expanded here by analyzing more recently identified defense systems, such as Wadjet, Zorya, and the cyclic-oligonucleotide-based anti-phage signaling system51,52 (Fig. 4a). The most prevalent systems were R-M, followed by CRISPR-Cas and Abi, and they were actively transcribed in sponge samples (Supplementary Fig. 9a). In total, 56 CRISPR spacer arrays were identified in bacterial and archaeal MAGs, of which eight contained 50 to 100 spacers and three contained over 100 spacers (Supplementary Data 7), indicating intensive ongoing battles between the adaptive CRISPR-Cas systems of the microbial symbionts and viruses in the C. concentrica holobiont. For example, M411805 Nitrosopumilus encoded a type I CRISPR-Cas system containing 158 spacers, to which ten vOTUs found from the C. concentrica holobiont could be matched, showing successful immunity by this CRISPR-Cas system (Fig. 2 and Supplementary Data 7). Supporting this finding, only one virus with ongoing infection was found for this archaeon by HiC-links. In comparison, M6398 Opitutales might benefit from homoimmunity by harboring proviruses that inhibit secondary infections by closely related viruses25.
a Abundance of defense systems against mobile genetic elements in archaeal and bacterial symbionts. Gray boxes show zero values. Cbass, cyclic-oligonucleotide-based anti-phage signaling system. The phylogenetic trees are the same as the ones in Fig. 1a. b Presence of proteins encoding immune- and transposase-related functions in host-linked bacterial and archaeal viruses. Detailed information is in Supplementary Data 8.
In comparison, many bacterial and archaeal viruses also encode a variety of immune system-related proteins, which might contribute to viral immune evasion or to the host’s immune ability against other viruses53,54. These genes were also actively transcribed, as shown by the metatranscriptome data (Supplementary Fig. 9b). R-M-related DNA methyltransferases or methylases were frequently encoded, especially in viruses of the abundant bacterial group Acidimicrobiales (Fig. 4b and Supplementary Data 8). These proteins might help viruses counteract the R-M systems of their bacterial hosts by protecting their DNA from restriction digestion54. In addition, viruses can encode CRISPR-Cas systems to counteract the defense island of the host55. One CRISPR-associated Cas1 and two Cas2 proteins were encoded by the virus V1128 infecting M1 Verrucomicrobiales. This virus was matched by 72 spacers of the CRISPR-Cas system of M1 and was active in individual CC1, suggesting a balanced coexisting and an ongoing arms race between immune protection by the CRISPR-Cas system of Verrucomicrobiales and immune evasion by the virus (Supplementary Fig. 4).
Holobiont-specific intrastrain genetic diversity driven by the virus-host arms race
Arms races between viruses and the immune systems of microorganisms are a primary driver of genetic variation in both viral and cellular host populations. Based on the nucleotide diversity data of individuals CC1-CC4, we found that 31 and 28 genes encoding proteins potentially involved in virus-host interactions, such as CRISPR-associated proteins, integrases, methylases, and methyltransferases were in negative frequency-dependent selection56 (maximum pN/pS ratio > 1) in prokaryotic symbionts and viruses, respectively (Supplementary Data 9). Further analyses revealed heterogeneity of nucleotide mutation and genome structure variation in prokaryotic symbionts and their viruses between sponge individuals directly caused by immune interactions. For example, site mutation is a mechanism for viruses to evade the restriction digestion by microbial hosts and consequently cause genome variations in viral populations54. We found frequent mutations in recognition sites of viral DNA targeted by host R-M systems (Fig. 5a). Of the recognition sites in host-linked viruses, 86.2% (25 out of 29) were found mutated in at least one sponge individual of CC1-CC4. These individuals also varied in the proportion of mutated recognition site loci and the degree of base mutation (Fig. 5a).
a Mutation proportion and degree at restriction recognition sites in genomes of prokaryotic host-linked viruses in the metagenomes of microbial cell fractions of individuals CC1-CC4. Mutated proportion is calculated by dividing the number of mutated loci by the number of all recognition loci found in the viruses. The mutation degree is calculated by summing up the proportion of mutated bases in all recognition sites found in viruses. Gray boxes show viruses with recognition sites having average mapped bases of less than 100, which are excluded from the analysis. b Read coverage of bacterial contigs containing CRISPR arrays in the metagenomes of microbial cell fractions of individuals CC1-CC4. The array regions are shown with a shaded background. Sequencing depth is the number of bases mapped to each site. c The Spearman correlation (two-sided test) between intrastrain nucleotide diversity of viruses and their bacterial/archaeal hosts in their entire genomes in the metagenomes of microbial cell fractions of individuals CC1-CC4. d The Spearman correlation (two-sided test) between nucleotide diversity of viruses and their bacterial/archaeal hosts in their recombination regions in the metagenomes of microbial cell fractions of individuals CC1-CC4. A sequence identity threshold of 95% is used in the mapping step of inStrain to ensure read coverage. e Genomic map of the transposable virus V957 infecting the bacterium M10 Rhizobiaceae and M11238 Alphaproteobacteria. f Genome recombination regions (shaded sections) between V63657 and M6 Arenicellales, V36967 and M6398 Opitutales, and V4473 and M8024 Alphaproteobacteria. Source data are provided as a Source Data file.
Differences were apparent in the length and content of CRISPR arrays in populations of bacterial symbionts from sponge individuals CC1-CC4, as indicated by analyzing the distribution of sequencing depth over contigs containing CRISPR arrays calculated from mapped reads. For example, the CRISPR arrays in contigs M1_10 matching virus V1128 and M1_96 matching viruses V1128 and V6338 of M1 Verrucomicrobiales were mostly recovered by reads of CC1, whereas the array in contig M1_20 was mostly recovered by reads of CC2 and CC3 (Fig. 5b). Moreover, half of the CRISPR array in contig M6_3 matching virus V3791 was mapped exclusively by reads of M6 Arenicellales in CC2 and CC3, but the other half was also found in CC1 (Fig. 5b). This variation in CRISPR array configuration between sponge individuals reflects the holobiont-specific co-evolutionary history of microbial symbionts and viruses, and points to a mechanism for generation of genetic variants across sponge individuals.
Genome recombination and coupled nucleotide diversity between viruses and their hosts likely promoted by virus-encoded transposases
In virus-host pairs, nucleotide diversity of the paired genomes was positively correlated, suggesting coupled generation of genetic diversity in their long-term co-evolution in the C. concentrica holobiont (r = 0.58, P = 9.66e-33) (Fig. 5c). Although immune interactions, as indicated above, may explain this observation, exchange of genome fragments between viruses and their cellular hosts could have also directly generated linked genomic nucleotide diversities. Indeed, 61.9% (109 out of 176) vOTUs showed sequence similarity to their hosts (alignment length ≥ 300 bp, identity ≥ 90%) including 14 matched to proviruses (Fig. 2). The virus-host coupling in nucleotide diversity were stronger in these recombination regions than in the entire genomes (r = 0.65, P = 3.17e-16) (Fig. 5d). These prevalent remains of DNA recombination events in the genomes of viruses and their specific hosts likely recorded their long history of DNA exchange. 90.1% of the recombination activities were detected within each strain-level virus-host pair, suggesting such genome communication was restricted at intrastrain levels (Fig. 2) when interspecific HGT was largely blocked (Supplementary Fig. 3). Moreover, virus-host recombination was only found in HiC- or provirus- but not CRISPR-inferred virus-host pairs implying recombination activity was restricted between viruses and immunized hosts (Fig. 2).
Abundant transposases detected in the genomes of sponge-associated symbionts and viruses here and in previous studies50,57 may explain frequent recombination events (Fig. 4b and Supplementary Data 8). Viruses carrying transposons may jump into the host genomes upon infection58 and inadvertently bring adjacent viral genomic sequences into the host genomes. Similarly, host genomic sequences may be carried to viral genomes through transposon activity. Multiple transposases, such as DDE and Mu-like prophage FluMu transposases, were found in the viruses interacting with symbionts of the taxa Verrucomicrobiales, Acidimicrobiales, and Alphaproteobacteria (Fig. 4b). Specifically, the virus V957 infecting the abundant bacterium M10 Rhizobiaceae and M11238 Alphaproteobacteria was identified as a transposable phage (Fig. 5e). In addition, REP-associated tyrosine transposase was found at the recombination regions of M6398 Opitutales and its virus V36967 (Fig. 5f and Supplementary Data 10). Insertion sequences belonging to the IS66 and IS1380 families were identified at the recombination regions of M8024 Alphaproteobacteria and its virus V4473, and M6 Arenicellales and its virus V63657, respectively.
Discussion
The organization of holobiont microbiomes is a complex process, with many factors potentially driving community assembly and composition3,4. Among these factors, host impacts are often dominant, reflected by congruence between the host’s genetic heterogeneity and microbiome composition dissimilarity, namely phylosymbiosis49. However, the host’s genetic signal can often be confounded by factors such as diet and habitat preference, especially when conspecific host populations with small-scale genetic differences are studied59,60. In this study, individuals of the sponge C. concentrica from the same location were investigated to control for variations in host metabolism and geography. Sponge individuals with different levels of genetic variation were sampled to assess the strength of host impact on microbial and viral community compositions. In line with previous studies based on 16S rRNA genes15,59,60, we detected a strong and significant phylosymbiosis-like pattern between host phylogeny and the strain-level composition of cellular microbiomes (Figs. 3a, 6). Moreover, few studies have reported phylosymbiosis between a host and its virome17,61. Here we used HiC metagenomics to identify virus-host links and detected various levels of impact from the cellular hosts to the strain-level compositions of viruses that infected them. Like the cellular microbiomes, the congruence between sponge host phylogeny and the composition of viruses infecting the sponge host was strong in individuals CC1-CC4 (Fig. 3a). In contrast, congruence between the genetics of prokaryotic symbionts and their viromes were found to be inconsistent, suggesting that the virome composition of C. concentrica holobiont might be influenced by both their cellular hosts and other unknown factors (Fig. 3b and Supplementary Fig. 6). However, we acknowledge that this study would benefit from a larger sample size by recruiting more individuals of the local sponge population. We consider this an important objective for future research.
At the strain level (ANI < 99% MAGs and vOTUs as defined in this study), the community composition of the microbial symbionts and sponge viruses is shaped by the genetic heterogeneity of the sponge hosts. At the intrastrain level (ANI > 99% MAGs and vOTUs down to single cells or viruses), this host impact generally disappears, and microbial cells and viruses are frequently exchanged between sponge hosts. The specificity between viruses and their cellular microbial hosts in sponge holobionts is preserved at the strain to species levels, and interspecific HGTs are generally blocked because of strong immunity. At the intrastrain level, interactions and coevolution between virus-host pairs are frequent resulting in holobiont-specific genetic diversity.
The extent of vertical transmission of microbial symbionts is an essential parameter when testing the mode of selection in holobiont evolution1,62,63. However, the methodology is known to affect predictions of vertical transmission13,14,15. We investigated potential patterns of transmission for both cellular microbiomes and viromes between C. concentrica individuals at single nucleotide resolution. Most cellular microbial symbionts and viruses shared a high overlap in nucleotide diversity between sponge individuals with various genetic distances, suggesting an overall genetic decoupling between the sponge host and other holobiont members at the intrastrain level (Fig. 3c, d). This could be best explained by a scenario in which symbiotic microbial cells and viral particles are frequently exchanged between sponge individuals of the local population (Fig. 6). Detailed analysis of each MAG supports this notion that only a small portion of microbial symbionts showed a signal of vertical transmission (Supplementary Data 6), supporting the long-standing speculation that sponge holobionts acquire their symbionts through a mix of vertical and horizontal transmission10,64. Furthermore, it is rational to expect that viruses are transmitted along with their specific bacterial or archaeal hosts between holobionts. However, whereas strain-level virus-host specificity was consistent between sponge holobionts in our study, nucleotide diversity analysis of each MAG and their linked viruses showed there were only a small number of co-transmission cases (Supplementary Data 6). This suggests a shared pool of genetic variations for most microbial symbionts and viruses in the local sponge population (Fig. 6). While their infectious specificity was maintained, microbial cells and viruses were thus likely assembled among sponge individuals following independent rules. Many bacterial vOTUs had very high levels of genetic overlap (up to 100% popANI and 99.5% SGC) between sponge individuals (Fig. 3c, d). These potentially shared viral variants thus showed epidemic-like characteristics in the local C. concentrica population.
At the strain level, 94.9% of viruses specifically infected a single host, suggesting strict host specificity (Fig. 2). In contrast, recent HiC metagenome studies in other environments generally found lower host specificity. For example, host specificity was found in 70% of viruses in human guts at the species level and in 68% of viruses in wastewater at the genus level45,65, whereas 7.1% and 13.9% of viral species in hydrothermal mats were found to infect only one host based on CRISPR-matching and HiC-ligation, respectively66. In addition, HGT frequency between archaeal and bacterial species in C. concentrica was very low, again reflecting the lack of multi-host viruses in the sponge holobiont (Supplementary Fig. 3). Limited HGT by the prokaryotic microbial symbionts of C. concentrica is also supported by no plasmid being linked to any MAGs. In comparison, much higher gene exchange activity occurs in animal guts67, which are considered melting pots of HGT68. Intensive immune interactions during the evolution of viruses and cellular hosts may have contributed to their high host specificity (Fig. 6). As sponges continuously pump large amounts of seawater containing diverse viruses through their bodies10,64, microbial symbionts living in highly dense communities need to combat high level of viral attack and thus have developed powerful anti-viral systems under this long-term selective force (Fig. 4a). As a side effect, it is likely that these defense systems limit HGT across species of cellular microbial symbionts by suppressing the existence of plasmids and broad-spectrum viruses. In contrast, sponge-associated viruses have developed mechanisms to evade host immune effect, for example, by encoding methyltransferases (Fig. 4b). The balance between defense and anti-defense systems of microbial symbionts and their viruses resulting from their long-term interactions and coevolution may have led to viruses maintaining a chronic infection lifestyle with high intracellular viral copies69, as exemplified by M9 Verrucomicrobiales and its viruses V6338, V1226 and V4379 (Supplementary Fig. 4). Long-term exposure between the viruses and the host immune systems iteratively evolves new resistances and infection targets70 and may also drive the high host specificity of viruses associated with C. concentrica.
While C. concentrica-associated viruses showed strict host specificity at the strain level, their transmission and evolution at the intrastrain level were substantial in local sponge populations forming shared (as discussed above) and holobiont-specific genetic patterns, respectively (Fig. 6). Specifically, we identified the involvement of at least three forms of mutation generation by virus-host interaction. First, randomly generated point mutations during replication in the restriction recognition sites of viral genomes were selected to evade digestion by host restriction endonucleases. Heterogenous mutation patterns in these sites were identified in the viromes between sponge individuals (Fig. 5a). Second and on the host side, microorganisms encoding CRISPR-Cas systems acquired viral DNA fragments and developed specific spacer arrays reflecting holobiont-specific history of viral attack. This process generated holobiont-specific genome structural variations in microbial symbionts between sponge individuals (Fig. 5b). Third, frequent exchange of genome fragments by viruses with their highly specific cellular hosts by recombination in their prolonged intrastrain-level coevolutionary history (Fig. 2). This process might have linked the genetic diversification of the viruses and the cellular hosts (Fig. 5d). Supporting this speculation, genome recombination was exclusively found in HiC- and provirus-linked virus-host pairs but not in CRISPR-linked ones (Fig. 2). We propose that the intracellular recombination activities of viruses targeted by host CRISPR-Cas systems might be suppressed and consequently these viruses had little chance for genetic exchange with their immunized host during their coevolutionary history. Moreover, we posit that the activity of the highly abundant transposases may have driven these recombination events, including the identification of virus-encoded transposases (Fig. 5e, f), which would have promoted genome reorganization and increased genetic diversity within the symbiont population71.
In summary, using the marine sponge C. concentrica as a model, we revealed the intraspecific organization and evolution of microbiomes in a subset of its local population. We found evidence for the sponge host’s impact on the strain-level composition of both the cellular microbiomes and the sponge host viromes and observed that intrastrain-level transmission of microbial cells and viruses was likely frequent between sponge individuals. Strict specificity between viruses and their cellular hosts was identified at strain level and virus-host coevolution was restricted at intrastrain level within these virus-host pairs. Immune and evasive interactions were likely responsible for establishing this specificity and simultaneously drove holobiont-specific genetic variances both in the cellular hosts and the viruses. These results provide a structured map for understanding the ecological and evolutionary processes occurring within marine holobionts.
Methods
Overview of experiment design and sample processing
Sponge samples were collected under permit FP23/6, and the guidelines within issued by the Department of Primary Industries of the New South Wales State Government. No ethical approval is required by the University of New South Wales’s institutional policies when sampling invertebrates. The geographic range of the sampling site was in the order of 2500 m2, and there were around 200 individuals of sponge C. concentrica in a patchy distribution. In total, data of 31 sponge individuals as a subset of the local population were analyzed (Supplementary Data 1). Individuals CC1-CC3 and CC7 were collected and sequenced in this study and sequencing reads of all other individuals were generated in our previous studies35,36,37. Oxford nanopore sequences of sponge tissue DNA of CC7, Roche 454 sequences of microbial cellular DNA of CC4-CC6, paired-end (CC1-CC3) and mate-pair (CC4) Illumina sequences of microbial cellular DNA of CC1-CC4 were used to generate MAGs. All but the Roche 454 DNA data plus the Illumina sequences of sponge tissue total RNA of CC8-CC31 were used for genera viral contigs. All Illumina data except those of CC29-31 were used for MAG and vOTU abundance analyses. All Illumina data except those of CC29-31 and sponge tissue of CC1 were used for nucleotide diversity analyses. HiC sequencing information of CC1-CC3 were used in reconstructing the virus-host network.
Sponge sampling and processing
The samples of the sponge C. concentrica were collected at the Bare Island, Sydney, Australia at a depth of 8 m in July 2020. Phenotypically healthy sponge individuals were randomly sampled within a radius of 20 m. Parts of the sponge tissue were directly snap frozen in liquid nitrogen on-site, while the remaining samples were transported in ambient seawater to the laboratory (about 20 min drive), and then directly processed to obtain the microbial cell fractions and virion fractions. Specifically, 50 g of sponge tissue was washed three times in sterile calcium-and magnesium-free seawater on a platform rocker for 3 mins, homogenized in 50 ml of sterile calcium-and magnesium-free seawater and subsequently filtered through a 125-µm pore-sized sieve. The filtrate was centrifuged twice at 300 × g and 4 °C for 15 mins. The supernatant was collected and filtered twice using 11-µm and then 3-µm pore-sized filters. The final filtrate was centrifuged at 15,000 × g and 4 °C for 15 mins to pellet bacterial and archaeal cells. Ice-cold polyethylene glycol 8000 (Sigma-Aldrich) was added at final concentration of 6% v/v to the remaining supernatant, which was then incubated overnight at 4 °C. The mixture was then centrifuged at 76,000 × g and 4 °C for 30 mins to collect the virion pellets, which were immediately snap-frozen in liquid nitrogen for further processes.
DNA sequencing of the sponge tissue and the microbial cell fraction
For nanopore sequencing, total DNA was extracted from the tissue of individual CC7 (randomly selected) using the Blood and Cell Culture DNA mini kit, genomic-tip 20/G (Qiagen). A nanopore library was constructed using the SQK-LSK110 kit (Oxford Nanopore Technologies, City, UK). Metagenomic sequencing and base calling were performed on the GridION platform, and low-quality reads (scores < 7) were removed. For HiC metagenome sequencing, snap-frozen tissue of individual CC1 and enriched microbial cell fractions from individuals CC1, CC2, and CC3 were used to generate the HiC DNA libraries following the standard ProxiMeta protocol developed by Phase Genomics (Seattle, WA, USA). Metagenomic sequencing was conducted on the Novaseq 6000 platform (Illumina) at Shanghai Biozeron Biotechnology Co., Ltd (Shanghai, China).
DNA Sequencing of the virion fraction
The virion fractions from CC1, CC2, and CC3 were processed for DNA extraction following a previously described SDS method72. Extracted viral DNA was then amplified with the REPLI-g Cell WGA & WTA Kits (150054, Qiagen, Hilden, Germany). Illumina sequencing libraries were generated for the amplified viral DNA using the NEB Next UltraTM DNA Library Prep Kit for Illumina (New England Biolabs, City, MA, USA). Sequencing was performed on the Novaseq 6000 platform (Illumina) at Magigene (Guangzhou, China). For the same amplified viral DNA, nanopore PromethION libraries were constructed using the SQK-LSK109 kit (Oxford Nanopore Technologies, City, UK). Sequencing was performed on the PromethION platform (Oxford Nanopore) at Magigene (Guangzhou, China), and low-quality reads of scores < 7 were removed.
Read processing
End N/Xs of raw reads were trimmed by using ngsqctoolkit (v.2.3.3, -t5 -t3 -n 75)73. Illumina adapters and low-quality bases were trimmed, and reads with an average Q score below 25 or shorter than 75 nt were excluded using Flexbar (v.3.5.0)74 and Skewer (v.0.2.2)75. For Illumina HiC paired-end reads, the 3’-end of the ligation site in each read was removed.
Metagenome assembly, MAG generation and characterization
The assembly and binning processes are summarized as a flowchart in Supplementary Fig. 1a. Specifically, Illumina mate-paired reads were assembled using metaSPAdes (v.3.15.3)76 with default setting. Illumina HiC paired-end reads were assembled using metaSPAdes with default setting and MEGAHIT (v.1.1.3, --presets meta-sensitive)77. Illumina contigs longer than 1 K nt were processed along with nanopore reads for binning using BASALT (v.1.0.0, --autopara more-sensitive --refinepara deep)78. In parallel, Illumina contigs were pooled with the contigs of 454 sequencing that we generated previously50 and processed through CDHIT (v.4.8.1, -c 1)79 to remove redundant sequences. The non-redundant contigs were further assembled with nanopore reads using Opera-MS (v.0.9.0)80 with the default setting. Nanopore reads were also separately assembled using Flye (v.2.9, -meta)81. These contig sets were then pooled and processed through CDHIT (-c 1), and polished by using Illumina reads with Nextpolish (v.1.4.0) with default setting82. Polished contigs were processed for binning using VAMB (v.3.0.7)83 with the default setting. Bins generated in the above two independent processes were pooled with polished nanopore contigs longer than 500 K nt and analyzed using both CheckM (v.1.1.3, lineage_wf) and CheckM2 (v.1.0.1) with the default setting. The contamination values from CheckM were adjusted by excluding duplicated markers with AAI no less than 95%, which were expected to come from the same species84 (adjusted contamination = contamination × (100 - heterogeneity) / 100). Bins with completeness over 50% and contamination lower than 5% evaluated by both CheckM and CheckM2 were pooled with our previously obtained MAGs28,37, with an exception of the dominant MAG M8024, which has a contamination value of 0.87 and 7.88 as evaluated by CheckM and CheckM2, respectively (Supplementary Data 2). dRep (v.2.6.2, -c 0.99)34 was used to obtain the final prokaryotic MAGs. Bins with a length over 2.5 Mb and having more than 90% eukaryotic content as estimated with Tiara85 with the default setting, were identified as eukaryotic MAGs. Prokaryotic and eukaryotic MAGs were further decontaminated using ACDC86 with default setting and their qualities were finally assessed using CheckM, CheckM2, and BUSCO (v.5.3.2)87 with default setting.
Taxonomic classifications of prokaryotic and eukaryotic MAGs are conducted using GTDB-Tk (v.1.7.0)88 based the GTDB taxonomy r202 database and CAT89 based on the NCBI NR database with default settings, respectively. A phylogenetic tree of prokaryotic MAGs was inferred based on 40 single-copy marker proteins90 using IQ-TREE (v.2.0, -m MFP -B 1000 -bnni)91. Pairwise ANI and AAI comparisons of MAGs were calculated using FastANI (v.1.33)92 and CompareM (https://github.com/dparks1134/CompareM) with default settings, respectively.
Viral genome assembly and classification
The bioinformatic workflow for vOTU generation and plasmid identification is shown in Supplementary Fig. 1b. Specifically, Illumina reads of metagenomes were assembled using metaSPAdes in the virus and plasmid assembly mode. Non-redundant contigs were further assembled with nanopore reads using Opera-MS with the default setting. Viral DNA nanopore reads were also individually assembled using Flye (-meta). The two contig sets were pooled, processed through CDHIT (-c 1), and polished by using Nextpolish with default setting. Illumina reads of metatranscriptomes were assembled using both Trinity (v.2.13.2)93 and metaSPAdes with the transcriptome mode. Contigs retrieved by the two methods were then pooled and processed through CDHIT (-c 0.99).
Putative viral contigs were identified from the above pooled metagenome and metatranscriptome contigs with lengths greater than 1.5 K nt (for circular) or 5 K nt (for linear) using VirSorter94, VirSorter295, VirFinder96, Deepvirfinder97, and CAT similarly to the method previously described44. Briefly, contigs ≥ 5 K nt and sorted as VirSorter categories 1 and 2, and/or VirSorter2 score > 0.9 and/or VirFinder score > 0.9 and P < 0.05 and/or Deepvirfinder score > 0.9 and P < 0.05, and contigs ≥ 1.5 K nt sorted as “circular” by VirSorter were defined as viral genome fragments. Contigs that had over two records in the following four conditions were also classified as viral genome fragments: VirSorter categories 3, VirSorter2 score 0.7–0.9, VirFinder score 0.7-0.9 and P < 0.05, Deepvirfinder score 0.7–0.9 and P < 0.05. Contigs that had only one record were further run through CAT and only those having over 40% genes classified as viral genome fragments were added to the viral contig set. The identified viral contigs were then grouped into vOTUs by using anicluster40 based on 99% pairwise ANI and 85% sequence coverage (Supplementary Data 3, vOTUs with prefix V were from metagenomes, and with Vr were from metatranscriptomes). Plasmids were identified using Mash distance search against PLSDB98 with a max distance = 0.1 (Supplementary Data 3, prefix P).
The completeness of the identified vOTUs was assessed using CheckV with default setting. Taxonomic classification of vOTUs with length ≥ 10 K nt was conducted using PhaGCN2 based on the ICTV virus taxonomy profile (2021)99. Taxonomy of the remaining vOTUs were defined at the family level if > 50% of the genome’s proteins were assigned to the same viral family using BlastP (v.2.8.1, identity > 30%, bit-score > 50, and e-value < 0.001)100 against the Viral RefSeq (release 201).
Abundance, activity and nucleotide diversity analysis of MAGs and vOTUs
rRNA and tRNA genes of MAGs and repeat regions of MAGs and viral contigs were masked using dustmasker (https://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/lxr/source/src/app/dustmasker/) with default setting. Illumina reads of the metagenomes and metatranscriptomes were mapped to these masked genomes using BWA-MEM (v.0.7.17, −5SP)101. Low-quality mappings were filtered using Samtools (v.1.9, -q 20, -F 2314)102 and Bamm (v.1.7.3, --percentage_id 0.99 --percentage_aln 0.9) (https://github.com/Ecogenomics/BamM). Mapped reads were counted using BBMap (v.38.73)103 with default setting and normalized to each genome size by calculating RPKM values. Gene expressive levels in metatranscriptomes were also calculated as RPKM values. The community-wide differences of the microbiomes or viromes in sponge individuals were evaluated as Bray-Curtis dissimilarity and visualized as a cluster using the average distance method in the R package vegan104. Robinson-Foulds test was used to compare the congruency of dendrogram topologies of sponge host genetics and microbial community composition using the R package phangorn (https://cran.r-project.org/web/packages/phangorn/index.html). Robinson-Foulds P-values were determined by the probability of 100,000 randomized bifurcating dendrogram topologies yielding equivalent or more congruent phylosymbiotic patterns than the tested dendrogram as specified previously105.
Reads were further filtered to keep those with Q score ≥ 30 to conduct nucleotide diversity analyses. Intrastrain-level nucleotide diversity (π) and gene pN/pS ratio of sponge host, microbial symbionts and viruses were analyzed using inStrain (v.1.6.4) by setting mapping identity threshold to 99%. Nucleotide diversity was calculated based on the formula: 1-[(frequency of A)2 + (frequency of C)2 + (frequency of G)2 + (frequency of T)2]. The genetic overlap ratios between sponge individuals were calculated using inStrain with the compare mode. Nucleotide diversity overlaps of MAGs and vOTUs between sponge individuals were assessed using the popANI and SGC values generated with inStrain and relevant clustering of sponge individuals were obtained. The consensus trees of the clusters were generated using IQ-TREE (-con).
Putative coding genes of prokaryotic MAGs were predicted using Prodigal (v.2.6.3, -meta)106. Protein sequences of the genes were searched against the REBASE database (http://rebase.neb.com/rebase) using BlastP (e-value < 1e-5, identity ≥ 50%) to classify restriction enzymes. Their potential recognition sites were informed from the database. Metagenomes of microbial cell fractions of individuals CC1-CC4 were selected for site mutation calculation as viruses in these fractions had sufficient sequencing depth for nucleotide diversity analysis. The recognition sites were then searched in host-linked viruses and the base mutation probabilities were calculated using reads with Q score ≥ 30. Only the recognition sites of a virus having averagely > 100 mapped bases per site per sample were included for analysis.
Virus-host pairing
Links between viruses and the cellular hosts they infected were generated based on three methods. (1) Provirus-based links. Putative proviruses in MAGs were identified using Phaster (http://phaster.ca) with scores > 70. The identified provirus regions were searched in vOTUs by using Minimap2107 with default setting; (2) CRISPR-based links. CRISPR spacers in prokaryotic MAGs were identified using CRISPRDetect3108. Spacer sequences were then matched to vOTUs and plasmids using SpacePHARER (v.5.c2e680a)109 with default setting; and (3) HiC-based links. HiC Illumina reads were mapped to sequences of MAGs, vOTUs and plasmids using BWA-MEM (− 5SP). To keep only reliable HiC-links, we filtered out the aligned read pairs that (a) had MAPQ score < 20, nucleotide identity < 99%, or minimum aligned coverage < 90% (to remove unspecific alignments); (b) mapped to multi-sites with flags XA:Z: and SA:Z: (to exclude potential intragenomic repeats or potential virus-provirus links); or (c) were likely duplicates of another read pair identified using CDHIT (-c 0.99) (to remove remaining amplification artifacts generated during Illumina library preparation). The contig links were extracted from the filtered BAM files. Links between a MAG and a vOTU, that is, in a read pair when one read mapped to a contig of a MAG and the other read mapped to a vOTU contig, supported by two or more HiC read pairs were considered as potential virus-host infectious links. Background noise in HiC data likely resulted from links between extracellular free DNA were further removed by calculating the ratio of HiC read pairs with two read ends mapped to a vOTU and a MAG, respectively, versus those with both read ends mapped to vOTUs. The top 140 vOTU-MAG links with the highest ratio values were kept.
The recombination regions between vOTUs and MAGs were identified using BlastN (v.2.8.1, e-value < 1e-5, alignment length ≥ 300 bp, identity ≥ 90%)100 on repeat region-masked genome sequences. The combined host-virus network was visualized in Cytoscape (v.3.8.2)110 with colored lines representing three prediction methods of host-virus linkages, and genome recombination events. HGT events between prokaryotic MAGs and vOTUs were predicted using MetaCHIP111 with default setting and visualized in Cytoscape.
Viral activity classification
Community-wide virus/host abundance ratios greater than 1 or 2 have previously been considered to define active viruses45,48. Here, we used the virus/host ratio in the CC1 tissue sample (containing intracellular viral DNAs and extracellular virions) to identify active viruses and used the virus/host ratio in the microbial cell fraction samples (containing mostly intracellular viral DNAs) to identify intracellularly active viruses. Specifically, we defined viruses linked to prokaryotic MAGs as generally active (VT/HT ratio > 2), intracellularly active (VC/HC ratio > 2), and dormant (VT/HT ratio < 2 or VC/HC ratio < 2).
Functional annotation of MAGs and viral contigs
Annotation of prokaryotic putative coding sequence was performed by searching them against PFAM112 and KOfam113 using HMMER (v.3.3.2) (http://hmmer.org) with an e-value threshold of 1e-15 and coverage of 35%. Antiviral defense system in prokaryotic MAGs was identified and classified using PADLOC (v.1.1.0)114 with default setting.
Putative coding sequences of vOTUs were predicted using PHANOTATE (v.1.5.0)115 with default setting. Functional annotation of proteins was performed based on PFAM, KOfam, and VOG databases using HMMER with an e-value threshold of 1e-15 and coverage of 35%. Transposable phages were identified based on the co-occurrence of conserved maker proteins using BlastP (e-value < 1e-5) following a previously described instruction116. Insertion sequence elements in identified similar DNA fragments were detected using ISEScan (v.1.7.2.3)117 with default setting.
Sponge phylogeny and potential genetic relationship
During the fractionation process, some sponge cells could have been lysed. The released DNA could have passed through the filter pores and contaminated the microbial cell pellets. Therefore, sponge sequences were present in the sequencing data of the microbial fraction, and they were used to study the sponge’s genetics. The ribosomal protein S6e of the sponge host was selected as a marker gene to infer the phylogeny of sponge individuals CC1-CC4 because this gene contains no repeat regions and had sufficient read coverage in all four individuals. Metagenome Illumina reads of individuals CC1-CC4 were mapped to this gene in the sponge host MAG and consensus sequences of each sponge individual were then aligned using BWA-MEM (− 5SP). The multiple sequence alignment was manually checked and columns containing missing data in any of the four individuals were removed (Supplementary Fig. 5). Phylogeny was processed using IQ-TREE (-m MFP -B 1000 -bnni). The phylogenetic distance between branches was calculated using MEGA-11118 with the p-distance model.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
All raw sequencing data generated in this study have been deposited both in the NCBI database under project PRJNA944799 and the National Omics Data Encyclopedia under the project OEP004196. Accession numbers of samples are listed in Supplementary Data 1. The prokaryotic genome sequences are available in LMSG under accession numbers LMSG_G000026884.1-LMSG_G000026920.1. The specific links for genomes are provided in Supplementary Data 2. The sequences of vOTUs and eukaryotic genomes are available in figshare (https://doi.org/10.6084/m9.figshare.24540976.v4). Source data are provided in the Source Data file. Source data are provided with this paper.
Code availability
The code used in this study can be found at Zenodo (https://doi.org/10.5281/zenodo.13293443)119.
References
Douglas, A. E. & Werren, J. H. Holes in the hologenome: Why host-microbe symbioses are not holobionts. mBio 7, e02099 (2016).
Rosenberg, E. & Zilber-Rosenberg, I. The hologenome concept of evolution after 10 years. Microbiome 6, 78 (2018).
Kohl, K. D. Ecological and evolutionary mechanisms underlying patterns of phylosymbiosis in host-associated microbial communities. Philos. Trans. R. Soc. Lond. B Biol. Sci. 375, 20190251 (2020).
Mallott, E. K. & Amato, K. R. Host specificity of the gut microbiome. Nat. Rev. Microbiol. 19, 639–653 (2021).
Mazel, F. et al. Is host filtering the main driver of phylosymbiosis across the tree of life? mSystems 3, e00097–00018 (2018).
Schultz, D. T. et al. Ancient gene linkages support ctenophores as sister to other animals. Nature 618, 110–117 (2023).
Bell, J. J. et al. Climate change alterations to ecosystem dominance: how might sponge-dominated reefs function? Ecology 99, 1920–1931 (2018).
Nguyen, M., Wemheuer, B., Laffy, P. W., Webster, N. S. & Thomas, T. Taxonomic, functional and expression analysis of viral communities associated with marine sponges. PeerJ 9, e10715 (2021).
Thomas, T. et al. Diversity, structure and convergent evolution of the global sponge microbiome. Nat. Commun. 7, 11870 (2016).
Webster, N. S. & Thomas, T. The sponge hologenome. mBio 7, e00135–00116 (2016).
Pascelli, C. et al. Viral ecogenomics across the Porifera. Microbiome 8, 144 (2020).
O’Brien, P. A. et al. Diverse coral reef invertebrates exhibit patterns of phylosymbiosis. ISME J. 14, 2211–2222 (2020).
Bjork, J. R., Diez-Vives, C., Astudillo-Garcia, C., Archie, E. A. & Montoya, J. M. Vertical transmission of sponge microbiota is inconsistent and unfaithful. Nat. Ecol. Evol. 3, 1172–1183 (2019).
Cleary, D. F. R. et al. The sponge microbiome within the greater coral reef microbial metacommunity. Nat. Commun. 10, 1644 (2019).
Sabrina Pankey, M. et al. Cophylogeny and convergence shape holobiont evolution in sponge-microbe symbioses. Nat. Ecol. Evol. 6, 750–762 (2022).
Van Rossum, T., Ferretti, P., Maistrenko, O. M. & Bork, P. Diversity within species: interpreting strains in microbiomes. Nat. Rev. Microbiol. 18, 491–506 (2020).
Laffy, P. W. et al. Reef invertebrate viromics: diversity, host specificity and functional capacity. Environ. Microbiol. 20, 2125–2141 (2018).
Jahn, M. T. et al. A phage protein aids bacterial symbionts in eukaryote immune evasion. Cell Host Microbe 26, 542–550 (2019).
Coclet, C. & Roux, S. Global overview and major challenges of host prediction methods for uncultivated phages. Curr. Opin. Virol. 49, 117–126 (2021).
Maqsood, R. et al. Discordant transmission of bacteria and viruses from mothers to babies at birth. Microbiome 7, 156 (2019).
Walters, W. A. et al. Longitudinal comparison of the developing gut virome in infants and their mothers. Cell Host Microbe 31, 187–198 e183 (2023).
Olm, M. R. et al. inStrain profiles population microdiversity from metagenomic data and sensitively detects shared microbial strains. Nat. Biotechnol. 39, 727–736 (2021).
Garud, N. R. & Pollard, K. S. Population genetics in the human microbiome. Trends Genet. 36, 53–67 (2020).
Zilber-Rosenberg, I. & Rosenberg, E. Microbial-driven genetic variation in holobionts. FEMS Microbiol. Rev. 45, fuab022 (2021).
Chevallereau, A., Pons, B. J., van Houte, S. & Westra, E. R. Interactions between bacterial and phage communities in natural environments. Nat. Rev. Microbiol. 20, 49–62 (2021).
Minot, S. et al. Rapid evolution of the human gut virome. Proc. Natl. Acad. Sci. USA 110, 12450–12455 (2013).
Yaffe, E. & Relman, D. A. Tracking microbial evolution in the human gut using Hi-C reveals extensive horizontal gene transfer, persistence and adaptation. Nat. Microbiol. 5, 343–353 (2020).
Moitinho-Silva, L. et al. Integrated metabolism in sponge-microbe symbiosis revealed by genome-centered metatranscriptomics. ISME J. 11, 1651–1666 (2017).
Taylor, M. W., Schupp, P. J., Dahllof, I., Kjelleberg, S. & Steinberg, P. D. Host specificity in marine sponge-associated bacteria, and potential implications for marine microbial diversity. Environ. Microbiol. 6, 121–130 (2004).
Thomas, T. et al. Functional genomic signatures of sponge bacteria reveal unique and shared features of symbiosis. ISME J. 4, 1557–1567 (2010).
Du, Y., Fuhrman, J. A. & Sun, F. ViralCC retrieves complete viral genomes and virus-host pairs from metagenomic Hi-C data. Nat. Commun. 14, 502 (2023).
Bickhart, D. M. et al. Assignment of virus and antimicrobial resistance genes to microbial hosts in a complex microbial community by combined long-read assembly and proximity ligation. Genome Biol. 20, 153 (2019).
Johansen, J. et al. Genome binning of viral entities from bulk metagenomics data. Nat. Commun. 13, 965 (2022).
Olm, M. R., Brown, C. T., Brooks, B. & Banfield, J. F. dRep: a tool for fast and accurate genomic comparisons that enables improved genome recovery from metagenomes through de-replication. ISME J. 11, 2864–2868 (2017).
Diez-Vives, C., Moitinho-Silva, L., Nielsen, S., Reynolds, D. & Thomas, T. Expression of eukaryotic-like protein in the microbiome of sponges. Mol. Ecol. 26, 1432–1451 (2017).
Taylor, J. A., Diez-Vives, C., Nielsen, S., Wemheuer, B. & Thomas, T. Communality in microbial stress response and differential metabolic interactions revealed by time-series analysis of sponge symbionts. Environ. Microbiol. 24, 2299–2314 (2022).
Taylor, J. A. et al. Phylogeny resolved, metabolism revealed: functional radiation within a widespread and divergent clade of sponge symbionts. ISME J. 15, 503–519 (2021).
Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 25, 1043–1055 (2015).
Chklovski, A., Parks, D. H., Woodcroft, B. J. & Tyson, G. W. CheckM2: a rapid, scalable and accurate tool for assessing microbial genome quality using machine learning. Nat. Methods 20, 1203–1212 (2023).
Nayfach, S. et al. CheckV assesses the quality and completeness of metagenome-assembled viral genomes. Nat. Biotechnol. 39, 578–585 (2021).
Zhao, J. et al. Significant differences in planktonic virus communities between “cellular fraction” (0.22 ~ 3.0 microm) and “viral fraction” (<0.22 mum) in the ocean. Micro. Ecol. 86, 825–842 (2023).
He, T. et al. Environmental viromes reveal the global distribution signatures of deep-sea DNA viruses. J. Adv. Res. S2090-1232, 00115–00117 (2023).
Oliveira, V. et al. Characterization of putative circular plasmids in sponge-associated bacterial communities using a selective multiply-primed rolling circle amplification. Mol. Ecol. Resour. 21, 110–121 (2021).
Gregory, A. C. et al. Marine DNA viral macro- and microdiversity from pole to pole. Cell 177, 1109–1123 e1114 (2019).
Marbouty, M., Thierry, A., Millot, G. A. & Koszul, R. MetaHiC phage-bacteria infection network reveals active cycling phages of the healthy human gut. Elife 10, e60608 (2021).
Andrade-Martínez, J. S. et al. Computational tools for the analysis of uncultivated phage genomes. Microbiol. Mol. Biol. Rev. 86, e00004–e00021 (2022).
Song, W., Wemheuer, B., Steinberg, P. D., Marzinelli, E. M. & Thomas, T. Contribution of horizontal gene transfer to the functionality of microbial biofilm on a macroalgae. ISME J. 15, 807–817 (2021).
Kieft, K. & Anantharaman, K. Deciphering active prophages from metagenomes. mSystems 7, e00084–00022 (2022).
Lim, S. J. & Bordenstein, S. R. An introduction to phylosymbiosis. Proc. Biol. Sci. 287, 20192900 (2020).
Fan, L. et al. Functional equivalence and evolutionary convergence in complex communities of microbial sponge symbionts. Proc. Natl. Acad. Sci. USA 109, E1878–E1887 (2012).
Millman, A., Melamed, S., Amitai, G. & Sorek, R. Diversity and classification of cyclic-oligonucleotide-based anti-phage signalling systems. Nat. Microbiol. 5, 1608–1615 (2020).
Doron, S. et al. Systematic discovery of antiphage defense systems in the microbial pangenome. Science 359, eaar4120 (2018).
Correa, A. M. S. et al. Revisiting the rules of life for viruses of microorganisms. Nat. Rev. Microbiol. 19, 501–513 (2021).
Hampton, H. G., Watson, B. N. J. & Fineran, P. C. The arms race between bacteria and their phage foes. Nature 577, 327–336 (2020).
Seed, K. D., Lazinski, D. W., Calderwood, S. B. & Camilli, A. A bacteriophage encodes its own CRISPR/Cas adaptive response to evade host innate immunity. Nature 494, 489–491 (2013).
Christie, M. R., McNickle, G. G., French, R. A. & Blouin, M. S. Life history variation is maintained by fitness trade-offs and negative frequency-dependent selection. Proc. Natl. Acad. Sci. USA 115, 4441–4446 (2018).
Gauthier, M.-E. A., Watson, J. R. & Degnan, S. M. Draft genomes shed light on the dual bacterial symbiosis that dominates the microbiome of the coral reef sponge Amphimedon queenslandica. Front. Mar. Sci. 3, 196 (2016).
Toussaint, A. & Rice, P. A. Transposable phages, DNA reorganization and transfer. Curr. Opin. Microbiol. 38, 88–94 (2017).
Easson, C. G., Chaves-Fonnegra, A., Thacker, R. W. & Lopez, J. V. Host population genetics and biogeography structure the microbiome of the sponge Cliona delitrix. Ecol. Evol. 10, 2007–2020 (2020).
Griffiths, S. M. et al. Host genetics and geography influence microbiome composition in the sponge Ircinia campana. J. Anim. Ecol. 88, 1684–1695 (2019).
Leigh, B. A., Bordenstein, S. R., Brooks, A. W., Mikaelyan, A. & Bordenstein, S. R. Finer-scale phylosymbiosis: insights from insect viromes. mSystems 3, 00131–00118 (2018).
Ronai, I. et al. Microbiota, symbiosis and individuality summer school” meeting report. Microbiome 8, 117 (2020).
Moran, N. A. & Sloan, D. B. The hologenome concept: helpful or hollow? PLoS Biol. 13, e1002311 (2015).
Hentschel, U., Piel, J., Degnan, S. M. & Taylor, M. W. Genomic insights into the marine sponge microbiome. Nat. Rev. Microbiol. 10, 641–654 (2012).
Chen, Y., Wang, Y., Paez-Espino, D., Polz, M. F. & Zhang, T. Prokaryotic viruses impact functional microorganisms in nutrient removal and carbon cycle in wastewater treatment plants. Nat. Commun. 12, 5398 (2021).
Hwang, Y., Roux, S., Coclet, C., Krause, S. J. E. & Girguis, P. R. Viruses interact with hosts that span distantly related microbial domains in dense hydrothermal mats. Nat. Microbiol. 8, 946–957 (2023).
Smillie, C. S. et al. Ecology drives a global network of gene exchange connecting the human microbiome. Nature 480, 241–244 (2011).
Shterzer, N. & Mizrahi, I. The animal gut as a melting pot for horizontal gene transfer. Can. J. Microbiol. 61, 603–605 (2015).
Alarcon-Schumacher, T., Naor, A., Gophna, U. & Erdmann, S. Isolation of a virus causing a chronic infection in the archaeal model organism Haloferax volcanii reveals antiviral activities of a provirus. Proc. Natl. Acad. Sci. USA 119, e2205037119 (2022).
de Jonge, P. A., Nobrega, F. L., Brouns, S. J. J. & Dutilh, B. E. Molecular and evolutionary determinants of bacteriophage host range. Trends Microbiol. 27, 51–63 (2019).
O’Brien, S., Kummerli, R., Paterson, S., Winstanley, C. & Brockhurst, M. A. Transposable temperate phages promote the evolution of divergent social strategies in Pseudomonas aeruginosa populations. Proc. Biol. Sci. 286, 20191794 (2019).
Chen, H., Rangasamy, M., Tan, S. Y., Wang, H. & Siegfried, B. D. Evaluation of five methods for total DNA extraction from western corn rootworm beetles. PLoS ONE 5, e11963 (2010).
Patel, R. K. & Jain, M. NGS QC Toolkit: a toolkit for quality control of next generation sequencing data. PLoS ONE 7, e30619 (2012).
Roehr, J. T., Dieterich, C. & Reinert, K. Flexbar 3.0 - SIMD and multicore parallelization. Bioinformatics 33, 2941–2942 (2017).
Jiang, H., Lei, R., Ding, S.-W. & Zhu, S. Skewer: a fast and accurate adapter trimmer for next-generation sequencing paired-end reads. BMC Bioinformatics 15, 1–12 (2014).
Nurk, S., Meleshko, D., Korobeynikov, A. & Pevzner, P. A. metaSPAdes: a new versatile metagenomic assembler. Genome Res. 27, 824–834 (2017).
Li, D. et al. MEGAHIT v1.0: A fast and scalable metagenome assembler driven by advanced methodologies and community practices. Methods 102, 3–11 (2016).
Qiu, Z. et al. BASALT refines binning from metagenomic data and increases resolution of genome-resolved metagenomic analysis. Nat. Commun. 15, 2179 (2024).
Fu, L., Niu, B., Zhu, Z., Wu, S. & Li, W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28, 3150–3152 (2012).
Bertrand, D. et al. Hybrid metagenomic assembly enables high-resolution analysis of resistance determinants and mobile elements in human microbiomes. Nat. Biotechnol. 37, 937–944 (2019).
Kolmogorov, M. et al. metaFlye: scalable long-read metagenome assembly using repeat graphs. Nat. Methods 17, 1103–1110 (2020).
Hu, J., Fan, J., Sun, Z. & Liu, S. NextPolish: a fast and efficient genome polishing tool for long-read assembly. Bioinformatics 36, 2253–2255 (2020).
Nissen, J. N. et al. Improved metagenome binning and assembly using deep variational autoencoders. Nat. Biotechnol. 39, 555–560 (2021).
Konstantinidis, K. T., Rossello-Mora, R. & Amann, R. Uncultivated microbes in need of their own taxonomy. ISME J. 11, 2399–2406 (2017).
Karlicki, M., Antonowicz, S. & Karnkowska, A. Tiara: deep learning-based classification system for eukaryotic sequences. Bioinformatics 38, 344–350 (2022).
Lux, M. et al. acdc - automated contamination detection and confidence estimation for single-cell genome data. BMC Bioinformatics 17, 543 (2016).
Manni, M., Berkeley, M. R., Seppey, M. & Zdobnov, E. M. BUSCO: Assessing genomic data quality and beyond. Curr. Protoc. 1, e323 (2021).
Chaumeil, P. A., Mussig, A. J., Hugenholtz, P. & Parks, D. H. GTDB-Tk: a toolkit to classify genomes with the genome taxonomy database. Bioinformatics 36, 1925–1927 (2019).
von Meijenfeldt, F. A. B., Arkhipova, K., Cambuy, D. D., Coutinho, F. H. & Dutilh, B. E. Robust taxonomic classification of uncharted microbial sequences and bins with CAT and BAT. Genome Biol. 20, 217 (2019).
Martinez-Gutierrez, C. A. & Aylward, F. O. Phylogenetic signal, congruence, and uncertainty across bacteria and archaea. Mol. Biol. Evol. 38, 5514–5527 (2021).
Nguyen, L. T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274 (2015).
Jain, C., Rodriguez RL, Phillippy, A. M., Konstantinidis, K. T. & Aluru, S. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat. Commun. 9, 5114 (2018).
Grabherr, M. G. et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 29, 644–652 (2011).
Roux, S., Enault, F., Hurwitz, B. L. & Sullivan, M. B. VirSorter: mining viral signal from microbial genomic data. PeerJ 3, e985 (2015).
Guo, J. et al. VirSorter2: a multi-classifier, expert-guided approach to detect diverse DNA and RNA viruses. Microbiome 9, 37 (2021).
Ren, J., Ahlgren, N. A., Lu, Y. Y., Fuhrman, J. A. & Sun, F. VirFinder: a novel k-mer based tool for identifying viral sequences from assembled metagenomic data. Microbiome 5, 69 (2017).
Ren, J. et al. Identifying viruses from metagenomic data using deep learning. Quant. Biol. 8, 64–77 (2020).
Schmartz, G. P. et al. PLSDB: advancing a comprehensive database of bacterial plasmids. Nucleic Acids Res. 50, D273–D278 (2022).
Jiang, J. Z. et al. Virus classification for viral genomic fragments using PhaGCN2. Brief. Bioinform. 24, bbac505 (2023).
Ye, J., McGinnis, S. & Madden, T. L. BLAST: improvements for better sequence analysis. Nucleic Acids Res. 34, W6–W9 (2006).
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://doi.org/10.48550/arXiv.1303.3997 (2013).
Danecek, P. et al. Twelve years of SAMtools and BCFtools. Gigascience 10, giab008 (2021).
Bushnell B. BBMap: a fast, accurate, splice-aware aligner). Lawrence Berkeley National Lab.(LBNL), Berkeley, CA (United States, 2014).
Dixon, P. VEGAN, a package of R functions for community ecology. J. Veg. Sci. 14, 927–930 (2003).
Brooks, A. W., Kohl, K. D., Brucker, R. M., van Opstal, E. J. & Bordenstein, S. R. Phylosymbiosis: relationships and functional effects of microbial communities across host evolutionary history. PLoS Biol. 14, e2000225 (2016).
Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11, 1–11 (2010).
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Biswas, A., Staals, R. H., Morales, S. E., Fineran, P. C. & Brown, C. M. CRISPRDetect: A flexible algorithm to define CRISPR arrays. BMC Genomics 17, 356 (2016).
Zhang, R. et al. SpacePHARER: Sensitive identification of phages from CRISPR spacers in prokaryotic hosts. Bioinformatics 37, 3364–3366 (2021).
Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).
Song, W., Wemheuer, B., Zhang, S., Steensen, K. & Thomas, T. MetaCHIP: community-level horizontal gene transfer identification through the combination of best-match and phylogenetic approaches. Microbiome 7, 36 (2019).
Finn, R. D. et al. The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res. 44, D279–D285 (2016).
Aramaki, T. et al. KofamKOALA: KEGG ortholog assignment based on profile HMM and adaptive score threshold. Bioinformatics 36, 2251–2252 (2020).
Payne, L. J. et al. Identification and classification of antiviral defence systems in bacteria and archaea with PADLOC reveals new system types. Nucleic Acids Res. 49, 10868–10878 (2021).
McNair, K., Zhou, C., Dinsdale, E. A., Souza, B. & Edwards, R. A. PHANOTATE: a novel approach to gene identification in phage genomes. Bioinformatics 35, 4537–4542 (2019).
Zhang, M. et al. Unexplored diversity and ecological functions of transposable phages. ISME J. 17, 1015–1028 (2023).
Xie, Z. & Tang, H. ISEScan: automated identification of insertion sequence elements in prokaryotic genomes. Bioinformatics 33, 3340–3347 (2017).
Tamura, K., Stecher, G. & Kumar, S. MEGA11: molecular evolutionary genetics analysis version 11. Mol. Biol. Evol. 38, 3022–3027 (2021).
Wang W. Decoupling of strain- and intrastrain-level interactions of microbiomes in a sponge holobiont. Zenodo, https://doi.org/10.5281/zenodo.13293443 (2024).
Acknowledgements
We thank Jianjun Wang (Institute of Geography and Limnology, Chinese Academy of Sciences), Huahua Jian (Shanghai Jiao Tong University), and Yantao Liang (Ocean University of China) for their valuable suggestions on data analysis. This study was supported by the National Natural Science Foundation of China (No. 42376113 L.F.), the Betty and Gordon Moore Foundation (T.T.), the National Natural Science Foundation of China (Nos. 42406103 W.W., 41776138 Z.L., and 31861143020 Z.L. and 91951120 L.F.), the Open Project of Key Laboratory of Environmental Biotechnology, CAS (No. KF2021006 L.F.), and the Guangdong Basic and Applied Basic Research Foundation (No. 2021B1515120080 L.F.). Computation in this study was supported by the Center for Computational Science and Engineering at the Southern University of Science and Technology.
Author information
Authors and Affiliations
Contributions
T.T. and L.F. conceived this study. W.W., W.S., M.M., B.X., J.T., and Y.Z. carried out the investigation and data analysis. W.W., W.S., X.F., Z.L., P.Q., N.S.W., T.T., and L.F. were involved in the writing and revision of the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks Kshitij Tandon and the other anonymous reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Source data
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Wang, W., Song, W., Majzoub, M.E. et al. Decoupling of strain- and intrastrain-level interactions of microbiomes in a sponge holobiont. Nat Commun 15, 8205 (2024). https://doi.org/10.1038/s41467-024-52464-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467-024-52464-6