Introduction

Microbial ecosystems thrive on intricate webs of interactions, often obscured by their vast diversity and hidden by limitations in conventional culture-based methods1,2. While studying pairwise interactions provides valuable insights3,4, understanding the complete picture within natural complex communities remains challenging5. With the advances in sequencing and computational modeling, microbial co-occurrence networks, constructed from patterns of species co-occurrence across environmental samples, have become a powerful tool to infer interspecies connections6,7,8. The use of co-occurrence network analysis became popular after several ready-to-use workflows with user-friendly visualizations were published9,10,11,12,13,14. However, as several previous studies argue, co-occurrence alone does not provide conclusive evidence of ecological interactions15,16,17. To address this limitation, we combine co-occurrence data with evidence of metabolic complementarity to infer more robust interactions and explore potential underlying mechanisms.

Metabolic dependencies have been proposed as a major driver of species co-occurrence18. However, it has also been suggested that species co-occurrence could in turn drive metabolic dependencies2. This highlights the need for a nuanced understanding of the dynamic interplay between metabolic interactions and microbial community assembly. Long-read sequencing and improved assembly enable high-quality metagenome-assembled genomes (MAGs), opening doors to map metabolic cooperation and competition in complex communities through nutrient exchange19,20. Although genome-scale metabolic models (GSMMs), along with flux balance analysis (FBA), have shown potential in predicting metabolite exchange and growth under curated laboratory conditions21,22, there remains substantial potential for further exploration in this area. Integrating these models with co-occurrence networks to validate and refine predictions of metabolic dependencies within uncontrolled natural habitats is an untapped frontier.

Geothermal springs, characterized by extreme temperatures, fluctuating ionic strengths, and enriched minerals, harbor a unique community of thermophiles with specialized physiological and metabolic adaptations23,24,25. These adaptations often involve genome streamlining, leading to reduced genomes and temperature-adapted proteins26,27,28,29,30,31. Yet, thermophiles exhibit remarkable genomic plasticity and metabolic flexibility32. Metabolic reconstructions of some geothermal phyla suggest dependence on interspecies exchange of amino acids, vitamins, and cofactors, implying close syntrophy33,34. This is further supported by observations of highly cooperative communities formed by small-genome auxotrophs, where cross-feeding promotes mutual benefit35,36,37. This suggests that harsh geothermal conditions while restricting growth, may simultaneously drive synergistic interactions. Nevertheless, this hypothesis requires a systemic examination. Addressing this question may yield insights into the ongoing debate regarding the prevalence of either antagonistic or synergistic interactions within natural microbial communities, as well as how environmental stresses affect their prevalence38,39,40,41,42.

Here, we leverage the power of random matrix theory (RMT) for robust co-occurrence and metabolic network constructions while tapping into the rich information contained within MAGs9,43. This synergy allows us to identify species co-occurrence patterns and predict potential metabolic interdependencies based on metabolic pathways. Adopting this powerful tool, we show that metabolic complementarity intensifies with rising temperatures, as thermophiles under heat stress increasingly rely on interspecies exchange of essential metabolites for survival. Additionally, we demonstrate that phylogenetic distance dictates cooperative strategies, with closely related thermophiles competing for similar resources while distantly related ones engage in mutually beneficial metabolic exchanges. Furthermore, our findings reveal that species with small genomes, potentially lacking vital metabolic pathways, depend heavily on metabolic partnerships for survival and growth. Our research offers a deeper understanding of microbial synergy in challenging environments, providing valuable insights into the interplay between environmental stress, metabolic dependencies, and the evolution of cooperative strategies within microbial communities.

Results

Network construction and thermal preferences of thermophiles

To comprehensively analyze both co-occurrence and metabolic complementarity networks among thermophiles, we developed a unique bioinformatic workflow (Fig. 1) and seamlessly integrated it into the iNAP platform (https://inap.denglab.org.cn)44 (Supplementary Fig. 1), as updated iNAP 2.045. This workflow leverages multiple metabolic complementarity indices, infers the network thresholds of co-occurrence and metabolic complementarity using the random matrix theory (RMT), extracts shared interactions within both network inferences, and identifies potential exchangeable metabolites.

Fig. 1: A schematic diagram of our study workflow.
figure 1

The entire analytical procedure can be segmented into four distinct parts. a Samples were collected, and DNA extraction was performed. Long contigs were obtained by hybrid assembly of short and long reads. b MAG abundance was calculated by read-length mapping after genome binning. c The genome-scale metabolic model of MAG was constructed by protein prediction after genome binning, and the pairwise metabolic complementarity index (MIcomplementarity) of MAGs was calculated. d An RMT-based method was used to generate a co-occurrence network and a metabolic complementary network, and their common edges were analyzed further to provide metabolic clues for potential co-occurrence of species. This figure was created with BioRender.com released under a Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International license.

A total of 449 million Illumina short reads (223.35 Gb of raw bases, N = 40) and 51 million Nanopore long reads (52.01 Gb of raw bases, N = 40) were obtained from the hot spring sediment samples across a temperature gradient (63.5 °C~85.8 °C), with eight samples each having five replicates. The Nonpareil method confirmed the estimated average coverage of each sample above 80%, indicating sufficient sequencing depth (Supplementary Fig. 2). There is an increasing trend of sequencing diversity (Nd) from the high- to low-temperatures (Fig. 2a), indicating that the environmental conditions of extreme thermal temperatures have reduced the sequence diversity of its microbial communities, and the inferred species diversity might be more specialized. After assembly, binning, and CheckM quality control, 401 medium- and high-quality metagenome-assembled genomes (MAGs) were retained for subsequent network analysis.

Fig. 2: Display of sequence diversity and MAG taxonomic distribution across samples.
figure 2

a Sequence diversity (Nd) for each temperature group, quantified using Illumina sequencing. Each lollipop plot represents a specific temperature group, with the endpoints indicating the maximum and minimum Nd values observed across five replicates. The central point on each lollipop plot denotes the average Nd value for the respective temperature group, calculated from the same five replicates. The smoothed line was drawn using the loess method to display the trend. b The phylogenetic tree of 401 medium- and high-quality MAGs. The background color of the tree represented the domain of MAGs (light blue for bacteria, light red for archaea). Branches were colored as the representations of the phylum of MAGs. The bar plot in the outer circle represents the relative abundance of MAGs. c Non-metric multidimensional scaling (NMDS) analysis was used to show the among-sample dissimilarities, and samples were grouped into three temperature categories. d Relative abundance at the phylum level.

The taxonomic assignment revealed that 85.78% of MAGs belonged to 38 bacterial phyla, whereas 57 MAGs were affiliated with seven archaeal phyla (Fig. 2b, Supplementary Data 1). The MAG abundance varied significantly across temperature ranges (PERMANOVA, p < 0.001) (Fig. 2c), leading to the classification of sampling sites into three groups: extremely thermal (ET, 78.5–85.8 °C, 2 samples × 5 replicates), highly thermal (HT, 67.5–73.9 °C, 3 samples × 5 replicates), and moderately thermal (MT, 63.5–65.8 °C, 3 samples × 5 replicates). Most MAGs exhibited distinct temperature preferences, with only a few evenly distributed across the gradient. For example, 80.52% of the relative abundance assigned to Thermoproteota MAGs originated from the ET group, highlighting this archaea’s strong preference for extremely high-temperature habitats. In contrast, Cyanobacteria and Bacteroidota preferred the cooler group, with 96.28% and 96.98% of relative abundance originating from HT and MT groups (Fig. 2d, Supplementary Data 2).

Archaea also displayed clear temperature preferences despite accounting for only 7.31% of relative abundance across all samples. While Thermoproteota favored the ET group, Micrarchaeota, the second most abundant, exhibited a unique distribution, with nearly equal contributions from HT and MT groups. These findings underscore the diverse thermal adaptations and preferences of thermophile communities within this hot spring ecosystem.

The co-occurrence patterns of thermophiles within three temperature groups

Three co-occurrence networks were constructed using the relative abundance information, employing the RMT-based method (Fig. 3, Supplementary Data 3). Positive edges constituted the majority in the three networks (ET: 98.82%; HT: 75.54%; MT: 77.99%). Post-extraction of positive edges as subnetworks, the R2 values of the power-law model for the co-occurrence networks of the HT and MT groups were 0.59 and 0.53, respectively, but the ET group could not fit well to the power-law distribution, losing its scale-free property. We found that the ET group subnetwork exhibited a higher network density, shorter average path distance (GD), lower harmonic geodesic distance (HD), higher average clustering coefficient, and reduced modularity compared to the other two subnetworks (Table 1). These characteristics suggested a tighter interaction structure in the ET group subnetwork than in the other two subnetworks. A harsh environment (high temperature) induces tight and synergistic interactions among thermophiles. Furthermore, the global topological properties of the three networks were significantly different from the respective randomized networks (generated 100 times), demonstrating that all observed interactions filtered by RMT cutoff were not randomly connected (Table 1, one sample t-test).

Fig. 3: The co-occurrence networks of three temperature groups and their corresponding positively-linked subnetworks.
figure 3

ac Network visualizations of co-occurrence networks of three temperature groups. df Corresponding positively-linked subnetworks of three temperature groups.

Table 1 The statistical properties of the co-occurrence networks (positive links) of MAGs within three temperature groups

The inferred pairwise metabolic complementarity among thermophiles

To assess metabolic dependencies, we reconstructed genome-scale metabolic models for each MAG. We defined the PhyloMint metabolic complementarity index (MIcomplementarity) to quantitatively assess the degree of metabolic dependencies between each pair of MAGs within three temperature groups (see “Methods” section). Specifically, 29 MAGs were unique to the ET group, while 70 MAGs were detected across all temperatures, likely representing heat-tolerant generalists. Both approaches revealed a surprising rarity of synergistic metabolic interactions (Fig. 4), while the RMT-based threshold yielded an even more stringent classification, with less than 3% of interactions deemed significant (ET: 4.13%; HT: 4.68%; MT: 6.46%) (Fig. 4).

Fig. 4: The distributions of three categories of interactions based on the pairwise MIcomplementarity values.
figure 4

In network graphs, gray links represent asymmetrical synergy (commensalism), while green links represent symmetric synergy (mutualism).

Notably, observed synergistic interactions displayed a marked asymmetry of MIcomplementarity values under all three groups. We categorized these into mutualistic (both MIcomplementarity values of pairwise MAGs exceeding the threshold) and commensalistic (only one MIcomplementarity value of pairwise MAGs exceeding the threshold) metabolic interactions. Regardless of the threshold used, mutualistic interactions were extremely rare, constituting less than 0.3% of all pairs in any group (Fig. 4). This suggests that metabolism-based synergy within the hot spring community is primarily driven by unidirectional feeding, with one thermophile benefiting from the metabolic products of another. In particular, the species pairs involved in commensalistic and mutualistic interactions exhibit dramatically different patterns regarding genome size differences. Across the three temperature groups, the average genome size differences (estimated genome size of the giver minus that of the taker) identified as commensalistic interactions by the RMT threshold were ET: 716 Kbp, HT: 1204 Kbp, and MT: 879 Kbp. However, the average genome size differences for mutualistic interactions were close to 0 for all three temperature groups. In all commensalistic interactions, the proportion of interactions where the estimated genome size of the taker is less than 2 Mbp and that of the giver is more than 2 Mbp accounted for as high as 31.29% (87 pairs) in the ET group, while in the HT and MT groups, these values were 37.01% (624 pairs) and 29.87% (1057 pairs), respectively. These results indicated that most metabolic dependency between one streamlined genome and another more comprehensive genome is a more prevalent pattern in commensalistic interactions.

Following RMT threshold determination (Fig. 4), the constructed metabolic networks for the ET group posed more robust scale-free properties than that of the occurrence networks (Table 2). In addition, the topological properties of metabolic networks varied across temperatures. The hottest ET network, despite its smaller size, exhibited the highest average clustering coefficient, highest density, and shortest communication paths (Table 2), suggesting a highly interconnected community in this harsh environment. However, the patterns in average degree, average path distance, and harmonic geodesic distance were opposite to those of the co-occurrence networks. Such findings illustrated that metabolic complementarity and co-occurrence networks do not exhibit the same characteristics of microbial communities, which were then reflected in the topology of the network. The hub node identification using zi–Pi analysis illuminated the key players in the metabolic networks (Supplementary Data 4). The ET network, with 36 hubs, hosted 6 archaeal hubs (3 connector hubs, 1 module hub, and 2 network hubs), including 4 assigned to Thermoproteota, 1 to Asgardarchaeota, and 1 to Aenigmatarchaeota. The larger HT and MT networks revealed diverse hubs (137 and 97), while both contained hubs from Thermoproteota, Micrarchaeota, and Methanobacteriota. These results indicated that despite their low abundance, archaea, particularly Thermoproteota and Methanobacteriota, emerge as essential hubs across all temperature networks, demonstrating their unique metabolic capabilities and indispensable roles in the thermophile communities.

Table 2 The statistical properties of the metabolic complementarity networks of MAGs within three temperature groups

To understand how microbial interactions arise, we investigated the relationships between phylogenetic distance (Faith’s PD) and the co-occurrence patterns/ metabolic complementarity. The co-occurrence strength, measured by Spearman’s correlation, negatively correlated to PD in all temperature groups (Fig. 5a–c). However, a fascinating pattern emerged after RMT filtering (cutoff = 0.830). The density distribution of PD exhibited two distinct peaks: one near 0 and another near 2. This suggested two potential drivers of co-occurrence: the close association of similar species with shared resource requirements (PD at peak 0) and the co-occurrence of phylogenetically distant species with complementary metabolic needs (PD at peak 2). Besides, significant positive correlations between MIcomplementarity and PD were observed in all temperature groups (Fig. 5d–f), indicating that distantly related MAGs were more likely to exhibit strong metabolic dependencies. This trend was particularly pronounced in the hottest temperature. Notably, after RMT filtering, those MAG pairs of PD lower than 1 became rare, with most values clustering around 2. It suggested that the analysis of metabolic complementarity omitted co-occurrences based on phylogenetic proximity, highlighting the importance of metabolic complementarity between distantly related species.

Fig. 5: Scatter Plot of relationships between phylogenetic distance (PD) and Spearman’s correlation (Cor)/PhyloMint Complementarity Index (MIcomplementarity).
figure 5

ac The correlations between Spearman’s correlation (Cor) and phylogenetic distance (PD) of three temperature groups. df The correlations between PhyloMint complementarity index (MIcomplementarity) and phylogenetic distance (PD) of three temperature groups. af The dashed lines indicate the threshold of RMT. Dark points represent the interactions (edges) selected into networks, and light points are not. The fitted curve was modeled using the generalized linear model and the shaded area indicates the 95% confidence interval. The correlation was statistically tested using Kendall’s rank correlation test (two-sided). The density charts describe the distribution of interactions. The density curves above each plot indicate the distribution of interactions along PD. The inset graphs present the 2-dimensional density distribution of nodes.

Genomic clues to species metabolic synergies

Genome sizes and contents were shown to be crucial connecting links for metabolic interactions within the hot spring community, which also were intricately linked to nutrient requirements. The estimated genome sizes of MAGs distributed at different temperatures underwent a transition from negative to positive correlation with temperature from the ET group (T85 and T78) to lower temperature (Fig. 6a). In harsh environments with extremely high temperatures, small-genome species tend to occupy higher abundances. However, when the environmental temperature drops to a range where most species can adapt, the small-genome species no longer hold this abundance advantage and tend to occupy only a small abundance in the environment. Genome size was also confirmed to correlate with the proportion of genes in the genome that perform different functions (Fig. 6b). Estimated genome size showed a significant positive correlation with the proportion of genes involved in Secondary metabolites biosysthesis, transport and catabolism (COG-Q, Spearman’s Rho = 0.670, p < 0.001), Carbohydrate transport and metabolism (COG-G, Spearman’s Rho = 0.473, p < 0.001), Lipid transport and metabolism (COG-I, Spearman’s Rho = 0.234, p < 0.001) and Inorganic ion transport and metabolism (COG-P, Spearman’s Rho = 0.125, p < 0.05). Conversely, smaller genomes prioritized housekeeping functions like Translation, ribosomal structures and biogenesis (COG-J, Spearman’s Rho = −0.932, p < 0.001), Nucleotide transport and metabolism (COG-F, Spearman’s Rho = −0.634, p < 0.001) and Replication, recombination and repair (COG-L, Spearman’s Rho = −0.462, p < 0.001). Furthermore, a linear mixed-effect model was constructed to reveal that when genome size was set to be random effect, differences in biological functions such as energy production and conversion, metabolism of nucleotide, amino acid and lipid had a strong positive effect on MIcomplementarity (Fig. 6c). The MIcomplementarity of pairwise MAGs significantly increased with the difference in genome sizes (Fig. 6d–f). This correlation was strongest in the ET group, suggesting that genome size plays more crucial role in shaping metabolic partnerships in harsh environments. These results reflected a trade-off of smaller size genomes, where the genes associated with genetic information storage and processing were retained to a greater degree while genes related to various metabolisms underwent loss, relying on synergistic interactions to compensate for their limited metabolic repertoire. However, it is essential to consider that the absence of specific genes might also be due to incomplete genome reconstruction, given the accepted cutoff for MAG completeness at 50%.

Fig. 6: Relationships between MIcomplementarity and estimated genome size and biological function proportion.
figure 6

a Correlation (Spearman’s rank correlation test, two-sided) between normalized average abundance of each temperature group and the estimated genome size. b Correlation matrix on proportions of twelve COG biological functions in MAGs. The lines between estimated genome size and each COG biological function indicated Spearman’s correlation (two-sided). The color shade of the lines represented the strength of the correlation, and the thickness represented the confidence level. c Effect of COG biological function proportion on MIcomplementarity by linear mixed-effects models (LMMs) fit by restricted maximum likelihood (REML) estimation. Data are presented as mean values ± standard errors of the estimated effect sizes. Bar length and error bar indicated the mean values and standard errors of the estimated effect sizes. Significance test was conducted using a t-test of Satterthwaite’s methods (number of observations n = 80,200 for non-self-loop pairwise interactions between 401 MAGs). Significant effects are represented by asterisks: ***p < 0.001, **p < 0.01, ‘ns’ stands for not significant. df Correlation between MIcomplementarity and genome size difference. The fitted curve was modeled using the generalized linear model. The correlation was tested using Spearman’s rank correlation test (two-sided).

More genomic clues of species synergies were found in the overlapped partnerships between co-occurrence and metabolic complementarity networks. Only a handful partnerships (7, 49, and 58 in the ET, HT, and MT groups, respectively) exhibited strong metabolic complementarity alongside persistent co-occurrence (Supplementary Data 5). Among these partnerships, 274 metabolites were detected as potentially transferable, while 58 were coenzyme A derivatives. Amino acids (with peptides and analogs) and carbohydrates (and carbohydrate conjugates) followed, with 43 and 36, respectively. There were also 30 molecules classified as nucleosides, nucleotides, and analogs deemed to be potentially transferrable. Notably, between those paired species, the types of metabolites potentially transferred from one to the other were quite different (Supplementary Data 5).

In exploring the intricacies of species metabolic synergies, our focus was drawn to 17 archaea-bacteria interactions, which were supported by both metabolic networks and co-occurrence networks, indicating that inter-domain synergies were prevalent in harsh environments such as hot springs. In these interactions, there were some MAGs that exhibited the properties of co-occurrence and metabolic complementarity with multiple other species. For instance, an archaeal MAG (T64.bin.92, Micrarchaeota phylum) had seven distinct bacterial partners (Fig. 7). This centrally positioned MAG, characterized by its smaller estimated genome size of 0.857 Mbp, predominantly assumed the role of a taker in these interactions. This was evidenced by its substantial reliance on obtaining several types of metabolites, such as some coenzyme A derivatives and other crucial metabolic need that it cannot fulfill endogenously, from other species. Despite its taker centric position, T64.bin.92 contributed to these commensalistic relationships by providing potential surplus materials like carbohydrates and nucleoside/nucleotide- related substances, embodying a reciprocal dynamic in these interactions. This necessity for external coenzyme A derivatives, vital for biosynthesis reactions involving acyl transfers, underlines the significant dependence of T64.bin.92 and its partners. Furthermore, its requirement for certain inorganic substances, including various metal ion and non-metal compounds from its metabolic counterparts, further illustrated the complexity and significance of its role as a taker. However, it is important to note that these conclusions are based on genetic potential inferred from genomic data. To confirm the actual metabolic exchanges and interactions, additional validation with functional omics and metabolomics data is necessary.

Fig. 7: The potentially transferable metabolite map of the ET group.
figure 7

The arrows illustrate the direction of metabolite transfer, and each sphere represents a class of metabolites. Here, T64.bin.92 serves as a metabolic taker while the other four serve as metabolic givers Sankey diagram displaying the categorical correspondence of metabolites in the HMDB database and in this study. This figure was created with BioRender.com released under a Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International license.

The validation of our study on composting system experiment

In hot spring habitats, we noticed that symmetric interactions (mutualism) were rare. Besides, in commensalistic and mutualistic interactions, the genome size differences between species pair varied. In a similar vein, our findings from a composting system where temperature is a key stress factor mirror those from hot springs. A low occurrence of mutualistic interactions were observed across different sampling time (temperature) groups: D00(Tavg = 26.02 °C), D05(Tavg = 64.29 °C), and D25(Tavg = 41.35 °C), with mutualism rates at 0.23%, 0.087%, and 0.23%, respectively. Asymmetrical interactions, suggesting commensalism, were slightly more common at 4.23%, 2.73%, and 4.18%. The genome size differences in commensalistic interactions at D05 were around 1.36 Mbp, similar to findings in hot springs, whereas mutualistic interactions across all groups showed negligible genome size differences (Supplementary Fig. 3). The correlation between phylogenetic distance and pairwise MIcomplementarity was also observed to be significant in composting system. Specifically, D05, as the highest temperature sample in the experiment, showed the strongest positive correlation. Furthermore, the pattern that the greater the genome size difference between two species corresponds to higher their complementarity potential was confirmed by testing the significance of the correlation between MIcomplementarity and genome size difference (D00: Spearman’s Rho = 0.122, p < 0.001; D05: Spearman’s Rho = 0.213, p < 0.001; D25: Spearman’s Rho = 0.090, p < 0.001).

The sequencing data from the composting experiment recovered few overly streamlined genomes, suggesting that such genomes might arise from long-term selection rather than short-term temperature changes. Nevertheless, we observed asymmetrical synergistic interactions, with a significant proportion being commensalistic (Supplementary Fig. 4). These interactions showed a considerable difference in genome size, particularly in the highest temperature group (genome size difference: D00, 815 Kbp; D05, 1368 Kbp; D25, 525 Kbp), similar to the hot spring habitats. This implies that high temperatures are a crucial factor driving synergy between genomes of vastly different sizes.

Discussion

Although some evidence has indicated that metabolic exchanges are ubiquitous, detecting those synergistic metabolic interactions in natural communities is still highly challenging2. Here, we built a bioinformatic workflow integrating co-occurrence with metabolic network approaches using metagenomics sequencing datasets. There are two primary considerations for this workflow construction. First, either co-occurrence or metabolic network approaches have their advantages and defects. The current metabolic network approaches mainly infer synergistic interaction by measuring the metabolic complementarity dependency or metabolite exchange potential between any two microbial genomes18,46,47. However, those two species may be out of synergy in time and space due to lack of contact. Conversely, co-occurrence network approaches infer species relationships in a series of real natural communities, but they only reflect superficial co-occurrence patterns that might not indicate true ecological interactions16,17,48. Therefore, we aim to combine the strengths of both approaches to address their limitations (Fig. 1). Second, we applied the random matrix theory (RMT) to construct the metabolic network models, alleviating unreliable synergistic interactions. The RMT method could obtain the lowest false positives in correlation-based networks49,50. In a synergistic metabolic network, there is a similar requirement for threshold determination on metabolic complementarity dependencies. Therefore, we established a publicly available workflow integrated into our iNAP pipeline to facilitate analyses based on both co-occurrence and metabolic networks.

Using this workflow, we yield some insights into how thermophiles within geothermal ecosystems adapt to the scorching heat through their synergies. Both co-occurrence and metabolic complementarity networks revealed that the network density increased significantly with the rising temperatures (Figs. 3 and 4). It suggested that thermophiles may form tighter connections under extreme heat and engage in more frequent material exchanges. These collaborations could serve as a crucial adaptation strategy, fostering community survival and enhancing resistance stability51,52. Some previous studies have demonstrated that thermophiles may prioritize amino acid assimilation to cope with nitrogen limitation53, while some may even adapt by exchanging or acquiring DNA from the environment54. Our metabolic networks provided additional evidence that the species achieved high community function stability at high temperatures by enhancing the efficiency of resource and information transfer (Fig. 6c, Supplementary Data 5).

Our analysis also revealed some genomic clues for metabolic complementarity among thermophiles. As species diverged further on the evolutionary tree, their metabolic synergy intensified, particularly at extreme temperatures (Fig. 4). This aligns with previous findings that species with less overlap in their metabolic abilities find greater benefit in partnering with distant relatives42. This observation likely stems from underlying genomic differences. Distant phylogenetic relationships often translate to greater genomic disparities, leading to variations in essential compounds needed for survival35,55,56. Furthermore, we also found that thermophiles inhabiting extremely heat niches harbored the smallest average genomes (Fig. 6c) that streamlined to minimize substrate and energy requirements (Supplementary Data 5), consistent with the Black Queen Hypothesis26,57. Consequently, to maintain their vital metabolic activities, species with reduced genome sizes were compelled to engage in more frequent metabolic interactions with other species. Finally, differences in genomic features, particularly in gene function distribution, are more likely to occur between distantly related species with complementarity metabolisms (Fig. 6c). We observed that species with smaller genomes prioritize genes crucial for genetic information storage and processing, potentially reflecting a strategy to conserve essential functions while relying on external sources for metabolic needs33,34,58. This suggested a symbiotic approach to survival under harsh stress, where core genetic elements are preserved while costly metabolic tasks are outsourced to their synergistic partners.

Notably, these synergistic relationships can be asymmetrical59. The distinct patterns observed in genome size differences between commensalistic and mutualistic interactions underscore this imbalance. Our results illuminated that a considerable proportion of synergistic relationships are commensalistic (Fig. 4). In these commensalistic interactions, there is a significant tendency for one participant (the taker) to have a streamlined genome, while the other (the giver) possesses a comparatively larger genome. This is particularly evident in the extremely thermal group, where over half of the commensalistic interactions involve a taker with a genome size streamlined to lower than 2 Mbp (Fig. 7). Similar trends, though less pronounced, are observed in the HT and MT groups. These findings suggest an ecological strategy among geothermal microbial communities. The commensalistic relationships often formed between microorganisms where one has undergone genome streamlining, forming loss of function (LOF), suggesting a specialized, especially energy-efficient or metabolism-conserving role, while their partners maintain a larger, potentially more versatile genomic repertoire. This dynamic could reflect an evolutionary optimization, where streamlined genomes reduce metabolic redundancy, relying instead on the metabolic versatility of their partners with more complete genomes and various capabilities. In contrast, mutualistic interactions, where both participants benefit, showed different genome patterns. These mutualistic relationships often involve species with more balanced genomic capabilities, suggesting that both partners contribute to and benefit from their combined metabolic activities. By accurately distinguishing between commensalism and mutualism, our findings provide a clearer understanding of the ecological and evolutionary dynamics within geothermal microbial communities.

We applied a novel approach to confirm that metabolic complementarity is significantly associated with high temperatures in hot spring habitats. To extend the applicability of our findings, we also examined a composting system, another environment where high temperatures are a primary stressor. Here, similar to hot springs, mutualistic interactions, which involve symmetric synergies, were notably infrequent (Supplementary Fig. 4). Among asymmetrical synergistic interactions, i.e., commensalistic ones, the group subjected to the highest temperature (D05) displayed the most pronounced genome size differences between interacting parties (Supplementary Fig. 4). However, compared to hot springs, fewer extremely streamlined genomes were identified in composting samples. Furthermore, we observed a positive correlation between the complementarity index (MIcomplementarity) and phylogenetic distance, particularly strong in the highest temperature group (Supplementary Fig. 3). This suggests that temperature significantly influences the establishment of metabolic complementarity between distantly related species in the composting system as well. Although the retrieved MAGs from the composting system did not exhibit as significant a range in genome sizes as those from hot springs, a clear correlation between metabolic complementarities, as determined by the RMT approach, and genome size differences was evident, especially at higher temperatures. These findings underscore the critical role of temperature in shaping metabolic complementarity across diverse environments.

While the developed metabolic pipeline marks progress in microbial network analysis, it also highlights areas that require further refinement for practical application in microbiome studies. As observed in this study, there were only a few shared connections between co-occurrence networks and metabolic networks (Supplementary Data 5), and there are several possible reasons for this. First, the basis behind the co-occurrence networks is the measured abundance of microbial species obtained by metagenome sequencing, which means that the statistical correlation between pairwise species is itself biased, or even erroneous. In complex microbial communities, the higher-order interaction (HOI) might be a non-negligible factor that cannot be achieved in pairwise interaction models. The modeling calculations and mechanistic studies associated with HOI must consider much more than substance exchange alone. One prime example is that the production of a substance by one species may require the coexistence of two or more other species60,61. Second, the metabolic approach only demonstrated specific exchanged metabolites between species, while fundamental species interactions are much more varied than just the exchange of metabolites. For instance, the exchanges of information systems or signal molecules, such as quorum sensing, may mediate species interactions62,63. Since microbial interactions in natural habitats are difficult to reproduce under laboratory conditions, our approach provides insights into observing co-occurrence and interpreting metabolic interactions.

Methods

Sample collection, DNA extraction, and sequencing

Sediment samples for our study were collected from a hot spring in Tengchong, Yunnan Province, China (N24°56′ ~ 25°27′, E98°26′ ~ 98°27′) in June 2020. The temperature range measured at this hot spring was between 63.5 °C and 85.8 °C, and the water was slightly alkaline (pH range = 8.36~8.70) (Supplementary Fig. 5a). Eight sampling sites displaying a gradual temperature decrease were selected along the spring flow, with five replicate samples collected at each site. According to the measured temperature of each sample, eight temperature sites were labeled as T85, T78, T73, T70, T67, T65, T64, and T63 (Supplementary Fig. 5b). After sample collection, they were promptly placed in a liquid nitrogen container and transported to laboratory within 2 days. Subsequently, they were stored at −80 °C. Before storage, sediment samples for DNA extraction were pre-divided to prevent DNA damage due to repeated freeze-thaw cycles in later experiments. After 48 h of lyophilization, total DNA within the microbial community was extracted from 1.5 g of freeze-dried sediment using the grind plus kit method as previously described64.

The acquired DNA was used for Illumina NovaSeq6000 PE250 metagenomic sequencing (DNA library insertion size: 450 bp) and Oxford Nanopore sequencing (the PromethION R9.4 flow cells FLO-PR0002, Oxford Nanopore PromethION sequencer). The Illumina sequencing was conducted on all 40 samples by Magigene Biotechnology Co., Ltd. (Guangzhou, China). Replicate samples of each site were mixed and sequenced using Nanopore sequencing by Benagen Technology Co., Ltd (Wuhan, China).

Metagenome assembly, genome binning, and MAG classification

Raw Illumina metagenomic reads were quality trimmed using Trimmomatic (v0.39, LEADING:3 TRAILING:3 MINLEN:50 SLIDINGWINDOW:4:20)65. The remaining reads of each sample were de novo assembled using IDBA-UD (v1.1.3) with default parameters66. Here, the reads of 5 replicates from one sampling site were co-assembled to improve the robustness of assembly and the possibility of obtaining a higher diversity of genomes within the homogeneous environment. The sequencing coverage and read diversity were estimated using the Nonpareil method67,68. Then, contigs of samples from the same sampling site were pooled together and used for co-assembly with Nanopore long reads using OPERA-MS (v0.9.0) with default parameters69. Longer contigs generated by this hybrid assembly, with a minimum length of 1000 bp (1500 bp for metaBAT2), were used for genome binning using metaWRAP (v1.3.2)70, with metaBAT2 (v2.12.1)71, MaxBin2 (v2.2.6)72, and CONCOCT (v1.0.0)73 as the core binning tools. Draft bins were quality controlled using metaWRAP bin_refinement module with parameters -c 50 -x 10, indicating only MAGs with completeness higher than 50% and contamination lower than 10% were retained for the following analysis, declared to be medium- and high-quality MAGs74. The refined bins were then replicated using dRep (v3.5.0)75 with parameters -pa 0.9 -sa 0.99. The estimated genome size was calculated by dividing the genome size by the sum of completeness and contamination76. The taxonomic classification of the selected MAGs was conducted with GTDB-Tk classify_wf workflow against the GTDB genome database (Release 202)77. The phylogenetic trees for bacterial and archaeal MAGs were constructed using multiple sequence alignment results generated by GTDB-Tk workflow. The unrooted bacterial and archaeal phylogenetic trees were rooted using midpoint rooting method, performed by midpoint.root function in phytools R package78. and were then visualized using the online tree display tool iTOL (v6.6)79. The phylogenetic distance was then calculated using the cophenetic.phylo function in R.

Quantification of MAGs and co-occurrence network construction

The refined bins were quantified by using CoverM (v0.7.0, https://github.com/wwood/CoverM, genome mode and coupled reads as input) with parameter: --method relative_abundance. In the output relative abundance table (referred as MAG table), every column representing a sample contains unmapped reads percentage, making it reasonable to compare relative abundance across samples. A non-metric multidimensional scaling analysis (NMDS) was conducted with an encapsulated function metaMDS in the vegan package (R) to investigate the community-level difference among sampling sites. A dissimilarity test based on the Bray–Curtis distance was then applied to divide samples into three groups. MAGs were assigned to three groups (ET, HT, MT) based on their relative abundance in samples. Each group comprised multiple samples with five replicates. A MAG was considered present in a group if it had non-zero abundance in more than half of the samples (≥6 for ET; ≥8 for HT and MT). If a MAG did not meet this criterion in any group, it was assigned to the group with the highest total relative abundance. Additionally, MAGs were ensured to be in the group corresponding to their sample prefix (e.g., T85 and T78 in ET). With the abundance table, the pairwise Spearman’s rank correlation of each two MAGs was calculated. Since the relative abundance table originated from the distribution of hypothetical species in a natural environment, it can be regarded as the matrix that the random matrix theory (RMT)-based approach requires. Before using RMT cutoff tool on iNAP website44, majority_selection tool was used to filter MAGs that had zero abundance in less than half of samples. This step is to exclude biases from correlation calculations due to the effect of too many zeros. The separated noise formed a network representing MAG pairs with a high abundance correlation, inferring their co-occurrence. The network’s general properties were calculated using the igraph R package and visualized using Cytoscape (v3.10.0)80.

Genome-scale metabolic models and interaction network construction

Genome-scale metabolic models (GSMMs) for all MAGs were constructed using CarveMe (v1.4.1) with default parameters81. The input for CarveMe was coding sequences of each MAG, which were predicted using Prokka82. The interactions between pairwise GSMMs were predicted using PhyloMint47. All modeling software mentioned above required the installation of an optimization solver, which for our study was IBM CPLEX Optimizer (ILOG COS 20.10 Linux x86-64 version). PhyloMint metabolic complementarity index (MIComplementarity) was used to represent the cooperation potential of pairwise GSMMs, or MAGs. For a given MAG pair, two MIComplementarity values, ranging from 0 to 1, were calculated due to the asymmetry of the index that a MAG could be both the metabolic giver and taker. Therefore, a higher MIComplementarity value was regarded as the maximum metabolic cooperation ability between pairwise MAGs and was used for the following analysis. MIComplementarity values were determined using the RMT cutoff tool on iNAP website, and the corresponding pairwise interactions were selected to form the PhyloMint complementarity network. In the network, synergistic interactions are determined based on whether the original PhyloMInt values for all species pairs represented by the edges exceed the RMT screening threshold. If only the unidirectional PhyloMInt value exceeds the RMT threshold, the edge is defined as representing commensalistic interactions; if the bidirectional PhyloMInt values both exceed the RMT threshold, the edge is defined as representing mutualistic interactions. Similarly, the network attributes were also computed. The predicted coding sequences of each MAG were searched against the database of Clusters of Orthologous Genes (COGs, https://www.ncbi.nlm.nih.gov/research/cog/)83 and were categorized into various biological functions. The effects of biological functions on the complementarity potential between pairwise MAGs were estimated using linear mixed-effects models (LMMs) to eliminate the impact of genome size on functions. This step was performed using the R package lme4. Pairwise MIComplementarity was regarded as the response variable, and gene proportion differences of each biological function were regarded as fixed effects. The genome size difference was termed a random intercept effect.

Extraction of shared interactions and determination of potential exchangeable metabolites

If a specific pairwise MAG interaction possessed both co-occurring properties, as detected by the co-occurrence network and the metabolic cooperation potential detected by the PhyloMint MIComplementarity network, they were defined as a “dual interaction”. A dual interaction indicated that two MAGs shared a strong correlation between environmental co-occurrence patterns and metabolite profiling complementarity. Using the PhyloMint algorithm, seed metabolites were defined by the strongly connected component and represented substrates acquired exogenously. When considering two MAGs, A and B, a metabolite found in A’s seed set but not in B’s seed set indicated a potential transferability from B to A. This suggests that B could synthesize the metabolite while A could utilize it. This definition aligns with the computation method of the PhyloMint complementarity index. In line with the previously described definitions, the potential of one MAG to utilize the metabolites produced by another was determined by the overlap between one MAG’s seed metabolite set and the other MAG’s non-seed metabolite set. The metabolite profiles were taken directly from the corresponding genome-scale metabolic models generated by CarveMe with the BiGG Models database (Version 1.6)84. The metabolites were categorized using the Human Metabolome Database (HMDB Version 5.0)85. Due to the miscellaneous nomenclature of diverse metabolites, we manually browsed all related metabolites by their name in the BiGG Models database to recheck the classification in HMDB and made some changes to the metabolite classification rules of HMDB as follows (at “super class” level): (i) fatty acyl CoAs were separated from lipids and lipid-like molecules, and were then grouped with other CoA derivatives to form the class “Coenzyme A derivatives”; (ii) compounds identified as amino acids, peptides, and analogs were separated from organic acids and derivatives into a separate class; (iii) Carbohydrates and carbohydrate conjugates were separated from organic oxygen compounds into a separate class; (iv) all inorganic compounds were combined into one class, “Inorganics”. This class included homogeneous metal and non-metal compounds (inorganic compounds that contain solely metal or non-metal elements, respectively); (v) Organic nitrogen compounds, organic oxygen compounds, organoheterocyclic compounds, benzenoids, and organosulfur compounds were combined into one class “Organic compounds with specific atoms or structures”; (vi) Oxidized ferrodoxin, reduced ferrodoxin, protoheme, staphyloferrin B, and pyoverdine P. putida specific were categorized into “Other metabolites”. With these modified rules, all metabolites involved in our study were classified into nine categories (Supplementary Fig. 6, Supplementary Data 5).

Analysis of the composting system

Zhao et al. conducted a 30-day composting experiment in 10 composting piles at a food waste composting facility. As the temperature changed during the experiment, they selected six sampling time points, with three replicates from each composting pile for each sampling, obtaining a total of 180 samples for metagenomic sequencing. After co-assembly and binning, they obtained 159 high-quality MAGs (completeness >90%, contamination <5%) for further study. Three sampling time points were selected, Day 0, 5, and 25 (tagged as D00, D05, and D25), with average temperatures of 26.02 °C, 64.29 °C, and 41.35 °C. These three time points were regarded as three states of temperature change in the composting experiment (Day 0: the beginning of the experiment, the coolest temperature; Day 5: the highest temperature; Day 25: the temperature dropped in the late stage of the experiment). The relative abundance table of MAGs at these three sampling time points was summarized as a proxy for the state of these MAGs at that time point. The metabolic models and metabolic complementarity indices of MAGs were constructed and calculated using the methods described above. In the three groups, MAGs with relative abundance greater than 0 were assigned to the corresponding group for the construction of metabolic complementarity networks. The threshold for identifying metabolic complementarity within networks was established at an RMT cutoff of 0.280.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.