Introduction

The Arctic Ocean is a unique ecosystem undergoing major transitions during climate change. Over the past two decades, temperatures have risen more than twice compared to the global average1, combined with a reduction in sea ice and snow cover, exacerbating warming trends. In particular, the extent of Arctic sea ice has declined1. These environmental changes have a wide range of consequences, including profound shifts in biodiversity2, and thus have a fundamental impact on ecosystems of the Arctic Ocean. There are first signs that the geographical ranges of temperate species are shifting northwards3, while polar fish and ice-associated species experience a reduction in their habitat due to changing environmental conditions. These ecological changes impact the entire ecosystem stability1. The complex relationship between biodiversity and ecosystem stability remains poorly understood, particularly in the Arctic Ocean. Consequently, the rapid changes in Arctic sea ice and environmental conditions require an improved understanding of the mechanisms governing the resilience and stability of biological processes and ecosystem functions in the Arctic Ocean. Within marine ecosystems, primary production is a key service supporting all trophic levels4,5, with implications for biodiversity, the abundance and community structure at higher trophic levels, and carbon sequestration. This distinct ecosystem feature is supported by a highly productive microalgal community that thrives in sea ice, accompanied by a remarkably diverse heterotrophic community ranging from bacteria to metazoans6.

Recent decades have seen an increase in pelagic phytoplankton and primary production in the Arctic Ocean, a direct consequence of global warming7,8,9. In the Central Arctic Ocean (CAO), sea-ice algae rather than phytoplankton account for much of the primary production10,11 as they have the potential to initiate pelagic blooms beneath the ice12. Typically, phytoplankton growth starts mainly within the marginal ice zone in spring, co-occurring with increased solar radiation and meltwater-induced stratification13. Over the past three decades, increasing evidence has documented the occurrence of under-ice blooms in the Arctic Ocean14,15,16,17, while phytoplankton in the water column below the ice shows significant differences from the microalgal communities in the sea ice18. However, changes in the diversity of key species related to the increase in Arctic pelagic primary production and its impact on the marine ecosystem stability are currently unresolved.

Recent findings indicate that high temperatures in natural ecosystems may affect ecological stability, whereas the consequences of alterations to biodiversity remain variable19. Nevertheless, the underlying mechanisms remain a subject of debate and limited understanding20. The presence of nearly 2000 phytoplankton taxa and 1000 ice-associated protists in the Arctic21 indicates the relevance of identifying keystone species in this wealth of Arctic marine microbial diversity that accounts for ecosystem stability22,23,24.

Understanding biological and ecological dynamics across seasonal environmental gradients is substantially fostered by novel statistical approaches. In polar ecosystems, these gradients include polar day and night, as well as variations in sea ice cover, stratification, or nutrient concentrations. Techniques are now accessible to assess the impact of ecological variables on ecosystem stability. For instance, co-occurrence networks (CON) determine and visualize how species coexist within communities or ecosystems25,26. However, in natural ecosystems, species interactions are subject to variation as a result of changes in environmental conditions, which can cause a transition from one stable state of co-occurrence to another27. Cross-convergence mapping (CCM) helps to identify the causality of co-occurrence in complex ecosystems, i.e., which organisms might share mutual or other direct relationships. Energy Landscape Analysis (ELA) aids in building ecological models that simulate and predict how ecosystems respond to disturbances or changes of environmental parameters28,29,30.

In this study, we establish a mathematical methodology to reveal seasonal patterns, suggest causal ecological relationships, and identify microbial keystone species in Western Fram Strait.

This major gateway between the Arctic and Atlantic Oceans has been studied for over 20 years within the framework of the long-term HAUSGARTEN and FRAM observatories31. Our study contributes an extended mathematical perspective on microbial inventories in Fram Strait, showing seasonal patterns and the influence of sea-ice on microbial dynamics and the biological carbon pump32,33,34,35,36. Based on a 4-year metabarcoding dataset of microeukaryotic taxa in context of rich oceanographic data, sampled year-round in approximately biweekly intervals, we develop scenarios of their long-term resilience. Additionally, we predict taxa that play a crucial role in maintaining stable communities among Arctic eukaryotic planktonic food web. Furthermore, we seek to define keystone species that can serve as indicators for monitoring the consequences of environmental change for Arctic marine ecosystem stability. Using an unprecedented combination of network analysis techniques like co-occurrence networks and cross convergence mapping, along with energy landscape analysis, our objective is to elucidate which factors might determine the stability of Arctic marine ecosystems. This approach will greatly improve our understanding of the effects of climate change on this ecosystem.

Results

We examined a dataset of 1019 eukaryotic ASVs and eight environmental parameters compiled over 4 years at mooring site F4 in the West Spitsbergen Current (WSC) in Fram Strait. The aim was to characterize species communities, analyze causal relationships between ASVs, and identify keystone and resilient taxa with respect to the impact of various environmental conditions.

To accomplish this, we established a novel computational pipeline, coupling co-occurrence analysis with convergence cross mapping and energy landscape analysis. This allowed us to identify causal interactions among taxa in a co-occurring community and to identify stable community states across different environmental conditions.

Co-occurrence network reveals seasonal dynamics

The co-occurrence network (CON) comprised eight connected components, with a major component accounting for 98% (935) of all nodes, which are connected by 8610 edges. In the following, we focus on this major connected component. The resulting undirected graph notably displays a clear seasonal cyclic pattern (Fig. 1A).

Fig. 1: Co-occurrence networks and microeukaryotic composition at mooring F4 from 2016-08-01 to 2020-09-17.
figure 1

Each network node represents an ASV, and each edge represents a similar temporal pattern of two ASVs. The edge weights correspond to the Pearson correlation coefficients determined from the comparison of the individual ASV temporal profiles. ASVs are connected if the coefficient is r > 0.7, p < 0.05. Thus, edges in this figure indicate if the temporal profiles of two nodes display a significant correlation. A Node color reflect the month in which the ASV exhibits maximal abundance, calculated from the maximum abundance mode for each year ranging from January to December. B In this representation, nodes are colored based on the community membership that was determined by the Louvain community detection algorithm. C The proportion of the top 10 taxonomic classes by cluster (“HS” high light summer, “LW” low light winter, “TS” transition spring and “TA” transition autumn). Color shades illustrate the assignment to auto- (green), mixo- (orange) or heterotroph (purple).

The network was partitioned using the Louvain community detection algorithm37, revealing ten discrete community clusters (Fig. 1B) labeled by the season in which the majority of cluster members had their maximum abundance (Table 1). To further group the clusters, we submerge each 3 month period into one season. Two clusters were assigned to the transition autumn period (01TA and 02TA), three clusters were associated with the low light winter period (03LW, 04LW, and 05LW), and three clusters with the transition spring period (06TS, 07TS, and 08TS). Finally, clusters 09 and 10 were allocated to the high light summer period (09HS, 10HS).

Table 1 Co-occurrence network clusters

Community composition

We explored the taxonomic composition per cluster to explore the seasonal associations of each specific taxonomic group. Alpha biodiversity, measured by Shannon entropy, decreases from summer through autumn and winter, gradually decreasing towards spring (Supplementary Information Fig. S1) except for cluster LW05. The beta biodiversity, measured by Bray-Curtis distance (Supplementary Information Fig. S2), between the winter and spring clusters (03LW, 04LW, 05LW, and 06TS) is notably lower than between most other cluster pairs, except that between 01TA and 10HS. Cluster 02TA shows, on average, a higher beta diversity compared to all other clusters, which can be explained by the fact that 02TA is the smallest cluster in terms of the number of ASVs (Supplementary Information Fig. S2).

We found distinct taxonomic compositions within various clusters. Photosynthetic organisms like Ochrophyta and Haptophyta dominate the light phases38. In late spring (cluster 08TS), phototrophs make up more than 75% of ASVs, while during summer (clusters 08TS and 09HS) and early autumn (10HS), they still comprise over 25% of all ASVs. Mixotrophs are highly abundant in most clusters, while they clearly dominate during the late autumn transition (cluster 02TA). Through the complete dark period (clusters 03LW, 04LW, and 05LW) as well as in early spring (06TS and 07TS), heterotrophs, particularly Syndiniales, are dominant with a clear peak of abundance (more than 90% of ASVs) in mid-winter (cluster 04LW). During early spring (cluster 06TS) when sunlight appears again, mixotrophs increase in their abundance, highlighting the nuanced trophic dynamics during the annual cycle. This comprehensive analysis at taxa level provides insights into the composition of these clusters, shedding light on the prevalence and distribution of specific classes within distinct seasonal communities (Fig. 1C).

Convergent cross mapping identifies community interactions

Convergent cross mapping (CCM) was applied to predict causal relationships within and between seasonal clusters based on the underlying ASV dynamics. We project the CCM-derived weights onto the co-occurrence network, resulting in a directed graph consisting of 17,220 directed edges and 935 nodes. Here, a directed edge indicates that knowledge of the dynamics of the source node allows predicting the dynamics of the target node.

A comparative analysis of edge weights within the CCM network was conducted. The connectivity derived from the co-occurrence network was compared with theoretical edge weights and randomly permuted connections. To perform this comparison, a two-sided Kolmogorov-Smirnov test was used. The theoretical edge weights were derived from all possible connections between pairs of nodes in the co-occurrence network, excluding the existing true links (Section “Convergent cross mapping identifies community interactions”). The findings clearly show that there is a stronger causal influence (higher NMI values) between co-occurring ASVs compared to random or to unconnected nodes (Supplementary Information Figs. S8 and S9 and Tables S1 and S2). The significance level was set at a nonparametric p value of less than 0.05 calculated similar to26. Trimming edges with non-significant NMI values (Supplementary Information Fig. S10), produced a network graph consisting of 4597 edges and 719 nodes, divided into 18 disconnected components, with the largest module encompassing 706 nodes with 4572 edges (Fig. 2A, B).

Fig. 2: Convergence cross mapping networks of microeukaryotes at mooring F4 from 2016-08-01 to 2020-09-17.
figure 2

Each node in the CCM network represents an ASV, and each edge represents the causal influences. The edge weight corresponds to the Normalized Mutual Information determined from the comparison of the individual ASV and their predicted representation in the shadow manifold. ASVs are connected if the smoothed p value of the weight is p < 0.05 (Supplementary Information Fig. S3). A Node color reflects the month in which the ASV exhibits maximal abundance, calculated from the maximum abundance mode for each year ranging from January to December. B In this representation, nodes are colored based on the community membership that was determined by the Louvain community detection algorithm. “HS” labels denote high light summer, “LW” represents low light winter, “TS” corresponds to transition spring, and “TA” indicates transition autumn. C The Normalized Mutual Information aggregated across the edges between the clusters, visually represented by thickness of the arrows corresponding to their respective values. Colors visually represent the clusters. D Interaction analysis between taxonomic clusters. For each of the ten clusters, the interactions between ASV groups are examined at class level, considering “Syndiniales”, “Dinophyceae”, “Bacillariophyta” and “MAST”. The cluster assignments are marked by different colors. The thickness of the arrows denotes the strength of the interaction, while the shapes represent the various taxa groups at the class level.

Of the total 12,648 edges eliminated during trimming, 18.16% represent connections between taxa reaching their peak abundance in March. This selective removal has a profound impact on the network structure (Fig. 2A). In contrast to the co-occurrence network, the causal interaction network “breaks” during spring, as demonstrated in Fig. 2A (months March and April). The winter cluster 05LW and the spring cluster 06TS collapse (see Fig. 2B), meaning that nodes disintegrate and large parts of the clusters are no longer connected to the rest of the network. This suggests that the corresponding connections in the co-occurrence network are not a result of causal interactions but rather result from other factors, possibly caused by the prevailing environmental conditions. Analysis of the betweenness centrality reveals that ASVs of Picozoa, Leegaardiella, Acantharea, Dinophyceae, MAST-1, and Syndiniales serve as essential hub nodes throughout the seasonal cycle in the network, highlighting their crucial function in maintaining network stability.

Community interaction

Analyzing cluster interactions revealed distinct patterns. We measure distance of clusters by “network distance”, a metric designed to evaluate the separation between clusters. This measure is computed by assessing the distance between the centroids of clusters within the dimension-reduced UMAP embedding space (Supplementary Information Fig. S4). Distances between clusters thus determined range between one and seven. Proximate clusters (a network distance of two to four) exhibited notably higher connectivity compared to clusters situated further apart (a distance of five to seven). Figure 2C provides a visual and quantitative representation of these interactions. Subsequent analysis revealed a prevalence of connections at a network distance of two (92% of total connections), followed by distances of three (6% of total connections), and four (2% of total connections) (Supplementary Information Figs. S4S6).

For each seasonal cluster, we investigate in detail the mutual influence (NMI) of four taxonomic groups selected from the top ten classifications (Fig. 2D): Bacillariophyta, Syndiniales, Dinophyceae, and MAST (all MAST-X variants were classified under MAST). Bacillariophyta primarily comprises photo-autotrophic species39, while Syndiniales include parasitic species, most of them characterized by their heterotrophic lifestyle40. Dinophyceae are known for their diverse array of species and ecological roles, from symbionts to planktonic autotrophs41. MASTs are heterotrophic pico-eukaryotic protists and contribute substantially to protist abundances in the ocean. They play a crucial role in marine ecosystems, being among the dominant eukaryotes in the Arctic Ocean41,42.

These four taxonomic groups are primarily distinguished by their unique lifestyles and ecological roles as primary producers, consumers, parasites, or endosymbiotic interactors. These distinctions form the basis for our analysis of their contributions to the ecosystem. Hence, by summarizing the members of each group into single nodes, we analyzed their cross-interactions using the information obtained from the CCM network.

The strength and direction of interactions between these taxonomic groups varied over the annual cycle (see Fig. 2D). During the spring-summer and summer-autumn transition, clusters 02TA and 06TS displayed fewer and weaker connections compared to other clusters (see Fig. 2B). This suggests dynamic changes in community structure during these transition phases, with ecological interactions between individual species either yet to be established or no longer present. At the beginning of the polar night (cluster 05LW), we detected the most substantial influence from the dinoflagellates (shown by the thickest arrow in Fig. 2D) to the pico-eukaryotic heterotrophic groups Syndiniales and MAST, suggesting an ecological role of dinoflagellates for the establishment of the winter community. Overall, the strength of the links between the taxonomic groups decreases as the polar night progresses and has its minimum at the peak of the polar night in December (Cluster 04LW and 03LW). The lack of strong connections between taxonomic groups during the deepest polar night indicates that there are only a few ecologically significant interactions within the microeukaryotic community.

During polar day, the connections between taxonomic groups become stronger and reach their peak during the time span around summer solstice (Cluster 10HS). Notably, Bacillariophyta (i.e., diatoms) showed the most robust connections during the polar day, owing to their role as predominant phototrophic biomass producers at the foundation of the marine food web. However, towards the end of the growth period, the impact of MAST on Dinoflagellates becomes more pronounced (as shown by the thicker arrow), indicating an essential involvement of this pico-eukaryotic heterotroph in the ecosystem during the late polar day; a signature of the transition from primary production to recycling.

Community and environment interactions

For a more detailed understanding of which environmental conditions align with seasonal community clusters, we conducted a correlation analysis (Fig. 3). For our calculation, we define the time series of clusters to be the weighted sum of the abundance values of the ASVs assigned to this cluster. Cluster 10HS displayed a significant positive correlation with Photosynthetically Active Radiation (PAR) (0.64) and temperature (0.52) but a significant negative correlation with Mixed Layer Depth (MLD) (−0.64). This cluster thrives in environments with high light and temperature levels but stronger stratification. Cluster 03LW exhibits the opposite behavior, showing a moderately positive correlation with MLD (0.25) and polar water fraction (PW_frac) (0.21) while displaying an inverse relationship with PAR and temperature (−0.35 and −0.37, respectively).

Fig. 3: Environmental data and their correlation with Louvain clusters.
figure 3

A Environmental data for F4 from 2016-08-01 to 2020-09-17. The x-axis represents the time period, while the y-axis indicates the following parameters: Mixed Layer Depth (MLD) [m], Temperature [°C], Chlorophyll Fluorescence [μgl−1], Polar Water Fraction [%], Photosynthetically Active Radiation (PAR) [μmol photons m − 2 d − 1], Salinity [Practical Salinity Units (PSU)], Oxygen Concentration [μmoll−1], Depth of measurement [m]. B Correlations between environmental parameters and seasonal Louvain clusters. The displayed chart shows the environmental parameters from panel A in relation to seasonal clusters. These clusters are characterized by the cumulative relative abundance of ASVs. The time series for the clusters are calculated as the weighted sum of the abundance values of the ASVs assigned to them. “TA” denotes transition autumn, “LW” low light winter, “TS” transition spring and “HS” high light summer. The color gradient used in the heatmap illustrates the strength of the correlation visually, with blue shades indicating negative correlations and red shades the positive correlations. It is worth noting that a significance mask has been applied to show only correlations that are statistically significant.

Energy landscape analysis determines stability of microbial communities

We employed energy landscape analysis (ELA) to assess the stability of communities under the prevailing environmental conditions. We focus on four clusters representing the four seasons (01TA for autumn, 03LW for winter, 08TS for spring, and 10HS for summer). For each of these clusters, we determined the energy landscape, which is a highly complex function that depends on the abundances of all ASVs and the environmental parameters (see Fig. 4). To approximately visualize this landscape, we plot an interpolated smooth surface as a function of the two most significant NMDS dimensions. In addition, for each time point, we evaluate the energy landscape function and represent each energy value by a point in the three-dimensional diagram, where the z-axis represents the energy value.

Fig. 4: Energy landscapes depicting community structure dynamics.
figure 4

The plots display the reconstructed energy landscape on the NMDS surface for a cluster of each season. Environmental landscapes over the NMDS surface are reconstructed for each of the four example clusters. The z-axis displays the energy, while the x- and y-axes display the first and second NMDS dimensions. The landscape contours were estimated using a smoothing spline approach with optimized penalty parameters. Community states, which are defined by ASV compositions and occupy lower-energy regions, indicate higher stability within the energy landscapes. A The transition autumn cluster 01TA. B The low light winter cluster 03LW. C The transition spring cluster 08TS. D The high light summer cluster 10HS.

For the landscape reconstructed for cluster 01TA (Fig. 4A), the autumn communities display lower energy values than the communities found in other seasons. This demonstrates that the autumn communities exhibit high stability. For the winter cluster 03LW (Fig. 4B), the picture is less clear. Whereas the interpolated energy landscape has a more pronounced minimum, the energy values of the observed communities are not clearly separated. As a tendency, the summer communities have a high energy value, demonstrating that summer communities are unstable in winter conditions. However, spring and autumn communities exhibit comparable energy values as winter communities, which indicates that stable community structures in winter conditions are not clearly defined. This trend is even more pronounced for the spring cluster 08TS (Fig. 4C). Here, the interpolated energy landscape shows a broad and shallow minimum, and the energy values of all observed communities, regardless of the season in which they are found, are very similar. This suggests that under spring conditions, community structures are not very stable and that community compositions show a high plasticity. As a consequence, many different communities may exist under spring conditions.

These findings demonstrate that knowledge of the composition of winter communities does not allow for the prediction of the composition of spring communities, which is in conjunction with the observation of the CCM analysis and the gap between winter and spring clusters (Fig. 2B). Finally, the energy landscape in summer (Fig. 4D) shows a pronounced minimum, in which the observed summer communities are also found. This indicates that summer conditions support well-defined communities with a high degree of stability.

Figure 5 displays how the energy values of the spring, summer, autumn, and winter clusters vary over the 4 years. These values demonstrate clear seasonal patterns over the 4-year period from August 2016 to September 2020. In general, the Autumn Cluster (01TA) displays minimal energy values during the autumn months (September, October, and November), indicating that autumn communities are typically more stable during this season (Fig. 5). In contrast, the seasonal minima for the Winter Cluster (03LW) are less clear, showing that the stability of this cluster is rather independent of the season. The Spring Cluster (08TS) exhibits minima predominantly during the spring months (March, April, and May), suggesting a tendency towards stability during this season (Fig. 5). Likewise, the Summer Cluster (10HS) shows local minima during the summer months (June, July, and August), indicating a tendency towards stability in energy levels during this season (Fig. 5). Overall, each cluster demonstrates a proclivity for higher stability during its respective season, with the notable exception of the winter cluster. This underscores the influence of seasonal dynamics on the stability of microbial communities in these ecological communities in the Fram Strait but also indicates a potentially high plasticity in winter conditions.

Fig. 5: Energy landscape analysis illustrating seasonal dynamics of community energy levels over a 4-year span (08.2016–09.2020), with each subplot depicting a distinct season (autumn, winter, spring, summer).
figure 5

The energy curves, colored to represent specific clusters, are based on abundance and environmental data, as indicated. The x-axis denotes time, encompassing the specified period, while environmental data integration underscores the complexity of ecological interactions shaping energy distributions. Lower energy levels are typically more stable, in contrast to higher energy levels. A The transition autumn cluster 01TA. B The low light winter cluster 03LW. C The transition spring cluster 08TS. D The high light summer cluster 10HS.

Predicting keystone microeukaryotes in the Fram Strait

According to our definition (Section “Keystone species definition”), a keystone species is highly connected in the co-occurrence network, has a high influence on other species, and appears in a stable community. By contextualizing the evidence from CON, CCM, and ELA, we predict 38 keystone species across the annual cycle within the measured environmental profile (Table 2). 14 of these keystone species are associated with summer clusters (three and eleven are found in clusters 09HS and 10HS, respectively), 13 with winter (eleven and two in the winter clusters 03LW and 04LW, respectively), eight are associated with autumn (cluster 01TA) and three with spring (cluster 07TS). The 14 keystone species from the summer clusters belong to the taxonomic groups Ochrophyta (6), Dinophyceae (4), Ciliophora (3), and Cryptophyta (1). These groups include Fragilariopsis, Pseudo-nitzschia, and Thalassiosira, major diatom taxa during Arctic blooms43 that also serve as prey for microzooplankton44,45. Notably, Fragilariopsis and Thalassiosira exhibited the highest abundance within this cluster. The keystone species in the winter clusters comprise Syndiniales (10), Radiolaria (1), Ochrophyta (1), and Dinophyceae (1); autumn cluster keystone species are Syndiniales (3), Ochrophyta (2), Chlorophyta (1), Dinophyceae (1) and unclassified Eukaryota (1). The spring keystone species belongs to Syndiniales (1) and Radiolaria (2), reflecting the major ecological strategies, including primary production, heterotrophy, and parasitism. The finding of only a few spring keystone species aligns with the greatest variability, as shown by ELA (Section “Energy landscape analysis determines stability of microbial communities”). The emergence of Chlorophyta during early autumn suggests a shift in primary production from Ochrophyta to Chlorophyta, including taxa that may prefer colder temperatures (ref. 46 and are better adapted to nutrient limitation47.

Table 2 ASV identified as potential keystone species for clusters 10HS, 06TS, 03LW and 01TA

Discussion

In this study, we present a new strategy for investigating ecological time series data based on 18S metabarcoding by combining three data analysis methods: Co-Occurrence Networks (CON)48, Convergent Cross Mapping (CCM)28, and Energy Landscape Analysis (ELA)30. Integrating these three methodological approaches, we aimed to predict and characterize the abundance of keystone microeukaryotes in the West Spitsbergen Current across different seasons, environmental parameters, and in relation to other organisms. We also investigated how different taxa groups affect others and how their effects vary with seasonal shifts and environmental factors.

Sequence-based surveys of the 18S rRNA gene are effective for identifying and comparing variations in microbial eukaryote communities in marine environments. This approach offers a valuable tool for understanding the diversity and distribution of different taxonomic groups in the ocean49. However, several aspects must be kept in mind when analyzing the data.

The taxonomic level of an amplicon sequence variant can vary in resolution due to missing references in the database or lacking diversity in the marker loci between strains. The presence of intracellular polymorphism, variation in rDNA copy number, and the existence of pseudogenes, in conjunction with the high sequencing depth of next-generation technologies, can lead to an inflation of diversity metrics. This is due to the fact that they capture a multitude of copies, pseudogenes, and other 18S rRNA variants within each organism, which consequently increases the number of predicted OTUs50,51. For example, when determining dinoflagellate dominance using molecular methods, caution is required due to the influence of their disproportionately large genome size. Indeed, in the Fram Strait and extensive areas of the central Arctic Ocean, a significant proportion of sequence reads were attributed to Phaeocystaceae, Micromonas sp., Dinophyceae, and Syndiniales52.

A more detailed discussion about advantages and disadvantages of the metabarcoding approach is found in ref. 50. On the other hand, some studies showed that quality-controlled amplicon methods have high quantitative potential for determining the diversity and composition of marine protist communities and the relative abundances of specific ASVs and lineages53,54.

Our co-occurrence network based on Fourier decomposition differs from previous methods that rely directly on the raw time series signals26,55,56. The resulting network accurately captured seasonal states and transitions, revealing community clusters that reflect the prevailing community structure57: in spring (cluster 08TS), primary producers such as Bacillariophyta appear and remain throughout the summer (09HS, 10HS), while mixotrophs increase in autumn (01TA, 02TA, 03LW) until almost exclusively heterotrophic and parasitic taxa dominate in winter (04LW, 05LW, 07TS). The considerable difference of spring clusters to other seasonal clusters (Supplementary Information Fig. S2) can be explained by the rapid environmental changes during this period (i.e., change from darkness to constant daylight within 20 days). The predominance of dinoflagellates in the intermediate phases of spring and autumn indicates that these mixotrophic organisms play a crucial role during transition phases58,59,60.

Our CCM analysis revealed that by far, not all co-occurring ASVs actually influence each other (Fig. 2). A striking example is between clusters 06TS and 05LW, which were closely connected in the CON but not in the CCM network (Fig. 2). This co-occurrence without apparent causal connections could be explained by unique environmental conditions shaping both of these clusters, such as polar water influx. Even more pronounced is the separation of cluster 03LW, mainly heterotrophs, and 08TS, mainly phototrophs, which are tightly connected by co-occurrence but show not a single causal link in the CCM network. The organisms in these two clusters are primarily influenced by environmental parameters, particularly light. Additionally, these photosynthetic and heterotrophic organisms are sometimes preyed upon by the same predators61 such as Syndiniales. This explains the simultaneous occurrence and similar seasonality of these taxa but indicates that they do not have a direct influence on each other. The lack of causal influence during the transition from polar night to day is clearly visible in the CCM network (see Fig. 2). We interpret this gap between winter and spring clusters as a “winter reset” (Supplementary Information Fig. S11). This phase is characterized by the predominance of Syndiniales and Dinophyceae. With the emergence of light, a new period of primary production begins, shaped by the prevailing environmental conditions. The ambient environmental conditions then determine which species will subsequently prevail. By reflecting causal interactions between species, the CCM network even stronger reflects the cyclic microbiome structure than the co-occurrence network. The cycle begins with photoautotrophs (cluster 08TS) in early spring and ends with the hetero- and mixotrophs (cluster 01TA) in late autumn. As light intensity decreases, mixotrophs become more prevalent than photoautotrophs, leading to a shift towards a heterotrophic lifestyle and a transition from carbon fixation to consumption. This transition into a low light period is characterized by parasitic species, suggesting an “eat and be eaten” scenario. The causal links from autumn to winter are much fewer than between other seasons (except winter to spring; Fig. 2B).

Energy landscape analysis was used to assess the stability of observed communities in their respective environments. The reconstructed landscape for autumn cluster 01TA shows that autumn communities are highly stable. However, for the winter cluster 03LW, the energy values of observed communities lack clear separation, making the situation less straightforward. Communities of the winter clusters which are still present in summer tend to display high energy values, indicating instability in winter conditions. The spring cluster 08TS shows an even more notable trend, indicating that community structures lack stability and exhibit high plasticity under spring conditions (Fig. 5). This suggested plasticity during spring strongly supports our hypothesis of a “winter reset”, which was first proposed to explain the weak causal interactions between winter and spring clusters in the CCM network.

The “winter reset” phenomenon marks an ecological transition characterized by a major reorganization of microbial communities. This reset shifts the ecosystem from winter heterotroph dominance (e.g., Cluster 03LW) to spring photoautotroph prominence (e.g., Cluster 08TS). The lack of direct causal connections between these clusters in the Convergent Cross Mapping (CCM) network, despite their co-occurrence in the Co-Occurrence Network (CON), suggests that these transitions are driven by external environmental factors rather than direct species interactions. A similar observation was made with the 16S data, indicating that seasonal variations in environmental variables play a more significant role than trophic interactions62.

The increasing light availability in early spring drives the growth of photoautotrophic organisms. However, how well the individual species thrive and which of these species will dominate the population, critically depends on various other factors. The ambient temperature will favor some species more than others, as will the nutrient availability and salinity. These factors, in turn, are heavily influenced by water stratification, which creates its own unique spatial pattern of environmental conditions. Together with ocean currents, which carry material and organisms, this creates a situation in which low abundant resident populations of phototrophic microbes are in competition with species introduced by currents from other locations34,38. Since every species possesses its own specific preferences for external factors such as temperature, salinity or nutrients, the outcome of this competition is likely to be highly sensitive on the prevailing environmental parameters, and much less so on the composition of the communities in the previous winter months, providing a possible explanation for the winter reset.

According to traditional ecological theory, keystone species are often defined as those with the most biomass63,64. Here, we followed a more elaborate approach and defined keystone species as those found in stable communities (as determined by ELA), which are also highly connected in the CON and CCM networks. This way, keystone species have a strong causal interaction with many other species and form part of stable communities. Besides identifying highly abundant keystone species (for example Fragilariopsis or Pseudo-nitzschia diatoms), we also identified several lowly abundant ASVs as keystone, suggesting that both common and rare members contribute to ecosystem stability. The keystone species can be linked to central ecological functions carried out within the community network, including primary production, consumption, and parasitic interactions. During the beginning of autumn, Chlorophyta emerges as a keystone species, indicating a shift in primary production from Ochrophyta to Chlorophyta. This shift may be explained by the preference of Chlorophyta for colder temperatures and a better adaptation to nutrient limitation.

Our approach represents a strong advancement in the field of ecological network analysis, combining the techniques of co-occurrence network construction and convergent cross mapping to elucidate species interactions and community structures. In comparison to traditional methodologies that frequently rely on simplistic correlation analyses, our approach offers a number of distinct advantages. Traditional ecological network studies frequently employ simple correlation coefficients to infer species interactions, which may result in the overlooking of non-linear relationships65. Clustering based on Fourier coefficients enables the capture of temporal profiles with greater fidelity to effectively summarize seasonal and long-term dynamics of species abundance66 and additionally reduces the compositional bias38. We choose the Hellinger transformation for our network analysis to normalize the abundance of data. The research demonstrated that transformation based on relative abundance is a crucial aspect of conducting appropriate analyses of eDNA metabarcoding data. Furthermore, the findings indicated that the Hellinger transformation exhibited a slight advantage over other methods67. The Hellinger distance is recommended for clustering or ordination of species abundance data68,69. In conducting the Energy Landscape analysis, the normalization method recommended by the author was employed. Consequently, the abundance data was subsequently binarised29,30.

A key limitation of correlation analyses is that they do not offer insights into causal relations, and therefore, it is difficult to infer species interactions from co-occurrence networks only70. Integrating CCM addresses this limitation and extends beyond conventional correlation-based techniques by explicitly evaluating causal relationships between species pairs28. Specifically, we compared our approach with two previous studies71,72. While our approach focuses on the specific interactions within and between clusters, Ushio et al. provide a more general framework for predicting community diversity based on interaction capacity, temperature and abundance. The emphasis on mechanistic explanations for observed ecological patterns distinguishes the two approaches. Our methodology provides a comprehensive understanding of keystone species in a specific context, while Ushio’s study provides broader insights into the factors influencing community diversity in different ecosystems. Both studies use similar techniques such as correlation and CCM71. Fujita’s study used controlled experiments with six isolated community replicates, subjected to diverse treatments over 110 days. Regarding Takens’ Theorem and Convergence Cross Mapping, Fujita et al. used Simplex projection to forecast population size72, while our study utilized pairwise CCM on ASV time series signals within clusters to predict keystone species.

The results presented in this study not only have practical implications for ecosystem management by improving our understanding and ability to predict change in complex ecological systems but also provide systematic insights into the mechanisms responsible for shaping and maintaining spatiotemporal heterogeneity in ecosystem composition29.

Methods

Sampling and data

Samples were collected with Remote Access Samplers (RAS; McLane) deployed in conjunction with oceanographic sensors over four annual cycles (01.08.2016 to 16.09.2020 (96 Samples)) at the F4 mooring (79.0118N 6.9648E) of LTER HAUSGARTEN and FRAM in the Fram Strait38,73. Each RAS contains 48 sterile bags, each collecting water samples of 500 mL at programmed sampling intervals.

The samples collected from 2016 to 2018 reflect the pool of up to two samples collected 1 h apart in two individual bags. Since 2018, we pooled samples taken 7 to 8 days apart from two consecutive weeks74. The samples were preserved by adding 700 μl of mercuric chloride (7.5% w/v) to the bags prior to sampling. Pooled samples were filtered onto Sterivex filter cartridges with a pore size of 0.22 μm (Millipore, USA).

Filters were stored at −20 °C until DNA extraction and ribosomal metabarcoding of 18S rRNA reads using primers 528iF (GCGGTAATTCCAGCTCCAA) and 926iR (ACTTTCGTTCTTGATYRR)75. The resulting amplicon sequence variants (ASVs) were classified using the PR2 4.12 database (see Supplementary Information Supplementary Methods: Data Preprocessing). We normalized raw ASV counts for CON and CCM using the Hellinger transformation but did not for the energy landscape analysis; hence a different normalization is introduced for the rELA implementation30.

Temperature, salinity, and oxygen concentration were measured with a CTD-O_2 attached to the RAS. Physical oceanography sensors were manufacturer-calibrated and processed as described under34. Raw and processed mooring data are available at PANGAEA https://doi.org/10.1594/PANGAEA.904565, https://doi.org/10.1594/PANGAEA.940744, https://doi.org/10.1594/PANGAEA.941125 and https://doi.org/10.1594/PANGAEA.946447. For chemical sensors, the raw sensor readouts are reported. The fraction of Atlantic and Polar Water were computed following76 for each sampling event and reported along with distance below the surface (due to mooring blowdown). Sea ice concentration derived from the Advanced Microwave Scanning Radiometer sensor AMSR-277 were downloaded from the Institute of Environmental Physics, University of Bremen (https://seaice.uni-bremen.de/sea-ice-concentration-amsr-eamsr2). Sentinel 3A OLCI chlorophyll surface concentrations were downloaded from https://earth.esa.int/web/sentinel/sentinel-data-access. For all satellite-derived data, we considered grid points within a radius of 15 km around the moorings. Surface water Photosynthetically Active Radiation (PAR) data, with a 4 km grid resolution, was obtained from AQUA-MODIS (Level-3 mapped; SeaWiFS, NASA) and extracted in QGIS v3.14.16 (http://www.qgis.org).

We considered eight environmental variables: mixed layer depth (MLD in m), water temperature (temp °C), polar-water fraction (PW_frac %), chlorophyll concentration from in situ sensor (chl_sens  ~ μg l −1), PAR (μ mol photons m − 27D1d −1), Salinity (PSU), oxygen concentration (O2_conc μmol l −1) and sampling depth (depth m)34.

Co-occurrence network

The abundance of taxa over the full observation period was converted into temporal profiles by employing Fourier transformation techniques78 to time-series signals. These temporal profiles rely on the 14 Fourier coefficients.

We chose 14 coefficients because they reflect the majority (662 of 1019) of observed ASV abundance peaks within the 4 years, which have peaked exactly once each year but no more than two times a year. To investigate the similarity of temporal profiles between ASV pairs, we performed pairwise correlations between the individual temporal profiles, where pairs with higher Pearson correlation values also show a similar temporal profile. Pairs with at least 0.7 (p < 0.05) Pearson correlation were then visualized in an undirected graph, corrected with Benjamini–Hochberg correction for multiple testing.

Only positive correlations were retained in the co-occurrence network to later focus exclusively on co-operative relationships, where both taxa were present. These relationships were then used to prune the causal network, ensuring that only co-operative interactions were considered in the following analysis. To identify strongly connected components that reflect the existing communities of co-occurring taxa, we applied the Louvain community detection algorithm37 on the entire graph. The entire process was implemented using the CCM and networkx packages in Python; visualization was performed using Cytoscape with the Edge-weighted Spring-Embedded Layout79. The whole co-occurrence network construction is described in Supplementary Information Supplementary Methods: Co-Occurrence Network.

Distance between clusters

To measure the distance between previously defined Louvain communities (clusters), we applied UMAP80 on time-series signals obtained after Fourier decomposition of the abundance data. From this, we generated a three-dimensional embedding space. Centroids for each cluster were calculated within this space (see Supplementary Information Figs. S4S6). The network distance between clusters was determined as the Euclidean distance between their centroids. Subsequently, a distance matrix was created, and distances were rounded to integers, with only significant connections retained.

Convergent cross mapping

Convergent cross mapping (CCM) identifies potential causal relationships between variables in time series data. It quantifies how knowledge of the time series of one taxon allows predicting the time series of another. We first built a CCM network from all pairwise combinations. From this, we extracted the in- and outgoing edges between nodes that are also connected in the co-occurrence network. We used the implementation of Normalized Mutual Information (NMI) from https://github.com/polsys/ennemi by Petri Laarne and the Convergent Cross Mapping by Implementation from Prince Javier https://github.com/PrinceJavier/causal_ccm81 to measure the strength of the causal relationship considering also non-linear relations. We could show that the implementation of Normalized Mutual Information (NMI) yields similar findings as the original implementation based on Pearson correlation28,82 (see Supplementary Information Fig. S7).

Using a permutation approach26 on the connectivity of the network, we calculated significance values for the edge weights, quantifying whether the respective NMI values are greater than expected for random edges (see Supplementary Information Fig. S8). The whole CCM network construction and validation are described in Supplementary Information Supplementary Methods: Convergent Cross Mapping and Supplementary Notes 2: Convergent Cross Mapping.

Aggregation on cluster level

We simplify the network of interactions between single ASVs into a network of interactions between clusters. For this, we assign a weight to a directed edge between two clusters by calculating the arithmetic mean of NMI of all (directed) edges connecting ASVs belonging to the respective clusters. This process effectively reduces the number of items in the node cloud, representing clusters through a unified composite node.

Energy landscape analysis

Energy landscape analysis is a method based on statistical physics. From data for many points in time, which contain taxon abundance and environmental variables, an energy landscape is reconstructed. This energy landscape is a function that maps ASV abundance and environmental variables to an energy value. In analogy to the potential energy in physics, a (local) minimum of this energy landscape indicates a stable community state. Here, we reconstruct the energy landscape function based on the complete time series of ASV abundance together with the available environmental data. We use the reconstructed function to determine the stability of observed communities, and in particular the seasonal clusters determined by the co-occurrence network, and we predict the most stable community compositions. Details of our analysis, including parameters and thresholds applied, are described in Supplementary Information Supplementary Methods: Energy Landscape Analysis. Understanding the existence and the nature of stable community states and how they change in response to environmental shifts is crucial for comprehending the resilience and adaptability of ecosystems in the face of various ecological challenges.

Definition of state space, pairwise maximum entropy models and energy landscape

Formalizing the stability landscape concept requires a precise definition of the state space within an energy landscape. We represent a community composition as a binary vector of length S, where S signifies the total number of ASVs. Within this framework, there exist 2S unique community compositions. Specifically, a community composition of the k-th sample is denoted as \({\sigma }^{(k)}=({\sigma }_{1}^{(k)},{\sigma }_{2}^{(k)},\ldots ,{\sigma }_{S}^{(k)})\), where \({\sigma }_{i}^{(k)}\in \{0,1\}\) indicates the presence/absence status of the i-th ASV. To establish links between community compositions, we adopt the assumption that transitions occur incrementally. Consequently, two community compositions are linked if and only if they differ in the presence/absence status of precisely one taxon. This leads to the formation of a structured network wherein each node is connected to S neighbors29,30. We attribute energy values to individual community compositions and establish the potential structure within the state space through the introduction of the extended pairwise maximum entropy model. This model governs the likelihood of observing community composition σ(k) under an environmental condition, denoted by ϵ = (ϵ1ϵ2, …, ϵM), where M denotes the number of environmental variables and ϵi represents the continuous values denoting these environmental factors, such as resource availability, pH, temperature, salinity, etc. The probability of σ(k) occurring in condition ϵ is given by:

$$P({\sigma }^{(k)}| \epsilon )=\frac{{e}^{-E({\sigma }^{(k)},\epsilon )}}{Z},$$
(1)

with the energy defined as:

$$E({\sigma }^{(k)},\epsilon )=-{\sum}_{i=1}^{S}{h}_{i}{\sigma }_{i}^{(k)}-{\sum}_{i=1}^{S}{\sum}_{j=1}^{M}{g}_{ij}{\epsilon }_{i}{\sigma }_{j}^{(k)}-{\sum}_{j=1}^{S}{\sum}_{i=1,i\ne j}^{S}{J}_{ij}{\sigma }_{i}^{(k)}{\sigma }_{j}^{(k)}/2,$$
(2)

where E(σ(k)ϵ) represents the energy of community composition σ(k) and:

$$Z={\sum}_{k=0}^{{2}^{S}-1}{e}^{-E({\sigma }^{(k)},\epsilon )}.$$
(3)

The function E represents the energy landscape and is defined by the parameters, h = (h1h2, …, hS), matrix J = (Jij), and matrix g = (gij). Based on the observed communities and the associated environmental factors, these parameters are estimated by Energy Landscape Analysis through a maximum likelihood approach29,30. These estimates can be acquired through stochastic approximation for the extended pairwise maximum entropy model (Eq. (2)), as described in the Supplements Supplementary Information Supplementary Methods: Energy Landscape Analysis.

Here, hi signifies the net impact of unobserved environmental factors favoring (hi > 0) or hindering (hi < 0) the presence of taxon i, and gij represents the influence of the i-th observed environmental factor on the occurrence of taxon j. The model captures pairwise relationships, as each taxon is interconnected with all others through Jij. It is pertinent to note that the term E(σ(k)ϵ) is labeled as energy due to its analog in statistical physics83, although it serves as an exponent in Eq. (1) and indicates the likelihood of observing a community composition within an ecological context. It does not correspond directly to physical energy as used in ecological studies.

Rather, the energy signifies the directionality of transitions between community compositions. For two adjacent nodes, σ(k) and \({\sigma }^{({k}_{0})}\), if \(E({\sigma }^{(k)})\, > \, E({\sigma }^{({k}_{0})})\), then the transition from σ(k) to \({\sigma }^{({k}_{0})}\) is more likely than the reverse.

In ecological studies, analyzing microbial communities often involves sifting through vast amounts of data, which presents major computational challenges due to the exponential increase in possible states as the number of amplicon sequence variants (ASVs) increases. To address this, limiting the analysis to the Top 100 ASVs provides a practical compromise between computational feasibility and analytical depth (as recommended by the Framework Owner, see Suzuki et al.30). This approach allows for efficient management of the data while still yielding meaningful insights into the community structure. By concentrating on the most abundant ASVs, the key contributors to the microbial ecosystem dynamics can be captured because these variants generally hold greater ecological and biological significance due to their higher prevalence. Furthermore, focusing on the top 100 ASVs helps minimize the impact of less abundant variants, which may introduce noise and potentially distort the results. However, this method is not without its drawbacks; there is a risk of bias toward dominant ASVs, potentially overlooking less prevalent but ecologically important species. Such biases could influence the interpretation of community dynamics and functional roles, highlighting the need for careful consideration of these limitations in ecological analyses. In the Supplementary Information Fig. S12, we validated that the cutoff of N = 100 ASVs is a good choice by testing the stricter cutoff values N = 50 and N = 75. Outliers were excluded solely for the purpose of plotting.

Abundance normalization

For the network approaches (CON and CCM), we apply the Hellinger transformation to the abundance matrix. In contrast to previous studies that employed Pearson correlation for pairwise comparisons of relative abundance to ascertain co-occurrence patterns, we utilized Fourier decomposition to extract distinctive temporal profiles for each ASV. This approach mitigates the potential for bias associated with Pearson correlation in compositional data analysis.

Hellinger transformation

In the matrix M = {mi,j}, the columns represent observed ASVs, and the rows represent different samples. The entries are the read counts of the ASVs in the given sample. The Hellinger transformation was applied to facilitate data normalization and comparison of ecological data, reducing the impact of differences in the scale of abundance values across samples (see Fig. 6A). The normalization is applied column-wise to the raw abundance data as follows:

Fig. 6: Schematic flowchart of our analytical approach.
figure 6

A Network analysis. The abundance table is normalized using the Hellinger transformation Section “Hellinger transformation”, followed by co-occurrence network (CON) and convergent cross mapping network (CCMN) analysis. B Energy landscape analysis. The abundance table is normalized using the proportional normalization Section “Proportional Normalization”, followed by energy landscape analysis (ELA).

  1. 1)

    Calculate the square root of each element:

    $${s}_{i,j}=\sqrt{{m}_{i,j}}$$
    (4)
  2. 2)

    Calculate the l2-norm (Euclidean norm) of each column:

    $${r}_{i}=\sqrt{{\sum}_{j}{s}_{i,j}^{2}}\,{\mbox{(column-wise)}}$$
    (5)
  3. 3)

    Normalize each column by dividing it by its l2-norm:

    $${h}_{i,j}=\frac{{s}_{i,j}}{{r}_{i}}$$
    (6)

The energy landscape analysis uses the following normalization.

Proportional normalization

The proportional normalization process applied to the matrix A (see Fig. 6B). Given a matrix A of size m × n the normalized matrix N is computed as follows30:

  1. (1)

    Compute the row sums: define a column vector s where each element si is the sum of the elements in the i-th row of A:

    $${s}_{i}={\sum}_{j}{a}_{i,j}$$
    (7)

    where A = {ai,j}.

  2. (2)

    Add a small constant:

    $${d}_{i}={s}_{i}+1{0}^{-16}$$
    (8)
  3. (3)

    Normalize the matrix: compute the normalized matrix N = ni,j using matrix division:

    $${n}_{i,j}=\frac{{a}_{i,j}}{{d}_{i}}$$
    (9)

This normalization ensures that each row of the resulting matrix sums to 1, with a small adjustment to maintain numerical stability.

Keystone species definition

After collecting attributes from co-occurrence analysis and distinguishing between potential ecological influence and occurrence just by chance, we calculated the stable states for different clusters using ELA. This information was merged to suggest potential keystone species. We defined a keystone species as an ASV with (1) a significant influence on other ASV in the network (significant NMI value), (2) a high centrality (closeness) value within its co-occurrence community, and (3) presence in at least one stable state as predicted by ELA. A significant high centrality value was determined by comparing each centrality value of a single node to the average centrality values of all nodes from the graph using a one sided, one-sample t-test with Benjamini–Hochberg correction for multiple testing (similar to84,85).

Season definition

For assessing results in context of the entire annual variability over which samples were collected, we defined the seasons as follows, based on month and the availability of light (PAR). In the case of Cluster 01TA, the month of maximal abundance is August. However, several nodes are also present in September and October. Consequently, we mapped this cluster to the autumn season in order to model a transition from the autumn cluster (see Fig. 7).

Fig. 7: Schematic cluster definition.
figure 7

Names are based on light availability (as defined in ref. 38), categorized in transition areas between dark and light (T), high light (H) and low light (L) phases based on PAR parameter (Fig. 3) and the season spring (S), summer(S), autumn (A) and winter (W).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.