Introduction

Tropical monsoon regions, concentrated within 30 latitude of the equator, are marked by abundant precipitation and exceptional ecological diversity. Monsoon precipitation is vital for maintaining regional ecosystems1, supporting agriculture2,3, and water resource management4. Beyond their local significance, tropical monsoons influence global atmospheric circulation, water cycling5, and energy balance6.

However, accurately forecasting monsoon precipitation over long timescales remains a formidable challenge due to the complex interplay of initial conditions, external forcings, and internal variability7. For instance, small errors in initial conditions can amplify over time, making accurate long-term forecasts difficult8. Additionally, uncertainties in future external forcings, such as greenhouse gas emissions and volcanic activity, further complicate predictions9,10. The strong natural variability of monsoons adds another layer of difficulty, posing significant challenges for seasonal and decadal-scale forecasting11.

Despite progress, climate models struggle to capture the nonlinear processes and complex interactions that define the climate system, limiting their long-term forecasting accuracy12. Furthermore, these models require extensive observational data and sophisticated parameterizations, complicating their use in regions with limited historical records13,14. Recently, AI-based methods have enabled the development of machine learning (ML) weather prediction systems, such as FourCastNet15, GraphCast16, Pangu-Weather17, NowcastNet18, and GenCast19. However, these systems excel at short- to medium-term (up to 15 days) predictions and are currently ill-suited for addressing the complexities of long-term and seasonal-scale precipitation forecasting.

To address these challenges, network theory offers a robust approach for analyzing the dynamic and structural properties of complex systems20,21,22,23. Its application in climate science24,25,26,27 has revealed hidden connections and teleconnections within the global climate system, providing a novel basis for prediction28,29,30. By leveraging network-derived metrics, researchers have successfully predicted phenomena like El Niño events31,32 and monsoonal rainfall26,33, demonstrating the approach’s feasibility and reliability.

This study aims to address the longstanding challenge of long-lead seasonal monsoon precipitation prediction by proposing a climate network-based framework that leverages global teleconnections. Specifically, we (1) develop a network-based prediction system grounded in spatiotemporal climate connectivity; (2) apply this unified framework to forecast monsoon precipitation across four major tropical monsoon systems-the South American monsoon (Eastern Amazon Region (EAR)), the East Asian monsoon (Hainan Island (HI)), the West African monsoon (Sahel region of East Africa (SEA)), and the Indian monsoon (Central India (CI)); and (3) assess its forecasting skill relative to established dynamical models, namely European Center for Medium-Range Weather Forecasts Seasonal Forecast System 5 (SEAS5)34 and Climate Forecast System version 2 (CFSv2)35. These four regions represent diverse climatic regimes, enabling robust evaluation of the framework’s generalizability. Notably, the SEA focus area, encompassing South Sudan within the Sahel belt, exemplifies a region where predictive urgency intersects with ecological and socioeconomic vulnerability: Around 80% of South Sudan’s population depends on rain-fed agriculture36, while its ecologically sensitive landscapes-including Boma National Park, a critical refuge for African wildlife-face escalating climate risks. This dual pressure underscores the need for reliable monsoon forecasts to support food security, biodiversity conservation, and climate-resilient development. Therefore, by integrating network theory with climate predictability science, we aim to provide both improved forecasting capabilities and enhanced insight into the underlying drivers of monsoon variability.

Results

Mining network predictors

The rainy season, characterized by significant increases in precipitation, varies widely in timing, duration, and intensity across different regions. For this study, we focused on four regions within the tropical monsoon belt, illustrated as blue boxes in Fig. 1a–d: EAR, HI, SEA, and CI (See Table 1 for the specific latitudinal/longitudinal ranges of these regions). These regions represent diverse monsoonal systems driven by distinct atmospheric and oceanic processes, making them ideal test cases for our predictive framework (as depicted in Fig. 1e). To standardize the analysis, we defined the rainy season as May–October in the Northern Hemisphere and November to next year’s April in the Southern Hemisphere, based on historical precipitation patterns.

Table 1 Monsoon regions and Predictors’ information for monsoon precipitation forecasting

In our endeavor to identify reliable predictors for rainy season precipitation, we constructed a series of climate networks (CNs) spanning January 1950–December 2022. The dataset used for this analysis comprises daily mean 2 m air temperature data from NCEP-Reanalysis I, covering 6242 global grid points. Each grid point was treated as a node within the network, and its in-degree and out-degree were computed based on correlations with other nodes (See Methods). In our climate network framework, we quantify regional climate interactions through two fundamental metrics: in-degree and out-degree. For a given year y, the in-degree of node i, denoted as \({k}_{i}^{\,{\mbox{in}}\,+}(y)\) (positive) or \({k}_{i}^{\,{\mbox{in}}\,-}(y)\) (negative), measures the cumulative influence the region receives from global climate patterns. Physically, this metric represents the number of strong climate teleconnections (e.g., ENSO or Rossby wave activity) affecting the region, serving as an indicator of its sensitivity to external climate drivers. A high in-degree suggests the region is strongly coupled to large-scale climate variability, making it particularly responsive to global forcing mechanisms.

Conversely, the out-degree \({k}_{i}^{\,{\mbox{out}}\,+}(y)\) or \({k}_{i}^{\,{\mbox{out}}\,-}(y)\) quantifies the region’s influence on other parts of the climate system, where the sign indicates whether the relationship is reinforcing (positive) or opposing (negative). Regions exhibiting high out-degree values function as important climate signal transmitters- their temperature variations can significantly impact remote areas through atmospheric teleconnections. For instance, a node with elevated \({k}_{i}^{\,{\mbox{out}}\,+}(y)\) might amplify similar temperature anomalies downstream, while high \({k}_{i}^{\,{\mbox{out}}\,-}(y)\) could induce opposite thermal responses. In our predictor selection, we particularly focus on high out-degree nodes located within known teleconnection zones, as their robust network connectivity often translates into strong predictive relationships with monsoon precipitation variability across our target regions.

Our primary objective was to identify specific regions within the tropical monsoon belt where rainy season precipitation could be reliably predicted, alongside their corresponding network predictors. To identify robust climate predictors, we systematically examined network nodes whose in-degree or out-degree metrics exhibited strong correlations with monsoon precipitation across the four target regions (EAR, HI, SEA, and CI). A multi-stage validation procedure was employed to ensure the temporal stability and physical plausibility of these relationships under evolving climate conditions. First, we evaluated correlation strength and statistical significance (p < 0.05) across two distinct historical periods, 1950–1989 and 1990–2022, retaining only those predictors that remained significant in both intervals and exhibited stronger associations during the recent, warming-dominated period.

To further assess robustness, we performed sensitivity analyses across varying time windows and emphasized predictors that consistently demonstrated stable performance. The final set of predictors (highlighted in red dots in Fig. 1 a–d) were selected based on three complementary criteria: (1) highest correlation magnitude during the recent 33-year period (1990–2022); (2) sustained statistical significance across the full analysis window; and (3) physical alignment with established large-scale climate teleconnections, such as ENSO variability and Rossby wave propagation pathways. This selection strategy, supported by the global node screening presented in Supplementary Fig. S1, ensures that our predictors are both statistically sound and mechanistically interpretable within the broader context of climate system dynamics.

Fig. 1: Network-based prediction framework.
Fig. 1: Network-based prediction framework.
Full size image

The red dots on the map indicate the locations of the identified predictors, while the blue boxes delineate the targeted forecast regions: a EAR, b HI, c SEA, and d CI. e presents the proposed prediction framework, highlighting the integration of network-based metrics and statistical modeling for monsoon precipitation forecasting. The geographic visualization was generated using Gephi combined with Google Earth Engine (Image ©2023 Google).

We designate the corresponding in-degree or out-degree of these nodes as predictors (as shown in Table. 1) for rainy season precipitation in these four regions. The calculation of rainy season precipitation for these regions is based on their respective geographical locations (See Methods). An illustrative depiction of the linear relationship between network predictors and rainy season precipitation is presented in Fig. 2, offering a clear visualization of their correlation.

Fig. 2: A visualization of the linear relationship between network predictors and rainy season precipitation.
Fig. 2: A visualization of the linear relationship between network predictors and rainy season precipitation.
Full size image

a EAR, b HI, c SEA, and d CI. The blue curves (left-hand y-axis) represent the time series of observed seasonal total precipitation rates in tropical monsoon regions, while the red curves (right-hand y-axis) represent their corresponding network predictors, which are derived from the red dots in Fig.1 for the corresponding regions. The linear regression slopes are displayed in each panel. The black and green vertical lines indicate the years 1990 and 1999, respectively, dividing the entire time span into three phases: the learning phase (1950–1989), the training phase (1990–1999), and the forecasting phase (2000–2022).

Forecasting the monsoon precipitation

We identified a strong linear relationship between specific predictors and precipitation rates in each of the targeted regions (illustrated in Fig. 2). While the SEA region exhibits a negative correlation between predicted and observed precipitation, making visual pattern matching less intuitive, its predictive power remains statistically robust. Leveraging this relationship, we developed reliable forecasting algorithms using linear regression models. However, given the non-stationary nature of the climate system, where dynamics change over time, we assumed that the linear relationships would also evolve dynamically. To address this, we partitioned the most recent 33 years into two phases and fine-tuned our model by determining the optimal window length, Lopt, during the training phase (1990–1999). Rather than relying on all historical data since 1950, the model employs a moving window of data with a fixed length Lopt. Subsequently, the forecasting phase (2000–2022) is used to perform actual predictions and evaluate the algorithm’s performance. This adaptive approach allows the model to adjust to recent trends and variability, mitigating the influence of outdated or fluctuating data.

To assess the performance of our forecasting framework, we employed three key metrics: root mean square error (RMSE)37, mean absolute percentage error (MAPE)38, and Pearson correlation coefficient (PCC)39. RMSE measures the average magnitude of prediction errors, highlighting significant deviations. MAPE expresses errors as a percentage, enabling easy comparison across datasets. PCC evaluates the linear correlation between predicted and observed values, indicating trend consistency. We systematically optimized these metrics to determine the best parameter set for our prediction framework by minimizing RMSE and MAPE while maximizing PCC. Specifically, we defined an evaluation criterion λ and selected the value of L that maximized λ as the optimal window length:

$${L}_{opt}=\arg {\max }_{L}\lambda \triangleq \arg {\max }_{L}\left(\frac{{{\mbox{PCC}}}_{{{\rm{norm}}}}}{{{\mbox{RMSE}}}_{{{\rm{norm}}}}\cdot {{\mbox{MAPE}}}_{{{\rm{norm}}}}}\right),$$
(1)

where PCCnorm, RMSEnorm and MAPEnorm represent the normalized PCC, RMSE, and MAPE respectively. Our investigation revealed that employing a fixed L of 40 years, corresponding to the 40-year period preceding the predicted year y, resulted in poor performance in terms of RMSE and MAPE. This finding suggests the presence of significant systematic errors (see Supplementary Fig. S2 for details).

Consequently, we determined the optimal learning set lengths Lopt for four monsoon regions as follows: 37 years for EAR, 40 years for HI, 11 years for SEA, and 33 years for CI (see Supplementary Fig. S3 for details). Using these optimal learning set lengths, we forecasted rainy season total precipitation for the respective regions from 2000 to 2022, yielding favorable results. The forecasts achieved: RMSE values below 1.2 × 10−5 kgm−2s−1, MAPE values below 16%, and PCC values higher than 0.6 (refer to Table 2).

Table 2 A comparison of the forecast performance among SEAS5, CFSv2, and the network-based prediction method

This prediction model demonstrates a significant advantage over alternative forecasting methods, particularly in terms of lead time. For the SEA, CI, and HI regions, our model enables precipitation forecasts with a lead time of up to 4 months. For the EAR region, the forecast lead time extends to 10 months. This advantage arises from our methodology: the climate network is constructed annually using 2 m air temperature data from January to December of the preceding year, enabling predictor values to be generated in January and forecasts to be issued well before the onset of each region’s monsoon season. Such extended forecasting capabilities are crucial for societal, economic, and environmental applications, offering improved responses to climate-related disasters, enhanced resource management, and additional temporal windows for informed decision-making and planning.

Comparative study: seasonal forecast systems vs. network-based prediction method

We conducted a comparative analysis to evaluate the predictive performance of our proposed network-based prediction method against two established seasonal climate prediction systems: the European Center for Medium-Range Weather Forecasts (ECMWF) Seasonal Forecast System 5 (SEAS5) developed by the ECMWF and the Climate Forecast System version 2 (CFSv2) developed by the National Oceanic and Atmospheric Administration (NOAA). Both SEAS5 and CFSv2 are known to provide a reasonable fit for the observed spatial distribution of precipitation, as shown in Supplementary Fig. S4. It is noteworthy that the hindcast data for SEAS5 is only available up to 2022, while the hindcast data for CFSv2 has gaps during 1990–1992 and 2017–2018. Therefore, only the period from 2000 to 2016 was used to calculate the hindcast performance of CFSv2. Numerous studies have evaluated the performance of SEAS5 and CFSv2 in tropical monsoon regions40,41,42. For instance, Ferreira et al.40 demonstrated that SEAS5 effectively captures extreme precipitation anomalies in northern South America and northeastern Brazil, despite some spatial heterogeneity. Chevuturi et al.41 reported a PCC of approximately 0.33 between SEAS5 hindcast data and observed data for the Indian monsoon. Similarly, Lang et al.42 noted that CFSv2’s forecast skill scores for rainy season precipitation in 17 regions across China, including the Pearl River Basin (HI), generally remained below 0.3.

To compare the performance of SEAS5, CFSv2, and our network-based method, we evaluated their predictive capabilities for four tropical monsoon regions—EAR, HI, SEA, and CI-using hindcast data. SEAS5 and CFSv2 hindcast data covered the periods 1990–2022 and 1993–2016, respectively, with lead times of 1 and 4 months. SEAS5 utilizes 25 ensemble members34, while CFSv2 employs 24 ensembles35. We compared these systems’ predictive performance against our network-based method using key metrics (e.g., RMSE, MAPE, and PCC). Table. 2 summarizes their performance for lead times of 1 month (refer to Table. S1 for lead-4 month results).

SEAS5 and CFSv2 exhibited their best performance in the EAR region, achieving a maximum PCC of 0.67 at a lead time of 1 month. However, our network-based method matched their PCC (0.63) at a much longer lead time of 10 months, while also delivering significantly lower MAPE and RMSE values (Table. 2). In contrast, SEAS5 and CFSv2 showed poorer performance in the other three regions, with PCC values ranging from 0.26 to 0.47 and higher errors compared to our method, as illustrated in Fig. 3 and Supplementary Fig. S5. These results are consistent with findings from prior studies40,41,42, further underscoring the advantages of our forecasting method.

Fig. 3: Prediction results.
Fig. 3: Prediction results.
Full size image

a, c, e, and g display time series of predicted values (green dashed lines) and observed values (blue solid lines). The SEAS5 hindcast values with lead times of 1 month and 4 months are shown as pink and black dotted lines, respectively. The gray shaded area indicates the training phase (1990-1999), while the white area represents the forecasting phase (2000–2022). b, d, f, and h display the scatter plot of predicted values versus observed values during the forecasting phase (2000–2022) for corresponding regions. The black solid line represents the linear regression line, with the gray shaded area indicating its 99% confidence interval. The Pearson correlation coefficient (r) is provided for each relationship. Significance levels are denoted by markers: **** (p < 10−4) and ** (p < 10−2).

Beyond its superior predictive performance, the network-based approach provides valuable insights into the physical mechanisms driving climate variability, addressing the opacity of AI models that often act as “black boxes." By analyzing the annual averaged absolute weights of network links connected to predictors, we identified regions with significant influences on predictor dynamics. These insights also helped relate network indices’ predictability to underlying climate mechanisms. Key regions influencing predictors for the EAR and SEA (depicted in Fig. 4a, c) align with the Rossby wave train, consistent with the findings of Gelbrecht et al.43, which highlight the pivotal role of Rossby waves in modulating South American precipitation patterns. The propagation of Rossby waves across the atmosphere establishes teleconnections that influence precipitation variability in both the Eastern Amazon Rainforest and the Sahel region. For the HI region, influential predictors are primarily located in the eastern Pacific (Fig. 4b), a region strongly governed by the ENSO. ENSO’s global impacts on atmospheric circulation and weather patterns, including its modulation of East Asian monsoons, have been extensively documented44,45,46. The connection between ENSO phases and HI predictors underscores the critical role of large-scale ocean-atmosphere interactions in shaping rainfall patterns over East Asia. In the CI region, predictors cluster predominantly in East Asia, illustrating a teleconnection with the East Asian monsoon (Fig. 4d). This is consistent with prior studies, such as Kripalani et al.47, which demonstrated an out-of-phase relationship between monsoon precipitation in southern Japan and India. This teleconnection reflects the interplay between subtropical and tropical circulation systems, contributing to the variability of Indian monsoon rainfall.

Fig. 4: Spatial distribution of the annually averaged (1950–2022) absolute link strengths (unitless) that comprise the predictor indices.
Fig. 4: Spatial distribution of the annually averaged (1950–2022) absolute link strengths (unitless) that comprise the predictor indices.
Full size image

a EAR, b HI, c SEA, and d CI. Red dots represent the locations of the predictors, while black points indicate locations of the predictors which show the top 5% strongest link, and the thickness of black lines represents the strength of the link.

These findings validate the network-based method as a powerful tool for understanding and forecasting monsoonal precipitation. By identifying and quantifying the influence of key teleconnections, this approach not only enhances predictive accuracy but also provides critical insights into the underlying physical mechanisms driving climate variability.

Discussion

Traditional global climate models (GCMs), such as SEAS5 and CFSv2, simulate atmospheric processes by numerically solving fundamental physical equations. While effective for capturing large-scale patterns, these models struggle with regional precipitation variability-particularly over land in monsoon-affected areas. For instance, SEAS5’s performance over regions like SEA and CI is constrained by its sensitivity to cumulus parameterizations and the dominance of sea surface temperature (SST)-driven signals over tropical oceans12,34. The homogenization effect in gridded GCM outputs often dilutes land-based precipitation signals, which are governed by localized land-atmosphere feedbacks and convective processes.

In contrast, our network-based framework achieves dual advantages: (1) targeted extraction of dynamically relevant signals from observational data, and (2) implicit encoding of complex interactions, such as ENSO- and Rossby wave-driven teleconnections-within the network topology. By annually updating the network using global temperature fields, the framework reduces dimensionality while preserving key spatiotemporal features governing monsoon variability. This strategy enables early-warning signal detection and robustness against structural uncertainties inherent in GCMs. Taking EAR as an illustrative example, the superior predictive performance in this region results primarily from the inherently stable interannual variability of its climatic predictors. EAR’s regional precipitation is predominantly influenced by well-defined ENSO-driven teleconnections and Rossby wave activities. These processes generate coherent, large-scale atmospheric signals, which our network is particularly adept at capturing. Conversely, regions like CI or SEA exhibit more complex land-ocean-topography interactions, introducing significant localized noise and thus decreasing signal stability and predictive accuracy. EAR’s more robust ocean-atmosphere coupling yields a higher signal-to-noise ratio, significantly enhancing the network’s predictive reliability and accuracy. Consequently, our method outperformed SEAS5 and CFSv2, achieving superior predictive accuracy (higher PCCs, lower MAPEs and RMSEs) with lead times up to 10 months.

The framework’s interpretability further addresses a critical limitation of purely AI-driven models. By identifying influential predictor regions (e.g., ENSO-associated zones) and their climatic linkages, it clarifies physical mechanisms, such as Rossby wave propagation-that underpin monsoon variability. This transparency bridges the gap between AI’s predictive power and the “black box” critique. Importantly, the framework complements rather than replaces GCMs, it bridges planetary-scale predictability with region-specific decision-making by leveraging observational constraints and teleconnection structures, thereby translating global model outputs into actionable regional insights.

Despite its strengths, several limitations require attention. First, relying on single-node predictors risks overfitting; ensemble-mean predictors (tested preliminarily) reduced overfitting but slightly lowered accuracy during extreme events. Second, while predictor regions align with known teleconnections (e.g., ENSO), causal relationships remain hypothetical, necessitating causality-based validation. Third, persistent climate shifts may demand periodic re-evaluation of network structures, despite our adaptive training strategy. Lastly, the linear framework may overlook nonlinear dynamics, suggesting opportunities to integrate ML for hybrid modeling48,49.

Emerging approaches combining CNs with ML, such as embedding network topology into AI architectures, could enhance nonlinear pattern recognition while retaining interpretability. Such hybrids may advance monsoon teleconnection analysis and refine predictions, synergizing physical insights with computational power. Future work should also explore incorporating additional variables (e.g., soil moisture, atmospheric pressure) to better resolve land-atmosphere feedbacks and extend the framework to extratropical systems or drought prediction.

By addressing GCM limitations and offering actionable insights into monsoon mechanisms, our framework advances climate prediction for disaster preparedness, agriculture, and water management. Its interpretability, adaptability, and extended lead times establish it as a critical tool for climate resilience. Future integration with AI/ML and causality-driven methods promises to further bridge the gap between global modeling and regional decision-making, fostering a more sustainable society.

Methods

Data pre-process

For each node i (i.e., longitude–latitude grid point), we calculate daily 2 m air temperature standardized anomalies Ti(d) (actual temperature value minus the climatological average and divided by the climatological standard deviation) for each calendar day. Specifically, given an actual record \({\widetilde{T}}^{y}(d)\), where y is the year (from 1948 to 2021), and d stands for the day (from 1 to 365), the anomalies record is defined as

$${T}^{y}(d)=\frac{{\widetilde{T}}^{ \, y}(d)-\,{\mbox{mean}}\,({\widetilde{T}}^{ \, y}(d))}{\,{\mbox{std}}\,({\widetilde{T}}^{ \, y}(d))},$$
(2)

where

$$\,{\mbox{mean}}\,({\widetilde{T}}^{ \, y}(d))=\left\{\begin{array}{cc}\frac{1}{y-1948}{\sum }_{k=1949}^{y-1}{\widetilde{T}}_{i}^{ \, k}(d)&y\ge 1953\\ \frac{1}{4}{\sum }_{k=1949}^{1952}{\widetilde{T}}_{i}^{k}(d)&y < 1953\end{array}\right.,$$
(3)

Similarly, \(\,{\mbox{std}}\,({\widetilde{T}}^{y}(d))\) is calculated in the same segments. Leap days were excluded to maintain consistency in the calendar year duration and simplify the analysis.

To analyze regional precipitation patterns, we selected multiple uniformly distributed latitude-longitude grid points (referred to as nodes) to represent four monsoon regions in the tropical monsoon belt: EAR, HI, SEA, and CI. The number of nodes in each region is denoted as: NEARNHINSEA, and NCI, respectively. The spatially averaged total precipitation rates, denoted as R, were calculated for each region during their respective rainy seasons from 1950 to 2022. Taking HI as an example, where the rainy season spans from May to October, the total rainy season precipitation is computed as:

$${R}_{{{\rm{HI}}}}(y)=\frac{1}{{N}_{{{\rm{HI}}}}}{\sum }_{i=1}^{{N}_{{{\rm{HI}}}}}{\sum }_{m={\mbox{May}}}^{{\mbox{Oct}}\,}{{{\mathcal{R}}}}_{i}^{m,y},\quad y=1950,1951,\ldots ,2022,$$
(4)

where y indicates the year when the rainy season starts, and \({{{\mathcal{R}}}}_{i}^{m,y}\) is the monthly averaged precipitation rate for node i in month m, year y.

Climate network construction

In this study, we construct a series of CNs for each calendar year y from 1949 to 2021. Each CN is built based on statistical relationships between temperature anomaly time series from different spatial locations (nodes), where each node corresponds to a specific latitude–longitude grid point on the Earth’s surface. The strength and direction of links between node pairs are determined by the measure of similarity of their respective temperature time series. To account for Earth’s spherical geometry, we selected 6242 latitude-longitude grid points as CN nodes, ensuring an approximately homogeneous global coverage. Specifically, we define the base longitude resolution near the Equator (at latitude ϕ ≈ 0.9524) as Δ = 1.875, which corresponds to n0 = 192 nodes per latitude circle. Then, for a given latitude ϕi (in degrees), the number of nodes is adjusted to ni = 192/bi, where \({b}_{i}=\,{\mbox{round}}\,(1/\cos {\phi }_{i})\) dynamically adjusts the longitudinal sampling to maintain near-constant surface density. The total number of nodes is \(N={\sum }_{i\in {{\mathcal{I}}}}{n}_{i}\approx {\sum }_{k = 0}^{46}2\cdot \lfloor 192\cdot \cos {\phi }_{2k}\rfloor =6242\), where \({{\mathcal{I}}}=\{0,2,\cdots \,,192\}\) indexes latitudes sampled every two grid points, and the factor of two accounts for symmetric contributions from the Northern and Southern Hemispheres. This cosine-weighted thinning ensures quasi-uniform spacing across the sphere, countering the original grid’s \(\rho \propto 1/\cos \phi\) density divergence.

Links

To construct the CN for a given year y, we compute the time-lagged Pearson correlation between all possible pairs of nodes i and j following25. This yields a cross-correlation function as follows:

$${C}_{ij}^{y}(-\tau )=\frac{\langle {T}_{i}^{y}(t){T}_{j}^{y}(t-\tau )\rangle -\langle {T}_{i}^{y}(t)\rangle \langle {T}_{j}^{y}(t-\tau )\rangle }{\sqrt{\langle {({T}_{i}^{y}(t)-\langle {T}_{i}^{y}(t)\rangle )}^{2}\rangle }\sqrt{\langle {({T}_{j}^{y}(t-\tau )-\langle {T}_{j}^{y}(t-\tau )\rangle )}^{2}\rangle }},$$
(5)

and

$${C}_{ij}^{y}(\tau )=\frac{\langle {T}_{i}^{y}(t-\tau ){T}_{j}^{y}(t)\rangle -\langle {T}_{i}^{y}(t-\tau )\rangle \langle {T}_{j}^{y}(t)\rangle }{\sqrt{\langle {({T}_{i}^{y}(t-\tau )-\langle {T}_{i}^{y}(t-\tau )\rangle )}^{2}\rangle }\sqrt{\langle {({T}_{j}^{y}(t)-\langle {T}_{j}^{y}(t)\rangle )}^{2}\rangle }},$$
(6)

where \(\tau \in [0,{\tau }_{\max }]\) is the time lag, with \({\tau }_{\max }=200\) days ensuring a reliable estimate of the background noise level, and 〈 〉 denote the average over the past 365 days, i.e.,

$$\langle f(t)\rangle =\frac{1}{365}\mathop{\sum }_{k=1}^{365}f(t-k),$$
(7)

where t stands for the end of time series (i.e., 31 Dec. of the year y).

Next, we identify two key correlation values:

  • \(\max {{C}_{ij}^{y}}\): the maximum (most positive) correlation and corresponding lag τ+;

  • \(\min {{C}_{ij}^{y}}\): the minimum (most negative) correlation and corresponding lag τ.

These correlations extrema define the directionality of the network:

  • if τ+ > 0, the positive link is directed from node i to node j, indicating that changes in i precede similar changes in j;

  • if τ > 0, a negative link is directed from i to j, indicating that changes in i precede opposite changes in j.

This process yields two directed unweighted networks: one for positive and one for negative correlations. We then assign weights to each link based on the z-scores of the correlation values (standardized relative to the null distribution), resulting in two directed weighted CNs per year as follows:

$${W}_{ij}^{+}(y)=\frac{{C}_{ij}^{y}({\tau }^{+})-{{\rm{mean}}}({C}_{ij}^{y})}{{{\rm{std}}}({C}_{ij}^{y})},$$
(8)

and

$${W}_{ij}^{-}(y)=\frac{{C}_{ij}^{y}({\tau }^{-})-{{\rm{mean}}}({C}_{ij}^{y})}{{{\rm{std}}}({C}_{ij}^{y})},$$
(9)

where mean and std are the mean and standard deviation of the cross-correlation function, respectively.

Subsequently, we establish two distinct networks for each calendar year to differentiate between positive and negative link strengths. This dual-network approach allows us to capture both positive and negative interactions between nodes. Each network is characterized by an anti-symmetric adjacency matrix denoted as A(y), where each entry aij(y) signifies the presence of a link from node i to node j with a value of 1, while a link from node j to node i is represented by  − 1 for the year y.

Node degree

To explore the dynamic aspects of the global climate system related to precipitation within tropical monsoon regions, we analyze the temporal evolution of key CNs properties. Our focus is on the degree centrality of nodes, a fundamental parameter in network theory that quantifies the total number of connected links21. Within the positive and negative CNs of a given year y, each node has two distinct degree measures: in-degree and out-degree. The in-degree represents the average strength of incoming links, while the out-degree signifies the average strength of outgoing links. Mathematically, these degree measures are defined as:

$${k}_{i}^{in\pm }(y)=\frac{{\sum }_{j = 1,j\ne i}^{N}{W}_{ij}^{\pm }(y){I}_{{a}_{ij}(y) = -1}}{{\sum }_{j=1,j\ne i}^{N}{I}_{{a}_{ij}(y) = -1}},$$
(10)

and

$${k}_{j}^{out\pm }(y)=\frac{{\sum }_{i = 1,i\ne j}^{N}{W}_{ij}^{\pm }(y){I}_{{a}_{ij}(y) = 1}}{{\sum }_{i=1,i\ne j}^{N}{I}_{{a}_{ij}(y) = 1}},$$
(11)

where N equals to the total number of nodes (i.e., 6242) and I is the indicator function.

Linear Models Construction

Within our proposed framework, we employ a linear model with an updating learning set to forecast precipitation for a given year y. In this approach, we utilize data from the L preceding years to make the prediction, where L ranges from 5 to 40 years. Specifically, for predicting the precipitation rate in year y, the learning set comprises the L years immediately preceding that target year as follows:

$${{{\bf{Y}}}}^{y} \, \triangleq \, \{R(\eta ):\eta =y-1,y-2,...,y-L+1,y-L\}.$$
(12)

Moreover, we have a vector of Predictor corresponding to the precipitation rate vector Yy, which is denoted as:

$${{{\bf{X}}}}^{y-1} \, \triangleq \, \{k(\eta ):\eta =y-2,y-3,...,y-L,y-L-1\}.$$
(13)

Next, to solve the linear model,

$${{{\bf{Y}}}}^{y}={\beta }^{y}\cdot {{{\bf{X}}}}^{y-1}+{\alpha }^{y}.$$
(14)

we employ the ordinary least squares regression technique, which provides the optimal linear unbiased estimator of the model parameters. Thus, the predicted value of year y can be calculated as:

$$\tilde{R}(y)={\beta }^{y}\cdot k(y-1)+{\alpha }^{y}.$$
(15)

In constructing the climate network, we use 2 m air temperature anomaly time series from January 1st to December 31st of each year. Therefore, the full set of network-based predictors for a given year y, can be computed in early January of year y + 1. This allows us to generate precipitation forecasts several months ahead of the upcoming monsoon seasons. Specifically, For HI, CI, and SEA regions, where the monsoon rainy seasons begin in May, predictions can be made in January, yielding a lead time of ~4 months. For the EAR region, where the monsoon typically begins in November, this approach provides a lead time of up to 10 months.

Evaluation metrics

In addition to the conventional use of PCC, we also evaluate the forecasting skill associated with each specific L using metrics, such as RMSE and MAPE. RMSE is widely employed in climate and environmental research37. For the predicted values \(\tilde{R}(y)\) and the observed values R(y) of our proposed model, the RMSE is calculated as

$${{\rm{RMSE}}} \, =\sqrt{\frac{1}{n}{\sum }_{y={y}_{1}}^{{y}_{n}}{\left(R(y)-{\tilde{R}}(y)\right)}^{2}}.$$
(16)

On the other hand, MAPE provides a measure of the relative error and is defined as:

$${{\rm{MAPE}}} \, =\frac{1}{n}{\sum }_{y={y}_{1}}^{{y}_{n}}\left| \frac{R(y)-\tilde{R}(y)}{R(y)}\right|.$$
(17)

MAPE is particularly useful in tasks where sensitivity to relative variations is more important than sensitivity to absolute variations. Both RMSE and MAPE have a numerical range of [0, + ), with smaller values indicating better predictive performance.