Abstract
Tropical monsoons play a critical role in shaping regional and global climate systems, with profound ecological and socio-economic impacts. However, their long-term prediction remains challenging due to the complex interplay of regional dynamics, global climate drivers, large-scale teleconnections, and inherent non-stationarities in the climate system. Here, we introduce a unified network-based framework for predicting monsoon precipitation across diverse tropical regions. By leveraging global 2-meter air temperature fields, this approach captures large-scale climate teleconnections, such as the El Niño-Southern Oscillation and Rossby waves, enabling accurate forecasts for four key monsoon systems: the South American, East Asian, West African, and Indian monsoons. Our framework achieves remarkable forecasting accuracy with lead times of 4-10 months, outperforming traditional systems such as Seasonal Forecast System 5 and Climate Forecast System version 2. Beyond its predictive capabilities, the framework offers flexibility for application to other regions and climate phenomena, advancing our understanding of global climate dynamics. These findings have far-reaching implications for disaster preparedness, resource management, and sustainable development.
Similar content being viewed by others
Introduction
Tropical monsoon regions, concentrated within 30∘ latitude of the equator, are marked by abundant precipitation and exceptional ecological diversity. Monsoon precipitation is vital for maintaining regional ecosystems1, supporting agriculture2,3, and water resource management4. Beyond their local significance, tropical monsoons influence global atmospheric circulation, water cycling5, and energy balance6.
However, accurately forecasting monsoon precipitation over long timescales remains a formidable challenge due to the complex interplay of initial conditions, external forcings, and internal variability7. For instance, small errors in initial conditions can amplify over time, making accurate long-term forecasts difficult8. Additionally, uncertainties in future external forcings, such as greenhouse gas emissions and volcanic activity, further complicate predictions9,10. The strong natural variability of monsoons adds another layer of difficulty, posing significant challenges for seasonal and decadal-scale forecasting11.
Despite progress, climate models struggle to capture the nonlinear processes and complex interactions that define the climate system, limiting their long-term forecasting accuracy12. Furthermore, these models require extensive observational data and sophisticated parameterizations, complicating their use in regions with limited historical records13,14. Recently, AI-based methods have enabled the development of machine learning (ML) weather prediction systems, such as FourCastNet15, GraphCast16, Pangu-Weather17, NowcastNet18, and GenCast19. However, these systems excel at short- to medium-term (up to 15 days) predictions and are currently ill-suited for addressing the complexities of long-term and seasonal-scale precipitation forecasting.
To address these challenges, network theory offers a robust approach for analyzing the dynamic and structural properties of complex systems20,21,22,23. Its application in climate science24,25,26,27 has revealed hidden connections and teleconnections within the global climate system, providing a novel basis for prediction28,29,30. By leveraging network-derived metrics, researchers have successfully predicted phenomena like El Niño events31,32 and monsoonal rainfall26,33, demonstrating the approach’s feasibility and reliability.
This study aims to address the longstanding challenge of long-lead seasonal monsoon precipitation prediction by proposing a climate network-based framework that leverages global teleconnections. Specifically, we (1) develop a network-based prediction system grounded in spatiotemporal climate connectivity; (2) apply this unified framework to forecast monsoon precipitation across four major tropical monsoon systems-the South American monsoon (Eastern Amazon Region (EAR)), the East Asian monsoon (Hainan Island (HI)), the West African monsoon (Sahel region of East Africa (SEA)), and the Indian monsoon (Central India (CI)); and (3) assess its forecasting skill relative to established dynamical models, namely European Center for Medium-Range Weather Forecasts Seasonal Forecast System 5 (SEAS5)34 and Climate Forecast System version 2 (CFSv2)35. These four regions represent diverse climatic regimes, enabling robust evaluation of the framework’s generalizability. Notably, the SEA focus area, encompassing South Sudan within the Sahel belt, exemplifies a region where predictive urgency intersects with ecological and socioeconomic vulnerability: Around 80% of South Sudan’s population depends on rain-fed agriculture36, while its ecologically sensitive landscapes-including Boma National Park, a critical refuge for African wildlife-face escalating climate risks. This dual pressure underscores the need for reliable monsoon forecasts to support food security, biodiversity conservation, and climate-resilient development. Therefore, by integrating network theory with climate predictability science, we aim to provide both improved forecasting capabilities and enhanced insight into the underlying drivers of monsoon variability.
Results
Mining network predictors
The rainy season, characterized by significant increases in precipitation, varies widely in timing, duration, and intensity across different regions. For this study, we focused on four regions within the tropical monsoon belt, illustrated as blue boxes in Fig. 1a–d: EAR, HI, SEA, and CI (See Table 1 for the specific latitudinal/longitudinal ranges of these regions). These regions represent diverse monsoonal systems driven by distinct atmospheric and oceanic processes, making them ideal test cases for our predictive framework (as depicted in Fig. 1e). To standardize the analysis, we defined the rainy season as May–October in the Northern Hemisphere and November to next year’s April in the Southern Hemisphere, based on historical precipitation patterns.
In our endeavor to identify reliable predictors for rainy season precipitation, we constructed a series of climate networks (CNs) spanning January 1950–December 2022. The dataset used for this analysis comprises daily mean 2 m air temperature data from NCEP-Reanalysis I, covering 6242 global grid points. Each grid point was treated as a node within the network, and its in-degree and out-degree were computed based on correlations with other nodes (See Methods). In our climate network framework, we quantify regional climate interactions through two fundamental metrics: in-degree and out-degree. For a given year y, the in-degree of node i, denoted as \({k}_{i}^{\,{\mbox{in}}\,+}(y)\) (positive) or \({k}_{i}^{\,{\mbox{in}}\,-}(y)\) (negative), measures the cumulative influence the region receives from global climate patterns. Physically, this metric represents the number of strong climate teleconnections (e.g., ENSO or Rossby wave activity) affecting the region, serving as an indicator of its sensitivity to external climate drivers. A high in-degree suggests the region is strongly coupled to large-scale climate variability, making it particularly responsive to global forcing mechanisms.
Conversely, the out-degree \({k}_{i}^{\,{\mbox{out}}\,+}(y)\) or \({k}_{i}^{\,{\mbox{out}}\,-}(y)\) quantifies the region’s influence on other parts of the climate system, where the sign indicates whether the relationship is reinforcing (positive) or opposing (negative). Regions exhibiting high out-degree values function as important climate signal transmitters- their temperature variations can significantly impact remote areas through atmospheric teleconnections. For instance, a node with elevated \({k}_{i}^{\,{\mbox{out}}\,+}(y)\) might amplify similar temperature anomalies downstream, while high \({k}_{i}^{\,{\mbox{out}}\,-}(y)\) could induce opposite thermal responses. In our predictor selection, we particularly focus on high out-degree nodes located within known teleconnection zones, as their robust network connectivity often translates into strong predictive relationships with monsoon precipitation variability across our target regions.
Our primary objective was to identify specific regions within the tropical monsoon belt where rainy season precipitation could be reliably predicted, alongside their corresponding network predictors. To identify robust climate predictors, we systematically examined network nodes whose in-degree or out-degree metrics exhibited strong correlations with monsoon precipitation across the four target regions (EAR, HI, SEA, and CI). A multi-stage validation procedure was employed to ensure the temporal stability and physical plausibility of these relationships under evolving climate conditions. First, we evaluated correlation strength and statistical significance (p < 0.05) across two distinct historical periods, 1950–1989 and 1990–2022, retaining only those predictors that remained significant in both intervals and exhibited stronger associations during the recent, warming-dominated period.
To further assess robustness, we performed sensitivity analyses across varying time windows and emphasized predictors that consistently demonstrated stable performance. The final set of predictors (highlighted in red dots in Fig. 1 a–d) were selected based on three complementary criteria: (1) highest correlation magnitude during the recent 33-year period (1990–2022); (2) sustained statistical significance across the full analysis window; and (3) physical alignment with established large-scale climate teleconnections, such as ENSO variability and Rossby wave propagation pathways. This selection strategy, supported by the global node screening presented in Supplementary Fig. S1, ensures that our predictors are both statistically sound and mechanistically interpretable within the broader context of climate system dynamics.
The red dots on the map indicate the locations of the identified predictors, while the blue boxes delineate the targeted forecast regions: a EAR, b HI, c SEA, and d CI. e presents the proposed prediction framework, highlighting the integration of network-based metrics and statistical modeling for monsoon precipitation forecasting. The geographic visualization was generated using Gephi combined with Google Earth Engine (Image ©2023 Google).
We designate the corresponding in-degree or out-degree of these nodes as predictors (as shown in Table. 1) for rainy season precipitation in these four regions. The calculation of rainy season precipitation for these regions is based on their respective geographical locations (See Methods). An illustrative depiction of the linear relationship between network predictors and rainy season precipitation is presented in Fig. 2, offering a clear visualization of their correlation.
a EAR, b HI, c SEA, and d CI. The blue curves (left-hand y-axis) represent the time series of observed seasonal total precipitation rates in tropical monsoon regions, while the red curves (right-hand y-axis) represent their corresponding network predictors, which are derived from the red dots in Fig.1 for the corresponding regions. The linear regression slopes are displayed in each panel. The black and green vertical lines indicate the years 1990 and 1999, respectively, dividing the entire time span into three phases: the learning phase (1950–1989), the training phase (1990–1999), and the forecasting phase (2000–2022).
Forecasting the monsoon precipitation
We identified a strong linear relationship between specific predictors and precipitation rates in each of the targeted regions (illustrated in Fig. 2). While the SEA region exhibits a negative correlation between predicted and observed precipitation, making visual pattern matching less intuitive, its predictive power remains statistically robust. Leveraging this relationship, we developed reliable forecasting algorithms using linear regression models. However, given the non-stationary nature of the climate system, where dynamics change over time, we assumed that the linear relationships would also evolve dynamically. To address this, we partitioned the most recent 33 years into two phases and fine-tuned our model by determining the optimal window length, Lopt, during the training phase (1990–1999). Rather than relying on all historical data since 1950, the model employs a moving window of data with a fixed length Lopt. Subsequently, the forecasting phase (2000–2022) is used to perform actual predictions and evaluate the algorithm’s performance. This adaptive approach allows the model to adjust to recent trends and variability, mitigating the influence of outdated or fluctuating data.
To assess the performance of our forecasting framework, we employed three key metrics: root mean square error (RMSE)37, mean absolute percentage error (MAPE)38, and Pearson correlation coefficient (PCC)39. RMSE measures the average magnitude of prediction errors, highlighting significant deviations. MAPE expresses errors as a percentage, enabling easy comparison across datasets. PCC evaluates the linear correlation between predicted and observed values, indicating trend consistency. We systematically optimized these metrics to determine the best parameter set for our prediction framework by minimizing RMSE and MAPE while maximizing PCC. Specifically, we defined an evaluation criterion λ and selected the value of L that maximized λ as the optimal window length:
where PCCnorm, RMSEnorm and MAPEnorm represent the normalized PCC, RMSE, and MAPE respectively. Our investigation revealed that employing a fixed L of 40 years, corresponding to the 40-year period preceding the predicted year y, resulted in poor performance in terms of RMSE and MAPE. This finding suggests the presence of significant systematic errors (see Supplementary Fig. S2 for details).
Consequently, we determined the optimal learning set lengths Lopt for four monsoon regions as follows: 37 years for EAR, 40 years for HI, 11 years for SEA, and 33 years for CI (see Supplementary Fig. S3 for details). Using these optimal learning set lengths, we forecasted rainy season total precipitation for the respective regions from 2000 to 2022, yielding favorable results. The forecasts achieved: RMSE values below 1.2 × 10−5 kgm−2s−1, MAPE values below 16%, and PCC values higher than 0.6 (refer to Table 2).
This prediction model demonstrates a significant advantage over alternative forecasting methods, particularly in terms of lead time. For the SEA, CI, and HI regions, our model enables precipitation forecasts with a lead time of up to 4 months. For the EAR region, the forecast lead time extends to 10 months. This advantage arises from our methodology: the climate network is constructed annually using 2 m air temperature data from January to December of the preceding year, enabling predictor values to be generated in January and forecasts to be issued well before the onset of each region’s monsoon season. Such extended forecasting capabilities are crucial for societal, economic, and environmental applications, offering improved responses to climate-related disasters, enhanced resource management, and additional temporal windows for informed decision-making and planning.
Comparative study: seasonal forecast systems vs. network-based prediction method
We conducted a comparative analysis to evaluate the predictive performance of our proposed network-based prediction method against two established seasonal climate prediction systems: the European Center for Medium-Range Weather Forecasts (ECMWF) Seasonal Forecast System 5 (SEAS5) developed by the ECMWF and the Climate Forecast System version 2 (CFSv2) developed by the National Oceanic and Atmospheric Administration (NOAA). Both SEAS5 and CFSv2 are known to provide a reasonable fit for the observed spatial distribution of precipitation, as shown in Supplementary Fig. S4. It is noteworthy that the hindcast data for SEAS5 is only available up to 2022, while the hindcast data for CFSv2 has gaps during 1990–1992 and 2017–2018. Therefore, only the period from 2000 to 2016 was used to calculate the hindcast performance of CFSv2. Numerous studies have evaluated the performance of SEAS5 and CFSv2 in tropical monsoon regions40,41,42. For instance, Ferreira et al.40 demonstrated that SEAS5 effectively captures extreme precipitation anomalies in northern South America and northeastern Brazil, despite some spatial heterogeneity. Chevuturi et al.41 reported a PCC of approximately 0.33 between SEAS5 hindcast data and observed data for the Indian monsoon. Similarly, Lang et al.42 noted that CFSv2’s forecast skill scores for rainy season precipitation in 17 regions across China, including the Pearl River Basin (HI), generally remained below 0.3.
To compare the performance of SEAS5, CFSv2, and our network-based method, we evaluated their predictive capabilities for four tropical monsoon regions—EAR, HI, SEA, and CI-using hindcast data. SEAS5 and CFSv2 hindcast data covered the periods 1990–2022 and 1993–2016, respectively, with lead times of 1 and 4 months. SEAS5 utilizes 25 ensemble members34, while CFSv2 employs 24 ensembles35. We compared these systems’ predictive performance against our network-based method using key metrics (e.g., RMSE, MAPE, and PCC). Table. 2 summarizes their performance for lead times of 1 month (refer to Table. S1 for lead-4 month results).
SEAS5 and CFSv2 exhibited their best performance in the EAR region, achieving a maximum PCC of 0.67 at a lead time of 1 month. However, our network-based method matched their PCC (0.63) at a much longer lead time of 10 months, while also delivering significantly lower MAPE and RMSE values (Table. 2). In contrast, SEAS5 and CFSv2 showed poorer performance in the other three regions, with PCC values ranging from 0.26 to 0.47 and higher errors compared to our method, as illustrated in Fig. 3 and Supplementary Fig. S5. These results are consistent with findings from prior studies40,41,42, further underscoring the advantages of our forecasting method.
a, c, e, and g display time series of predicted values (green dashed lines) and observed values (blue solid lines). The SEAS5 hindcast values with lead times of 1 month and 4 months are shown as pink and black dotted lines, respectively. The gray shaded area indicates the training phase (1990-1999), while the white area represents the forecasting phase (2000–2022). b, d, f, and h display the scatter plot of predicted values versus observed values during the forecasting phase (2000–2022) for corresponding regions. The black solid line represents the linear regression line, with the gray shaded area indicating its 99% confidence interval. The Pearson correlation coefficient (r) is provided for each relationship. Significance levels are denoted by markers: **** (p < 10−4) and ** (p < 10−2).
Beyond its superior predictive performance, the network-based approach provides valuable insights into the physical mechanisms driving climate variability, addressing the opacity of AI models that often act as “black boxes." By analyzing the annual averaged absolute weights of network links connected to predictors, we identified regions with significant influences on predictor dynamics. These insights also helped relate network indices’ predictability to underlying climate mechanisms. Key regions influencing predictors for the EAR and SEA (depicted in Fig. 4a, c) align with the Rossby wave train, consistent with the findings of Gelbrecht et al.43, which highlight the pivotal role of Rossby waves in modulating South American precipitation patterns. The propagation of Rossby waves across the atmosphere establishes teleconnections that influence precipitation variability in both the Eastern Amazon Rainforest and the Sahel region. For the HI region, influential predictors are primarily located in the eastern Pacific (Fig. 4b), a region strongly governed by the ENSO. ENSO’s global impacts on atmospheric circulation and weather patterns, including its modulation of East Asian monsoons, have been extensively documented44,45,46. The connection between ENSO phases and HI predictors underscores the critical role of large-scale ocean-atmosphere interactions in shaping rainfall patterns over East Asia. In the CI region, predictors cluster predominantly in East Asia, illustrating a teleconnection with the East Asian monsoon (Fig. 4d). This is consistent with prior studies, such as Kripalani et al.47, which demonstrated an out-of-phase relationship between monsoon precipitation in southern Japan and India. This teleconnection reflects the interplay between subtropical and tropical circulation systems, contributing to the variability of Indian monsoon rainfall.
a EAR, b HI, c SEA, and d CI. Red dots represent the locations of the predictors, while black points indicate locations of the predictors which show the top 5% strongest link, and the thickness of black lines represents the strength of the link.
These findings validate the network-based method as a powerful tool for understanding and forecasting monsoonal precipitation. By identifying and quantifying the influence of key teleconnections, this approach not only enhances predictive accuracy but also provides critical insights into the underlying physical mechanisms driving climate variability.
Discussion
Traditional global climate models (GCMs), such as SEAS5 and CFSv2, simulate atmospheric processes by numerically solving fundamental physical equations. While effective for capturing large-scale patterns, these models struggle with regional precipitation variability-particularly over land in monsoon-affected areas. For instance, SEAS5’s performance over regions like SEA and CI is constrained by its sensitivity to cumulus parameterizations and the dominance of sea surface temperature (SST)-driven signals over tropical oceans12,34. The homogenization effect in gridded GCM outputs often dilutes land-based precipitation signals, which are governed by localized land-atmosphere feedbacks and convective processes.
In contrast, our network-based framework achieves dual advantages: (1) targeted extraction of dynamically relevant signals from observational data, and (2) implicit encoding of complex interactions, such as ENSO- and Rossby wave-driven teleconnections-within the network topology. By annually updating the network using global temperature fields, the framework reduces dimensionality while preserving key spatiotemporal features governing monsoon variability. This strategy enables early-warning signal detection and robustness against structural uncertainties inherent in GCMs. Taking EAR as an illustrative example, the superior predictive performance in this region results primarily from the inherently stable interannual variability of its climatic predictors. EAR’s regional precipitation is predominantly influenced by well-defined ENSO-driven teleconnections and Rossby wave activities. These processes generate coherent, large-scale atmospheric signals, which our network is particularly adept at capturing. Conversely, regions like CI or SEA exhibit more complex land-ocean-topography interactions, introducing significant localized noise and thus decreasing signal stability and predictive accuracy. EAR’s more robust ocean-atmosphere coupling yields a higher signal-to-noise ratio, significantly enhancing the network’s predictive reliability and accuracy. Consequently, our method outperformed SEAS5 and CFSv2, achieving superior predictive accuracy (higher PCCs, lower MAPEs and RMSEs) with lead times up to 10 months.
The framework’s interpretability further addresses a critical limitation of purely AI-driven models. By identifying influential predictor regions (e.g., ENSO-associated zones) and their climatic linkages, it clarifies physical mechanisms, such as Rossby wave propagation-that underpin monsoon variability. This transparency bridges the gap between AI’s predictive power and the “black box” critique. Importantly, the framework complements rather than replaces GCMs, it bridges planetary-scale predictability with region-specific decision-making by leveraging observational constraints and teleconnection structures, thereby translating global model outputs into actionable regional insights.
Despite its strengths, several limitations require attention. First, relying on single-node predictors risks overfitting; ensemble-mean predictors (tested preliminarily) reduced overfitting but slightly lowered accuracy during extreme events. Second, while predictor regions align with known teleconnections (e.g., ENSO), causal relationships remain hypothetical, necessitating causality-based validation. Third, persistent climate shifts may demand periodic re-evaluation of network structures, despite our adaptive training strategy. Lastly, the linear framework may overlook nonlinear dynamics, suggesting opportunities to integrate ML for hybrid modeling48,49.
Emerging approaches combining CNs with ML, such as embedding network topology into AI architectures, could enhance nonlinear pattern recognition while retaining interpretability. Such hybrids may advance monsoon teleconnection analysis and refine predictions, synergizing physical insights with computational power. Future work should also explore incorporating additional variables (e.g., soil moisture, atmospheric pressure) to better resolve land-atmosphere feedbacks and extend the framework to extratropical systems or drought prediction.
By addressing GCM limitations and offering actionable insights into monsoon mechanisms, our framework advances climate prediction for disaster preparedness, agriculture, and water management. Its interpretability, adaptability, and extended lead times establish it as a critical tool for climate resilience. Future integration with AI/ML and causality-driven methods promises to further bridge the gap between global modeling and regional decision-making, fostering a more sustainable society.
Methods
Data pre-process
For each node i (i.e., longitude–latitude grid point), we calculate daily 2 m air temperature standardized anomalies Ti(d) (actual temperature value minus the climatological average and divided by the climatological standard deviation) for each calendar day. Specifically, given an actual record \({\widetilde{T}}^{y}(d)\), where y is the year (from 1948 to 2021), and d stands for the day (from 1 to 365), the anomalies record is defined as
where
Similarly, \(\,{\mbox{std}}\,({\widetilde{T}}^{y}(d))\) is calculated in the same segments. Leap days were excluded to maintain consistency in the calendar year duration and simplify the analysis.
To analyze regional precipitation patterns, we selected multiple uniformly distributed latitude-longitude grid points (referred to as nodes) to represent four monsoon regions in the tropical monsoon belt: EAR, HI, SEA, and CI. The number of nodes in each region is denoted as: NEAR, NHI, NSEA, and NCI, respectively. The spatially averaged total precipitation rates, denoted as R, were calculated for each region during their respective rainy seasons from 1950 to 2022. Taking HI as an example, where the rainy season spans from May to October, the total rainy season precipitation is computed as:
where y indicates the year when the rainy season starts, and \({{{\mathcal{R}}}}_{i}^{m,y}\) is the monthly averaged precipitation rate for node i in month m, year y.
Climate network construction
In this study, we construct a series of CNs for each calendar year y from 1949 to 2021. Each CN is built based on statistical relationships between temperature anomaly time series from different spatial locations (nodes), where each node corresponds to a specific latitude–longitude grid point on the Earth’s surface. The strength and direction of links between node pairs are determined by the measure of similarity of their respective temperature time series. To account for Earth’s spherical geometry, we selected 6242 latitude-longitude grid points as CN nodes, ensuring an approximately homogeneous global coverage. Specifically, we define the base longitude resolution near the Equator (at latitude ϕ ≈ 0.9524∘) as Δ = 1.875∘, which corresponds to n0 = 192 nodes per latitude circle. Then, for a given latitude ϕi (in degrees), the number of nodes is adjusted to ni = ⌊192/bi⌋, where \({b}_{i}=\,{\mbox{round}}\,(1/\cos {\phi }_{i})\) dynamically adjusts the longitudinal sampling to maintain near-constant surface density. The total number of nodes is \(N={\sum }_{i\in {{\mathcal{I}}}}{n}_{i}\approx {\sum }_{k = 0}^{46}2\cdot \lfloor 192\cdot \cos {\phi }_{2k}\rfloor =6242\), where \({{\mathcal{I}}}=\{0,2,\cdots \,,192\}\) indexes latitudes sampled every two grid points, and the factor of two accounts for symmetric contributions from the Northern and Southern Hemispheres. This cosine-weighted thinning ensures quasi-uniform spacing across the sphere, countering the original grid’s \(\rho \propto 1/\cos \phi\) density divergence.
Links
To construct the CN for a given year y, we compute the time-lagged Pearson correlation between all possible pairs of nodes i and j following25. This yields a cross-correlation function as follows:
and
where \(\tau \in [0,{\tau }_{\max }]\) is the time lag, with \({\tau }_{\max }=200\) days ensuring a reliable estimate of the background noise level, and 〈 ⋅ 〉 denote the average over the past 365 days, i.e.,
where t stands for the end of time series (i.e., 31 Dec. of the year y).
Next, we identify two key correlation values:
-
\(\max {{C}_{ij}^{y}}\): the maximum (most positive) correlation and corresponding lag τ+;
-
\(\min {{C}_{ij}^{y}}\): the minimum (most negative) correlation and corresponding lag τ−.
These correlations extrema define the directionality of the network:
-
if τ+ > 0, the positive link is directed from node i to node j, indicating that changes in i precede similar changes in j;
-
if τ− > 0, a negative link is directed from i to j, indicating that changes in i precede opposite changes in j.
This process yields two directed unweighted networks: one for positive and one for negative correlations. We then assign weights to each link based on the z-scores of the correlation values (standardized relative to the null distribution), resulting in two directed weighted CNs per year as follows:
and
where mean and std are the mean and standard deviation of the cross-correlation function, respectively.
Subsequently, we establish two distinct networks for each calendar year to differentiate between positive and negative link strengths. This dual-network approach allows us to capture both positive and negative interactions between nodes. Each network is characterized by an anti-symmetric adjacency matrix denoted as A(y), where each entry aij(y) signifies the presence of a link from node i to node j with a value of 1, while a link from node j to node i is represented by − 1 for the year y.
Node degree
To explore the dynamic aspects of the global climate system related to precipitation within tropical monsoon regions, we analyze the temporal evolution of key CNs properties. Our focus is on the degree centrality of nodes, a fundamental parameter in network theory that quantifies the total number of connected links21. Within the positive and negative CNs of a given year y, each node has two distinct degree measures: in-degree and out-degree. The in-degree represents the average strength of incoming links, while the out-degree signifies the average strength of outgoing links. Mathematically, these degree measures are defined as:
and
where N equals to the total number of nodes (i.e., 6242) and I is the indicator function.
Linear Models Construction
Within our proposed framework, we employ a linear model with an updating learning set to forecast precipitation for a given year y. In this approach, we utilize data from the L preceding years to make the prediction, where L ranges from 5 to 40 years. Specifically, for predicting the precipitation rate in year y, the learning set comprises the L years immediately preceding that target year as follows:
Moreover, we have a vector of Predictor corresponding to the precipitation rate vector Yy, which is denoted as:
Next, to solve the linear model,
we employ the ordinary least squares regression technique, which provides the optimal linear unbiased estimator of the model parameters. Thus, the predicted value of year y can be calculated as:
In constructing the climate network, we use 2 m air temperature anomaly time series from January 1st to December 31st of each year. Therefore, the full set of network-based predictors for a given year y, can be computed in early January of year y + 1. This allows us to generate precipitation forecasts several months ahead of the upcoming monsoon seasons. Specifically, For HI, CI, and SEA regions, where the monsoon rainy seasons begin in May, predictions can be made in January, yielding a lead time of ~4 months. For the EAR region, where the monsoon typically begins in November, this approach provides a lead time of up to 10 months.
Evaluation metrics
In addition to the conventional use of PCC, we also evaluate the forecasting skill associated with each specific L using metrics, such as RMSE and MAPE. RMSE is widely employed in climate and environmental research37. For the predicted values \(\tilde{R}(y)\) and the observed values R(y) of our proposed model, the RMSE is calculated as
On the other hand, MAPE provides a measure of the relative error and is defined as:
MAPE is particularly useful in tasks where sensitivity to relative variations is more important than sensitivity to absolute variations. Both RMSE and MAPE have a numerical range of [0, + ∞), with smaller values indicating better predictive performance.
Data availability
The datasets used in this study were obtained from authoritative and publicly accessible sources as detailed below: (1) 2 m air temperature data (daily mean, 1948–2021, UTC 00:00) were obtained from the NCEP/NCAR Reanalysis I dataset. The data are provided on a Gaussian grid with a spatial resolution of approximately 1.875∘ × 1.875∘. This dataset was selected due to (a) its well-established correlation with SST fields, facilitating the representation of large-scale climate interactions, and (b) its regular updates, supporting long-term climate analysis. (Access: https://psl.noaa.gov/data/gridded/data.ncep.reanalysis.html). (2) Monthly mean precipitation rate data (1950–2022) were also sourced from NCEP/NCAR Reanalysis I, covering the same spatial resolution. (Access: https://downloads.psl.noaa.gov/Datasets/ncep.reanalysis/Monthlies/surface_gauss/). (3) Hindcast precipitation data from two operational seasonal forecasting systems were used for validation: the ECMWF and the National Centers for Environmental Prediction. These institutes support two seasonal forecast systems, SEAS5 and CFSv2, respectively. (a) The SEAS5 (system id = 5) system provides continuous data coverage from 1981 to 2022, with subsequent data being generated by the updated SEAS5 system (system id = 51). (b) The CFSv2 system offers data from 1993 to present, with missing records for 2017 and 2018. (Data accessible at: https://cds.climate.copernicus.eu/datasets/seasonal-monthly-single-levels?tab=download). (4) ERA5 monthly total precipitation data (1990–2022) were used for supplementary validation and were sourced from the Copernicus Climate Change Service. (Access: https://cds.climate.copernicus.eu/datasets/reanalysis-era5-single-levels-monthly-means?tab=download).
Code availability
The Python codes used for the analysis are available on GitHub (https://github.com/fanjingfang/Monsoon).
References
Fu, C. & Wen, G. Variation of ecosystems over East Asia in association with seasonal, interannual and decadal monsoon climate variability. Clim. Change 43, 477–494 (1999).
Wang, B., Gadgil, S. & Rupa Kumar, K. The asian monsoon-agriculture and economy. Asian Monsoon 651–683 (Springer, 2006).
Gadgil, S. & Gadgil, S. The indian monsoon, GDP and agriculture. Econ. Polit. Weekly 41, 4887–4895 (2006).
Balek, J. Hydrology and Water Resources in Tropical Regions (Elsevier, 1983).
Worden, J., Noone, D. & Bowman, K. Importance of rain evaporation and continental convection in the tropical water cycle. Nature 445, 528–532 (2007).
Trenberth, K. E., Stepaniak, D. P. & Caron, J. M. The global monsoon as seen through the divergent atmospheric circulation. J. Clim. 13, 3969–3993 (2000).
Zhou, T., Hsu, H.-H. & Matsumoto, J. Summer monsoons in East Asia, Indochina and the western North Pacific. In Proc. Global Monsoon System: Research and Forecast, 43–72 (World Scientific, 2011).
Slingo, J. & Palmer, T. Uncertainty in weather and climate prediction. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 369, 4751–4767 (2011).
Collins, M. et al. Long-term Climate Change: Projections, Commitments and Irreversibility (IPCC, 2013).
Wang, B. et al. Rethinking indian monsoon rainfall prediction in the context of recent global warming. Nat. Commun. 6, 7154 (2015).
Kajtar, J.B., Santoso, A., England, M.H. & Cai, W. Tropical climate variability: interactions across the Pacific, Indian, and Atlantic oceans.Clim. Dyn. 48, 2173–2190 (2017).
Sperber, K. et al. The Asian summer monsoon: an intercomparison of CMIP5 vs. CMIP3 simulations of the late 20th century. Clim. Dyn. 41, 2711–2744 (2013).
Fan, K., Liu, Y. & Chen, H. Improving the prediction of the east asian summer monsoon: new approaches. Weather Forecast. 27, 1017–1030 (2012).
Shi, P. et al. Significant land contributions to interannual predictability of east asian summer monsoon rainfall. Earth’s. Future 9, e2020EF001762 (2021).
Pathak, J. et al. Fourcastnet: a global data-driven high-resolution weather model using adaptive fourier neural operators. arXiv preprint arXiv:2202.11214 (2022).
Lam, R. et al. Learning skillful medium-range global weather forecasting. Science 382, 1416–1421 (2023).
Bi, K. et al. Accurate medium-range global weather forecasting with 3D neural networks. Nature 619, 533–538 (2023).
Zhang, Y. et al. Skilful nowcasting of extreme precipitation with nowcastnet. Nature 619, 526–532 (2023).
Price, I. et al. Gencast: diffusion-based ensemble forecasting for medium-range weather. arXiv preprint arXiv:2312.15796 (2023).
Watts, D. J. & Strogatz, S. H. Collective dynamics of -small-world’networks. Nature 393, 440–442 (1998).
Albert, R. & Barabási, A.-L. Statistical mechanics of complex networks. Rev. Mod. Phys. 74, 47 (2002).
Newman, M. Networks (Oxford University Press, 2018).
Cohen, R. & Havlin, S. Complex Networks: Structure, Robustness and Function (Cambridge University Press, 2010).
Tsonis, A. A. & Roebber, P. J. The architecture of the climate network. Phys. A Stat. Mech. Appl. 333, 497–504 (2004).
Yamasaki, K., Gozolchiani, A. & Havlin, S. Climate networks around the globe are significantly affected by El Niño. Phys. Rev. Lett. 100, 228501 (2008).
Boers, N. et al. Prediction of extreme floods in the eastern central Andes based on a complex networks approach. Nat. Commun. 5, 5199 (2014).
Boers, N. et al. Complex networks reveal global pattern of extreme-rainfall teleconnections. Nature 566, 373–377 (2019).
Feng, Q. Y. & Dijkstra, H. Are North Atlantic multidecadal SST anomalies westward propagating? Geophys. Res. Lett. 41, 541–546 (2014).
Dijkstra, H. A., Hernández-García, E., Masoller, C. & Barreiro, M. Networks in Climate (Cambridge University Press, 2019).
Ludescher, J. et al. Network-based forecasting of climate phenomena. Proc. Natl. Acad. Sci. 118, e1922872118 (2021).
Ludescher, J. et al. Improved El Niño forecasting by cooperativity detection. Proc. Natl. Acad. Sci. 110, 11742–11745 (2013).
Meng, J. et al. Complexity-based approach for El Niño magnitude forecasting before the spring predictability barrier. Proc. Natl. Acad. Sci. 117, 177–183 (2020).
Fan, J. et al. Network-based approach and climate change benefits for forecasting the amount of indian monsoon rainfall. J. Clim. 35, 1009–1020 (2022).
Johnson, S. J. et al. Seas5: the new ECMWF seasonal forecast system. Geosci. Model Dev. 12, 1087–1117 (2019).
Saha, S. et al. The NCEP climate forecast system version 2. J. Clim. 27, 2185–2208 (2014).
Abdalla, A. A. & Abdel Nour, H. The agricultural potential of Sudan. Exec. Intell. Rev. 28, 37–45 (2001).
Willmott, C. J. & Matsuura, K. Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Clim. Res. 30, 79–82 (2005).
De Myttenaere, A., Golden, B., Le Grand, B. & Rossi, F. Mean absolute percentage error for regression models. Neurocomputing 192, 38–48 (2016).
Cohen, I. et al. Pearson correlation coefficient. Noise Reduction in Speech Processing 1–4 (Springer, 2009).
Ferreira, G. W., Reboita, M. S. & Drumond, A. Evaluation of ECMWF-SEAS5 seasonal temperature and precipitation predictions over South America. Climate 10, 128 (2022).
Chevuturi, A. et al. Forecast skill of the indian monsoon and its onset in the ECMWF seasonal forecasting system 5. Clim. Dyn. 56, 2941–2957 (2021).
Lang, Y. et al. Evaluating skill of seasonal precipitation and temperature predictions of NCEP CFSv2 forecasts over 17 hydroclimatic regions in China. J. Hydrometeorol. 15, 1546–1559 (2014).
Gelbrecht, M., Boers, N. & Kurths, J. Phase coherence between precipitation in South America and Rossby waves.Sci. Adv. 4, eaau3191 (2018).
Changnon, S. A. El Niño 1997-1998: the Climate Event of the Century (Oxford University Press, 2000).
Ronghui, H. & Yifang, W. The influence of ENSO on the summer climate change in China and its mechanism. Adv. Atmos. Sci. 6, 21–32 (1989).
Yuan, Y. et al. The 2016 summer floods in China and associated physical mechanisms: a comparison with 1998. J. Meteorol. Res. 31, 261–277 (2017).
Kripalani, R. H. & Kulkarni, A. Monsoon rainfall variations and teleconnections over south and east Asia. Int. J. Climatol. A J. R. Meteorol. Soc. 21, 603–616 (2001).
Nooteboom, P. D., Feng, Q. Y., López, C., Hernández-García, E. & Dijkstra, H. A. Using network theory and machine learning to predict El Niño. Earth Syst. Dyn. 9, 969–983 (2018).
Petersik, P. J. & Dijkstra, H. A. Probabilistic forecasting of El Niño using neural network model. Geophys. Res. Lett. 47, e2019GL086423 (2020).
Acknowledgements
This work was supported by the National Natural Science Foundation of China (Grant No. 42450183, 12275020, 12135003, 12205025, and 42461144209) and the Ministry of Science and Technology of China (2023YFE0109000). J.F. is supported by the Fundamental Research Funds for the Central Universities.
Author information
Authors and Affiliations
Contributions
J.M. and J.F. designed the research. G.R. performed the analysis. G.R., J.M., and J.F. prepared the manuscript, G.R., J.M., and J.F. discussed results, and contributed to writing the manuscript. J.M. and J.F. led the writing of the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Communications Earth & Environment thanks and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary handling editor: A.L. [A peer review file is available].
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Ran, G., Meng, J. & Fan, J. Tropical monsoon rainfall can be predicted with lead times up to 10 months. Commun Earth Environ 6, 417 (2025). https://doi.org/10.1038/s43247-025-02391-1
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s43247-025-02391-1






