Introduction

El Niño-Southern Oscillation (ENSO) is one of the most prominent modes of interannual climate variability, characterized by shifts in sea surface temperatures (SST) across the tropical Pacific Ocean and the weakening of equatorial trade winds. ENSO exerts profound global influences on weather patterns, agriculture, and socio-economic systems by driving variability in precipitation, temperature, as well as extreme events such as droughts and floods1,2,3. Traditional statistical and dynamical models have demonstrated predictive skill within a lead time of about 12 months (where effective forecast lead time is defined as the period during which the correlation between the forecasted ENSO index and the observed value remains above 0.50)4,5,6,7,8,9. However, ENSO prediction remains a formidable challenge due to the system’s inherent nonlinearity, stochasticity, and multivariate dependencies5,10,11,12,13,14, with one of the most persistent limitations being the spring predictability barrier (SPB)15.

Recent advances in deep learning (DL) have demonstrated transformative potential in ENSO forecasting. Convolutional neural networks (CNNs) have demonstrated remarkable skill in capturing spatial features, extending the effective forecast lead time beyond 15 months16,17. Other DL techniques, such as recurrent neural networks (RNNs)18, convolutional long short-term memory (LSTM) neural networks19, graph neural networks20, and transformers21,22,23, have further enhanced spatiotemporal dependency modeling, successfully extending forecast lead times to 18 months and beyond, with some models achieving performance exceeding 20 months23,24,25,26. Among these advancements, transformer-based architectures have emerged as a particularly powerful approach, leveraging self-attention mechanisms to capture complex, long-range dependencies across three-dimensional ENSO dynamics. However, despite these improvements, no single architecture is universally optimal for ENSO forecasting. For instance, CNNs and transformers, two of the most widely adopted DL models in this field, each exhibit distinct strengths and limitations. CNNs, while adept at extracting spatial features, struggle to capture long-term dependencies. Meanwhile, transformers, despite their ability to model global interactions, often require large datasets and lack the inductive biases necessary to recognize key local ENSO precursors16,27,28. To address these challenges, CNN-transformer hybrid models have been introduced in geoscience29,30,31, and their application to ENSO forecasting has demonstrated robust performance up to 18 months across diverse test datasets32. However, this current approach predominantly relies on an encoder-based, SST-only architecture, failing to incorporate the critical ocean-atmosphere interactions that govern ENSO evolution33.

In response to these limitations, we introduce CTEFNet (Convolutional Transformer ENSO Forecast Network), a novel hybrid deep learning model that synergistically integrates CNNs within a transformer encoder-decoder architecture. By leveraging the complementary strengths of these architectures and incorporating a comprehensive set of oceanic and atmospheric variables, CTEFNet effectively captures the multivariate precursors of ENSO evolution. Our model achieves state-of-the-art forecast performance, extending the effective lead time to 20 months. Moreover, CTEFNet successfully mitigates the SPB, underscoring its robustness and reliability in long-range ENSO forecasting.

Beyond its predictive superiority, CTEFNet enables further physical interpretability when combined post hoc with a novel gradient-based sensitivity analysis34,35, inspired by the principles of adjoint modeling techniques36,37,38,39. Unlike conventional sensitivity analysis that relies on an ensemble of forward modeling with perturbed inputs16,21, adjoint sensitivity analysis, widely used in ocean and climate modeling, quantifies how perturbations in an objective function propagate backward through the evolution of a system38. However, adjoint models are computationally expensive and often constrained by linearity assumptions, which limit their applicability to complex, nonlinear systems. In contrast, backpropagation gradients offer a computationally efficient and nonlinear assessment of input influence on the objective, in this case, ENSO evolution. This approach enables a systematic evaluation of the relative importance of different inputs across varying temporal and spatial scales25,32,40. While conventional DL gradient-based methods primarily evaluate how input perturbations affect predictive performance23, our approach combines DL gradients with adjoint principles to derive a dynamic, spatiotemporally evolving sensitivity analysis, revealing the physical mechanisms that drive ENSO formation from a data-driven perspective.

Our sensitivity analysis uncovers physical precursors of ENSO events consistent with established mechanisms41,42,43,44,45,46,47,48,49. Furthermore, it reveals new insights into the development of inter-basin interactions, advancing our understanding of ENSO’s global influence. By improving both predictive skill and interpretability, this study highlights the critical role of multivariate coupling in ENSO dynamics and underscores the value of deep learning in climate science. These findings establish CTEFNet as a practical and scalable solution for long-term ENSO forecasting, bridging the gap between data-driven predictions and physical understanding of climate variability.

Results

CTEFNet, built upon a novel CNN-transformer hybrid architecture (MATERIALS AND METHODS), integrates key ocean-atmosphere variables from the Coupled Model Intercomparison Project Phase 6 (CMIP6) SSP370 dataset, spanning 2015 to 2100. This dataset, representing a medium-to-high emissions scenario, incorporates future climate forcings, providing a comprehensive depiction of ENSO dynamics in a warming climate. Using a 12-month predictor window, CTEFNet incorporates SST, heat content (HC), mixed layer depth (MLD), sea surface salinity (SSS), sea level pressure (SLP), and the zonal and meridional components of ocean surface current velocity (UO, VO) and wind stress (TAUU, TAUV). CTEFNet predicts the evolution of the Niño 3.4 index over a 24-month horizon, and its performance is rigorously assessed through prediction skill evaluations and sensitivity analysis, utilizing reanalysis data from the Global Ocean Data Assimilation System (GODAS) and the fifth-generation ECMWF atmospheric reanalysis (ERA5) from 1980 to 2021.

Predictive skill of CTEFNet in ENSO forecasting

To assess the forecasting skill of CTEFNet, we utilize the Niño 3.4 index, a widely used SST anomaly-based metric, to characterize ENSO variability. CTEFNet exhibits superior forecast skill, significantly outperforming the North American Multi-Model Ensemble (NMME) dynamical models50, as well as state-of-the-art DL approaches including CNN16, Geoformer21, ResCNN24, ResoNet32, and STPNet26. As shown in Fig. 1, CTEFNet demonstrates markedly higher predictive accuracy in terms of correlation coefficient, particularly for mid-to-long-term forecasts with lead times beyond 6 months. It maintains all-season correlation skills above 0.7 for forecast leads up to 12 months, extending its predictive horizon by 3 months longer than existed deep learning models, and by over 5 months longer than dynamical models. Additionally, CTEFNet sustains all-season correlation skills above 0.6 for forecast leads extending beyond 17 months, a 4-month improvement over existing deep learning models. Although previously developed DL models exhibit strong forecasting performance on specific datasets, their forecasting capabilities typically diminish when trained on the CMIP6 SSP370 data, often limiting their effective lead time to around 17 months. One possible reason is that these models may fail to capture physically meaningful patterns, which can hinder performance on datasets governed by diverse climate dynamics. To better understand this limitation—as well as the strong predictive performance of CTEFNet—we present a sensitivity analysis in the later section. For rigorous benchmarking, we also compare CTEFNet with the ECMWF SEAS5-20C reforecast system51 under matched initialization periods, where CTEFNet consistently achieves higher forecast skill (Fig. S1). Furthermore, to illustrate the ability of CTEFNet to forecast specific events, we present a case study of the 1997–1998 El Niño. This analysis demonstrates that CTEFNet achieves more accurate predictions of both the timing and magnitude of the event relative to other models (Fig. S2).

Fig. 1: ENSO correlation skill in CTEFNet and other models.
figure 1

The all-season correlation skill of the three-month-moving-averaged Niño 3.4 index as a function of the forecast lead month in CTEFNet (solid orange), CNN(solid deep blue), Geoformer (solid purple), ResCNN (solid green), ResoNet (solid brown), STPNet (solid pink), persistent forecast (dash black) and the dynamical forecast systems included in the NMME project (dash with other colors). The shading around the lines for the DL models denotes the 95% confidence interval, based on the bootstrap method.The validation period is between 1980 and 2021.

To further evaluate seasonal variations in forecast performance, Fig. 2a presents the seasonal correlations skills of our CTEFNet across different lead times. The correlation skill remains above 0.5 for over 20 months from June to December, and for 16 months during the boreal spring (March to May), despite the visible influence of the SPB. Additionally, Fig. 2b, c shows that CTEFNet significantly outperforms CNN and Geoformer in predicting both autumn and winter conditions, while also excelling in mid- to long-term predictions for the boreal spring. These results highlight CTEFNet’s capability not only in achieving high accuracy but also in mitigating the challenges of seasonal forecast degradation.

Fig. 2: The seasonality and lead-time of CTEFNet’s performance.
figure 2

a Contour plot of correlation skills for CTEFNet across calendar months during the test period (1980–2021) at different lead times. The horizontal axis denotes the forecast lead month, while the vertical axis represents the calendar month. b Same as a, but showing the correlation skill difference between CTEFNet and CNN. c Same as a, but showing the correlation skill difference between CTEFNet and Geoformer.

Unveiling ENSO precursors through gradient-based sensitivity analysis

By computing the backpropagation gradients of the multivariate inputs with respect to the Ninño 3.4 index, we quantify the relative influence of key ocean-atmosphere variables across different lead times, a measure we refer to as sensitivity52,53,54(Fig. 3, Materials and methods, and Figs. S3S7). Through systematic sensitivity analysis based on this novel approach, we demonstrate that CTEFNet captures the seasonal evolution and propagation of ENSO signals prior to its maturity. The learned representation of ENSO’s physical mechanisms by CTEFNet, as revealed through the sensitivity analysis, could be a plausible reason for the model’s enhanced predictive performance.

Fig. 3: The precursors and underlying mechanisms of ENSO forecasting revealed by CTEFNet.
figure 3

a Sensitivity analysis periods, with red indicating predicted target periods and gray representing 12-month input periods. be Averaged sensitivities across multiple El Niño events, retaining only grid point values that are statistically significant at the 95% confidence level, illustrating the contributions of various predictors across different months. Colors denote the sensitivities of SST, HC, SLP, and MLD, while vectors represent those of UO, VO, TAUU, and TAUV. The green box marks the Niño 3.4 region.

The sensitivity analysis was conducted for 11 El Niño events (1982, 1987, 1991, 1994, 1997, 2002, 2004, 2006, 2009, 2015, and 2018) from 1980 to 2021. The prediction targets were determined based on the Niño 3.4 index in November of the El Niño year (when El Niño typically peaks) and the subsequent months (when the index exceeds 0.5), with corresponding inputs derived from the 12-month period preceding November (Fig. 3a). To identify robust precursor signals, we computed the average sensitivities across these El Niño events at each lead month, revealing a sequence of physically meaningful precursor patterns. These sensitivities were statistically assessed using a Student’s t-test, and only grid points with sensitivities significantly different from zero at the 95% confidence level are retained. This filtering step ensures that our interpretations are grounded in statistically significant and robust signals, thereby enhancing the reliability of the identified precursors (Fig. 3b–e). This approach enables a systematic quantification of both positive and negative sensitivities, providing clear insights into the distinct impact of each variable on ENSO evolution. Additionally, the use of normalized input data allows for direct comparison of variable contributions across different time periods and regions, establishing a comprehensive framework for understanding ENSO predictability. To provide a broader perspective, we also conducted a parallel sensitivity analysis for La Niña events (Fig. S7). While La Niña sensitivity patterns generally exhibit a spatial distribution opposite to those of El Niño events, asymmetries emerge in the seasonal evolution of inter-basin interactions across the equatorial oceans. These findings highlight the potential for further investigations into asymmetric ENSO dynamics, which could refine our understanding of ENSO predictability32.

Our sensitivity analysis reveals that CTEFNet captures the early precursors of El Niños, particularly through the sensitivity fields of HC and MLD in the equatorial Pacific observed in November, approximately 11 months before the El Niño peaks (Fig. 3b). These signals align with the recharge phase of the recharge oscillation mechanism42, where positive HC sensitivity in the equatorial western Pacific (WP) indicates the accumulation of warm surface waters and a deepening thermocline, establishing favorable preconditions for El Niño development. Concurrently, negative MLD sensitivity across the tropical Pacific, typically associated with a thickened ocean barrier layer55,56, suggests a suppression of vertical entrainment and mixing, facilitating heat retention at the surface and reinforcing conditions conducive to El Niño initiation57,58.

By April (eight months before El Niño peaks), the El Niño-related signals are ubiquitous throughout the global equatorial oceans. (Fig. 3c). The positive sensitivity of HC in the WP intensifies and extends into the central Pacific (CP), eastern Pacific (EP), and eastern Indian Ocean (IO). This eastward expansion of the Pacific warm pool coincides with a strengthening positive SST sensitivity in the tropical EP and an intensified westerly wind sensitivity over the tropical Pacific, indicating multivariate ocean-atmosphere interactions preceding El Niño events. These patterns mark the initiation of the Bjerknes positive feedback mechanism41, wherein a rise in SST in the EP weakens the zonal temperature gradient, enhancing westerly wind, which in turn amplifies eastward warm water transport, reinforcing positive SST anomalies in the EP. In contrast, HC and SST in the North Tropical Atlantic (NTA) exhibit negative sensitivities (Fig. 3c). Cooling in the NTA has been shown to suppress convection in the tropical Atlantic43, which alters the position and intensity of the Pacific Intertropical Convergence Zone (ITCZ) through air-sea interactions, particularly via moist static energy feedback processes44,59,60. These atmospheric adjustments weaken the Pacific trade winds, reduce upwelling, and facilitate the accumulation of warm water in the EP and CP. This sequence of processes ultimately enhances the eastward propagation of warm Kelvin waves, disrupts the normal Pacific circulation, and establishes a critical inter-basin teleconnection pathway linking Atlantic variability to ENSO61,62. Additionally, increased ocean current speeds and enhanced wind stress to the south in the tropical Atlantic contribute positively to El Niño formation in our sensitivity analysis. These changes could intensify cross-equatorial heat transport, modulate surface fluxes, and alter large-scale atmospheric circulation, which may promote adjustments in the Walker circulation and further amplifying El Niño development.

From April to August (three months before El Niño peaks), the positive SST sensitivity field in the EP progressively expands westward into the CP and WP, while westerly wind stress sensitivity in the WP intensifies and negative MLD sensitivity in the EP and CP becomes more pronounced (Fig. 3d). These shifts align with the canonical evolution of El Niño, highlighting the complex multivariate interactions driving ENSO development. The evolving SST pattern alters atmospheric pressure systems, further weakening the trade winds, which in turn amplifies warm SST anomalies across a broader region of the Pacific. As the trade winds weaken and SST variations intensify, shoaling MLD in the EP and CP reduces heat exchange between the deep ocean and surface waters, accelerating the warming of surface waters. During this period, the negative SST sensitivity field in the NTA propagates southward, encompassing the entire tropical Atlantic. This cooling suppresses convection and strengthens descending motions over the Atlantic, reinforcing the descending branch of the Walker circulation and leading to compensatory westerly wind anomalies over the tropical Pacific46. Additionally, wind stress sensitivities associated with this Atlantic cooling, characterized by easterlies off Central America and over South America, further strengthen the teleconnection between the Atlantic and Pacific, establishing a robust link between Atlantic variability and ENSO dynamics45,46. Simultaneously, negative SST sensitivities emerge in the eastern IO, influencing the intensity and spatial configuration of the Indian Ocean Dipole (IOD). This shift in the IOD modulates the Walker circulation, generating compensatory westerly wind anomalies over the tropical Pacific, further promoting El Niño development47,49. The interaction between the IO and Pacific SSTs further strengthens the ocean–atmosphere feedback mechanisms driving the intensification of El Niño events.

By October (one month before El Niño peaks), SST sensitivity in the tropical Pacific continues to increase as El Niño conditions mature (Fig. 3e), further amplifying ENSO’s ocean-atmosphere feedback loop. Simultaneously, wind stress sensitivity and ocean surface current sensitivity indicate that strengthening westerly wind stress and intensified westward ocean current velocity transport warm water from the WP to the CP and EP, leading to warm water accumulation in the EP and a corresponding decrease in HC in the WP. This results in contrasting HC sensitivities: while the WP experiences a negative impact, the EP exhibits a positive response. Moreover, negative MLD sensitivivty field across the tropical pacific suggests that shoaling of mixed layer further enhances surface heat accumulation, reinforcing the positive feedback loop driving El Niño development. Concurrently, negative SST sensitivity persists in the tropical Atlantic, along with basin-wide negative sensitivity fields over the northern and equatorial IO, the South China Sea, and off the northern Australian coasts. Notably, cooling in the IO induces divergence and westerly winds over the western tropical Pacific Ocean, a process consistent with the Matsuno–Gill response63,64. This dynamic adjustment triggers the eastward propagation of downwelling Kelvin waves, which further reinforces El Niño development48. The resulting SST sensitivity patterns are accompanied by notable negative SLP sensitivity fields in the EP and CP, alongside positive SLP sensitivity fields over the Atlantic and Australian regions. This pressure pattern modulates large-scale wind fields and strengthens ocean current sensitivity toward the eastern Pacific, further amplifying the transport of warm water to the CP and EP, reinforcing El Niño intensification.

The identified precursor signals and sensitivity patterns derived from CTEFNet align closely with the canonical evolution of El Niño, demonstrating that our model effectively represents key physical processes underpinning ENSO development and propagation.

Discussion

Recent advancements in DL have revolutionized ENSO forecasting, offering a powerful data-driven framework capable of capturing its highly nonlinear dynamics. However, despite these significant strides, challenges persist in achieving robust and interpretable multivariate ENSO predictions. To address these gaps, we introduce CTEFNet, a novel CNN-transformer hybrid model designed to effectively capture the coupled spatiotemporal interactions governing ENSO evolution. Compared to the CNN-transformer model proposed by Lyu et al.32, which relies on an encoder-based architecture to process only SST through separate CNN and Transformer modules, CTEFNet employs a distinctive encoder-decoder framework. A key advantage of this design lies in its sequential prediction capability, which enables CTEFNet to dynamically capture the evolving multivariate interactions at each time step. This architecture makes CTEFNet particularly well-suited for sequence generation and long-range forecasting. To systematically assess the importance of the model design and input features, we performed ablation studies comparing CTEFNet with CNN-based and Transformer-based models16,21, and evaluating different input combinations (using only SST, SST plus sea variables, SST plus air variables, and all combined) (Fig. S8). Results show that both the hybrid architecture and key multivariate inputs are essential for CTEFNet’s superior forecasting skill. CTEFNet’s predictive performance is further enhanced through training on an ensemble dataset from CMIP6, allowing it to account for subtle differences in physical mechanisms across multiple climate models. This ensemble-based learning approach significantly improves CTEFNet’s ability to capture implicit multivariate ENSO dynamics, which may not be fully represented in a deterministic dynamical model. As a result, CTEFNet achieves effective long-lead ENSO forecasts up to 20 months while significantly reducing the SPB, outperforming both traditional dynamical models and state-of-the-art DL models.

Beyond its superior predictive performance, CTEFNet contributes to improved interpretability of deep learning-based climate forecasting when coupled with a post hoc gradient-based sensitivity analysis. We propose this gradient-based approach as a practical alternative to conventional adjoint models, to identify global ENSO precursors and their underlying mechanisms. Unlike traditional adjoint methods, which are often computationally prohibitive and constrained by linearity assumptions, our gradient-based approach offers a more flexible and nonlinear representation of ENSO dynamics. This method overcomes the common challenge of inadequately managing nonlinear responses in physical oceanography and climate science, providing a more faithful representation of complex system dynamics. Meanwhile, a major strength of our approach is its efficiency, derived naturally as a byproduct of the DL model, eliminating the need for additional computationally intensive integration steps. This inherent efficiency makes it highly scalable and well-suited for large-scale climate simulations. The integration of this methodology represents a step forward in enhancing the robustness and applicability of climate sensitivity analysis, particularly in scenarios where nonlinear interactions are significant. Our sensitivity analysis with CTEFNet reveals physical precursors to El Niño events that are consistent with some established mechanisms. Specifically, the sequential buildup of heat in the WP and its eventual release to the CP and EP aligns with the recharge-oscillation mechanism. Additionally, the positive feedback loop in the tropical Pacific, characterized by rising SSTs in the CP and EP and the amplification of westerly winds, strongly reflects the Bjerknes feedback mechanism. Notably, our analysis also highlights the role of inter-basin interactions in ENSO variability, revealing that cooling in the tropical Atlantic and Indian Oceans influences large-scale wind patterns and atmospheric circulation, ultimately modulating El Niño formation through cross-basin teleconnections. In parallel, our analysis for La Niña events reveals generally opposite spatial sensitivity patterns, with asymmetries in their seasonal evolution and inter-basin interactions, suggesting event-wise complexity in ENSO dynamics.

In this regard, another recent approach using Swin Transformer65 could achieve comparable predictive performance of Niño 3.4 index with good computational efficiency. However, it exhibits less stable gradient-based sensitivity analysis than CTEFNet, suggesting that it fail to consistently produce physically meaningful attribution patterns under limited data conditions (Fig. S9). This limitation likely stems from the absence of strong inductive biases, which are inherently provided by the CNN component in CTEFNet. CNNs enable robust feature extraction, maintaining stability even in the presence of noisy data66,67,68. While Swin Transformer incorporates spatial localization through its hierarchical structure and sliding windows, its inductive biases are weaker and less explicitly defined compared to those of CNNs. Consequently, Swin Transformer may exhibit instability when trained on limited datasets, leading to gradient fluctuations and reduced reliability in sensitivity analysis69,70.

Furthermore, CTEFNet’s sensitivity analysis reveals new insights into El Niño’s seasonal evolution, particularly regarding inter-basin influences. For instance, we identify a persistent cooling signal in the tropical Atlantic from spring through autumn, with easterly wind anomalies near Central and South America strengthening the teleconnection between the Atlantic and Pacific Oceans. In the Indian Ocean, sensitivity fields extend from the eastern basin in summer to the northern and equatorial regions by autumn, underscoring the evolving nature of inter-basin interactions. These findings paint a more dynamic picture of ENSO’s seasonal progression than conventional models suggest, indicating that inter-basin interactions may be far more critical in driving ENSO’s peak-phase characteristics than previously recognized.

Despite its strengths, CTEFNet’s current implementation remains focused on predicting the Niño 3.4 index, limiting its direct application to ENSO diversity71,72,73. A natural extension of this work would involve enhancing the model’s capability to differentiate between Modoki and canonical ENSO events, which exhibit distinct climatic impacts. Additionally, while CTEFNet has demonstrated significant improvements in mitigating the SPB, it remains a formidable challenge. The persistent decline in model performance during spring is typically attributed to the complex dynamics of the tropical climate system, including shifts in wind patterns and ocean currents that are not well-captured by existing models. Addressing this limitation requires a concerted effort to dissect the underlying mechanisms of SPB and to develop refined modeling approaches that can account for these intricate seasonal variations. Innovative methodologies, possibly integrating higher-resolution data and advanced machine learning techniques, could improve predictions during this challenging season. Such advancements would not only enhance the accuracy of climate models like CTEFNet but also broaden their applicability in real-world climate strategy and policy-making, where understanding and anticipating climate variability is crucial.

Methods

Data and processing methods

The performance of DL models is largely determined by both the quantity and quality of training data. However, the observation data for extreme climate events, such as ENSO, is often insufficient to provide adequate sampling. To address this limitation, simulation data from 18 CMIP6 climate models (2015–2100) are utilized for model training (Table S1). While CMIP6 models are known to exhibit certain biases in simulating ENSO evolution74,75, recent studies have demonstrated that they nevertheless provide sufficient physical information to enable deep learning models to achieve strong predictive skill16,76. For model evaluation and selection, reanalysis datasets from the Ocean Reanalysis System 5 (ORAS5) and ERA5 (1958–1978) are used as validation sets. To further evaluate the model’s generalization ability, it is tested using data from GODAS and ERA5 (1980–2021).

Before inputting the data into CTEFNet, a uniform preprocessing procedure is applied. First, monthly anomalies for each input variable are calculated by removing long-term trends and climatology, and this operation is performed separately for each CMIP6 model. The data are then standardized to a uniform spatial resolution of 1° × 2° through linear interpolation, covering the spatial domain from 60 °S to 60 °N in latitude and 0° to 360° in longitude. Grids corresponding to land areas (except for wind stress and sea level pressure) and missing data are assigned a value of zero. The processed fields are then normalized and concatenated along the layer axis to form datasets comprising nine layers.

The input data includes SST, HC, MLD, SSS, SLP, UO, VO, TAUU, and TAUV from the current and previous eleven months. These variables are combined in an overlapping manner, resulting in a data format of size [12 × 9 × 120 × 180], where the four dimensions represent the temporal duration of the input data, the number of variable types, and the latitude and longitude grids. The target variable for training CTEFNet is the Niño 3.4 index for the subsequent 24 months, with the corresponding data format being [24 × 1].

Architecture of CTEFNet

CTEFNet comprises two primary components: a CNN-based feature extractor and a Transformer spatiotemporal analysis module (Fig. 4). The CNN-based feature extractor performs multi-scale downsampling of input variables, capturing key regional spatial features. The Transformer module utilizes self-attention mechanisms and parallel processing to model multivariable relationships and long-range dependencies in sequential data. Unlike previous transformer-based models, which predict the entire forecast region21,22, CTEFNet directly predicts the Niño 3.4 index. This design enables early downsampling within the pipeline, optimizing computational efficiency. As a result, CTEFNet can process larger, global input data, improving ENSO prediction while enhancing sensitivity analysis of global multivariate patterns.

Fig. 4: Architecture of CTEFNet for ENSO predictions.
figure 4

CTEFNet consists of an input layer, a CNN-based feature extractor, a Transformer spatiotemporal analysis module, two fully connected layers, and an output layer. The input predictors include SST, HC, MLD, SSS, SLP, UO, VO, TAUU, and TAUV anomaly fields, all spanning 12 consecutive months in the region defined by (60°S–60°N, 0°–360°E). The Niño 3.4 index for the subsequent 24 months serves as the predictands for supervised training.

To extract the spatiotemporal features from the input variables, we employ a stack of three CNN-based blocks. Each block consists of two convolution layers, two batch normalization layers, two ReLU activation functions, and one global average pooling layer. The convolution operations, with their local receptive fields, enable the model to capture critical local information while minimizing global context noise. The hierarchical structure of the CNN blocks also facilitates the extraction of multi-scale spatial features, making them well-suited for focusing on specific regions. The Transformer, with its encoder-decoder architecture, excels in modeling spatiotemporal sequences. Its self-attention mechanism efficiently captures long-range dependencies across time steps and spatial locations, enabling the model to identify and leverage key factors that drive climate change, regardless of their position in the sequence.

Model training strategy

CTEFNet processes batches of input variables (batch size = 8), where each batch contains 12 consecutive months of data as predictors, and the Niño 3.4 index for the subsequent 24 months as the target predictands. The model is trained using a rolling prediction strategy21, with the RMSE of the Niño 3.4 index serving as the loss function to quantify the deviation between the predictions and the target values.

$${{Loss}}=\frac{1}{{T}_{\rm{out}}}\mathop{\sum }\limits_{t=1}^{{T}_{\rm{out}}}\sqrt{{({{Nino}}3.{4}_{t}^{{out}}-{{Nino}}3.{4}_{t}^{tg})}^{2}}$$

where \({\rm{Nino}}3.{4}_{t}^{\rm{out}}\) and \({\rm{Nino}}3.{4}_{t}^{tg}\) represent the output and target Niño 3.4 index, respectively, which are derived from normalized sea surface temperature anomalies at a depth of 5 m. An Adam optimization algorithm is employed to optimize CTEFNet during training, with a learning rate warm-up technique applied77,78, starting with an initial learning rate of 2 × 10−5. The computational cost associated with model training and inference is detailed in the Supplementary Information.

Gradient-based sensitivity analysis

Our sensitivity analysis is grounded in recent explainable artificial intelligence research in climate science. Previous studies35,52,53,54 have validated gradient-based explanation methods using synthetic benchmark datasets with known ground-truth attributions and real-world datasets. These studies demonstrate that gradient-based techniques can reliably recover feature relevance in controlled settings, providing both theoretical justification and empirical support for our approach.

Building on this foundation, we apply backpropagation to compute the gradient of the predicted Niño 3.4 index with respect to each input variable. These gradients quantify the local sensitivity of the output to inputs with backpropagation across layers of the neural network, enabling us to trace the precursors of ENSO events as a function of spatial location and lead time.

Specifically, we use the Niño 3.4 index from November of the ENSO years and subsequent months (when the index exceeds 0.5) within the valid period (1980–2021) as target values for prediction. Gradients are computed for each target value with respect to the corresponding inputs. The average gradient values across all target months are then used to determine the overall contribution of the input variables. This process is mathematically expressed as follows:

$${{Grad}}_{t}={\left.\frac{\partial {{Nino}}3.{4}_{t}}{\partial {{Inputs}}}\right\vert }_{{Input}{s}_{t}}$$
$${{AvgGrad=mean}}_{t}\left({{Grad}}_{t}\right)$$

where Nino3. 4t and Inputst represent the target predicted Niño 3.4 index and the input variables at target month t, respectively, Gradt denotes the gradient value of the input variables obtained at month t through backpropagation in CTEFNet, and AvgGrad represents the averaged gradients across all target months, reflecting the overall contribution of input variables to the Niño 3.4 index.