Introduction

In recent decades, the frequency, intensity, and duration of climate extremes have increased across many regions, largely in response to anthropogenic climate change1, culminating in 2024, which set a new global temperature record. More notably, it became the first calendar year to exceed 1.5 °C above pre-industrial levels, crossing the critical threshold identified in the Paris Agreement2. This milestone underscores the accelerating pace of global warming, as all ten warmest years on record occurred within the past decade. Multiple regions worldwide, including Antarctica3,4, experienced record-breaking temperatures in 2024. Across Europe, the 2024 summer mean temperature was the highest on record, with a temperature anomaly of +1.54 °C relative to the 1991–2020 climatological mean, exceeding the previous high of +1.34 °C from 20222. In southern Romania, heatwave conditions persisted for 63 out of 92 days (68% of the summer), making 2024 one of the most prolonged and intense heatwave seasons on record5.

Geopotential height at 500 hPa (Z500) reflects large-scale atmospheric circulation patterns, which are instrumental in steering weather systems and fostering conditions conducive to heat extremes. Persistent high-pressure systems, marked by elevated geopotential heights (i.e., atmospheric blocking), are closely tied with prolonged heatwaves6,7,8, extreme events typically defined as periods of at least three consecutive days with maximum temperatures exceeding the local 90th percentile9,10. Key large-scale circulation patterns, such as the North Atlantic Oscillation (NAO11) and the European blocking phenomena12,13,14 strongly influence temperature regimes across Europe. A positive NAO phase is typically associated with warmer conditions in northern Europe due to enhanced westerly flow and heat advection, while negative phases favor cold air incursions in southern Europe15. Blocking events often lead to clear skies and stagnant air masses, promoting extreme surface heating. These circulation-driven mechanisms are further modulated by interactions with surface conditions, including soil moisture deficits and land-atmosphere feedbacks, all of which contribute to amplifying temperature extremes in vulnerable regions16. For example, deficits in root-zone soil moisture have been reportedly shown to precondition the land surface for heat extremes by suppressing evaporative cooling and enhancing sensible heat fluxes16. This land-atmosphere feedback is particularly strong over the eastern part of Europe, where summer soils often enter a “soil-moisture-limited” regime17.

This study investigates the drivers of the exceptional summer heat in 2024 over Eastern Europe. We focus on the interplay between large-scale atmospheric circulation (represented by Z500), radiative forcing (represented by sunshine duration, SD), land-atmosphere coupling (reflected by the volumetric soil moisture, SM) and thermodynamic feedbacks (reflected in daily minimum temperature, TN) in modulating maximum temperature (TX) anomalies. We apply a novel approach that combines multivariate statistical analyses with causal inference methods, and regression models to better understand the physical interactions responsible for shaping temperature extremes in this region. These insights are crucial for improving climate resilience and adaptive strategies to mitigate the escalating risks of extreme heat.

Results

European and global context of summer 2024

The summer of 2024 was characterized by significant regional temperature anomalies. In Eastern Europe, most of the countries (e.g., Czech Republic, Hungary, Lithuania, Poland, Romania, Moldova, Ukraine, and Slovakia) experienced record-breaking summer temperatures (Fig. 1). Meanwhile, Antarctica also experienced near-record temperatures, up to +6 °C above average (Fig. S1). These extreme temperatures were observed across all metrics, including mean, minimum, and maximum daily temperatures (Figs. 1 and S1). The highest magnitude has been observed for the daily maximum temperature (Fig. 1f), which was more than +5 °C warmer compared to climatology, especially for the countries situated in the vicinity of the Black Sea (i.e., Romania, Moldova, and Ukraine). In addition to record-breaking temperatures, large-scale circulation and the surface radiation also reached record-breaking levels in summer 2024 (Fig. S2). Eastern Europe, in particular, experienced an exceptional increase in sunshine duration, (Fig. S2a), with up to +240 sunshine hours more compared to climatology, especially over Romania and Ukraine (Fig. S2b). Furthermore, the geopotential height at 500 mb (Z500) also exhibited record-breaking values over large parts of the mid-latitudes (Fig. S2c), with Z500 anomalies up to 80 m over the eastern part of Europe (Fig. S2d). ERA5-Land SM anomalies reveal that Eastern Europe entered summer 2024 with a record-breaking soil-moisture deficit, especially over Romania and Ukraine (Fig. S2e). SM values during summer 2024 were below −1.5 σ relative to the 1981–2010 climatology (Fig. S2f).

Fig. 1: State of the climate in summer 2024.
figure 1

a Ranking of 2024 (i.e., the mean from June to August) daily minimum temperature; b the daily minimum temperature anomaly in summer 2024; (c) as in (a) but for the mean summer temperature; (d) as in (b) but for the mean summer temperature; (e) as in (a) but for the maximum daily temperature and (f) as in (b) but for the maximum daily temperature. In (a, c, e) “1” means the warmest summer over the analyzed period (i.e., 1950–2024), and “2” means the second warmest summer. Rankings below five appear white. In (b, d, f), the anomalies are computed relatively to the climatological period 1981–2010.

Sunshine duration (SD), a proxy for surface solar radiation (SSR), directly affects TX by influencing the amount of incoming solar energy available for surface heating. Its variability depends on cloud cover, aerosol concentrations, and local meteorological conditions18. In summer 2024, Eastern Europe experienced record sunshine hours, with up to +240 additional hours above climatological norms (Fig. S2). Since the 1980s, Europe has experienced a marked increase in SSR, attributed to reductions in aerosol pollution and cloud cover during the so-called “brightening period”18,19. This solar brightening has been linked to a significant intensification of TX and an increase in the frequency of heat extremes, particularly during summer months20,21. Regional differences in SD trends, driven by variations in cloud dynamics and aerosol emissions, highlight the importance of considering these factors when assessing TX variability. The interaction between the large-scale atmospheric circulation and SD amplifies the complexity of TX-related extremes in Europe22 creating a feedback loop that enhances surface heating and exacerbates heat extremes23. The daily minimum temperature (TN) plays a significant role in determining the baseline for daily temperature fluctuations22. High TN values often exacerbate heatwave conditions by limiting nocturnal cooling, which amplifies the subsequent day’s maximum temperature7. Overall, the 2024 extreme temperatures had profound environmental and socioeconomic consequences, with the Mauna Loa Observatory recording a jump of 3.6 ppm in global CO2 level, reaching 427 ppm. Moreover, human-caused climate change added an average of 41 extra days of dangerous heat worldwide, exacerbating health risks and economic losses2.

Eastern Europe—the warming hotspot in summer 2024

Figure 2 shows a clear trend towards positive anomalies in TX, TN, Z500, and SD, and negative anomalies in SM and AOT, especially since the early 2000s. Most of the analyzed parameters (i.e., TX, TN, Z500, and SD) exhibited record-breaking values during the summer of 2024, particularly over the eastern part of Europe (Figs. 1 and S2). The temporal evolution of TX, TN, Z500, SM, and SD time series, averaged over the regions with record-breaking values (depicted by the gray box in Fig. 1a for TX, TN, SM, SD, and in Fig. S2c for Z500), underscores the exceptional nature of summer 2024 in terms of anomalies. The TX anomaly reached an unprecedented +3.61 °C, surpassing the 2023 anomaly by +1.9 °C (Fig. 2a) while TN reached record-breaking values with an anomaly of +3.0 °C (Fig. 2b). Atmospheric conditions were equally extreme, as Z500 anomalies soared by +40 m (Fig. 2c). During 1950-mid-1980s the SM time-series shows slightly positive anomalies, consistent with cooler TX/TN and weakly negative Z500 geopotential-height anomalies (Fig. 2d). From the late-1980s onward, however, SM turns persistently negative, reaching −0.05 m3 m-3 or lower in multiple summers after 2000, including in summer 2024 (Fig. 2d). Furthermore, the solar radiation also reached unprecedented levels, with more than 116 additional sunshine hours above climatological norms (Fig. 2e). Despite these record-breaking anomalies, summer 2024 was also marked by negative anomalies in aerosol optical thickness (AOT), although no records were broken for AOT during this period (Fig. 2f). Notably, since 2006, only positive anomalies have been recorded for TX, TN, Z500 (relative to the 1981–2010 climatological mean), and SD (relative to the 1983–2010 climatological mean). In the case of AOT, starting from 1994 onwards, only negative anomalies have been recorded (Fig. 2f). The ranking and anomalies of the AOT, shown in Fig. S3a and S3b, respectively, indicate that AOT was characterized by extremely low values in summer 2024, especially over the eastern part of Europe, where the highest amplitude of the negative AOT anomalies was observed, but no record-breaking observations have been detected (Fig. S3). Simultaneously over this region, low cloud cover (LCC) was reduced with ~20% in 2024 (Fig. S3d). Low clouds exert a strong cooling effect by reflecting solar radiation, and their decline enhanced solar absorption and reinforced warming24,25. This increased incoming solar irradiance directly raises surface temperatures during the daytime, leading to higher daily maximum temperatures. The total cloud cover (TCC) anomaly map for 2024 (Fig. S3f) closely follows the LCC anomaly pattern (Fig. S3c, d), confirming a substantial decline over Eastern Europe. The combination of reduced cloud cover, and low AOT led to an exceptionally clear-sky environment in summer 2024. With fewer low clouds persisting, the resulting increase in surface solar heating further intensified the anomalous high temperatures26,27,28. This combined decrease in low clouds and AOT implies a daytime positive feedback mechanism29 that most likely amplified the record-breaking warming observed in summer 202426,27,28.

Fig. 2: Climatic anomalies at the regional level.
figure 2

Summer anomalies averaged over eastern Europe (i.e., black square in Fig. 1a, c) for a the summer daily maximum temperature (TX); b the daily minimum temperature (TN), c the summer geopotential height anomalies at 500 mb level (Z500) averaged over the region [15°E – 45°E, 30°N 55°N], d the summer volumetric soil moisture (SM) averaged over the region [20°E – 35°E, 40°N – 55°N], e the summer sunshine duration (SD) averaged over the region [20°E – 35°E, 40°N – 55°N] and f) as in e) but for the aerosol optical thickness (AOT 550). The anomalies are computed relative to the climatological period 1981–2010 for TN, TX, Z500, SM, and AOT and relative to the climatological period 198–2010 for SD.

Fig. 3: Dominant coupled mode of variability of summer TX, on the one hand and Z500/SD/TN/SM on the other.
figure 3

The TX spatial structure of the first CCA pair (a) explaining 17.09% of variance and its coupled Z500 (b), 11.82% of variance explained), SD pattern (c), 13.14% of variance explained, TN (d), 18.10% of variance explained) and SM (e), 13.11% of variance explained). The correlation between their corresponding time series (f) is 0.99.

Coupled modes of variability

To investigate the coupled variability between summer maximum temperature on one hand and key influencing factors, we employed canonical correlation analysis (CCA)30, a multivariate statistical method which identifies pairs of coupled patterns of variability, maximizing the correlation between their associated temporal evolution (see “Methods” section).

Figure 3 shows the first CCA pair (i.e., CCA1) of TX on the one hand and summer Z500, summer SD, summer TN, and summer SM on the other hand. It highlights positive TX anomalies (Fig. 3a, ~17% of variance explained) over the eastern part of Europe, coupled with an extended high-pressure system centered over the eastern part of Europe and a low-pressure system over the central North Atlantic basin (Fig. 3b, ~12% of variance explained), as well as positive SD (Fig. 3c, ~13% of variance explained), and TN anomalies (Fig. 3d, ~18% of variance explained), and negative SM anomalies (Fig. 3e, ~13% of variance explained) over the eastern part of Europe. The year-to-year variations of the normalized temporal components of the first CCA pairs are shown in Fig. 3f. The two time series are significantly correlated (r = 0.99, 99% significance level), indicating a strong coupled link between their corresponding patterns. Both CCA1 time series present strong interannual variability during 1983–2024, along with an overall trend over the analyzed period. Notably, both time series show that the summers of 2007, 2012, and 2024 were extreme (i.e., dry and hot). Next to the 2024 event, both 2007 and 2012 summers were also characterized by long-lasting heatwaves and droughts and increased frequency of atmospheric blocking events, over Eastern Europe31,32,33,34,35.

Causality testing

CCA results suggest that the observed variability in summer TX over eastern Europe is mainly driven by a combination of both dynamical and thermodynamical factors, (e.g., TN, Z500, SM, and SD). However, the CCA method is based on correlation and therefore does not provide definitive information about causal links for its coupled patterns. To address this limitation, we use the Convergent Cross Mapping method (CCM)36 to test causality between these drivers (i.e., TN, Z500, SD, and SM) and the summer TX (see “Methods” for further details).

Figure 4 showcases the main causal directions we tested (Z500, SD, SM, and TN cause TX), as well as the corresponding causality maps cross-mapping is performed in the reversed direction of causality to assess the spatial distribution of causal influence. The cross-mappings from the TX index to Z500, SM, TN, and SD time series, increases abruptly in accuracy, quickly attaining a high level of statistically significant cross-map skill. This high level of convergence (around ρ = 0.8) manifests across all panels (Fig. 4a, c, e, g) and is indicative of a strong causal relation from Z500, SD, SM, and TN to the TX variable. Correspondingly, all four causality maps have high levels of CCM across the Eastern European block (Fig. 4b, d, f, h). The influence is less pronounced in Northern Europe and the far-west continent, together with the mid-latitude North Atlantic.

Fig. 4: Causality testing.
figure 4

Cross-map skill vs library size (red lines) showcasing statistically significant convergence levels for the cross-maps from TX to Z500 (a), SD (c), SM (e), and TN (g). Corresponding CCM maps (maximum cross-map skill vs lon/lat coordinates) are shown in (b, d, e, f, h). Statistical significance is assessed with two surrogate models: Ebisuzaki (light gray areas in (a, c, e, g) and bootstrap (dark gray areas in (a, c, e, h) at the 95th significance levels. Marked grid points in the CCM maps represent statistically significant cross maps at that grid point under the Ebisuzaki model. For all CCMs (graph and map), an embedding dimension of E = 5 is used and lags between –4 and 0, consistent with an in-phase causal signal nearly concurrent at monthly resolution.

Figure S4 contains the reversed directions of causality. The bi-directionality of the causal signal is usually investigated using time delays in the CCM calculations, a method dubbed here Time Delay CCM (TDCCM, see Methods). For example, X may influence Y with a lag of l = l1 years (in which case a peak at l = −l1 appears in the Y xmap X TDCCM) and Y may influence X with a lag of l = l2 years (corresponding to a peak at l = −l2 in the X xmap Y TDCCM representation). This is the case for the Z500-TX, SM-TX, and TN-TX pairs (Figs. 4 and S4). Convergence is registered in both directions in these three pairs, meaning a causal-feedback installs between the two variables. Together, the CCM results of Figs. 4 and S4 reveal four forcing factors of the maximum temperature index (TX), namely Z500, SD, SM, and TN.

The contributors to TX variability are further highlighted by the results of a simple regression model, where TX is treated as the predictand, and Z500, SM and SD, and TN serve as predictors (Fig. S5). This analysis is based on the time series presented in Fig.2. In this context, Z500 represents the effect of large-scale atmospheric circulation, SD reflects the impact of solar radiation, SM reflects the land-atmosphere coupling, while TN can be interpreted to some extent as a proxy for the influence of air mass. The regression model fit (red line) is shown in Fig. S5 alongside observed TX anomalies (black line). Over the common analysis period (1983–2024), the regression model attributes 37.4% of the TX variability to TN, followed by 26.2% to Z500, 20.5% to SD, and 15.9% to SM.

Discussion

The summer of 2024 stands out as the warmest on record, with Eastern Europe emerging as a hotspot of extreme heat anomalies (Figs. S1, S2 and S6, respectively). From a climatological perspective, persistent high-pressure systems, such as atmospheric blocking, are the primary natural drivers of extreme summer temperatures, as they are associated with clear skies and increased solar radiation8,13,37,38,39,40. Long-lasting blocking patterns, often linked to stationary Rossby waves, are widely recognized as major contributors to European summer heatwaves8,14. Beyond atmospheric blocking, deficits in root-zone soil moisture have repeatedly been shown to precondition the land surface for heat extremes by suppressing evaporative cooling and enhancing sensible-heat fluxes17. However, the role of SD and TN in modulating TX variability has received comparatively less attention20,22,41. In this study, we focus only on the variables that show record-breaking values over the eastern part of Europe in 2024: TX, TN, Z500, SM, and SD. While previous studies have investigated atmospheric circulation anomalies or land–atmosphere interactions during heatwaves over Europe42,43,44, our study presents a novel integration of multivariate statistical analysis (CCA and regression model) with causal inference methods (CCM and TDCCM). This combination enables a robust, observational data-driven identification of co-varying patterns and causal pathways across atmospheric dynamics, surface conditions, and radiative processes. In contrast to earlier work43,45,46, which often focused on a single driver or relied solely on correlation-based diagnostics, our approach uncovers the directionality and temporal evolution of the interactions between these coupled variables, quantifying a compound influence behind the extreme heat of 2024.

The Z500 spatial structure linked with the 2024 extreme temperatures indicates that the high-pressure system over the eastern part of Europe (letter “H” in Fig. 3b) leads to the advection of warm and dry air over the analyzed region reducing cloud cover and increasing incoming solar radiation, which in turn favor the occurrence of extremely high temperatures and dry soils, affecting both the TX and TN. Higher TN values may result in warmer morning conditions, shortening the time required for solar radiation to warm the surface and consequently driving an increase in TX. This phenomenon becomes especially significant during heatwaves, where prolonged clear skies and arid conditions enhance daytime heating7. Studies based on climate model simulations confirm that surface warming enhances convective activity, lifting clouds to higher altitudes, thereby emitting less infrared radiation to space, reducing outgoing longwave cooling and amplifying surface warming47,48. A decrease in aerosol concentrations leads to a reduction in cloud condensation nuclei, which can result in fewer, larger cloud droplets, ultimately reducing cloud albedo and cloud cover49,50. Consequently, negative AOT anomalies may have contributed to changes in cloud cover in summer 201451,52. Our study also underscores the interplay between large-scale atmospheric dynamics (represented by Z500) and small-scale thermodynamic contributions (represented by SD) in shaping extreme heat events and record-breaking temperatures. The observed drying and the sharp warming in both TX and TN warming after the 2000s coincide with a regime shift to positive Z500 anomalies that indicate more frequent blocking highs, and a step-like rise in SD linked to reduced AOT. These results are also in good alignment with prior studies linking reduced cloudiness and aerosol concentrations to increased summer temperatures in Europe18,22,53. Furthermore, reduced TN, often linked to insufficient nighttime cooling, accelerates soil drying through increased evapotranspiration during the day. Drier soils limit the partitioning of incoming solar energy into latent heat (via evaporation), thereby increasing sensible heat flux and further elevating TX16. A higher TN implies a shallower nocturnal inversion, weaker radiative cooling, and more turbulent kinetic energy at sunrise. Consequently, less sensible-heat input is needed to break the inversion and the daytime boundary layer deepens earlier, favoring a higher TX54.

CCM and TDCCM point to strong, statistically significant causal links from Z500, SD, SM, and TN to TX, with skill values exceeding 0.8. Notably, bidirectional causality exists between TX and both Z500 and TN, indicating that feedback mechanisms—particularly involving nighttime warming and atmospheric dynamics—may intensify heat extremes. While unidirectional causality from SD to TX emphasizes the amplifying role of radiative forcing, the feedbacks between TN and TX suggest that nocturnal warming limits nighttime cooling, accelerating surface warming the next day. Additionally, dry soils and reduced evapotranspiration during such events further enhance surface heating, reinforcing the atmospheric ridge and maintaining elevated geopotential heights16. TDCCM results indicate that the causal influence from Z500, SD, and soil moisture on temperature anomalies occurs with minimal lag within the monthly resolution of the analysis. However, we caution that this apparent simultaneity reflects the coarse temporal scale of the data and does not exclude the possibility of delayed or cumulative effects operating at finer temporal resolutions or across seasons. The regression model shows a high explanatory power, with an adjusted multiple R² of 0.99, a mean absolute error (MAE) of 0.14 °C, and a root mean square error (RMSE) of 0.18 °C. It successfully captures the primary features of interannual variability and reproduces the observed upward trend in TX over the period 1983–2024. The combined influence of TN, Z500, SM, and SD accounts for 99.1% of the TX anomaly observed in 2024 (i.e., 3.61 °C observed vs. 3.69 °C predicted by the model). The residuals are approximately Gaussian, and diagnostic plots reveal no significant violations of the fundamental assumptions underlying the regression analysis. Over the common analysis period (1983–2024), the regression model attributes 37.4% of the TX variability to TN, followed by 26.2% to Z500, 20.5% to SD, and 15.9% to SM.

This study employs an intricate combination of statistical and dynamical methods to disentangle the causal interplay between atmospheric circulation, surface radiative forcing, and soil moisture deficits, which fueled the extreme summer temperatures that affected Eastern Europe in 2024. We find that high-pressure anomalies over Eastern Europe coupled with enhanced SD, depleted soil moisture, and elevated nighttime temperatures, produced a persistent/compound warming signal. Causal analysis confirms that these drivers not only co-vary with maximum temperature, but also exert directional influence, with evidence for feedbacks between atmospheric dynamics and nocturnal warming. The regression model constructed from these variables successfully reproduces the full magnitude of the 2024 heat anomaly. We highlight caution and the need to consider multiple, interacting drivers when diagnosing or predicting extreme temperature events. As land–atmosphere coupling intensifies under man-made warming, the ability of climate models to resolve such compound mechanisms becomes critical for improving seasonal forecasting and informing adaptation strategies. The combined statistical–causal framework presented here provides a framework that can be applied to other extreme events and regions, offering both mechanistic insight and predictive value for future climate risk assessments.

Methods

Data sources

The daily minimum, mean, and maximum temperature, as well as the monthly geopotential height at the 500 mb level, the potential evaporation, the LCC and the TCC have been extracted from the European Center for Medium-Range Weather Forecasts Reanalysis version 5 (ERA5) dataset55. The ERA5 global reanalysis dataset, developed by the European Center for Medium-Range Weather Forecasts, offers a comprehensive climate record from 1940 to the present. It provides high-resolution hourly data on atmospheric, land surface, and oceanic wave variables at a spatial resolution of 31 km. ERA5 has several advantages: (i) it offers high temporal and spatial resolution, providing hourly estimates at a 31 km spatial resolution, which is valuable for detailed climate and weather analysis; (ii) the dataset spans over 85 years (at the time of the writing of this manuscript), offering extensive historical climate records useful for trend analysis and model calibration, and (iii) it incorporates modern data assimilation techniques, including four-dimensional variational assimilation (4D-Var), leading to more accurate reconstructions of past weather. Overall, ERA5 includes a 10-member ensemble that allows for uncertainty quantification, enhancing reliability in climate applications. It assimilates multiple data types, including radiosondes, aircraft reports, satellite observations, and in-situ measurements, ensuring comprehensive coverage. Improvements in surface pressure bias correction and tropical cyclone representation have enhanced data consistency. However, ERA5 also has certain limitations. Due to the varying availability of observations, data accuracy is higher in later years (post-1979, when satellite data became available) and less reliable in earlier periods, especially over the Southern Hemisphere. The absence of upper air observations before 1946 results in a cold bias in lower stratospheric temperatures55,56. While improvements have been made, soil moisture still undergoes long spin-up adjustments, which may affect early hydrological estimates. Moreover, ERA5 uses prescribed sea-surface temperature (SST) datasets, which may introduce biases, particularly in earlier periods when SST measurement techniques were less refined.

We use SD data from the SSR Data Set—Heliosat (SARAH-3) dataset57. SARAH-3 is the latest edition of a satellite-based climate data record that provides SSR parameters from 1983 onward with high spatial (0.05° × 0.05°) and temporal (30-min, daily, and monthly) resolution. It delivers seven key parameters—global radiation (SIS), direct irradiance (SID), direct normal irradiance, sunshine duration (SD), photosynthetically active radiation (PAR), daylight (DAL), and effective cloud albedo (CAL)—using an improved Heliosat retrieval scheme that incorporates the HelSnow algorithm to better distinguish snow from clouds, thereby enhancing accuracy and temporal stability. Validation against high-quality ground-based networks such as Baseline Surface Radiation Network and Global Energy Balance Archive shows small mean biases (around ±2 W m-2 for monthly means) and high and significant correlations (exceeding 0.9), confirming its suitability for climate monitoring and model evaluations. The SARAH-3 dataset is limited to the region 65°S to 65°N and 65°W to 65°E. For the current study, we made use of the monthly SD.

The AOT is extracted from the Modern-Era Retrospective analysis for Research and Applications, Version 2 (MERRA-2). The MERRA-2 AOT at the 550 nm wavelength included in this study is obtained from a high-resolution reanalysis dataset (i.e., 0.5° × 0.625°, ca. 50 km in the latitudinal direction, 72 model layers up to 0.01 hPa). The MERRA-2 reanalysis covers the period from January 1980 up to the present. Developed by the National Aeronautics and Space Administration’s Global Modeling and Assimilation Office (NASA GMAO), MERRA-2 is a comprehensive atmospheric reanalysis that spans from 1980 to the present58,59. It integrates a wide array of observational data, including satellite-based aerosol observations, to provide detailed insights into aerosol distributions and their interactions within the Earth’s climate system. A distinctive feature of MERRA-2 is its assimilation of bias-corrected AOT measurements from various instruments, such as the Advanced Very High-Resolution Radiometer, Moderate Resolution Imaging Spectroradiometer aboard the Aqua and Terra satellites, and the Multi-angle Imaging SpectroRadiometer. This assimilation process ensures that the reanalysis is well-constrained by observations, enhancing the accuracy of aerosol representation58,60.

Anomalies are computed relative to the 1981–2010 World Meteorological Organization reference period for all variables except SD, which uses 1983–2010 due to the availability window of the SARAH-3 dataset. A detailed description of all abbreviations used in the paper can be found in Table S1.

Canonical correlation analysis

CCA is a multivariate statistical technique used to identify pairs of patterns with maximum correlation between their associated time series30 based on the distinction between time evolutions of patterns (where the time series of consecutive pairs are uncorrelated). In other words, CCA determines the extent to which two structures, each associated with a variable, are linked. In this study, we arbitrarily chose to represent TX as a univariate field and Z500, SD, TN, and SM as a combined multivariate field. This approach was guided by our goal of identifying large-scale atmospheric and radiation patterns maximally coupled with extreme maximum temperatures. Therefore, while CCA does not assign predictor or predictand roles in a strict statistical sense, we structured the analysis to explore how variability in TX co-evolves with concurrent anomalies in Z500, SD, TN, and SM. Mathematically, CCA identifies two sets of vectors (one vector for TX and one for Z500, TN, SD, and SM) in a way that the correlations between the projections of the variables onto these vectors are mutually maximized. In order to avoid degeneracy of the covariance matrix, it is recommended to reduce the number of degrees of freedom prior to CCA61. Therefore, in CCA, the new variables introduced are the time series of EOFs, with an equal number of eigenmodes for each variable. Here, we reconstructed the initial fields (TX/Z500, SD, TN, and SM) based on the first 10 EOF modes, which explain more than 70% of variance in each field30. CCA has been previously successfully used in climate research to identify links between large-scale patterns in temperature, clouds and other key climate variables62,63.

Regression analysis

Multiple linear regression models are constructed using a stepwise forward approach64 to allow for a reduction of the number of explaining variables for models with many explaining variables presented to the initial model. Model quality is assessed using different metrics such as MAE, RMSE, and the coefficient of determination or adjusted R2 65. A detailed description of all metrics is given in the supplementary file (Table S2). Additionally, diagnostic plots (not shown), including residuals vs. fitted values, Q–Q plots, scale-location plots, and residuals vs. leverage, are examined to identify potential violations of fundamental model assumptions. The regression equation is:

$${\rm{Y}}={\rm{\beta }}\_0+{\rm{\beta }}\_1{\rm{x}}\_1+{\rm{\beta }}\_2{\rm{x}}\_2+\cdots +{\rm{\beta }}\_\mathrm{nx}\_{\rm{n}}+{\rm{\varepsilon }}$$
(1)

Where Y represents the TX index, βo, β1, β2,…βn are constants determined by the least squares procedure, x1, x2,…xn the predictors used (i.e., TN, Z500, SM, and SD) and ε is the error.

In stepwise regression, each predictor is prioritized, taking into account its correlation coefficient with the predictand, and is added to the model gradually. As the predictors are added, the F statistic is used to determine whether or not they are significant for the final regression equation (F statistics are set to 0.05 and 0.1, respectively). We choose stepwise regression because it prioritizes predictors based on the partial correlation, and it is likely that high and significant correlations will reflect underlying physical processes. The final equation of the model in our study is given by:

$$\begin{array}{l}\mathrm{TX}=-0.009565458+{\mathrm{TN}}^{* }0.87+{{\rm{Z}}500}^{* }0.004717385\\ +{\mathrm{SD}}^{* }0.010034+{\mathrm{SM}}^{* }-9.59\end{array}\,$$
(2)

To ensure a robust estimation of model skill, we applied three resampling schemes, namely: Leave-one-out (LOOCV), Five-fold CV and 10-fold CV (see supplementary file for more details)66,67. The near-identical performance across LOOCV, 5-fold and repeated 10-fold splits indicates that the high apparent R² is not the artifact of extreme leverage points, and that the fitted coefficients are stable with respect to resampling. From a modeling perspective, this small shrinkage suggests that collinearity among the four predictors does not inflate overfitting to a problematic degree. Regardless of fold size, the model retains >98% of its explanatory power when confronted with unseen data (see supplementary file), and the average loss of fit is <0.6% of variance.

Convergent cross mapping

In order to also assess the spatial distribution of the causal signal from the three drivers (TN, Z500, and SD), we perform CCM36 from the affected variable time series (TX) at each grid point to the TN, Z500, and TX. On the map, we plot the highest level of (statistically significant) cross-map skill. This way, we obtain a map of causality that complements the regression map. While the causality map tells us if there is a causal signal at that grid point, the regression map gives us an approximate sign of this causal signal (assuming a linear causal relationship). The CCM map is a way to test if the regression map is causal or just a spurious correlation. The lags chosen are the maximum-ρ lags within the embedding vector τ(E − 1) ≤ l ≤ 0 (meaning we assume in-phase causation at each grid point). Dashed points on the map represent statistically significant and convergent cross-map skills.

CCM is a technique derived from dynamical systems theory, used to detect causal relationships in time series data by constructing phase spaces based on time-delay embedding. If variable X influences variable Y (X → Y), it is possible to estimate the states of X (the cause) using Y (the effect), as deterministic systems encode information about causes within their effects. This approach relies on Takens’ embedding theorem, which asserts that the dynamics of a system can be reconstructed from time-lagged observations of a single variable. By creating a multidimensional phase space from Y, it becomes feasible to cross-map X, with the accuracy of the predictions reflecting the causal information embedded in Y. Importantly, when X causes Y, the cross-mapping is performed from Y to X. To ensure that the cross-mapping is not driven by coincidental correlations, it is crucial to establish convergence. Convergence demonstrates that cross-map skill, measured by the Pearson correlation between predicted and actual time series, improves as the library length (amount of data used) increases. This improvement, which approaches a limit as the library length grows, signifies the effective extraction of causal information. Thus, the combination of cross-mapping and convergence serves as a robust criterion for causality36.

In many cases, causality involves time delays, so performing cross-mapping with a lag can optimize the detection of causal signals. By keeping the library length fixed and varying the lag, one can determine the lag that maximizes cross-map skill, a process known as time-delayed convergent cross mapping (TDCCM)68,69. Because prediction occurs from effect to cause, the optimal lag should be negative; finding a peak at a positive lag would contradict the principle that causes precede effects and could indicate spurious correlations or unidirectional slave dynamics. After identifying the optimal negative lag using TDCCM, convergence is verified through CCM at that lag. Selecting the embedding dimension (E) follows a similar process, where cross-mapping is performed across different values of E to identify the one that maximizes cross-map skill. Once the optimal embedding dimension is determined, TDCCM is carried out, followed by CCM at the identified lag. TDCCM results show two close peaks at negative lags for these pairs. However, these lags fall within the embedding vector τ(E − 1) ≤ l ≤ 0 (where τ = 1 is the embedding lag, E = 5 is the embedding dimension, and l is the causal lag), showing an in-phase causal signal from Z500, SM, and TN to TX. The TX xmap SD link also shows convergence and statistical significance (Fig. 4c) and the TDCCM representation (Fig. S4d) would imply at first glance a bidirectional causal signal. However, SD is less likely to be influenced by TX. This asymmetry in the causal signal is revealed in the shape of the convergence curve in the reversed direction of causality, TX causes SD (SD xmap TX, Fig. S4c).