Introduction

As the effects of human-induced climate change emerge more and more distinctly, formerly extreme conditions are rapidly becoming the new normal. Nonetheless, with 12 consecutive months of record-breaking global mean temperatures (GMT), each exceeding 1.5 °C above pre-industrial levels, 2023/24 was beyond exceptional. The temperature anomaly peaked in September 2023 at 1.8 °C, marking an unprecedented temperature jump of 0.59 °C (GISTEMP1; Copernicus: 0.58 °C2; HadCRUT: 0.56 °C3) compared to September 2022. Following the onset of an El Niño event in May 2023, the scientific community had expected warm temperatures in late 2023, but the heat — in its extraordinary magnitude, persistence and geographic extent — far exceeded predictions4. Peak heat occurred surprisingly early in September, at a time of the year typically characterized by low global interannual variability, and sooner than the temperature impacts of El Niño usually manifest.

And yet, a key question remains: Could the unusual heat and particularly the temperature jump from 2022 to 2023 simply represent an extreme manifestation of internal variability, or was it also amplified by external forcings? In addition, the lack of comparable temperature anomalies simulated by climate models participating in the Coupled Model Intercomparison Project Phase 6 (CMIP6) has raised concerns about their ability to accurately capture long-term warming and interannual variability5. Understanding the relevant processes that drove the 2023 heat anomaly is crucial to determine whether this event was a mere outlier or signals a permanent shift in the climate system, such that similar heat events might occur more frequently from now on. Analyses so far have linked the event to a distinct peak in the Earth’s energy imbalance6,7,8, which was larger than the positive long-term trend documented since the early 2000s9,10,11, but the relative roles of internal variability versus external forcing in producing this record-high energy imbalance remain uncertain.

Previous work assessing the contribution of internal variability to the 2023 temperature spike has identified the transition from a three-year La Niña to an El Niño phase12 as a key driver13,14, suggesting that the 2023/24 heat event represents an unusual, but not unprecedented manifestation of internal variability15,16. Consistently, the 2023–24 jump in ocean temperature was reported as unlikely but not unpredictable17. However, other studies highlight that elevated global temperatures due to El Niño typically exhibit a greater lag than observed in 202318,19, or assign only a modest temperature increase to this natural climate fluctuation7. Beyond the El Niño Southern Oscillation (ENSO), the ascending phase of the 11-year solar cycle may have marginally contributed to the warming18, though its overall impact is generally considered negligible7,20. Additionally, analyses based on large ensemble climate model simulations find no instances of September temperature records being broken by a margin comparable to that of 202321. Together, these lines of evidence indicate that internal variability alone is unlikely to account for the extraordinary heat of September 2023.

Among the external factors that could drive changes in radiative forcing beyond greenhouse gases (GHGs), elevated stratospheric water vapour levels after the Hunga Tonga Ha’apai eruption in January 2022 have been proposed22,23, although more recent findings conclude that this effect was outweighed by enhanced radiation attenuation of the volcano’s sulfur aerosol plume24. Another widely discussed potential driver is the 2020 International Maritime Organization Low Sulphur Regulation (IMO2020) and the resulting decline in sulfur aerosol concentrations25,26,27,28,29,30,31,32,33. Estimates of the associated temperature effect generally range from 0.03 to 0.08 °C, although substantially higher values up to 0.2 °C have been suggested5. Declining aerosol concentrations might also have contributed to the observed reductions in low-level cloud cover and a record-low planetary albedo in 2023/24, with an associated temperature impact similarly estimated at around 0.2 °C7.

In this study, we investigate whether the extreme global heat in 2023/24 was truly unpredictable, or whether temperatures fell within the realm of expectation given the current level of anthropogenic warming and internal climate variability as simulated in climate models. By combining extreme event attribution techniques with a regional and process-based analysis, we find that the 2023 September temperature jump was nearly impossible under standard anthropogenic forcing, but reaches a low, yet non-negligible probability when accounting for additional external drivers beyond those routinely represented in climate models. Land regions contributed disproportionately to the heat, which was associated with an unusual combination of shortwave (SW) forcing and water vapour feedback. CMIP6 simulations generally do not reproduce temperature jumps as large as the 2023 event; however, a climate model simulation constrained by observed surface ocean conditions and atmospheric circulation closely captures the land warming pattern, suggesting that models can generate such extremes given realistic boundary conditions. Comparable temperature jumps are projected to become gradually more likely under further warming, yet a purely internally driven event of similar magnitude remains unlikely in the coming decades.

Results

The temperature jump in September 2023 was effectively unpredictable

From July 2023 through June 2024, each month set a new record in global temperature (Fig. 1a). To pinpoint when and over which time span the global heat was most pronounced, we conduct probabilistic attribution for each individual month and all possible multi-month windows within this period. Based on 143 years of observational data (1880–2022), September 2023 stands out as the most statistically extreme month, both when considering monthly temperature anomalies (Supplementary Fig. S1a) and year-to-year temperature jumps (Supplementary Fig. S1b). Given its exceptional nature, all subsequent analyses focus on this month.

Fig. 1: Occurrence probabilities for the 2023 global heat and temperature jump under current and additional global warming.
Fig. 1: Occurrence probabilities for the 2023 global heat and temperature jump under current and additional global warming.The alternative text for this image may have been generated using AI.
Full size image

a Observed monthly global mean temperature (GMT) during 1950–2023 from GISTEMP, displayed as anomalies with respect to 1850–1900. The arrows mark the absolute temperature anomaly (light green) in September 2023 and the temperature difference to September 2022 (dark green). b Zero-centered generalized extreme value (GEV) distribution for September temperature anomalies from GISTEMP (thick grey line), with the 95% uncertainty range shown by the thin grey lines. Histograms display the distribution of residuals for 1880–2022, i.e., the difference between observed temperature and the conditional estimate given GMT and ENSO for the respective year based on observations. The residual of the year 2023 is indicated by a vertical red line. c Same as b but for year-to-year temperature jumps. d Future occurrence probability estimates for the absolute temperature anomaly of September 2023 in CMIP6 models under SSP5-8.5. Years are shown relative to when each model reaches the GWL observed in 2023; each model’s calendar is shifted so that the year corresponding to this GWL is set to 2023. Boxplots show the probability estimate per climate model (mean over all ensemble members). The distribution fit is obtained using 1850–2099 temperature data. Note that the horizontal axis increments by individual years up to 2043, and by decades thereafter. Boxplots show the median, quartiles, and whiskers extending to 1.5 × the interquartile range (IQR); points beyond are shown as outliers. e Same as d but for the temperature jump between September 2022 and 2023.

We assess the extremeness of the September 2023 GMT anomaly using multiple approaches. Statistical modelling of the event using different extreme value distributions fitted to the historical observational record suggests an occurrence probability near zero (Fig. 1b). Applying the same probabilistic analysis to CMIP6 model data supports these findings, yielding vanishingly small likelihoods of the September 2023 temperature anomaly. As a consistency check, we additionally applied simple count statistics in place of statistical distributions to identify instances where CMIP6 simulated temperatures exceed the observed anomaly. This approach shows that CMIP6 models generally do not produce temperature anomalies comparable to those observed in September 2023 at the corresponding global warming level (GWL) (Supplementary Fig. S2a). However, even modest additional warming leads to a rapid increase in the probability of such events. Under SSP5-8.5, the probability rises from near zero in 2023 to 5% by 2028 and reaches 60% by 2034 (Fig. 1d), with little scenario dependence before 2050, as pathways only diverge markedly afterwards34. This suggests that, within the next decade, heat events of comparable intensity to September 2023 may occur biannually.

Similarly, the year-to-year temperature increase from September 2022 to 2023 is extremely unlikely under current warming: Again, observation-based estimates indicate near-zero probabilities (Fig. 1c; GISTEMP: 0 [0–102], HadCRUT: 10−4 [0–10−3]); the best estimate based on the CMIP6 ensemble (median over all models) is 10−6, and even the upper end of the uncertainty range (97.5th percentile) remains lower than 1% for 44 out of 49 models. Similar temperature jumps are also virtually absent in CMIP6 simulations at the 2023 global warming level (GWL; Supplementary Fig. S2b). In contrast to the absolute temperature anomaly, the likelihood of such a large temperature jump remains extremely low over the next decades (Fig. 1e), and does not exceed 1% even towards the end of the 21st century. To screen for biases in our statistical framework and in CMIP6 simulations, we employ extreme value distributions to estimate occurrence probabilities of earlier record-breaking jumps. The GEV-derived likelihoods for past events match empirical probabilities in both observations and CMIP6 models reasonably well (Supplementary Fig. S3), indicating that our approach does not systematically underestimate the rarity of record-breaking temperature jumps. Thus, at current warming levels and based on well-established probabilistic extreme event attribution techniques, the September 2023 temperature anomaly was effectively unpredictable. However, under higher warming levels, the GMT anomaly quickly becomes less unusual, whereas the extreme jump from 2022 to 2023 remains highly unlikely. For this reason, the remainder of the analysis concentrates on understanding the drivers and mechanisms behind the September 2023 temperature jump, rather than the temperature anomaly itself.

Next, we examine the spatial fingerprint of the unprecedented heat through regional contributions to the global temperature spike for ocean and land areas in the tropics (latitudes ≤ 23.5°) and extratropics. Although land covers only one-third of Earth’s surface, it accounted for more than half of the observed temperature jump (Fig. 2a), exceeding its contribution to the observed long-term global warming. Consistently, tropical and extratropical land show very low exceedance probabilities for September 2023 based on observational data (Fig. 2b). When the same analysis is applied to the CMIP6 ensemble, extratropical land stands out even more prominently (Fig. 2c). In contrast, ocean regions appear comparatively less anomalous during this month, especially in light of the concurrent El Niño event (Fig. 2b and c). While a more fine-grained regional analysis might reveal additional insights for more localized temperature extremes, such as those in the North Atlantic35, our analysis underscores that global probabilities remain lower than any regional estimate, highlighting that the event’s broad spatial extent — rather than isolated hotspots — is what made it particularly anomalous. Moreover, despite the extremely low probability estimates, our findings are not only consistent between observational and modelled data, but remain remarkably robust to the choice of temperature datasets and variations in the attribution framework design, including sample size and distribution choice (Fig. 2c, Methods).

Fig. 2: Regional contributions to the September 2023 temperature jump.
Fig. 2: Regional contributions to the September 2023 temperature jump.The alternative text for this image may have been generated using AI.
Full size image

a Area-weighted contributions of different regions to the September 2023 temperature jump based on GISTEMP. Tropical regions are defined as latitudes between 23.5° S and 23.5° N; extratropical regions lie outside this band. b Same as Fig. 1c but for regional temperature jumps. c Occurrence probabilities of the September 2023 temperature jump obtained for different datasets and using multiple distributions.

Drivers of the 2023 global heat build-up

Since the temperature jump in September 2023 was effectively unpredictable based on past records of heat extremes and yet occurred, it raises the question of which processes and interactions drove this unprecedented event. Conceptually, such an extreme temperature anomaly must be associated with unusually strong heat transfer to the atmosphere. We thus employ reanalysis data to examine the evolution of global mean temperature alongside four key drivers of the surface energy budget: downwelling SW radiation and water vapour, closely linked to downwelling longwave radiation, indicate the available energy at the surface, whereas soil moisture governs the partitioning between latent and sensible turbulent heat fluxes over land, and the NINO3.4 index represents the influence of ENSO.

In the months preceding September 2023, all four drivers shifted towards conditions conducive to elevated temperatures (Fig. 3a): a continuous increase in SW radiation and atmospheric water vapour, the onset of an El Niño event, and declining soil moisture levels (especially in the tropics). This implies that no single driver accounts for the September 2023 temperature jump; instead, it arose from a combination of forcings and ensuing feedbacks. Next, we use short-term fluctuations in these drivers as predictors to quantify their respective contributions to the September 2023 jump at each grid cell (detailed in “Process-based analysis”). Our multiple linear regression model represents the warming pattern reasonably well (Fig. 3b), demonstrating that the chosen drivers can predict most of the observed temperature evolution (R2 = 0.79). Given that we employ a simple approach to model a complex system with intricate feedbacks and interactions, we cannot claim direct causation, but the selected variables either drive temperature themselves or closely relate to mechanisms that do. Some warming remains unexplained, particularly in the tropics, which might hint at an unusually strong and early ENSO influence. Additional unexplained warming is expected due to the use of local, concurrent predictors, which do not account for heat advection or antecedent temperature anomalies. Each driver’s contribution to the temperature change depends on the strength of its anomaly and local temperature sensitivity, both of which vary considerably across regions (Supplementary Fig. S4). For example, the response of surface temperature to changes in ENSO is strongest over the tropical ocean, as expected, whereas sensitivity to incoming SW radiation is generally higher over land. Aggregating these contributions over land and ocean in the tropics and extratropics highlights a pronounced contribution of water vapour, especially in the extratropics (Fig. 3c). However, since tropospheric water vapour content increases with warming, this strong signal likely reflects not only direct longwave forcing, but also the atmospheric memory of cumulative effects from preceding SW forcing and El Niño-induced elevated sea surface temperature (SST) anomalies.

Fig. 3: Observation-derived and simulated drivers of temperature jumps.
Fig. 3: Observation-derived and simulated drivers of temperature jumps.The alternative text for this image may have been generated using AI.
Full size image

a Global mean temperature (GMT, red shading), total column water vapour, downwelling shortwave radiation, soil moisture, and the NINO3.4 index during 2023, all based on ERA5 and expressed as standardized anomalies relative to 1980–2023. b Absolute GMT anomaly in September 2023 from the ERA5 reanalysis (left) and predicted anomaly based on grid cell linear regression (right). c Regional averages of contributions from different predictors to the GMT jump in September 2023. d Global contributions of the same predictors to 40 extreme temperature jumps simulated in a subset of CMIP6 models with more than 10 ensemble members available (n = 6) at the 2023 GWL. Boxplots show the median, quartiles, and whiskers extending to 1.5 × the IQR; points beyond are shown as outliers.

Finally, we examine how the roles of different drivers inferred from observations compare to those simulated in CMIP6 models under comparable background warming (Fig. 3d), that is, for the most extreme September jumps produced at the 2023 GWL (1.2 °C, using a ± 0.5 °C window); see “Process-based analysis” for details. Across drivers, contributions derived from the ERA5 reanalysis generally align with the CMIP6 ensemble. The main exception is downwelling SW radiation, for which ERA5 values exceed the CMIP6 range, particularly over extratropical oceans (Supplementary Fig. S5). This discrepancy is compatible with recent studies linking the 2023 heat to reduced aerosol loads (e.g., due to IMO20205,30,28), which impact mostly mid- and high-latitude oceans, and declines in low-level cloud cover7. Although CMIP6 models generally fail to reproduce the September 2023 temperature jump under current warming levels, their sensitivity towards water vapour, SW radiation, soil moisture, and ENSO, as well as the variability in these drivers, is consistent with reanalysis data. Based on this general agreement between reanalysis data and CMIP6 models (Fig. 3d), we hypothesize that in years with a strong temperature contribution from internal variability, such as during a transition from La Niña to El Niño, a relatively small, externally forced perturbation in one or more drivers (particularly incoming SW radiation) would be sufficient for CMIP6 models to produce temperature jumps of comparable magnitude as observed in September 2023.

To estimate the contribution of SW forcing required to reconcile observations with models, we construct a ’SW-adjusted’ jump by reducing the observed SW contribution to the CMIP6 median for extreme temperature jumps (i.e., the distance between the orange cross and the SW boxplot median in Fig. 3d). This yields an adjusted jump of approximately 0.51 °C, suggesting that roughly 0.07 °C of the total jump may stem from excess SW forcing. The associated likelihood of this SW-adjusted jump increases from 10−6 to 0.1% (Fig. 4a) — still rare, but no longer negligible. Whether this excess reflects external drivers such as aerosol reductions or internal variability not fully captured in this CMIP6 subset remains an open question, though the magnitude is consistent with recent estimates of aerosol-related warming (see Discussion). The underlying cause of the high SW forcing (and, more broadly, any other external influence) will also determine whether it continues to affect the probability of extreme temperature jumps in the future.

Fig. 4: Occurrence probabilities of temperature jumps under additional global warming.
Fig. 4: Occurrence probabilities of temperature jumps under additional global warming.The alternative text for this image may have been generated using AI.
Full size image

a Median occurrence probabilities for temperature jumps of varying magnitude across CMIP6 models. A lower bound of -10 is applied to likelihood values. Contour lines indicate constant probability levels. The white cross marks the September 2023 jump; the white circle shows the same event after adjusting the shortwave radiation contribution. The corresponding GWLs of each year are based on the CMIP6 ensemble mean GMT. b Change in magnitude of temperature jumps with the same likelihood as the 2023 event by 2100 under SSP5-8.5. Boxplots and grey markers represent individual model simulations. Boxplots show the median, quartiles, and whiskers extending to 1.5 × the IQR. Red markers denote best estimates based on the pooled simulations when multiple simulations are available. Positive values imply that a jump in 2100 would need to be more intense than the 2023 event to have the same probability, indicating increasing temperature variability in September under higher warming levels.

We additionally perform a CESM2 simulation in which surface ocean conditions are prescribed and the atmospheric circulation — only the winds, not temperature nor humidity — is nudged towards ERA5. In this setup, the spatial pattern of warming is very well reproduced over land (and, by design, over the ocean; Supplementary Fig. S6). The resulting temperature jump of 0.56 °C closely matches the ERA5 value of 0.58 °C, demonstrating that CESM2 can produce temperature jumps of nearly the observed magnitude when forced by suitable SST and circulation anomalies. Similarly, the contributions from the different drivers are largely consistent between CESM2 and ERA5 (Supplementary Fig. S7), though precise agreement is not expected given the distinct methodologies and associated uncertainties inherent to both reanalyses and climate models. We therefore also screen for comparable temperature jumps in a 100-member ensemble of fully interactive CESM2 simulations (CESM2-LE), which follow the same emission scenario and share identical model physics as the observationally constrained simulation and thus enable a more meaningful comparison. Under present-day warming, no free-running simulations reach the magnitude of 0.56 °C achieved in the observationally constrained experiment, despite the large sample size. This indicates that internal variability alone (as simulated by CESM2) does not account for the full event magnitude. Notably, we find no evidence for an underestimation of internal variability in CESM2: apart from the September 2023 event, the distribution of temperature jumps in ERA5 is consistent with that simulated by CESM2-LE members (Supplementary Fig. S8a). Together, these results therefore suggest that the prescribed SST and wind patterns already contain a component of external forcing.

Gradually rising probabilities of future temperature jumps

Even in the absence of external drivers beyond those represented in climate models, projections indicate that the frequency and intensity of extreme temperature jumps may still change in the long term due to the combined effects of internal variability and non-linear warming. In the CMIP6 ensemble, the frequency of such jumps tends to increase towards the late 21st century (Fig. 4a). For example, the probability of observing a jump comparable to that of September 2023 rises from 10−6 [0–10−3] in 2023 to 10−5 [0–10−2] in 2050, and to 0.1% [0–6.0%] in 2100 under the high emission scenario SSP5-8.5. This rise in probability and intensity could be driven both by accelerating warming in future decades and by intensifying internal variability. Decomposing these contributions reveals that on average, changes in the warming rate account for only about 6.5% of the rise in jump intensity, whereas 93.5% stems from increased internal variability (Supplementary Fig. S9). This dominance of internal variability suggests that the projected jump intensification is more closely tied to the global warming level reached than to the warming rate itself. Expanding our analysis to the CESM2-LE confirms this pattern of rising probabilities; while it does not produce any comparable temperature jumps under current warming levels, the probability of such events under higher warming is low but non-negligible; for example, at a global warming level of 4 °C, such jumps occur with a probability on the order of 10−3 (Supplementary Fig. S8b).

Although projected jump probabilities rise on average, we find considerable spread in the evolution of extreme jump frequency between and within models. To quantify this in tangible terms, Fig. 4b shows, for each model, how a jump in 2100 would have to differ from that of September 2023 in order to have the same occurrence probability within that model’s simulated climate. Most models show positive values, indicating increasing jump intensity and frequency over time; however, a few project only little change or even a modest decrease in September temperature variability. Likewise, even models showing an overall increase in temperature jumps may feature individual ensemble members that exhibit no clear trend or the opposite signal. This underscores the need for leveraging both multiple models and large ensembles when analyzing future heat extremes. On average, a jump in 2100 would need to be 0.05 °C warmer than that of September 2023 to remain equally (un)likely.

Discussion

Our probabilistic attribution framework assigns an extremely low likelihood to the temperature jump in September 2023. One potential explanation for such low exceedance probabilities is that internal variability might be underestimated in climate models and/or within the statistical framework, thereby biasing exceedance probabilities downward. However, the GEV-derived likelihood estimates for previous record-breaking temperature jumps closely match their observed frequencies. Likewise, CMIP6 models reproduce the occurrence rates of previous temperature jumps reasonably well, as does CESM2-LE. Recent work has also shown that September temperature variability in CMIP6 models is rather larger than smaller compared to observations and should therefore yield conservative probability estimates21. Together, these results provide no evidence for a systematic underestimation of temperature variability in climate models or the attribution framework, and suggest that the low likelihood of the September 2023 temperature jump does not simply result from underestimated variability. However, we cannot rule out that the representation of extreme tail behaviour, unlike overall variability, remains incomplete in climate models, resulting in a potential probability underestimation of rare events.

The probabilistic attribution implicitly assumes that temperature extremes are enabled by the long-term global warming and internal variability. All probability estimates, therefore, describe the likelihood that the temperature jump was caused by internal variability alone, given the current warming level. Under this assumption, the September 2023 temperature jump was extremely unlikely. This conclusion holds robustly across multiple observational temperature datasets and methodological choices in the attribution framework. It is further corroborated by CMIP6 model simulations, which collectively span approximately 40,000 years but simulate only very few comparable temperature jumps, most of which occur at higher warming levels. A recent analysis of record-breaking temperature margins likewise concluded that internal variability alone could not explain the September 2023 event21.

In light of these findings, can our results be reconciled with previous studies suggesting that internal variability alone was sufficient to explain the 2023/24 global heat and that the event was therefore predictable? Two recent studies have evaluated the predictability of the 2023/2024 heat with a boosted forecast ensemble initialized in late 202236 and an ensemble forecast simulation for 2023 and 202437. In both studies, single ensemble members approach — but do not fully reach — the observational GMT estimate for 202337 or September 202336. Internal variability and particularly the strong El Niño in 2023, therefore, seem to have substantially contributed to the event, but could not fully account for the magnitude of the observed temperature jump in September. Several studies have highlighted the role of ENSO in driving the 2023 heat; for example, it has been shown that temperature spikes, such as the one from 2022 to 2023, become more frequent following a transition from a prolonged La Niña to an El Niño state13. However, since that analysis focuses on annual mean values, it does not fully capture the magnitude of the heat event, as record-breaking temperatures in 2023 occurred only from July onward and lasted well into 2024. Our focus on September temperatures is consistent with a daily-data assessment that identified September 15–29, 2023, as the period of most extreme temperatures15. When considering annual means, our observation-based probability estimate for the 2023 temperature jump is 8.7%, close to the value of 10.3% reported in previous work13. Other studies highlight the role of SST patterns in driving GMT and conclude that the atmospheric response to SST patterns38 or the combined forced and internal contributions16 causing the 2023/24 global heat were not unprecedented; however, these analyses also focus on annual and not monthly means.

Besides the temporal scale, the spatial footprint is also crucial to characterize the exceptional nature of the heat. Notably, regional occurrence probabilities are consistently higher than global estimates. In line with this, SSTs in 2023/24 have been estimated as an unlikely but not unprecedented 1-in-512 year event17. Therefore, the 2023/24 SST spike might have been caused by internal variability alone, but when looking at the global scale and in particular September temperatures, this explanation appears insufficient when compared to model-based estimates of internal variability. Supporting this, a physical climate model experiment with prescribed observed SSTs and ERA5-nudged winds produces a temperature jump of 0.56 °C, nearly matching the ERA5 value of 0.58 °C. However, no free-running simulations in a 100-member ensemble of the same model reach this magnitude under present-day warming. Since the climate model does not seem to underestimate internal variability in general, the near reproduction in the constrained simulation suggests that the prescribed SSTs and possibly the wind patterns already include a component of external forcing. If such a forcing started to play a role only recently, both observational and model-based probability estimates from the attribution analysis would underestimate the true likelihood of the 2023 temperature spike: The observational record would contain too few years influenced by that forcing for reliable statistical representation, and the forcing itself might be only partially represented, or entirely absent, in climate models.

Indeed, we find unusually strong SW forcing in September 2023, particularly over extratropical oceans and tropical land, where the forcing is clearly outside the CMIP6 model range. Near-surface temperature, however, is more sensitive to SW anomalies over land than over the ocean, such that even moderate radiative perturbations can have a substantial effect on surface temperature. This enhanced response might also help to explain why the temperature jump was most unusual over land. Several mechanisms could contribute to the observed SW forcing, including recent changes in anthropogenic aerosol emissions. In this context, the impact of the IMO2020 regulation on shipping emissions has been extensively discussed. Estimates of the near-term temperature increase due to IMO2020 range from 0.03 °C25,27 to 0.08 °C28. However, recent work also shows that, while IMO2020 temperature effects may reach up to 0.16 °C in single years, they should remain indistinguishable from internal variability25. Additionally, large-scale aerosol reductions in East Asia have been found to cause 0.07 ± 0.05 °C of additional global warming, in line with the observed increase in surface warming trends since 201039. However, this likely concerns decadal warming rather than short-term temperature variability. Our estimate of the excess SW contribution to the September 2023 temperature jump of 0.07 °C falls within the ranges reported in these studies. Attributing the observed excessive SW forcing to a certain driver, such as reductions in anthropogenic or natural aerosol loads, is, however, beyond the scope of this study. Nevertheless, accounting for the excessive SW forcing, independent from its underlying cause, renders the September 2023 temperature jump unlikely but no longer impossible.

The exceptional global heat of 2023 has raised concerns about a potential acceleration of the warming rate40, especially since climate models generally fail to reproduce the heat event18. Our analysis contributes to this debate by showing that CMIP6 models — although comparable temperature jumps at the 2023 GWL are hardly achieved, and aside from the unusually high SW forcing — are consistent with observations and appear to capture the key processes required to reproduce the observed temperature jump. In addition, a hypothetically increased warming rate would likely impact the frequency of large absolute temperature anomalies more than the occurrence of abrupt year-to-year temperature jumps. Therefore, even an increased warming rate would likely not suffice to solely explain the temperature jump in September 2023, since the extremely low probability of the warming spike points to an exceptional contribution of internal variability.

Nevertheless, our current results are not sufficient to either fully defend or decisively challenge the performance of climate models. When accounting for the excessive SW forcing, the September 2023 temperature jump remains an event with a low, yet plausible probability of occurrence that cannot be ruled out given the potential role of internal variability. Should another comparably large jump occur in the near future, especially one with an even stronger apparent contribution from internal variability, our current understanding and the representation of the Earth System in climate models would be subject to serious scrutiny. Whether such an event could substantially shift confidence in model projections depends on the estimates of internal and externally forced contributions; these quantities are only partially captured by climate models and our statistical approach, even when informed by observations.

Towards the end of the century and under a high-emission scenario, however, most CMIP6 models simulate a near-linear increase in the frequency of extreme temperature jumps with global warming. This suggests that, in the future, internal variability could play a more prominent (and potentially even exclusive) role in driving temperature jumps of similar magnitude to September 2023. Such events would therefore be less surprising and thus less diagnostic of model performance. Further work is needed to better understand the physical mechanisms behind this projected increase, for example, whether it is linked to changes in ENSO dynamics. In the meantime, we provide the necessary framework and data such that, if another extreme jump were to happen and initial estimates of external forcing contributions — if any — became available, the plausibility of such an event according to CMIP6-based extreme value statistics under the then-current level of background warming could be immediately assessed.

Methods

Observational and reanalysis data

Our analysis is based on observed and simulated monthly near-surface air temperature data. The observational datasets include GISTEMPv4 (1880–present, 2° × 2° spatial resolution41) and HadCRUT5 (1850–present, 5° × 5° spatial resolution3). These temperature products are derived from 2-metre air measurements over land and sea surface temperatures over oceans, and we refer to them as global mean temperature (GMT). Some results are shown for GISTEMP only, as HadCRUT5 yields very similar outcomes.

For observation-based analyses beyond temperature, we employ the fifth-generation atmospheric reanalysis of the global climate (ERA5), produced by the European Centre for Medium-Range Weather Forecasts (ECMWF)42. We use monthly values of 2-metre temperature, downwelling SW radiation, total column water vapour from ERA5, and soil moisture in the upper 100 cm from ERA5-Land43 at 0.25° × 0.25° resolution to analyse the drivers behind the 2023 global heat. To ensure appropriate data quality, we use reanalysis data from 1980 onwards, when the introduction of satellite data led to substantial improvements in the global observing system, particularly in the Southern hemisphere44.

Although the ERA5 definition of temperature is more directly comparable to climate model output than the observational GMT estimates, which combine 2-metre air temperature over land with SSTs over ocean, we chose to use the observational products for the attribution analysis due to their longer temporal coverage. We verified that the results of the probabilistic attribution are robust to the choice of GMT definition. The spatial patterns of September 2023 temperature anomalies from ERA5 and the observational products are consistent, with no indication of a systematically larger discrepancy over the ocean than over land (Supplementary Fig. S10). Event definitions (e.g., the magnitude of the temperature anomaly and jump) and global warming levels (GWLs) are computed as the average across GISTEMP, HadCRUT, and ERA5 datasets.

GWLs are calculated as non-centered 5-year running means of GMT relative to the pre-industrial baseline (1850–1900). To account for the influence of ENSO, we employ a relative NINO3.4 index independent of long-term background warming45. The index is retrieved from the Royal Netherlands Meteorological Institute’s (KNMI) climate explorer46. To assess the sensitivity towards the ENSO index, we repeated our analysis with NOAA’s Multivariate ENSO Index Version 2 (MEI.v2)47, and obtained very similar results.

Climate model data

Model simulations are sourced from the 6th phase of the Coupled Model Intercomparison Project (CMIP6)48 as available in the CMIP6 next generation (CMIP6ng) archive49, comprising 51 CMIP6 models with a total of 348 realizations (Supplementary Table S1). Specifically, we use Earth System Model simulations at 2.5° × 2.5° horizontal resolution from i) 1850–2014, based on historical greenhouse gas, aerosol, and land use forcings, and ii) projections from 2015–2100, following the high-emission scenario SSP5-8.550. We choose SSP5-8.5 to obtain a clear signal of the impact of internal variability under continuous global warming. Again, variables of interest are near-surface air temperature at 2 metres (‘tas’), total atmospheric water vapour (‘prw’), incoming solar radiation (‘rsds’), and soil moisture in the upper 100 cm (vertically aggregated ‘mrsol’). The NINO3.4 index is derived from ocean surface temperature (‘ts’) following Van Oldenborgh (2021)45; for a consistent comparison between models and observations, observed NINO3.4 values of specific events (e.g., September 2023) are mapped onto the corresponding model distribution using percentile mapping.

We also analyse the occurrence of extreme temperature jumps in the Community Earth System Model version 2 Large Ensemble (CESM2-LE51), which spans 1850–2100. The CESM2-LE consists of 100 initial conditions ensemble members that follow historical forcings until the end of 2014 and SSP3-7.0 forcings thereafter.

Observationally constrained climate model simulation

We conduct a climate model simulation that incorporates the observed surface ocean state as well as large-scale atmospheric circulation, which — in contrast to ‘regular’ climate model experiments — captures historical weather and climate (extremes). The resulting spatial pattern and temporal evolution of meteorological variables, even though air temperature and humidity are not prescribed and always freely calculated, are in good agreement with observations52.

The approach has been established in previous studies52,53,54, and is applied here to the Community Earth System Model (CESM 2.2.2). Winds in the free troposphere are ’nudged’ towards ERA5. Observed SSTs and sea ice concentrations are obtained from ERSSTv5 and a blend of HadISST1 (1870–1981) with NOAA's OI SST v2 (1982–2023), consistent with CESM2 simulations using prescribed surface ocean conditions from the National Center for Atmospheric Research (NCAR). The merging procedure is described by Hurrell et al. (2008)55 and is performed for the baseline period of 1982 to 2011. Soil moisture is calculated interactively, but its temporal evolution is indirectly constrained by observations due to the model setup. Land ice evolution is not considered to reduce computational expenses.

Using this configuration, i.e., reanalysis and observation-constrained large-scale winds and surface ocean, a global simulation with roughly 1° horizontal resolution is performed for 1980–2023, including 3 years of spin-up (1977–1979), and using a high-emission Shared Socioeconomic Pathway (SSP3-7.0) from 2015 onwards. Owing to the special simulation set-up, this model experiment can be considered closely constrained towards the observed state of the climate system, and hence represents a model realization of the climate system that mimics the ’true’ internal ocean and atmospheric variability. This simulation also allows for a comparison with the CESM2-LE, which samples internal variability but does not reproduce the specific circulation conditions of 2023.

Non-stationary Generalized Extreme Value distribution

We represent monthly temperature anomalies and temperature jumps as realizations of a non-stationary Generalized Extreme Value (GEV) distribution. Assuming that the extreme heat in September 2023 was primarily driven by long-term global warming, and exacerbated by the onset of El Niño, we formulate the non-stationary location parameter as:

$$\mu ={\beta }_{0}+{\beta }_{1}\times {{{\rm{GMT}}}}+{\beta }_{2}\times {{{\rm{NINO}}}}3.4$$
(1)

Annual GMT values are smoothed to minimize the leakage of the analysis period into its own long-term warming estimate and thus better separate the signals of internal variability and background warming. We therefore apply a 15-year locally weighted scatterplot smoothing (LOWESS) filter56 to the GMT time series, which fits local regressions using a tri-cube weighting kernel based on available GMT observations of the nearest 15 years. Compared to the 4-year running mean recommended by the World Weather Attribution protocol57, this approach results in a stronger smoothing, an adjustment motivated by the global scale of the analysis: Most attribution analyses focus on regional events, where the GMT covariate contains less of the event’s temperature signal, since it is a regional and not global temperature mean. At the global scale, a stronger smoothing ensures a cleaner separation of the event from the explanatory covariate.

In contrast, the NINO3.4 index represents internal variability; it is therefore not smoothed and defined over the same time scale as the heat event, i.e. the attribution of the absolute temperature anomaly in September 2023 uses the NINO3.4 value for that month, and for the temperature jump it is the difference in NINO3.4 between September 2022 and 2023. For attribution based on climate models, observed NINO3.4 values of specific events (e.g., September 2023) are mapped onto the corresponding model distribution using percentile mapping.

As expected, the explanatory power of the GMT covariate for the location parameter is high for absolute temperature anomalies, but very low for temperature jumps, which reflect temperature variability rather than long-term warming. The ENSO index impacts the likelihood of anomalies and jumps, since internal variability usually makes an important contribution to both.

Unlike most attribution studies of heat extremes57, we also account for non-stationarity in the scale parameter by incorporating GMT as a covariate. This allows us to capture potential changes in internal variability under higher warming levels, for example, due to land-atmosphere feedbacks and other non-linear processes that might intensify heat extremes under climate change58. We specify the GMT-dependent scale parameter as:

$$\sigma ={\gamma }_{0}+{\gamma }_{1}\times {{{\rm{GMT}}}}$$
(2)

From this conditional distribution, we extract year-by-year residuals from the fitted conditional distribution to quantify the deviation of individual events from statistical expectation. For a given year, the residual is the difference between the observed temperature (or temperature jump) and the mean of the GEV conditional on GMT and NINO3.4 of that year. Each residual, therefore, represents how much warmer or colder the year was relative to what was expected given the state of the two covariates. To ease comparison across different GWLs, we then standardize the residuals by their GMT-dependent scale and re-project them onto the scale corresponding to the 2023 GWL. Similarly, all distributions shown in plots display residuals (i.e., they are centered at 0) with the scale parameter of 2023. To facilitate a consistent analysis across models and ensure that attribution results are not biased by different warming rates between climate models, we identify the year when each model reaches the observed 2023 GWL. The model’s calendar is then shifted so that this year corresponds to 2023, ensuring all models share the same observed GWL in 2023.

Uncertainty estimation in the probabilistic attribution framework

We estimate distribution parameters using a Markov Chain Monte Carlo (MCMC) sampler, drawing 25,000 samples from the converged posterior distribution, and quantify uncertainty using the posterior median and the 2.5th–97.5th percentiles, following the methodology of Hauser et al. (2017)59, as implemented in the accompanying Python package60. In rare instances, a small offset was added to the GMT covariate (≤ 0.1 °C) to obtain a valid initial fit; whenever this was not successful, the respective simulation was omitted. Where multiple ensemble members are available, we pool simulations to obtain more robust parameters. When assessing temperature jumps at different warming levels (Fig. 4b), however, we fit parameters to each CMIP6 simulation individually. This allows us to evaluate the robustness of the scale parameter’s response to GMT changes across individual model runs and to better understand the uncertainty in observation-based non-stationary scale estimates, since the length of observational records is far more comparable to an individual model run than to pooled simulations. Probability estimates across the CMIP6 ensemble are then obtained by taking the median of the individual model probabilities.

We fit each distribution excluding the 2023 event, typically using 173 years of data (1850–2022), except for probability estimates of future temperature extremes, for which data until 2100 are included. To account for the increasing uncertainty in estimating the far tails of probability distributions, probability estimates are truncated at 10−10; values below this threshold cannot be distinguished with confidence.

We note that ideally, the GEV distribution is applied to block maxima, such as annual maximum temperatures. However, extracting multi-year block maxima from the relatively short observational time series would yield too few data points for a meaningful statistical fit. We therefore apply the GEV distribution directly to the full time series of temperature anomalies based on the good empirical fit with the observed September data. To further assess the sensitivity of probability estimates to the choice of distribution, we additionally fit the temperature data to (i) a Gaussian distribution with non-stationary location and scale parameters, and (ii) a Generalized Pareto Distribution (GPD). The GPD is fitted to the highest 20% of residuals from a multiple linear regression of the temperature anomalies (jumps) on a GMT time series and NINO3.4 anomalies (year-to-year changes) to estimate the probability of absolute temperature anomalies (jumps). Since GPD estimates are conditional on exceeding the 80th percentile threshold, the resulting probabilities are multiplied by 0.2 to obtain unconditional values, allowing comparison with those from the GEV and Gaussian fits. Both the GPD and Gaussian distribution fits yield qualitatively similar results to the GEV-based likelihood estimates.

Process-based analysis

We set up a multiple linear regression to model temperature fluctuations for each grid cell:

$${T}^{{\prime} }={\beta }_{0}+{\beta }_{1}\times {{{\rm{tcwv}}}}^{{\prime} }+{\beta }_{2}\times {{{\rm{SW}}}}{\downarrow }^{{\prime} }+{\beta }_{3}\times {{{\rm{NINO3.4}}}}^{{\prime} }+{\beta }_{4}\times {{{\rm{sm}}}}^{{\prime} }$$
(3)

where T represents near-surface temperature, tcwv is total column water vapour, SW refers to downwelling solar radiation, NINO3.4 is the ENSO index, and sm denotes moisture in the upper 100 cm of the soil (excluding Antarctica, defined as land south of 60°S). All variables are expressed as short-term fluctuations with respect to the mean in a changing climate (indicated by \({\prime}\)), computed by applying a 31-year LOWESS filter to each variable and calculating the deviation from the smoothed time series. This means that all predictors and the predictand represent deviations from respective long-term averages. The analysis is based on 43 years of reanalysis data (1980–2022). For climate models, we use the 43 years preceding the year in which the observed GWL of 2023 is reached. For consistent treatment of observational and model data, GWLs are calculated in both cases using 5-year non-centered rolling means of global temperature from GISTEMP. To evaluate multicollinearity among predictors, we computed variance inflation factors (VIFs), which indicate the degree of linear dependence among the predictors61. VIFs remain below 5 in most regions (Supplementary Fig. S11), indicating that multicollinearity does not critically distort the regression coefficients62, although predictors are inherently coupled within the Earth system.

Based on the regression model, we can quantify the contribution of individual drivers to a specific temperature jump by multiplying the respective regression coefficients and the observed year-to-year difference in each driver. To evaluate consistency between reanalysis data and climate projections, we compute the contributions to the September 2023 temperature jump for each driver and compare these to extreme temperature jumps simulated in CMIP6 models at the same GWL as observed in 2023 ( ± 0. 5 °C). We analyze the six CMIP6 models with more than 10 ensemble members available, enabling sufficient sampling of jumps with magnitudes comparable to the 2023 event. For each model, we select the temperature jumps exceeding the 99th percentile, which yields 40 events across 141 simulations. Different thresholds (e.g., the maximum jump per model or those exceeding the 95th percentile) yield qualitatively similar results.