Introduction

Hail—frozen hydrometeors originating from convection exceeding 5 mm in any dimension—is a common phenomenon found in many regions of the world1. Most frequent across the Great Plains of North America, damaging hail occurs an average of 158 days per year in the United States2, which translates to roughly $10B of insured losses annually before accounting for agricultural losses. Losses due to hail and other severe convective storms have quintupled in the United States since 20083, raising questions from scientists, industry, and members of the public about the roles of a warming climate and changing socioeconomic exposure4. Despite this interest, the potential impacts of anthropogenic climate change on hailstorm frequency and intensity are relatively unknown, owing to the coarse horizontal grid spacing (≈100 km) that current general circulation models (GCMs) simulate relative to the scale of hail-producing thunderstorms (≈1–10 km). Understanding how hailstorms may change in the future is critical for estimating societal impacts for a variety of stakeholders.

Most research focusing on future hail potential has used an environmental proxy approach from GCMs, whereby an ingredients-based approach5 is used to assess hail occurrence in future climate regimes (i.e., the likelihood of future hail events is implicitly determined by changes in the fundamental variables known to be associated with hailstorms). The consensus from this work suggests that a warming climate will increase the number of days with favorable severe convective storm ingredients, driven primarily by updraft strength associated with increases in convective instability6,7,8,9. While these works are generalizable to severe convective storms broadly, they lack the ability to project changes in hail characteristics specifically due to the large variability in atmospheric environments that support hailstorms10,11,12,13,14. A notable exception to this generality was a study15 using environmental profiles from three regional climate models (RCMs) as input to a 1-D hail growth model16 to predict maximum surface hailstone sizes over North America during the 2041–2070 study period. A decrease in the frequency of small hail and an increase in the frequency of large hail was noted; the former was due to a greater melting depth, and the latter to increases in updraft strength. While these approaches are informative, they lack the ability to explicitly simulate hail-producing storms and their microphysical characteristics in future climates. Simply, hailstorm-supportive environments are an important—but not sufficient—consideration for hailstorm occurrence.

As an alternative to the implicit environmental approaches, a few recent studies have explored the use of a pseudo-global warming (PGW) approach as a means to explicitly simulate the impact of anthropogenic climate change on severe convective storms17,18,19. The PGW method applies a mean thermodynamic (and occasionally kinematic) delta from a suite of GCM output to the initial and lateral boundary state fields of reanalysis cases that are then fed into a convection-permitting numerical model. The most recent of these works20 analyzed historical hailstorm cases under control and PGW-modified conditions, finding stronger updrafts for all cases, but hailstone size distributions dependent on season and location in the atmospheric vertical profile (i.e., surface versus aloft). Emblematic of many PGW case studies, this work is also limited by a small sample size of seven events. While not comprehensive, these works do offer insight into a process-based understanding of potential changes and are relatively inexpensive from a computational perspective.

In the last decade, research into how climate change is affecting severe convective storms has increasingly employed a dynamical downscaling approach21,22. Dynamical downscaling here refers to the use of relatively coarse resolution model output incapable of resolving thunderstorms (e.g., GCM output fields) as initial and lateral boundary conditions to force convection-permitting (i.e., model horizontal grid spacing Δx ≤ 4 km23) simulations capable of explicitly examining meso-γ processes associated with hailstorms. This is similar to the limited area model approach used in day-to-day operational numerical weather prediction. This methodology has proven valuable for the assessment of historical and future convective storm populations24,25,26,27,28,29,30,31,32,33, but is computationally expensive (both in terms of processing and storage requirements) due to the horizontal grid spacing required to simulate without a cumulus parameterization scheme34. Overall, these works suggest notable changes in the frequency, spatial location, intensity, and variability of severe convective storms by the end of the 21st century that are seasonally and emissions pathway-dependent. Despite these valuable insights, most research using dynamical downscaling has focused on the aggregate response of severe convective storms to anthropogenic forcing and not specifically on the hazard of hail. In fact, a recent Nature Reviews article35 focusing on the effects of climate change on hailstorms suggests that, “more high-resolution numerical model simulations are required to better resolve hailstorm processes and to investigate expected future changes around the globe.”

To our knowledge, only two studies have explicitly investigated the future of hail across the U.S. using a dynamical downscaling framework at the thunderstorm scale. The first was a simulation using 1-km horizontal grid spacing over a limited domain covering Colorado during Jun–Aug and showed an increase in hail mass aloft but an increased melting level, which eliminated nearly all hail at the ground36. This work was limited by domain size and a select handful of event days during the warm season and is, thus, unable to characterize broad changes in hailstone populations. The other, and most parallel to the work herein, used dynamical downscaling of the Geophysical Fluid Dynamics Laboratory Climate Model version 3 output to determine future changes in hail size frequency31. This work examined hail occurrence by using simulated hourly maximum column-integrated graupel (GRPLmax; kg m−2), which represents a vertically integrated mixing ratio of the rimed ice category (hail occurrence was derived ad hoc from limited auxiliary simulations using correlation to HAILCAST37). Broad spatial increases in GRPLmax values correlated to hail ≥3.5 cm in diameter were found over the U.S. for all seasons, whereas increases in GRPLmax correlated to ≥5 cm diameter hail were confined to the central U.S. during boreal spring and summer. While correlation does exist with surface hail size, the column-integrated nature of GRPLmax introduces notable caveats when trying to capture surface hail size for a wide spectrum of environmental regimes, mainly related to the influences of melting.

We approach the climate change and hail question by evaluating convection-permitting (Δx = 3.75 km) dynamically downscaled regional climate simulations (WRF-BCC; see Methods)38 across the U.S. using GCM output from the National Center for Atmospheric Research (NCAR) Community Earth System Model39. We present explicit hail size and frequency projections calculated from the model’s inline spectral bulk microphysics scheme for historical (HIST; 1990–2005), mid-century (MID; 2040–2055), and end of century (END; 2085–2100) experiments. This work marks the first comprehensive investigation of explicitly simulated hail size and frequency using both intermediate and pessimistic anthropogenic emissions pathways.

Results

Comparison of hail day climatologies

We begin with comparison of mean annual and seasonal WRF-BCC near-surface severe hail days from the HIST epoch with two independent hail climatologies derived from GridRad (Doppler radar) and observed reports (Supplementary Fig. A1). While a one-to-one comparison should be performed with caution due to the differences in approaches and thresholds (see Online Methods), the purpose of this analysis was to gauge whether WRF-BCC could replicate the aggregate climatological attributes of severe hail. Overall, WRF-BCC reasonably reproduces the frequency, magnitude, timing, and location of severe hail days, though not without bias. Mean annual severe hail days from HIST (Supplementary Fig. A1a) suggest the highest frequencies in the central High Plains, which is corroborated by GridRad (Supplementary Fig. A1b) and observed reports (Supplementary Fig. A1c). Two notable areas of bias were found, the first located in the northern High Plains, where WRF-BCC simulated a higher number of JJA severe hail days versus both GridRad and reports. The second was a low bias in most areas east of the Mississippi River, a majority of which occurred in MAM. These biases could be reduced by using regionally dependent diameter thresholds, but such a topic is extraneous to this discussion (e.g., the WRF-BCC high bias in the northern High Plains may be due to a lack of observed reports). More importantly, WRF-BCC is able to simulate the known seasonal progression of severe hail day frequency climatology (Supplementary Fig. A1d–g) as shown by reports (Supplementary Fig. A1l–o), with differences comparable to GridRad (Supplementary Fig. A1h–k).

Projections of hail size

Daily-maximum (1200–1200 UTC) values of near-surface and column-maximum hail diameters were extracted at each native WRF-BCC grid point in the analysis domain and binned in 0.5 cm increments for each climate epoch to analyze size distributions (Fig. 1). Statistically significant broad decreases were found for near-surface maximum hail diameters between 1.5 and 3.5 cm for all future epochs, whereas for sizes ≥4.5 cm, increases were noted (Fig. 1a). This inflection point between 4 and 4.5 cm suggests a varying response of near-surface hail diameter to climate change—one that favors increases in the largest hail sizes and a more sweeping decrease in frequency at smaller diameters. Relative changes for each projected epoch were sensitive to the emissions pathway and the selected time horizon (e.g., the largest frequency decreases and increases relative to HIST were in the END85 experiment).

Fig. 1: Projected hailstone size changes.
figure 1

Binned a near-surface and b column max hailstone diameter frequency for each WRF-BCC future epoch expressed as percent change relative to HIST. Statistically significant differences are displayed along the abscissa for each bin (i.e., horizontal lines corresponding to each experiment).

Column-maximum hail size distribution changes were negligible for diameters ≤3.5 cm, but a robust increase (statistically significant at all sizes ≥4 cm for all epochs) occurred with the largest WRF-BCC hailstones (Fig. 1b). This includes a >25% increase in the frequency of ≥4.5 cm diameter hailstones for all future epochs and a >75% increase 5+ cm diameter hailstones for the END experiments relative to HIST.

Spatial changes

Widespread decreases of 2–4 days per year in near-surface severe hailstone diameter are projected across a majority of the Great Plains (Fig. 2a–e). These changes are emissions pathway-dependent, with a greater magnitude of decrease found when using RCP 8.5 experiments. East of the Mississippi River, projected changes in near-surface severe hail days were marginally increasing for most regions, though not in a statistically significant manner (Supplementary Fig. A2). Looking at severe hail days through the lens of column-maximum hail diameter shows quite different results, with robust increases of ≥5 days over the Midwest, Ohio Valley, and Northeast U.S. for all experiments (Fig. 2f–j). Decreases in the southern Great Plains and along the Gulf Coast were spatially similar for all future experiments, but were most prevalent in RCP 8.5 simulations. Broadly speaking, the spatial projections in column-maximum severe hail days—perhaps unsurprisingly—mirror those found in previous work examining thunderstorm days from WRF-BCC32.

Fig. 2: Projected changes in mean annual severe hail days.
figure 2

Mean annual HIST a near-surface (top row) and f column-maximum (bottom row) severe hail days. be and gj represent the change in mean annual days for each experiment from the respective HIST values. Statistical significance and percent relative change are shown in Supplementary Fig. A2.

Seasonal distributions of near-surface and column-maximum severe hail days were also analyzed (Figs. 3 and 4, respectively). For near-surface severe hail days, DJF was characterized broadly by decreases in portions of Texas and Oklahoma and mixed potential for change along the eastern and northeastern gradient of the maximum frequency shown in HIST. The largest decreases (and increases) were found in the spring and summer, with MAM showing more variability in the projections dependent on emissions pathway and time period. JJA exhibited the most significant across-epoch decreases in mean near-surface severe hail days, with the greatest change projected in the southern, central, and northern High Plains. All four experiments, but especially MID85, also portrayed increases in JJA hail days for a majority of locations east of the Mississippi River. Changes were less notable in SON, but all experiments recorded a spatial mean decrease of <0.5 days.

Fig. 3: Projected changes in seasonal mean severe hail days.
figure 3

HIST mean annual frequency of near-surface severe hail days for a DJF, b MAM, c JJA, and d SON. eh, il, mp, and qt represents the change in mean annual near-surface severe hail days from the HIST period for the MID45, END45, MID85, and END85 experiments, respectively.

For projections of mean seasonal column-maximum severe hail days, DJF exhibited spatial changes similar (albeit different magnitudes) to that for near-surface (Fig. 4e, i, m, q). Greatest decreases were, again, found across the southern Great Plains, with changes similar for both RCPs and time epochs. The largest increases in DJF column-maximum severe hail days were found across the eastern third of the CONUS, especially for both END experiments. Changes in MAM days were more simulation-dependent, with increases projected in mean days for most areas in the Midwest and Mid-South (Fig. 4f, j, n, r). Simulated changes in the summer displayed a more latitudinal dipole structure, with decreases in mean JJA days across the Gulf Coast and southern United States, and the most consistent regional increases across the northeast United States. However, JJA did have the largest inter-experiment variability, which is showcased by focusing on individual states (Fig. 4g, k, o, s). Changes across the analysis domain for SON column-maximum severe hail days revealed an increasing pattern ranging from a spatial mean of three more days for MID45 to seven more days for END85 (Fig. 4h, l, p, t). Similar to the projected annual changes, the seasonal breakdown is similar to that found for thunderstorm days from previous research using WRF-BCC32.

Fig. 4
figure 4

As in Fig. 3, but for mean annual column max severe hail days.

Mechanisms for change

Explanations of the projected changes in hail size from WRF-BCC were explored by analyzing the underlying environmental mechanisms derived from the parent GCM. While we acknowledge that specific storm-scale details (e.g., updraft width, mesocyclone strength40) are important for determining maximum hail size, this discussion is focused on the broader environmental projections of melting level, 700–500 hPa lapse rate, parcel water vapor mixing ratio, deep-layer wind shear, and a composite of favorable atmospheric ingredients for significant hail (i.e., the significant hail parameter; SHIP).

Melting level has increased since 197941 and is projected to further increase across the domain regardless of the experiment (Supplementary Fig. A3). By MID, both emissions pathways suggest an annual mean increase ≈300 m. Divergence in melting level height projections occur toward the end of MID, with RCP 8.5 simulating an additional mean increase of 450 m by END, whereas additional increases in the RCP 4.5 scenario were <150m. For near-surface hail size and occurrence, any such increases in melting level height are likely to have the greatest influence on smaller hailstones prone to significant melting during their fall trajectories. We posit that a majority of the future decreases found in WRF-BCC smaller near-surface hail diameters (i.e., ≤4 cm) are due to increases in melting level height (Fig. 1a). This hypothesis is strongly supported by a 25-year sample of hailpad observations from southern France42 and a 29-year sample of hailpad observations from northeast Italy43.

Another important aspect of vertical thermodynamic structure for hailstorms is the mid-level lapse rate (Γ), measured here between 700 and 500 hPa (°C km−1), which serves as a proxy for layer static stability. Lapse rates that approach dry adiabatic (i.e., dθ/dz ≈ 0) in this layer typically suggest the presence of an elevated mixed-layer (EML), which has been trending stronger and more frequent across the High Plains in recent decades44. This has opposing implications for hailstorms, as steeper lapse rates favor large hail, but air that is too warm and dry at the base of the EML can inhibit convection initiation and sustenance (i.e., convective inhibition; CIN). Annual mean days with steep (i.e., ≥7.5 °C km−1) mid-level lapse rates are projected to increase by MID across a large swath of the central United States. (Supplementary Fig. A4a–e). The projected increase in days has greater dependence on the emissions pathway versus the time horizon. The Southeast, Mid-Atlantic, and Northeast U.S. portrayed either no change or slight decreases in steep lapse rate days depending on time horizon and emissions pathway. While regional change was evident, the domain-averaged annual mean mid-level lapse rate shows a only a slight decreasing trend (Supplementary Fig. A4f). A mean domain-averaged increase of 3 J kg−1 of the most unstable (MU) CIN is projected by END (not shown).

Next, we analyze water vapor mixing ratio (w; g kg−1), which is the primary source of energy (via latent heat release) for hailstorms. We examine w for a MU parcel as this measurement is also used in the calculation of SHIP. As expected due to increasing temperatures, widespread increases in w are noted, with both emissions scenarios averaging close to a 0.6 g kg−1 increase by MID (Supplementary Fig. A5a–e). Divergence also occurs for w projections by the end of the MID period, with nearly a 2 g kg−1 increase expected by the END period relative to HIST for the RCP 8.5 pathway (Supplementary Fig. A5f). One area of interest is in the central High Plains, where projected increases in w display a relative minimum. This suggests rising aridity given the expected increases in local temperatures, which is a relatively well-studied aspect of U.S. regional climate45,46,47. A related analysis (not shown) to diagnose changes in potential energy for convection using MU convective available potential energy showed similar results to w.

Sufficient deep-layer wind shear is necessary for organized convection to form, favoring morphologies that promote hail formation48. We examine this important ingredient by using the HIST climatology and future projections of days with deep-layer (0–6 km) bulk wind shear ≥18 m s−1 following previous work49. These can be thought of as adequate deep-layer wind shear days for organized convection, such as supercells33. In general, a median grid point decrease between 10 and 15 days was projected for MID45, MID85, and END45 (Supplementary Fig. A6a–d). END85 was a notable departure from the other three experiments, with a median decrease across the domain of ≈21 days (Supplementary Fig. A6e). MID85 was unique in that a more regional increase in days was found across the western Great Lakes and mid-upper Mississippi Valley (Supplementary Fig. A6c). The greater inter-experiment variability for deep-layer bulk shear is not surprising given its sensitivity to the synoptic circulation patterns. Overall, the broader picture foretells a reduction in mean annual deep-layer wind shear across the domain by 1–2 m s−1 by end of century (Supplementary Fig. A6f).

We conclude by examining the significant hail parameter (SHIP;50), a dimensionless composite parameter that measures juxtaposition of hail-favorable environmental ingredients51,52. A HIST climatology of SHIP ≥1 days (value of 1 is ≈ the HIST 90th percentile) shows the well-known distribution (and biases) of significant hail environments across the U.S. (Fig. 5a). Projected changes in SHIP ≥1 days reveals the importance of examining all ingredients together (Fig. 5b–e). For example, increases in moisture, convective instability, and mid-level lapse rate are not sufficient to offset decreases in deep-layer vertical wind shear and increased melting levels in most areas across the High Plains. This leads to a regional 2–5 day decrease in SHIP ≥1 days in all four experiments. Conversely, in areas like the Mid-South, decreases in both mid-level lapse rate and deep-layer vertical wind shear are opposed by increases in water vapor mixing ratio and convective instability, leading to an increase in days. A seasonal breakdown (not shown) indicates that the largest decreases in SHIP ≥1 days occur during JJA, with the largest increases recorded in MAM. This result is consistent with expected summary changes in severe convective storm environments as determined by the implicit environmental approaches21,22,53,54,55,56 and helps explain the spatial distributions we find in WRF-BCC.

Fig. 5: Projected changes to severe hail environments.
figure 5

WRF-BCC parent GCM representation of a HIST annual mean days with a SHIP value ≥1. be show the projected departure in days for each experiment. f is calculated using the domain-averaged annual accumulation of daily max SHIP. Years corresponding to WRF-BCC simulations are shaded in gray. Bold lines indicate the 10-year running mean.

Discussion

We find decreases in near-surface severe hail days over large swaths of the Great Plains, with most of the change projected during summer. These decreases were most notable for the end-of-century simulation under the most aggressive emissions pathway (RCP 8.5). Spatial changes in column-maximum severe hail days showed a more latitudinal response, with decreases along the Gulf Coast and Texas, and increases in the Mid-South, Midwest, and Northeast.

Study limitations include the use of only one pair of GCM/RCM members and a rather limited number of simulation years for each epoch due to computational expense. Ideally, a large ensemble of simulations should be examined with varying initial and lateral boundary conditions from a suite of GCMs forcing a multi-physics convection-permitting ensemble of RCMs for 30-year experiment periods. Finally, more realistic hail size climatologies (especially at the largest sizes) could be achieved by incorporating a more sophisticated microphysics scheme or physically-based hail forecast model (e.g., HAILCAST37).

We conclude by summarizing the expected impacts of climate change on hailstorms from our experiments using a simple illustration (Fig. 6). A warmer future climate leads to an increased water vapor mixing ratio over most regions and subsequent energy for thunderstorm updrafts. Temperature decreases through the mid-level of the atmosphere steepen in the most hail-prone areas of the United States, a feature that favors stronger updrafts; however, this can also inhibit the development, as well as sustenance, of deep convection if the base of the EML becomes too warm. If deep convection does develop, mean updraft speeds are greater, and the largest stones are more frequent. As hailstones fall toward the surface, a greater depth of air >0 °C promotes increased melting and distributional shifts that favor larger hailstones. This has significant implications for impacts on lives and property. Finally, the authors remind readers that, regardless of the changes in hail size and frequency due to climatic shifts, we should expect to see continued increases in the societal impacts and losses associated with hailstorms due to increases in the footprint of the human-built environment4,57.

Fig. 6: Hailstorm changes.
figure 6

Schematic summary of a current and b future hailstorms. Red and blue arrows indicate the thunderstorm’s updraft and downdraft, respectively. b a warmer climate leads to (1) increased vapor mixing ratio, which serves as energy for thunderstorms. This leads to stronger updrafts (2), on average, and more large hailstones aloft. As hailstones begin their downward trajectory, they encounter a melting level height (3) that is expected to increase by at least 500 m by the end of the century. These processes ultimately result in fewer small hailstones reaching the surface, with a favored distribution toward larger hail sizes. A thunderstorm producing 5 cm hail near Roswell, New Mexico on 7 June 2014 is shown in the background photograph.

Methods

RCM experiments

Convection-permitting (Δx = 3.75 km; 51 vertical η levels; no cumulus parameterization) historical and future RCM simulations were created using the Advanced Research core of the Weather Research and Forecasting (WRF) Model version 4.1.258. Initial and lateral boundary conditions of atmospheric and oceanic state variables for WRF originate from 6-hourly interval output from the Community Earth System Model (CESM59) provided by NCAR that participated as a GCM in phase 5 of the Coupled Model Intercomparison Project (CMIP560). Herein, we use a version of these GCM data that are regridded and bias-corrected using 1981–2005 ERA-Interim reanalysis61 following previous work62. Bias correction of GCM data prior to input into a RCM improves overall simulation performance, especially for extremes63,64,65,66, and represented a critical element in the authors’ choice of GCM data for input into the RCM.

RCM output were generated for one historical (HIST; 1990–2005), two mid-century (MID; 2040–2055), and two end of century (END; 2085–2100) 15-year periods for a domain encompassing the conterminous U.S. For the MID and END epochs, Representative Concentration Pathway (RCP67) 8.5 and 4.5 were chosen to represent a pessimistic and intermediate anthropogenic emissions scenario, respectively. For each experiment epoch, 15 simulations (representing each year in the respective period, for a total 75 simulations) were continuously integrated across an entire hydrologic year (1 Oct–30 Sep), with a new initialization occurring every 1 Oct. This represents a middle ground approach between more frequent (i.e., daily) reinitialization28,31 and a continuous integration of the convection-permitting RCM over the 15-year period29 from previous work. In general, less frequent initialization is preferred in RCM applications to recreate conditions that require hydrologic memory68,69 due to their development over multiple days or months70. Our approach aimed to balance this consideration with the equally important issue of computational resources/efficiency (e.g., processing multiple years from a period in parallel). We direct readers to previous work that contains additional details of the RCM configuration, including verification results of the HIST period against observations38. These simulations are generically referred to as WRF-BCC (WRF-Bias-Corrected CESM) throughout this study. The five simulation experiments are labeled as HIST, MID45, MID85, END45, and END85.

Representation of hail in WRF-BCC

To assess changes in hail, we examined WRF-BCC output of the lowest model level maximum diameter hail diagnostic (hail_maxk1) and the column-maximum diameter hail diagnostic (hail_max2d) calculated at run-time from the Thompson microphysics scheme71,72. Hourly maximum values of hail_maxk1 and hail_max2d were captured at the top of each simulation hour for analysis, which represents the maximum value over the last hour of numerical integration (20 s interval). To ensure simulated hail values were originating from updrafts associated with convection, hail_maxk1 and hail_max2d were masked if the gridpoint did not also record a maximum vertical velocity value of 6 m s−1 over the same hour. We also restrict all analyses to areas east of the U.S. Rocky Mountains at WRF-BCC grid points with elevations ≤1830 m (≈6000 ft).

For this work, WRF simulated the maximum hail diameter using a modified gamma distribution of the Thompson scheme’s graupel class:

$$N(D)={N}_{o}{D}^{\mu }{e}^{-\lambda D}$$
(1)

where N is the number of hailstones of diameter D. The intercept (No) and slope (λ) are predicted by representation of the underlying physical mechanisms causing increases or decreases in the hailstone sizes (e.g., melting, accretion). The shape parameter is held constant (μ = 1) and a spherical hailstone shape is assumed. Due to the continuous and potentially infinite nature of the gamma equation, hailstone diameters for this version of WRF are constrained to 50 bins related to physically observable hail sizes73. The maximum hailstone diameter is set to the upper limit bin size where the number concentration of hailstones last reaches a prescribed minimum threshold of 5 × 10−4 m−3, which translates to roughly one hailstone of that size per 160 m2. This threshold is corroborated by in-situ, high-density observations of hail74. The largest of the 50 spectral bulk bin sizes in WRF is constrained to 7.5 cm, which intrinsically caps the maximum size of WRF-BCC hail diameters. Thus, it should not be expected that simulated maximum hail diameters approach the largest values occasionally observed (e.g., 10+ cm75). Future simulations of this type may benefit, especially with respect to the largest hailstone sizes, from double-moment hail class microphysics scheme or an inline 1-D physically-based hail size forecasting module (e.g., HAILCAST37), both of which were determined to be too computationally expensive during generation of WRF-BCC.

Comparison to observations

It is challenging to fully assess the ability of WRF-BCC to capture the observed climatology of hail in the U.S. due to known issues with the official U.S. hail-reporting database76 and caveats associated with remotely sensing hail from active and passive remote sensing platforms77,78. While a comprehensive verification—akin to that performed for WRF-BCC temperature and precipitation38—is particularly difficult for hail, we attempt this important exercise by comparing WRF-BCC hail_maxk1 to observed reports and Doppler-radar detections of hail.

For observed hail reports, we follow previous work79 and use an archive maintained by NOAA’s National Weather Service Storm Prediction Center. The database includes reports of “severe" hail, which is defined in the U.S. as hail with a 2.54 cm or larger maximum dimension. Doppler-radar detections of hail stem from GridRad v3.180, which is a high-resolution, 3-D gridded radar dataset covering much of the contiguous U.S. derived from the National Weather Service’s Next Generation Weather Radar (NEXRAD) system. Specifically, we use GridRad’s 95th pecentile maximum estimated size of hail variable (MESH95).

1995–2017 (i.e., temporal coverage of GridRad v3.1) hail reports ≥2.54 cm (1 in), GridRad MESH95 detections calibrated to the “severe" threshold (≥6.33 cm; 2.49 in)78, and WRF-BCC hail_maxk1 ≥3.18 cm (1.25 in) from the HIST epoch were resampled in space and time to create gridded climatologies of 1200–1200 UTC hail days78 on a similar ≈1° × 1° latitude-longitude grid for comparison. We say similar grid because there are some slight differences in the WRF-BCC grid (both GridRad and reports were aggregated to an exact 1° × 1° latitude-longitude grid) due to its native 3.75 km horizontal grid spacing using a Lambert conformal conic projection. We used a 30× upscale regrid on the WRF-BCC for comparison, resulting in a 112.5 km (30 × 3.75 km) grid spacing, which is slightly larger than 1° × 1° across our domain. We chose a 1° × 1° grid for this analysis due to previous work that compared GridRad MESH95 to hail reports78, but the overall results were not sensitive to grid size (only the expected changes to magnitude values). HIST hail_maxk1 ≥3.18 cm was chosen as a threshold for defining “severe" hail days from WRF-BCC because it demonstrated the lowest gridpoint-to-closest-gridpoint root-mean-square error of those thresholds tested (1.91, 2.54, 3.18, 3.81 cm 0.75, 1.0, 1.25, 1.5 in) against severe (≥2.54 cm) hail days from reports. This is similar to the calibration analysis and threshold chosen for GridRad “severe" detections (MESH95 ≥6.33 cm) in previous work78.

Significance testing

Where indicated throughout the manuscript, statistical significance was calculated for annual distributions using a Mann–Whitney U test for the medians at the 95% significance level (p values ≤ 0.05). For Fig. A2, additional restriction of the p values was performed by implementing a field significance false discovery rate of α = 0.1.