Background & Summary

The troposphere, as the layer of the atmosphere closest to the Earth’s surface, contains about 75% of the mass of the atmosphere and over 90% of the water vapor mass. When the electromagnetic wave signal traverses the troposphere, it undergoes alterations in speed and path deflection. These changes, coupled with the inclusion of the tilt distance along the signal propagation path, collectively contribute to what is known as tropospheric delay. Tropospheric delay significantly impacts GNSS navigation and positioning, remote sensing and satellite altimetry et al., which hinders the advancement of high-precision earth observation services1,2. Tropospheric delay is typically divided into two components: the tropospheric hydrostatic delay and wet delay caused by water vapor. Water vapor, being a highly dynamic constituent of the atmosphere, demonstrates notable temporal variations in content and distribution, presenting difficulties in accurately characterizing tropospheric wet delay3. Tropospheric water vapor affects the transmittance of visible and near-infrared bands, causing attenuation and delay of microwave signals, which significantly impacts the scattering and absorption of radar signals4. Similarly, in the field of GNSS, the troposphere introduces delays in electromagnetic signals, thereby affecting the accuracy of positioning5. Since the composition of the troposphere changes with altitude, the effects of the troposphere vary significantly across different vertical layers. Accurately describing the vertical distribution of tropospheric components, particularly zenithal water vapor, is crucial for improving atmospheric models and GNSS positioning accuracy. Research has shown that zenith tropospheric delay and water vapor content generally follow a negative exponential distribution with altitude6,7. The scale height concept provides a useful quantitative measure for describing this distribution. A comprehensive analysis of scale height aids in understanding the structure and variability of the tropospheric atmosphere, facilitating the development of more precise tropospheric delay models. This study explores the use of scale height to describe the vertical structure of tropospheric parameters. We assess the feasibility of applying this method globally and publish a comprehensive dataset on the scale heights of key tropospheric parameters.

In the troposphere, six key parameters play a vital role in Earth observation, particularly for applications like GNSS meteorology. These parameters include the tropospheric zenith total delay (ZTD) and its two components, the zenith hydrostatic delay (ZHD) and zenith wet delay (ZWD), water vapor density (WVD), precipitable water vapor (PWV), and the weighted mean temperature (Tm). Table 1 provides detailed information about these six parameters. In addition, the vertical distribution of these six key tropospheric parameters strictly follows a negative exponential pattern, which makes it highly appropriate to use the concept of scale height to describe their vertical structure.

Table 1 Detailed information of the selection parameters of the dataset.

With the development of high precision Earth observation system, the ZTD has been a key parameter of interest to researchers across various disciplines, including Global Navigation Satellite Systems (GNSS), microwave remote sensing, and radar detection et al. Accurate estimation and modeling of ZTD is essential to mitigate the effects of tropospheric delay and improve the accuracy of GNSS positioning and navigation solutions8. Based on this, a large number of tropospheric delay correction models have been established, such as Hopfield model, Saastamoinen model and Black model which need to measure meteorological parameters9,10,11, and some empirical models such as GZTD series, IGGtrop series and GPT series12,13,14. At the same time, there are currently ZTD, ZHD and other parameters based on the global grid network products such as TUW-VMF3 and GFZ-VMF3. Li et al. conducted a global spatiotemporal assessment of two existing tropospheric products, TUW-VMF3 and GFZ-VMF3, from the Vienna University of Technology (TU Wien) and the GeoForschungsZentrum Potsdam (GFZ)15. Nevertheless, most existing models and products are predominantly focused on two-dimensional planes, often overlooking the variations in elevation of parameters such as ZTD. Wang et al. refined the vertical model of ZTD using numerical weather models16. Taking Altitude-Related Correction into account, Zhao et al. proposed a high-precision ZTD model17. By establishing a four-layer ZTD scale height model, Zhang et al. significantly improved the convergence speed of precision single point positioning18. Considering the elevation difference of ZTD, Zhao et al. proposed a high-precision ZTD interpolation method19. In contemporary positioning and modeling methodologies, the vertical distribution characteristics of parameters such as ZTD and ZHD are increasingly recognized as critical. Consequently, it is essential to develop a dataset that delineates the scale height of these key parameters.

Water vapor is one of the most significant and challenging parameters affecting ZTD and high-precision monitoring of water vapor is also a key focus of GNSS meteorology20. As the vertical distribution of water vapor generally follows a negative exponential pattern, the scale height of water vapor serves as a valuable metric for characterizing its vertical structure. In the field of GNSS meteorology, the study of scale heights for key variables such as water vapor density, PWV and Tm is equally critical. Using the water vapor density scale height as a vertical constraint can improve the precision of GNSS water vapor tomography21,22. PWV plays an important role in extreme precipitation weather warning, and high-precision GNSS-PWV inversion is helpful for accurate forecasting, and also plays a role in calibration of large-scale remote sensing PWV data23. Ding et al. developed an empirical model for PWV vertical adjustment24. Tm, a key conversion factor in the GNSS-PWV inversion, plays a pivotal role, and further investigation into its vertical distribution can yield more accurate PWV products25,26. Yang et al. established a refined empirical model for Tm using error compensation techniques27. Relevant research indicates that incorporating the Tm scale height can significantly improve modeling accuracy28, and high-precision Tm scale height data can lead to more accurate GNSS-PWV retrieval results29. Both water vapor density and PWV scale heights offer valuable insights into the vertical distribution of water vapor, which is closely tied to the turbulent structure of the atmosphere. By examining this vertical distribution, researchers can better understand the interaction between water vapor and atmospheric turbulence, thereby gaining deeper insights into the transport and transformation mechanisms of water vapor within the atmospheric boundary layer30.

Scale height plays a critical role in various fields, and numerous studies have confirmed its significance in enhancing model accuracy. As an indicator representing the vertical distribution characteristics of parameters, scale height also offers new perspectives for advancing tropospheric detection techniques and atmospheric modeling. However, parametric scale height products remain limited, and our understanding of scale height is still inadequate. In this study, we developed a six-parameter scale height dataset, ERA5-SH, derived from the profile values of fundamental meteorological parameters provided by ERA5. This product includes six key parameters relevant to Earth observation and employs rigorous data screening techniques to ensure its accuracy. The dataset’s accuracy was validated using data from over 400 active sounding sites worldwide. Additionally, the characteristics of ERA5-SH are analyzed in detail, with an example provided demonstrating how ZTDSH can enhance the accuracy of spatial interpolation.

Methods

Data acquisition

ERA5

The ERA5 dataset, produced by the European Centre for Medium-Range Weather Forecasts (ECMWF), is a comprehensive global meteorological dataset offering reanalysis of atmospheric and surface variables spanning from 1979 to the present. Leveraging sophisticated numerical models, data assimilation techniques, and information from diverse observational sources, ERA5 re-simulates and evaluates historical meteorological conditions31. The scale height product derived from ERA5 reanalysis is resolved on a 360° × 181° longitude and latitude grid with a spatial resolution of 1° × 1° and a temporal resolution of 1 hour. This product utilizes profile data for temperature, geopotential, relative humidity, and specific humidity across 37 pressure levels ranging from 1000 hPa to 1 hPa.

It is essential to recognize that the geopotential provided by ERA5 is referenced to mean sea level (MSL) and adjusted for gravitational variations. However, vertical coordinates typically used in GNSS and related fields represent ellipsoidal heights relative to a reference ellipsoid. Therefore, it becomes necessary to convert geopotential heights to ellipsoidal heights for further processing. The geopotential height hd is first calculated using the formula32:

$${h}_{d}=\frac{Z}{{g}_{n}}$$
(1)

where Z is the geopotential and gn = 9.80665m/s2 is the standard gravity constant. The orthometric height horth is then derived by considering the radius of curvature of the meridian and the gravitational acceleration at the specific latitude. Finally, the ellipsoid height hel is obtained by adding the geoidal undulation, which can be calculated using the Earth Gravitational Model 2008 (EGM2008).

To obtain meteorological parameters at the surface, adjustments to the vertical distribution of ERA5-provided meteorological data are required. Specifically, the vertical correction involves accounting for the elevation difference between the ground level and the isobaric surface heights in the ERA5 dataset. Linear interpolation is used for temperature and relative humidity to estimate values at the ground level. For pressure, assuming the ground layer lies between the k and k-1 th pressure levels, the interpolation is performed according to the following formula33:

$${P}_{{ground}}={P}_{k-1}\cdot {\exp }\left(\frac{{ln}{P}_{k}-{ln}{P}_{k-1}}{{H}_{k}-{H}_{k-1}}\cdot \left({H}_{{ground}}-{H}_{k-1}\right)\right)$$
(2)

where Pground, Pk-1 and Pk denotes the pressure at ground, k-1 and k th pressure levels, respectively. Hground, Hk-1 and Hk are the ellipsoid height at ground, k-1 and k th pressure levels, respectively.

If the surface ellipsoid height is lower than the lowest pressure level, the extrapolation is performed using the following formula:

$${P}_{{ground}}={P}_{0}\cdot {\exp }\left(-{g}_{n}\times \frac{0.028965}{8.3143{T}_{v}}\times \left({H}_{{ground}}-{H}_{0}\right)\right)$$
(3)

where Pground and P0 denotes the pressure at ground and lowest pressure level, respectively. Hground and H0 are the ellipsoid height at ground and lowest pressure level, respectively. Tv is the virtual temperature, which can be calculated as follows:

$${T}_{v}=T\cdot \left(1+0.6077q\right)$$
(4)

where T is temperature and q represent specific humidity.

Radiosonde

The validation of the Scale Height product was validated using daily data obtained from 818 sounding stations worldwide, sourced from the University of Wyoming Weather Data website (http://weather.uwyo.edu/upperair/sounding.html). These datasets encompass meteorological parameters like temperature, pressure, and relative humidity, spanning from the Earth’s surface up to an altitude of around 30 km. It is important to note that the radiosonde data are collected at two distinct times, namely UTC 12:00 and UTC 0:00.

Sounding data provides high vertical resolution and detection accuracy, making it valuable for atmospheric studies. However, some stations face challenges such as limited time resolution, significant data gaps, and insufficient detection heights, which can hinder its effective use. To overcome these challenges, rigorous quality control measures have been applied to ensure the reliability and consistency of the sounding data and finally we selected 587 stations out of 818 for validation. These measures include the following principles:

  1. 1.

    The altitude of the final valid record in the radiosonde data must be no less than 10 km to ensure sufficient vertical coverage.

  2. 2.

    The number of valid observation levels in the radiosonde data should be at least 20 to provide adequate vertical detail.

  3. 3.

    The vertical spacing between two consecutive altitude layers must not exceed 2 km to maintain a smooth vertical profile.

  4. 4.

    The pressure differential between any two successive levels should not exceed 200 hPa to avoid abrupt changes and ensure data consistency.

Production process of ERA5-SH

The six scale height data derived from ERA5 is produced and validated according to the flowchart shown in Fig. 1. It can be seen that the geopotential data from ERA5 was converted into ellipsoidal height for production purposes, yielding the ellipsoidal height for each pressure layer. Then, meteorological parameters (temperature, specific humidity et al.) at each pressure layer were interpolated or extrapolated based on the ellipsoidal height of the surface, resulting in the derivation of meteorological parameter profiles starting from the surface. Subsequently, the obtained meteorological parameters were utilized for numerical computations and the profiles corresponding to the six research parameters (ZTD, ZWD, ZHD, WVD, PWV, Tm) were extracted. Finally, the six scale height data was determined using the least square method for parameter fitting.

Fig. 1
figure 1

Flowchart showing the production and validation of the ERA5-SH dataset.

To validate the accuracy of the produced ERA5-SH, the meteorological profiles from 587 sounding stations worldwide in 2022 were selected as measured data to conduct the assessment. Initially, stringent quality control measures were applied to the radiosonde data, eliminating stations with subpar data quality. Utilizing the meteorological parameter profiles provided by the remaining stations, akin to the production process of ERA5-SH data, the scale height of parameters at the sounding stations was calculated. As the ERA5-SH data is grid-based, bilinear interpolation was performed to obtain the scale height data at the sonde station. Subsequently, the accuracy of the data was assessed by comparing the interpolated ERA5-SH data with the calculated data at the sounding site.

Numerical calculation for the layered tropospheric parameters

In this section, we will outline the calculation methodology for six parameters and acquire the profile data of these parameters utilizing the ERA5 and sounding data.

The ZTD consists of two parts: ZHD and ZWD34:

$${ZTD}={ZHD}+{ZWD}$$
(5)

The ZWD can be expressed as an integral of the wet refractive index (Nw) in the vertical direction:

$${ZWD}={\int }_{{h}_{0}}^{\infty }{N}_{w}{dh}={\int }_{{h}_{0}}^{\infty }(\frac{{k}_{{2}^{e}}^{{\prime} }}{T}+\frac{{k}_{3}e}{{T}^{2}}){dh}$$
(6)

where \({k}_{2}^{{\prime} }\approx 71.2952\) and \({k}_{3}\approx 64.79\) are calculated as parameters of Nw. T is temperature, measured in Kelvin and e represents water vapor pressure, which can be calculated by the modified Magnus formula32:

$$\left\{\begin{array}{c}e=6.1078\times 1{0}^{\frac{7.5{T}_{c}}{237.3+{T}_{c}}}\times \frac{{rh}}{100},{T}_{c} > 0\\ e=6.1078\times 1{0}^{\frac{9.5{T}_{c}}{265.5+{T}_{c}}}\times \frac{{rh}}{100},{T}_{c}\le 0\end{array}\right.$$
(7)

where rh represents relative humidity and Tc is temperature, measured in degrees Celsius.

Similarly, the ZHD can be expressed as an integral of the hydrostatic refractive index in the vertical direction:

$${ZHD}={\int }_{{h}_{0}}^{\infty }{N}_{h}{dh}={\int }_{{h}_{0}}^{\infty }\frac{{k}_{1}(p-e)}{T}{dh}$$
(8)

where \({k}_{1}\approx 77.689\) is calculated as parameters of Nh and p denotes pressure.

The WVD can be calculated according to the following formula:

$${WVD}=\frac{e}{{R}_{v}T}$$
(9)

where \({R}_{v}=461.5J/\left({kg}* K\right)\) denotes a gas constant.

The PWV can be calculated accurately by the following formula based on the stratified meteorological data:

$${PWV}=\frac{1}{g}{\int }_{{h}_{0}}^{\infty }\frac{q}{{\rho }_{w}}{dp}=\frac{1}{{\rho }_{w}\,g}{\sum }_{i=1}^{n-1}\left(\frac{{q}_{i}+{q}_{i+1}}{2}\varDelta {p}_{i}\right)$$
(10)

where \({\rho }_{w}\approx 0.999{kg}/{m}^{3}\) means the liquid water density, \(g\approx 9.80655m/{s}^{2}\) is the gravitational acceleration, i denotes the i th pressure level and n is the total number of layers. qi and Δpi represent the specific humidity and the pressure difference of the i th pressure level, respectively. The specific humidity can be calculated by the following formula:

$$q=\frac{0.622e}{p-0.378e}$$
(11)

The Tm can be calculated accurately by the following formula based on the stratified meteorological data35:

$${T}_{m}=\frac{{\int }_{{h}_{0}}^{\infty }\frac{e}{T}{dh}}{{\int }_{{h}_{0}}^{\infty }\frac{e}{{T}^{2}}{dh}}=\frac{{\sum }_{i=1}^{n-1}(\frac{{e}_{i}}{{T}_{i}}+\frac{{e}_{i+1}}{{T}_{i+1}})\frac{\varDelta {h}_{i}}{2}}{{\sum }_{i=1}^{n-1}(\frac{{e}_{i}}{{{T}_{i}}^{2}}+\frac{{e}_{i+1}}{{T}_{i+1}^{2}})\frac{\varDelta {h}_{i}}{2}}$$
(12)

where Δhi denotes the thickness of the i th pressure level.

Scale height fitting

The vertical distribution of the parameters follows negative exponential function as follows:

$${pv}=p{v}_{s}\,{\exp }\left(-\frac{h-{h}_{s}}{{SH}}\right)$$
(13)

where the SH represents the scale height, \(p{v}_{s}\) is the surface parameter value, \({pv}\) is the parameter value at a height of h and hs is the ground ellipsoidal height, respectively.

In the preceding numerical calculation, the profile data for each parameter has been acquired, enabling the determination of the scale height (SH) through exponential least square fitting utilizing the Levenberg-Marquardt method.

Periodic fitting of time series

For the scale height time series with annual and semi-annual cycles, they are modeled in the form of trigonometric functions, and the form is as follows36:

$${SH}={a}_{0}+{a}_{1}{\cos }\left(2\pi \frac{{doy}}{365.25}\right)+{a}_{2}{\sin }\left(2\pi \frac{{doy}}{365.25}\right)+{a}_{3}{\cos }\left(4\pi \frac{{doy}}{365.25}\right)+{a}_{4}{\sin }\left(4\pi \frac{{doy}}{365.25}\right)$$
(14)

where \({a}_{0},{a}_{1},{a}_{2},{a}_{3},{a}_{4}\) are the model coefficients.

Accuracy evaluation indicators

In this study, we selected three indicators, including RMSE (Root Mean Square Error), RRMSE (Relative Root Mean Square Error), Bais and R2 (coefficient of determination), to evaluate the accuracy of the ERA5-SH. These can be calculated using the following formulas:

$${RMSE}=\frac{1}{n}{\sum }_{i=1}^{n}\sqrt{{(S{H}_{i}-S{H}_{i}^{R})}^{2}}$$
(15)
$${RRMSE}=\frac{\frac{1}{n}{\sum }_{i=1}^{n}\sqrt{{(S{H}_{i}-S{H}_{i}^{R})}^{2}}}{\frac{1}{n}{\sum }_{i=1}^{n}\left|S{H}_{i}^{R}\right|}$$
(16)
$${Bais}=\frac{1}{n}{\sum }_{i=1}^{n}(S{H}_{i}-S{H}_{i}^{R})$$
(17)
$${R}^{2}=\frac{{SSR}}{{SST}}=1-\frac{{SSE}}{{SST}}=1-\frac{{\sum }_{i=1}^{n}(S{H}_{i}-{mean}{({SH}))}^{2}}{{\sum }_{i=1}^{n}{(S{H}_{i}-S{H}_{i}^{R})}^{2}}$$
(18)

where SH and SHR respectively represent the ERA5-SH and the reference. i and n denotes the i th value and the total number of samples.

Data Records

The ERA5-SH dataset is divided into two parts, which can be accessed via: https://doi.org/10.5281/zenodo.14676025 (ZTDSH, ZHDSH, ZWDSH)37 and https://doi.org/10.5281/zenodo.14679394 (PWVSH, WVSH, and TmSH)38. Data for each parameter is stored annually in a.mat file, which contains a structure named after the corresponding parameter. The files are compressed using linear quantization, with the structure including two fields, “Scale” and “Offset,” for data decompression, and an int16-type field named “Data” for storing the data. The “Data” is a three-dimensional matrix and the dimensions represent longitude (starting from 0°), latitude (from 90°N to 90°S), and time (starting from January 1st at 0 o 'clock, hourly), respectively. Additionally, there is a field named “Max_error,” which represents the maximum compression error. Each.mat file is approximately 1 GB in size.

Technical Validation

Accuracy assessment

The determination coefficient R2 can well reflect the goodness of the exponential fitting used to obtain these scale height. In Fig. 2, it illustrates the mean R2 of the six scale height spanning from 2013 to 2022, and it can be seen that each scale height can effectively reflect the vertical distribution of the corresponding trospheric parameters in most regions with R2 is more than 0.95 overall. The R2 for PWVSH and ZWDSH demonstrate analogous global distribution patterns, particularly in the transitional zones between continents and oceans. In fact, the magnitude of ZWD is intricately linked to PWV, which is extensively employed in GNSS meteorology. For variables related to water vapor, their coefficients of determination generally exhibit lower values within continental-oceanic transition regions. This phenomenon can be attributed to the difference in temperature and humidity between the land and the sea. Due to the differences in specific heat capacity between land and water, the land heats up more quickly than the sea during the day. This rapid heating causes warm, humid air to rise, leading to the formation of convection, which subsequently influences the vertical distribution structure of water vapor. In addition, the determination coefficients of ZHDSH show significant differences at latitude, with lower values at lower latitudes. The determination coefficients of ZHDSH, ZTDSH and WVSH are lower in Antarctic region. This is closely related to the high altitude and cold climatic conditions of the region. At the same time, the spatial distribution of determination coefficients of WVSH and TmSH in the Antarctic region shows obvious instability. It should be noted that unlike other parameters, which are defined as integral values from the specified height to the top of the tropsphere, water vapor density is not an integral value, so its vertical distribution is more susceptible to atmospheric activity, resulting in a relatively low coefficient of determination. However, in terms of its mean value, the coefficient of determination still exceeds 0.85, indicating that WVSH can still reflect the vertical distribution structure of water vapor density to a certain extent. Meanwhile, in comparision with the parameters related to water vapor, the scale height of ZHD is more stable, and the coefficient of determination is more than 0.998.

Fig. 2
figure 2

The mean value of the R2 of six parameters exponential fitting from 2013 to 2022.

Figure 3 presents the R2 box plots for the six parameters from 2013 to 2022, alongside histograms of all data over the ten-year period. Overall, the six parameters have consistently demonstrated a high level of goodness of fit over the ten years. Even for WVSH, which has the lowest fit among the parameters, the minimum R2 exceeds 0.8, while both the mean and median values are above 0.95, indicating high stability and reliability. The box plot reveals that ZHDSH does not exhibit any outliers over the past ten years, as ZHD is a variable independent of water vapor, and its vertical structure remains relatively stable across both time and space. In contrast, the other parameters related to water vapor display more outliers in their coefficients of determination over the ten-year period. This can be attributed to the rapid spatial and temporal variability of water vapor in the troposphere, leading to instability in the water vapor structure over time and space. As shown in the histogram, the distribution of R2 for all variables, except for ZHDSH, follows a normal distribution. Additionally, the histograms of variables directly related to water vapor (PWV, WVD, and ZWD) exhibit higher similarity, with the distributions of determination coefficients for PWVSH and ZWDSH being particularly similar. This finding corroborates the conclusions presented in Fig. 2. The mean R2 of the six parameters exceed 0.95, and their standard deviations are less than 0.02, indicating excellent goodness of fit and stability.

Fig. 3
figure 3

Box plot of the R2 for the six parameters from 2013 to 2022 and histogram of all the coefficients of determination over the decade.

Following the quality control principles mentioned in Section 3, 409 high quality radiosonde stations were selected to conduct the validation of the ERA5-SH dataset. The scale height of the six tropospheric parameters computed at the radiosonde station are considered as the reference value, and bilinear interpolation is applied to compute the ERA5-SH values at the corresponding station by interpolating from the four nearest grid points. The accuracy of this estimation is assessed through calculation of RMSE, RRMSE, Bias, and R2 against the reference value. Figure 4 shows the global distribution of the verified RMSE, and it can be seen that the ERA5-SH dataset shows high precision on a global scale. Among the analyzed parameters, PWVSH and ZWDSH show stable and high accuracy globally, with RMSE values below 0.2 km at most stations The accuracy of WVSH varies with latitude. High-latitude stations tend to have larger RMSE values, exceeding 0.6 km at certain locations. In contrast, low-latitude stations demonstrate lower RMSE values, generally within 0.2 km. In addition, TmSH exhibits high accuracy on a global scale, with most sites maintaining RMSE values within 7 km, whereas TmSH values typically range between 30 and 50 km. It should be noted that ZHDSH and ZTDSH show significantly lower accuracy in Asia, primarily due to the insufficient sounding altitude of stations in this region. For reliable ZHD and ZTD estimates, a sounding altitude of over 20 km is typically required, whereas other water vapor-related variables (PWV, ZWD, WVD, Tm) can be accurately determined with a sounding altitude of approximately 15 km. Outside of this region, ZTDSH and ZHDSH also demonstrate high accuracy, with RMSE values below 0.5 km at most stations, particularly for values around 7 km.

Fig. 4
figure 4

The RMSE distribution of six parameters was verified by sounding data.

The maximum, minimum as well as the mean value of the RMSE, RRMSE, bias and correlation coefficient (R) for the six scale heights are counted and listed in Table 2. The maximum and minimum RMSE values for PWVSH are 0.719 km and 0.122 km, respectively, indicating a significant disparity that suggests some stations have poor verification results. As shown in Fig. 4, these stations are primarily located in areas with extreme climates, such as high altitudes. This trend is not exclusive to PWVSH; similar patterns are observed for other variables as well. Despite some test stations exhibiting poor verification results, the mean values for the four indices of PWVSH are 0.243 km, 13.194%, 0.185 km, and 0.919, respectively, demonstrating a generally high overall verification accuracy. For both PWVSH and ZWDSH, the average correlation coefficient exceeds 0.9, indicating the highest verification accuracy among the parameters. Conversely, the minimum correlation coefficients for ZTDSH and ZHDSH are less than 0, with mean values below 0.6, attributed to the previously mentioned insufficient sounding altitudes. Nevertheless, the maximum correlation coefficient for these parameters exceeds 0.85, suggesting that high accuracy can still be achieved at stations unaffected by sounding height limitations. For RRMSE, the mean values for WVSH and TmSH are below 10%, with maximum values under 20%, indicating that most stations exhibit high accuracy. Additionally, the minimum correlation coefficients for WVSH and TmSH are 0.542 and 0.742, respectively, reflecting their high verification accuracy. In general, a comprehensive analysis of the four precision indices across the six parameters demonstrates consistently high verification accuracy.

Table 2 Statistical table of maximum, minimum and mean values of RMSE, RRMSE, Bias and R of six parameters.

Furthermore, six radiosonde stations from different regions were selected to provide more comparative details: Station 3808 for ZTDSH, Station 70200 for ZHDSH, Station 45004 for ZWDSH, Station 83378 for PWVSH, Station 96441 for WVSH, and Station 52818 for TmSH. Figure 5 displays the geographical locations of these stations, along with the time series of the corresponding parameters and scatter density plots. It can be seen that the scale heights derived from radiosonde data show strong agreement with the ERA5-SH dataset, exhibiting high similarity in both overall trends and detailed variations. For PWVSH, ZWDSH, and TmSH, the verification results are particularly robust. A comparison of the time series reveals that the ERA5-SH results align closely with those from the radiosonde, both in terms of values and trends. The scatter density plots also demonstrate a strong correlation, with the R2 for the linear fits exceeding 0.9. For WVSH, since WVD is not an integral value, WVSH exhibits relatively fewer stable results during verification. However, despite differences in some details, the overall consistency remains high, with a linear fit R2 greater than 0.86. In the case of ZHDSH and ZTDSH, due to the previously mentioned insufficient sounding height, there is a systematic bias between the ERA5-SH and radiosonde results. Nevertheless, the trends between the two remain highly consistent, and the scatter plots still display a strong linear correlation.

Fig. 5
figure 5

Time series comparison of ERA5-SH and radiosonde results, along with scatter density plots for the six parameters at the selected stations.

Characteristics of ERA5-SH

Based on the above description, the ERA5-SH dataset demonstrates high precision and stability, as shown by the great exponential fitting goodness and validation against external radiosonde data. This chapter provides an in-depth analysis of the product’s characteristics, focusing on both spatial and temporal distribution patterns of the six parameters. The analysis examines these patterns from both a mean perspective and at specific moments or locations, offering a comprehensive understanding of the scale height features. Finally, an example is presented that highlights the use of the ZTDSH product to enhance the precision of spatial interpolation, especially in areas with significant elevation changes, demonstrating the practical application of the dataset in improving Earth observation accuracy.

Figure 6 illustrates the spatial distribution of ERA5-SH at 00:00 UTC on January 1, 2013. It is evident that, except for ZHD, a parameter unrelated to atmospheric water vapor, the scale height of the remaining parameters exhibits a vortex structure influenced by atmospheric dynamics. Moreover, the scale height at the periphery of the vortex tends to be relatively elevated. Upon further examination of the long-term spatial distribution map, a pronounced variability in the scale height of water vapor-related parameters is observed, displaying a distinct large-scale periodicity. In contrast, the scale height of ZHD remains relatively stable and undergoes minimal short-term fluctuations. Since ZTD is defined as the sum of ZHD and ZWD, its scale is strongly influenced by both ZHDSH and ZWDSH. As a result, ZTD shows significant spatial variation with latitude (driven by ZHDSH) and exhibits a vortex structure similar to that of ZWDSH.

Fig. 6
figure 6

spatial distribution map of scale height at UTC 00:00 January 1, 2013.

The temporal variation characteristics of scale height are further explored. To account for latitudinal differences, the globe was divided into six latitude regions: R1 (60°N-90°N), R2 (30°N-60°N), R3 (0°-30°N), R4 (0°-30°S), R5 (30°S-60°S), and R6 (60°S-90°S). Figure 7 illustrates the mean scale heights of the six parameters for each month in 2022 across these latitude regions. It is evident that scale heights exhibit significant differences between the Northern and Southern Hemispheres, with opposing trends over time. For instance, in the R3 region, scale heights for PWVSH and ZWDSH gradually increased from February to August, reaching peak values of 2.20 km and 2.29 km, respectively, in August. Conversely, in the R4 region, located on the opposite side of the equator, an opposing trend was observed, with minimum values of 1.59 km and 1.65 km occurring in August. Notably, extreme values of scale heights were often recorded in both hemispheres during July and August, indicating seasonal variability. ZTDSH and ZHDSH also displayed significant latitudinal differences. Due to the substantial influence of ZHD on ZTD, both parameters exhibited similar spatiotemporal distribution characteristics, with scale height in the R3 and R4 regions significantly higher than those in other high-latitude areas. Additionally, extreme climate conditions in the polar regions often result in extreme scale height values. Notable examples include WVSH in January and February in the R1 region, and TmSH in February and March in the R6 region.

Fig. 7
figure 7

Temporal distribution of scale heights for six variables across six latitude regions.

To delve deeper into the mean value characteristics of scale height, Fig. 8 presents the mean value of the six parameter scale heights spanning the period from 2013 to 2022. The mean distribution of scale height exhibits pronounced geographical disparities. Specifically, the scale height of the parameter associated with water vapor displays lower values over the oceanic regions flanking both sides of the equator, while higher values are observed in proximity to the equator. In contrast, ZHD, which is decoupled from water vapor, demonstrates a relatively consistent spatial distribution, with elevated values in low-latitude areas and diminished values in high-latitude regions. Additionally, the scale height exhibits higher values near the equator, a feature that is particularly pronounced in parameters related to water vapor (e.g., PWVSH, ZWDSH). The lower scale height values observed at land-sea boundaries are primarily attributed to the temperature differences between land and sea. Furthermore, variations in scale height with respect to elevation are evident. In regions of higher elevation, such as the Qinghai-Tibet Plateau and Antarctica, scale height tends to be smaller, with an overall decrease in scale height corresponding to increased elevation.

Fig. 8
figure 8

Mean value of the six parameters scale height from 2013 to 2022.

Figure 9 shows the time series (gray spots) and box plot of the ERA5-SH at a certain location (120°E, 30°N). Meanwhile, statistical analysis is carried out on the data, and statistical characterization such as the mean and median of the series is given. Notably, parameters directly linked to water vapor (PWV, WVD, ZWD) generally fall within the range of 0.8 to 4 km, with the mean value around 2.0 km. Influenced by ZHD, the primary component of ZTD, the scale height of ZTD is very similar to that of ZHD, mainly distributed between 7.0 km and 8.0 km. Tm is significantly impacted by temperature and exhibits a linear decrease with height in the troposphere, resulting in a relatively high scale height mainly distributed between 38.0 km and 80.0 km.

Fig. 9
figure 9

(120°E, 30°N), time series and statistical characterization of the six parameter scale height from 2013 to 2022.

Note that the time series of the parameter scale height exhibits certain annual and semi-annual periodic characteristics, formula (22) is utilized to fit the time series, and the results are depicted by the red line in Fig. 8. It is evident that ZHD displays strong annual and semi-annual cycle characteristics as it is not influenced by changes in water vapor. On the other hand, the remaining variables, which are associated with water vapor, exhibit significant fluctuations but still demonstrate certain periodic traits overall. It is important to highlight that the periodic characteristics vary across different locations. For instance, in the case of ZTD, at times and locations with low water vapor content, the scale height exhibits reduced fluctuations and displays pronounced periodic characteristics.

Here, we present an example of height correction using ZTDSH data, which enhances the accuracy of interpolation from grid to site, particularly in regions with significant elevation changes. In the process of interpolating ZTD from grid points to target positions, adjustments are necessary due to the significant elevation-dependent variations of ZTD. By leveraging ZTDSH data, the ZTD value at the target location can be accurately determined by calculating the elevation difference between the target location and the grid node. The ZTD data referenced in this study are derived from a screened ZTD dataset compiled by the Karlsruhe Institute of Technology (KIT) team in 2020, which includes 91,088,258 screened ZTD values from 12,552 GNSS stations39. We select these GNSS test sites as target locations and conduct bilinear interpolation based on the data from the four nearest grid nodes surrounding the sites. Initially, the elevation discrepancy between the four grid points and the target position is individually calculated. Subsequently, the ZTD value is adjusted according to the ZTDSH at the four grid points, aligning the ZTD to the target elevation. Finally, utilizing the four corrected ZTD data, bilinear interpolation is performed on the same elevation plane to derive the ZTD value at the target position.

Figure 10 presents a comparison of interpolation results for 12,552 GNSS stations worldwide. The left graph displays the interpolation results without accounting for elevation changes, while the right graph illustrates the results incorporating ZTDSH data. After integrating ZTDSH data, the RMSE of the interpolation decreased significantly from 50.27 mm to 18.40 mm, particularly in regions with substantial elevation changes. This underscores the necessity of incorporating elevation correction for ZTD in areas with significant elevation gradients, especially at land-sea interfaces, where the RMSE can exceed 50 cm. With ZTDSH data correction, the RMSE can be reduced to less than 5 cm, resulting in an increase in interpolation accuracy of over 90%.

Fig. 10
figure 10

Comparison of RMSE results under two interpolation strategies.

Figure 11 offers a detailed breakdown of RMSE and Bias at each site before and after elevation correction. Without elevation correction, the average RMSE of ZTD obtained through interpolation is 5.02 cm. Following elevation correction with ZTDSH, the average RMSE is reduced to 1.84 cm, indicating a significant improvement in accuracy. The stations are categorized into two groups: high altitude (>400 m) and low altitude (<400 m). Notably, the accuracy improvement is more pronounced at high altitudes.

Fig. 11
figure 11

RMSE and Bias of stations with different altitude under two interpolation strategies.