Background & Summary

Groundwater is the foundation of the hydrological cycle and the largest source of unfrozen freshwater on Earth, and it plays an essential role in sustaining ecosystem balance and supporting human survival1. Globally, groundwater serves as a crucial water source for agricultural, industrial, and domestic uses. It is particularly indispensable in arid and semi-arid regions. The increasing demand for groundwater is driven by several factors, including the growing global population, expansion of irrigated agricultural areas, and rapid economic development2,3. This increased demand has resulted in a range of environmental and social challenges, including ground subsidence and deteriorating air quality. Groundwater level fluctuations serve as key indicators of groundwater movement and overall conditions. Long-term monitoring of these levels enables the study of the formation, distribution, and movement patterns of groundwater within a given region4,5, thereby providing a scientific basis for the evaluation and rational utilization of groundwater resources, as well as for industrial and agricultural development and residential planning. Therefore, it is imperative to monitor and analyze changes in groundwater levels to manage limited groundwater resources in an equitable, sustainable, and economically prudent manner.

Principal data on groundwater levels worldwide are derived from two sources: monitoring wells and Gravity Recovery and Climate Experiment (GRACE) satellites. In particular, GRACE satellites offer a novel approach to groundwater monitoring6,7. The measurement of changes in the Earth’s gravity field enables the observation of changes in groundwater levels on a global scale. Nevertheless, the low spatial resolution of GRACE data, typically a few kilometers, substantially constrains its direct applicability to local or small-scale regional water resource management8. In comparison, groundwater level data obtained from observation wells provide information with a high degree of accuracy. The International Groundwater Resources Assessment Centre, through its Global Groundwater Level Information System, facilitates the integration of observation well data on a global scale, thereby providing a platform for the sharing of data and the exchange of information9,10. The database contains time-series data on groundwater levels, collected from monitoring stations worldwide. The data are sourced from various providers and cover a range of time scales. Data are collected and processed using various methods. However, the distribution and periodicity of the updates to these datasets are inconsistent, particularly in regions where data scarcity is a concern. Furthermore, the density and quality of monitoring network maintenance may not align with the requirements of research and management activities. In particular, the China Ecosystem Research Network groundwater level dataset (2005–2014) is a widely used and valuable resource for groundwater level observations in China. The frequency of observations ranges from 1 to 10 days, providing a comprehensive and detailed record of groundwater levels over the specified period, encompassing a range of typical ecosystems in China, albeit with relatively low data density11. The remaining datasets are primarily concentrated on smaller watersheds, such as the Black River and Huludou Basins. In addition, these datasets encompass a relatively short time span with slower update frequencies12,13. Overall, there are considerable discrepancies between the available global groundwater-level data sets, with Chinese regional data exhibiting a notable lack of accuracy and temporal continuity. Thus, the available data are inadequate for related research.

In this study, we aimed to address the lack of data on groundwater levels in China and the lack of temporal continuity in regional observations. To this end, we utilized the China Geological Environmental Monitoring Groundwater Level Yearbook (2005–2022), which is a comprehensive compilation produced by the China Academy of Geological Environmental Monitoring (Wu, 2007). We constructed a comprehensive database of monitoring data from underground well observation sites in China through the long-term collation of time-series data. This resulted in a database comprising 11,911 sites, 4,275,506 observation records, and an average of five observations per site per month.

Methods

Presentation of the groundwater monitoring data

This dataset originates from the “China Geological Environment Monitoring Groundwater Level Yearbook (2005–2022)” compiled by the China Geological Environmental Monitoring Institute. The yearbook records detailed continuous groundwater-level monitoring data across the country from 2005 to 2022, covering 11,911 groundwater observation wells and 4,275,506 observation records. The data include information such as the ID of each monitoring site, groundwater depth or groundwater level, groundwater type (shallow water or confined water), and site location. The data are stored with a daily resolution, with groundwater depth or groundwater level measured in meters (m). From 2005 to 2017, the 1956 Yellow Sea Vertical Datum was used, and from 2018 to 2022, the 1985 National Vertical Datum was adopted, with a difference of 0.029 m between the two. All data were converted and unified to the 1985 National Vertical Datum for consistency.

The monitoring sites are mainly concentrated in the North China Plain, Songhua River Basin, and other areas, with a focus on regions with high water resource utilization and fragile ecological environments. Since 2018, China has established a complete groundwater monitoring network, with the number of groundwater monitoring sites increasing from 1,200 to over 10,000, significantly enhancing the number and distribution density of monitoring points, especially in remote northwestern and southwestern regions (Fig. 1a and b). From 2005 to 2017, the annual effective monitoring data exceeded 800 records, while from 2018 to 2022, the annual effective monitoring data exceeded 8,000 records. This change significantly improved the spatial coverage and data accuracy of groundwater-level monitoring, providing more abundant real-time data support for the scientific management and protection of groundwater resources (Fig. 1c).

Fig. 1
figure 1

Spatial distribution of groundwater wells. a Spatial distribution of groundwater monitoring stations from 2005 to 2017; b Spatial distribution of groundwater monitoring stations from 2018 to 2022; c Number of effective stations per year from 2005 to 2017; d Number of effective stations per year from 2018 to 2022.

Time setting

In the actual monitoring process, due to limitations in equipment or other factors, some monitoring stations were unable to collect daily data, resulting in incomplete daily records for certain stations. However, most stations maintained a monitoring frequency of at least five times per month. To ensure data continuity and comparability, we calculated the monthly average of the monitoring data for each station, obtaining monthly groundwater level data for each station. This approach effectively resolved the gaps caused by missing data and ensured the stability and reliability of the data, which is beneficial for subsequent analysis and comparison.

$${\bar{x}}_{i}=\frac{1}{{n}_{i}}\mathop{\sum }\limits_{j=1}^{{n}_{i}}{x}_{{ij}}$$
(1)

where \({\bar{x}}_{i}\) is the monthly average value of the \({ith}\) month, \({{\rm{x}}}_{{\rm{ij}}}\) represents the groundwater level data for the \({\rm{j}}{th}\) day of the \({\rm{i}}{th}\) month, and \({{\rm{n}}}_{{\rm{i}}}\) represents the number of days in the \({\rm{i}}{th}\) month. Processing monthly data in this manner effectively enhances data accessibility, thereby facilitating the identification of long-term groundwater trends.

Inverse distance weighting method

The conventional method for monitoring groundwater levels involves recording data at specific stations. In instances where the groundwater levels are not directly monitored, the inverse distance weighting (IDW) method can be employed to effectively extrapolate these values14. The IDW method is a common geospatial interpolation technique that is implemented by calculating the distance between the point to be measured and the actual point, and weighting it according to the inverse of the distance to infer the data in an unknown area. The following formulas are used in this process15,16:

$$\begin{array}{cc}\hat{Z}\left({s}_{0}\right) & =\mathop{\sum }\limits_{i=1}^{N}{\lambda }_{i}Z\left({s}_{i}\right)\end{array}$$
(2)
$${\lambda }_{i}=\frac{{d}_{i0}^{-p}}{\mathop{\sum }\limits_{i=1}^{N}{d}_{i0}^{-p}}$$
(3)
$$\mathop{\sum }\limits_{i=1}^{N}{\lambda }_{i}=1$$
(4)

The interpolation result at s0, denoted as \(\hat{{\rm{Z}}}\left({{\rm{s}}}_{0}\right)\), was obtained by employing a weighted interpolation scheme, where the weight of each sampling point was calculated based on the distance between the interpolation point and each known sampling point di0. The monitoring value \({\rm{Z}}\left({{\rm{s}}}_{{\rm{i}}}\right)\) was determined at the interpolation point N, which represents the number of surrounding sampling points involved in the interpolation. In this study, we interpolated groundwater sites based on Python. Considering the distribution characteristics of site data, the power was set to 3 and the smoothing factor was set to 0.6 to ensure that the interpolation results could retain sufficient local details and avoid the loss of important information due to excessive smoothing. In order to further improve the accuracy of the interpolation results, the 10-nearest-neighbor sampling points were selected for interpolation, which fully utilizes the information of the surrounding sampling points and improves the accuracy of interpolation. Through the optimized setting of these parameters, this study can effectively interpolate between the more dispersed groundwater monitoring stations in order to obtain a more accurate groundwater level estimation.

Data Records

The China monthly groundwater level data set at 1 km spatial resolution from 2005 to 2022 is provided with open access through the National Tibetan Plateau Scientific Data Center, which can be downloaded by users through the following link (https://doi.org/10.11888/Terre.tpdc.301342)17. Based on the data of China Geological Environment Monitoring groundwater Level Yearbook (2005–2022), the data set generated the monthly groundwater level data with a spatial resolution of 1 km through data cleaning, monthly mean synthesis and IDW interpolation. The data adopts UTM projection mode, and provides detailed monthly groundwater level spatial distribution information, which is suitable for groundwater resource management, ecological environment research and other fields.

Technical Validation

Display of interpolation results

The monitoring data of the groundwater stations at different time frequencies were restored and interpolated to obtain nationwide monthly groundwater level grid data (Fig. 2). Owing to substantial changes in the number and spatial distribution of monitoring stations in 2018, we selected data from January 2005 and January 2020 for comparison. The results show that the use of the IDW interpolation method can clearly reveal the spatial distribution of groundwater levels in China, and the interpolation effect was considerably improved with the increase in the number of stations (Fig. 2a1,a2). The smooth transition from the low-water-level area represented by red to the high-water-level area represented by blue naturally reflects the changes and fluctuations in groundwater levels in different geographical regions. Among them, the low-water-level areas were mainly distributed in the eastern coastal areas, whereas the high-water-level areas were concentrated in the southwestern region.

Fig. 2
figure 2

Interpolation results. a1 and b1 January 2005 and January 2020, respectively; a2 and a3 January 2005 and 2005, respectively; b2 and b3 January 2020 and January 2020, respectively.

The rationality of the data distribution was verified by comparing the statistical histograms of the original monitoring station data with the interpolation results. The frequency distribution of the interpolation results for 2005 and 2020 (Fig. 2a2,b2) showed a distribution trend consistent with that of the original data (Fig. 2a3,b3), indicating that the interpolation results could effectively capture the actual changes in groundwater levels. In the original data, there were 898 and 9499 effective monitoring points for 2005 and 2020, respectively. After interpolation processing, the data volume increased markedly to 27,026,352, effectively filling the spatial gap between the monitoring points and enabling the results to be more comprehensive. Although the data volume increased substantially after interpolation, the mean and standard deviation of the interpolated data showed slight differences compared with the original data: the mean for 2005 original data was 1056.86 m with a standard deviation of 1006.65 m; for 2020 original data, the mean was 1117.76 m with a standard deviation of 1056.95 m. Additionally, the K-S test between the interpolated data and the original data showed a K-S value of 0.354, indicating a statistically significant difference in their distributions. This difference may be attributed to the smoothing effect introduced by the interpolation method and the assumptions of the model during the estimation process. Nonetheless, the interpolated data provide valuable spatial information for further groundwater level analysis, especially in areas with sparse monitoring points, and have significant practical value.

We evaluated the spatial correlation of the interpolated groundwater levels using Moran’s index (Fig. 3) to verify the continuity of the interpolated spatial data. In Fig. 3, “High-High” represents the spatial aggregation pattern in areas with high groundwater levels, indicating that these areas are high and influence each other or have some spatial continuity. This is usually done to show whether there is agglomeration in the spatial distribution of the data, namely whether the high value regions are associated with other high value regions. The results indicated that, in both 2005 and 2020, the groundwater level data showed a strong positive spatial autocorrelation. The clustering of high-value (high-water-level areas) and low-value (low-water-level areas) areas indicated that the spatial distribution of the groundwater level was consistent and regular. This can further effectively expand the data to a larger range through interpolation methods, making the overall statistical features more comprehensive in reflecting the spatial distribution characteristics of groundwater levels within China and providing a solid data foundation for subsequent water resource management and analysis.

Fig. 3
figure 3

Results of spatial correlation evaluation.

To further quantify the differences in the time-series changes before and after interpolation, a comparison was performed between the time-series change curves of the original monitoring station data and the interpolated raster data, and the correlation was calculated (Fig. 4). The results showed that the site and raster data exhibited similar trend characteristics, with a correlation coefficient (R2) of 0.82. Spatially, the strength of the correlation was highly associated with the site density. This result indicates that the interpolated data could effectively capture the temporal trend in groundwater level changes, verifying the reliability of the interpolation method for time-series data processing.

Fig. 4
figure 4

Groundwater time series results.

Overall, the interpolation results not only accurately reproduced the actual spatial distribution characteristics of groundwater levels, but also demonstrated high interpolation accuracy and spatial resolution, providing reliable data support for further research on the dynamic changes in groundwater resources in China.

Evaluation of interpolation accuracy

This study was based on administrative divisions and considered the density of monitoring stations. To verify the accuracy of the groundwater level restoration, the study area was divided into three zones: Zone I, Zone II, and Zone III. Zone I covers most of eastern China, with a relatively dense distribution of monitoring stations, whereas Zones II and III are located in the southwestern and northwestern regions, respectively, with a relatively sparse number of stations. By comparing the interpolation results in different zones with the accuracy verification of the original data, the influence of the dense data distribution on the interpolation results can be effectively evaluated. The measured water level was fitted to the restored water level to verify the accuracy of the interpolation results, as shown in Fig. 5. The regression coefficients and R² values of the three regions were close to 1, indicating a significant linear relationship between the reconstructed and measured water levels in these three regions. This finding indicates that the model can accurately predict groundwater levels, reflecting its adaptability to the monitoring data. Although the study area was vast and the fluctuation range of groundwater levels was large (–100–4500 m), the root-mean-square error values of the three regions were within an acceptable range. This indicates that the groundwater data processed through IDW had good reliability under different geographical and environmental conditions. Therefore, the interpolated groundwater data were considered reasonable and reliable, which is of great importance for the study of two-dimensional surface groundwater data. The results of this study provide strong support for water resource management, environmental protection, and related scientific research.

Fig. 5
figure 5

Validation results of the groundwater table grid data. Zone I covers most of eastern China, with a relatively dense distribution of monitoring stations, whereas Zones II and III are located in the southwestern and northwestern regions, respectively, with a relatively sparse number of stations.

This study additionally supplemented the accuracy test of groundwater levels in the North China Plain, as shown in Fig. 6. The region has dense monitoring stations, which are important for agricultural irrigation, and the agricultural irrigation water largely depends on the extraction of groundwater. Therefore, the research results in this region are relatively sufficient, providing comparable data for testing. The maximum water level difference between the reduced groundwater level and the actual monitored value was only 2.78 m. Most (89.09%) of the water level errors were concentrated within 0.03 m, the standard deviation was 0.392, the R2 value reached 0.99, the root-mean-square error was 0.398, and the mean-square error was 0.1588. These data show that the data set recovered the monitoring data, and that it was scientific and reasonable. Of course, this also depends partly on the dense distribution of sites, requiring further refinement of their interpolation results in areas where monitoring data are sparsely distributed; thus, there is a need for enhanced monitoring in key areas of groundwater research.

Fig. 6
figure 6

Accuracy validation of groundwater grid data in the North China Plain.