Table 2 Overview of preprocessing steps for the groundwater time series with the number and percentage of discarded/flagged time series and the flags created per step.
From: A Global-Scale Time Series Dataset for Groundwater Studies within the Earth System
Preprocessing step | Percentage and (number) of discarded time series | Percentage and (number) of flagged time series | Created data flag |
|---|---|---|---|
1. Discarding time series with less than 2 records | 8.81% (22,136) | 0 | / |
2. Reconciling records to ensure that only one groundwater reference point elevation is provided per time series | 0.53% (1,342) | 0 | / |
3. Removing numeric placeholders for missing records | 0 | 0 | / |
4. Temporal aggregation of time series to either daily, monthly, or yearly resolution | 1.86% (4,680) | 1 - 85% YS (174,337), 9% MS (18,670), 6% d (11,285) 2 -44% not-NA (90,757) | 1 - interval [d,MS,YS] 2 - aggregated_from_n_values [number] |
5. Capping the gap fraction and gap length a) gap fraction b) gap length | a) 0,64% (1,617) b) 2,42% (6,094) | a) 100% not-NA | a) gap_fraction [number] |
6. Flagging negative depth to groundwater records | 0 | 96% No (196,747), 0.8% Some (1,617), 0.4% All (911) | negative_signs_wtd [‘No’,’Some’,’All’] |
7. Flagging autocorrelation | 0 | 16% True (33,142) | autocorrelation [True/False] |
8. Flagging outliers and change points with DBSCAN | 0 | 6% True (14,524) | outliers_change_points [True/False] |
9. Flagging uninterrupted sequences of the exact same water table as potential measurement error | 0 | 1 - 11% True (22,623) 2 - 100% not-NA | 1 - Attribute table: plateaus [True/False] 2 - Time series table: plateaus [plateau length in time steps] |
10. Calculating Mann-Kendall trend direction and Sen’s slope | 0 | 1 - 66% no trend (134,157), 22% decreasing (45,506), 12% increasing (24,629) 2 - 34% not-NA (70,135) | 1 - trend_direction [‘no trend’, ‘decreasing’, ‘increasing’] 2 - trend_slope [m/year] |
11. Adding further data flags | 0 | 100% not-NA | starting_date [date], ending_date [date], length_years [number], aggregated_from_n_values_median [number], groundwater_mean_m [number], groundwater_median_m [number] |