Table 2 Overview of preprocessing steps for the groundwater time series with the number and percentage of discarded/flagged time series and the flags created per step.

From: A Global-Scale Time Series Dataset for Groundwater Studies within the Earth System

Preprocessing step

Percentage and (number) of discarded time series

Percentage and (number) of flagged time series

Created data flag

1. Discarding time series with less than 2 records

8.81% (22,136)

0

/

2. Reconciling records to ensure that only one groundwater reference point elevation is provided per time series

0.53% (1,342)

0

/

3. Removing numeric placeholders for missing records

0

0

/

4. Temporal aggregation of time series to either daily, monthly, or yearly resolution

1.86% (4,680)

1 - 85% YS (174,337),

9% MS (18,670),

6% d (11,285)

2 -44% not-NA (90,757)

1 - interval [d,MS,YS]

2 - aggregated_from_n_values [number]

5. Capping the gap fraction and gap length

a) gap fraction

b) gap length

a) 0,64% (1,617)

b) 2,42% (6,094)

a) 100% not-NA

a) gap_fraction [number]

6. Flagging negative depth to groundwater records

0

96% No (196,747),

0.8% Some (1,617),

0.4% All (911)

negative_signs_wtd [‘No’,’Some’,’All’]

7. Flagging autocorrelation

 0

16% True (33,142)

autocorrelation [True/False]

8. Flagging outliers and change points with DBSCAN

0

6% True (14,524)

outliers_change_points [True/False]

9. Flagging uninterrupted sequences of the exact same water table as potential measurement error

0

1 - 11% True (22,623)

2 - 100% not-NA

1 - Attribute table: plateaus [True/False]

2 - Time series table: plateaus [plateau length in time steps]

10. Calculating Mann-Kendall trend direction and Sen’s slope

0

1 - 66% no trend (134,157), 22% decreasing (45,506), 12% increasing (24,629)

2 - 34% not-NA (70,135)

1 - trend_direction [‘no trend’, ‘decreasing’, ‘increasing’]

2 - trend_slope [m/year]

11. Adding further data flags

0

100% not-NA

starting_date [date], ending_date [date], length_years [number], aggregated_from_n_values_median [number], groundwater_mean_m [number], groundwater_median_m [number]

  1. All data flags are listed in Table 6.