Table 2 Quality control tests and associated flag names in table GLORIA_qc_flags.csv.

From: GLORIA - A globally representative hyperspectral in situ dataset for optical sensing of water quality

Flag name Number of cases

Description and method

Noisy_red 40

High-frequency variability, potentially instrument noise, near the red end: spectra were standardized to zero mean and unit standard deviation. A 4th order polynomial was fitted over the interval 750–900 nm. Spectra with a root-mean square error (RMSE) >0.2 were flagged. This threshold was determined using visual inspection of the distribution of RMSEs with respect to spectral shapes.

Noisy_blue 15

High-frequency variability, potentially instrument noise, near the blue end: spectra were standardized to zero mean and unit standard deviation. A 4th order polynomial was fitted over the interval 350–400 nm. Cases where root-mean square error >0.15 were flagged (threshold determined using visual inspection of the distribution of RMSEs with respect to spectral shapes).

Baseline_shift 164

Spectra shifted up are those where the minimum Rrs is 60% of its median. This percentage corresponds approximately to 1.5 times the interquartile range above the upper quartile of the baseline-percent distribution of the entire GLORIA dataset.

Spectra shifted down are those with at least 20 negative values and either:

• a negative linear slope in the interval 765–900 nm <−8.75 × 10−7 sr−1 nm−1 (the slope threshold was determined as the bound of the lower quartile) and >50% negative Rrs values in this spectral region; or

• >70% negative Rrs in the interval 765 nm-900 nm; or

• at least 20 negative Rrs at in the interval 350–450 nm.

Oxygen_signal 1311

Spectra where Oxygen_peak_height >0.1 (Table 3). This threshold was determined using visual inspection of the distribution of peak heights with respect to spectral shapes.

Negative_uv_slope 139

Negative slopes in the ultraviolet to blue end: The spectra were standardized to zero mean and unit standard deviation. A straight line was fitted over the interval 350–420 nm and spectra with slopes <−0.005 were flagged. This threshold was determined using visual inspection of the distribution of slope values.

QWIP_fail 278

Spectra failing a statistical quality control metric based on Apparent_visible_wavelength (Table 3). The QWIP score exceeded a value of |0.2|.

Suspect 226

Spectra identified during expert elicitation as potentially fraught with measurement problems.

Flagged 1779

A one in this column indicates the presence of at least one flag from the tests described in this table.