Table 2 Summary of our data harmonization process to produce the final harmonized dataset, SNAPD.

From: Harmonized nitrogen and phosphorus concentrations in the Mississippi/Atchafalaya River Basin from 1980 to 2018

Harmonization step

Details

Observations affected

Step 0:

Pre-harmonization

Raw data

9,217,921

Step 1:

Organization name

Standardized organization names in instances where there were varied spellings.

568,644

Step 2:

Unique water monitoring sites

Flagged or combined coordinates and Monitoring Location Identifiers (MLIs) where possible such that each water monitoring site was defined as the unique combination of a MLI and coordinate pair.

54,478 (multiple coordinates)

965,724 (multiple MLIs)

Step 3:

Medium

If the sample was taken in any medium besides water, dropped.

163,356

Step 4:

Date

If an observation was missing a date, dropped.

1,640

Step 5:

Chemical form

If the chemical form of the observation could not be determined, dropped.

1,026,757

Step 6:

Concentration value

If the concentration value was negative, nonsensical (e.g., text instead of a number), or missing and the observation was not indicated to be a non-detect, dropped.

194,579

Step 7:

Concentration units

If concentration units were missing or if they could not be converted to mg/L, dropped.

20,222

Step 8:

Detection Text/codes

If the detection code/text indicated that concentration was not detected due to contamination or other quality control reasons, dropped.

39,868

Step 9:

Sample fraction

If sample fraction was ambiguous or missing, dropped.

340,239

Step 10:

Activity type

If the activity type indicated that the sample was part of a quality control check, dropped.

384,273

Step 11:

Result type

If the result type indicated that the concentration value was estimated, dropped.

130,054

Step 12:

Conversions

Converted nutrients to elemental form (as P or as N) and converted concentration units to mg/L, where possible.

all

Step 13:

Nutrient renaming

Renamed nutrients to incorporate their sample fraction (e.g., nitrogen mixed forms unfiltered, ammonia filtered) to ensure comparability of observations.

all

Step 14:

Detection limit approximation

If a detection limit was not provided for a non-detect observation in the raw data, approximated the detection limit (see section on Non-detects, detection codes, and detection limits).

68,533

Step 15:

Non-detect handling

If an observation was indicated as non-detected, imputed concentration value using detection limits (see section on Imputing concentration for non-detects).

1,241,315

If a nutrient-site-year had 80% or more non-detected observations, flagged observations and left concentration as N/A.

612,918

Step 16:

Outlier flagging

If a given nutrient’s concentration value was above the 99th or below the 1st percentile, flagged as a potential outlier.

131,021

Step 17:

Duplicates

If there were duplicates from multiple concentrations reported for the same site, nutrient, sample fraction, detection status, and date, averaged concentration and indicated the number of observations in the daily average. Note that this also includes time duplicates (see section on Duplicates).

3,191,771

If there were duplicates due to differently named organizations reporting the same record, chose one organization and assigned to duplicate records.

142,952

If there were duplicates due to a site measuring both detected and non-detected concentrations on the same date for the same nutrient, averaged concentration and flagged that the average includes an imputed value.

134,848

Step 18:

Nutrients and sample fraction combination

If nutrient sample fractions could be combined to create a more common nutrient (e.g., total phosphorus vs. particulate phosphorus), combined observations where possible (see section on Combining nutrients and sample fractions).

352 (added as new observations)

Step 19:

Data quality

For a given sample, if the filtered nutrient concentration was greater than or equal to the unfiltered nutrient concentration, dropped.

100,050

  1. The number of observations affected by each harmonization step is indicated. Observations may be counted more than once as there may have been more than one harmonization step that affected a given record.