Fig. 2 | Scientific Data

Fig. 2

From: National-scale remotely sensed lake trophic state from 1984 through 2020

Fig. 2

Flowchart for data harmonization, modeling, and prediction steps of the LTS-US dataset pipeline. Steps shaped as a file-folder correspond to an intermediary data product, and rectangles correspond to an intermediary model. Data aggregation combines data from the U.S. EPA’s National Lakes Assessment, HydroLAKES, and LimnoSat-US to create a single, harmonized dataset of in situ lake trophic states with paired remotely sensed surface reflectances. Model training steps create multinomial logistic regression, multilayer perceptron, and extreme gradient boosted regression tree models. Each fitted model is then applied to the entire LimnoSat-US data, where national-scale predictions are made for each modeling method. Probabilistic predictions are then averaged to create ensemble predictions of lake trophic state. Quality control steps (described in “Technical Validation”) use both the ensemble and individual model predictions to assess model performance. Each of these four components correspond to a piece of the overall data production pipeline: data aggregation functions are described in “1_aggregate”; model training functions are described in “2_train”; national-scale prediction functions are described in “3_predict”; quality control procedures are described in “4_qc”. Flowchart was designed with the “DiagrammeR” package93.

Back to article page