Extended Data Fig. 10: Prediction of nitrate dynamics from community structure (see SI).
From: Functional regimes define soil microbiome response to environmental change

(a-b) Candidate variables for feature selection to estimate x(0) and C(0) from sequencing and WSOC (water-soluble organic carbon) measurements. (a) Scatter plots of \(\widetilde{x}(0)\) against (1) total absolute abundance (total sequencing reads, first panel), (2) summed absolute abundance of significantly enriched ASVs in CHL − relative to CHL+ (second panel), (3) summed abundance of significantly enriched ASVs in CHL − relative to T0 (third column), and (4) summed abundances of all ASVs that possess Nar and Nap genes in CHL+ samples inferred via PICRUST228 (last panel). (b) Scatter plots between \(\widetilde{C}(0)\) and WSOC measurements in CHL+ or CHL- samples at T0 or T9 (endpoint) as indicated in the panel titles. Colors show endpoint perturbed pH in both (a, b). Note the log scale. (c-e) Models for predicting rA and rA/rC from estimates of \(\widetilde{x}(0)\), \(\widetilde{C}(0)\), x(0), C(0). Shows estimates of: (1) \({r}_{A}=\widetilde{x}(0)/x(0)\) where \(\widetilde{x}(0)\) and x(0) come from nitrate dynamics and the variables identified in (a, b) respectively. (2) \({r}_{A}/{r}_{C}=\widetilde{C}(0)/C(0)\) again with \(\widetilde{C}(0)\) from nitrate dynamics and C(0) from WSOC measurements in (a, b). (c) Assumes fixed values of rA and rC. The top two panels present the log-distribution of rA and rA/rC computed by \(lg(\widetilde{x}(0)/x(0))\) and \(lg(\widetilde{C}(0)/C(0))\), where dashed lines and shades show the mean values and 0.5 standard deviations. The bottom two panels present the learned \(\widetilde{x}(0)\) and \(\widetilde{C}(0)\) (from nitrate dynamics) and the uncertainty of learning (0.5 standard deviation) using purple curves and shaded region. x-axis values are Nar+Nap gene abundances and WSOC data estimated in (a, b). (d) The pH-dependent rA and rA/rC model (see SI). The y-axes are lg(rA) and lg(rA/rC), computed by \(lg(\widetilde{x}(0)/x(0))\) and \(lg(\widetilde{C}(0)/C(0))\). Orange curves and shades represent the learned lg(rA) = f1(pH) and lg(rA/rC) = f2(pH) functions (lines) and learning uncertainty (shaded regions). (e) The null pH model which predicts \(\widetilde{x}(0)\) and \(\widetilde{C}(0)\) as a function of pH without estimating rA and rA/rC. Yellow curves and shades represent the learned \(lg(\widetilde{x}(0))\) and \(lg(\widetilde{C}(0))\) as pH-dependent functions and learning uncertainty. The rightmost column of each row was used as an estimate of x(0) and C(0). (f-g) Predicting nitrate dynamics out of sample from sequencing and WSOC measurements. (f) The examples of fit with our effective 1-biomass consumer resource model (Fig. 3, top row) and the three models presented in (c-e), bottom three rows) in each of the three functional regimes (three columns). The pH-dependent model (orange, third row) has the best prediction performance, which is close to the fit for the 1-biomass consumer-resource model (blue, first row). (g) The fitting and prediction errors on the training dataset (left column) and the test dataset (right column). The pH-dependent model (orange) has the smallest errors among the three prediction models.