Table 3 Cross validation of RFs and stepwise GLMs for prokaryotic CWMs by repeated split sampling

From: Cross-continental soil prokaryotic traits driven by precipitation regime and land cover

 

Random forest

Stepwise GLM

Ensemble

Cell diameter

0.65 ± 0.11

0.70 ± 0.11

0.72 ± 0.10

Cell length

0.45 ± 0.15

0.60 ± 0.10

0.58 ± 0.12

Minimal doubling time

0.79 ± 0.068

0.78 ± 0.097

0.80 ± 0.067

Genome size

0.66 ± 0.12

0.70 ± 0.10

0.70 ± 0.10

RRN

0.70 ± 0.10

0.67 ± 0.10

0.74 ± 0.082

Optimum pH

0.61 ± 0.15

0.68 ± 0.12

0.67 ± 0.12

Optimum temperature

0.72 ± 0.092

0.76 ± 0.13

0.76 ± 0.11

Oxygen preference

0.83 ± 0.066

0.79 ± 0.080

0.82 ± 0.066

Sporulation

0.75 ± 0.089

0.69 ± 0.13

0.75 ± 0.10

Motility

0.74 ± 0.10

0.68 ± 0.11

0.73 ± 0.089

Salinity preference

0.57 ± 0.12

0.52 ± 0.16

0.58 ± 0.12

  1. Pearson correlation coefficients between observed and predicted CWMs (mean ± standard deviation of 200 split sampling runs). For each run the dataset was split into a training and a test set at a ratio of 70:30% of the data. Ensemble indicates a combined model including the stepwise GLM and the RF model with predictions weighted by the cross-validation results for the individual models.