Extended Data Fig. 2: Examination of estimated tree species richness using three cross-validation approaches. | Nature Ecology & Evolution

Extended Data Fig. 2: Examination of estimated tree species richness using three cross-validation approaches.

From: Co-limitation towards lower latitudes shapes global forest diversity gradients

Extended Data Fig. 2

(Left column) In randomized cross-validation (RCV), three imputation models—random forests (RF, top), multiple regression with ordinary least squares (OLS, centre), and XGBoost (XGB, bottom)—were trained for each continent with a random subsample comprising 90% of the training data from that continent; the remaining 10% of the training data were used as the testing set. This process was repeated 20 times with sample replacement to examine the accuracy of estimated tree species richness values for each sample plot. (Centre column) In spatial cross-validation (SCV), all sample data from an ecoregion were reserved for testing the three imputation models, trained with the remaining samples from the same continent. This process was repeated until all the forested ecoregions across the world had been tested. (Right column) For post-sample validation (PSV), we collated a new sample dataset from 22,131 forest sample plots (Phase II data), and used Phase II data as the testing set to evaluate the accuracy of the predictive models that were trained for each continent with the Phase I data. Scatter plots show observed (vertical axis) vs. predicted (horizontal axis) values of tree species richness per hectare, from which we calculated mean absolute error (MAE), root-mean-squared error (RMSE), and the coefficient of determination (R2) of the trendlines (red) between the predicted and observed values.

Back to article page