Table 4 Impact of synthetic data generation (top models with selected sample sizes and SOC ranges) on the skewness and kurtosis of the training data.

From: Enhancing soil organic carbon estimation with generative AI and Nix color sensor

Dataset

Skewness

Kurtosis

Training data

2.80

8.85

Training + GMM (5000 samples, 3–7%)

−0.61

0.75

Training + KNN (4000 samples, 3–9%)

−0.38

−0.40

Training + bootstrap (1000 samples, 3–4%)

0.76

6.59