Fig. 5: Loss as a function of sample size and model complexity.
From: Approaches for handling high-dimensional cluster expansions of ionic systems

The green (red) colors indicate the average and standard deviation of the loss for training (testing) in 50 cross-validation trials, setting aside 80% of the 428 structures for training and 20% for testing. a The learning curve for SGL with a loss function of root mean squared error (RMSE) per primitive cell as a function of sample size. The chosen hyperparameters are α = 0.056 and λ = 0.5. b The RMSE as a function of model complexity, starting from including only the first orbit in a pair cluster and ending with including all geometric clusters up to quintuplets. Each individual model always uses the Ewald energy and all features in orbits up to the orbit number indicated. The number of significant features selected for each model is in yellow, showing how the number of ECI increases to over 150 in the last model.