Fig. 4
From: Interpretable machine learning models classify minerals via spectroscopy

The k-cross validation procedure. A) Data is divided into 10 equally sized bins and one bin is chosen for use as test data. B) The wavenumber domain from 300 to 1300 cm−1 is divided into ranges of size 100 and size 200, starting at 300 and at 350 cm−1, making the first four ranges 300 to 400, 300 to 500, 350 to 450, and 350 to 550. Each range has two features defined: peak position and peak height. C) Each feature has a chi-squared value calculated and the features with the 25 highest values are selected for the next step. D) Each pair of the 25 features is used to train a Gaussian Process model and the F1 score is calculated. The two features which produce the highest F1 score are chosen. E) Models are trained using different techniques, using the two features chosen. The model with the highest F1 score is chosen as the final result. The process is repeated from step A, selecting a different bin as the test data, until each bin has been used as test data exactly one time.