Fig. 3: Evaluation of the impact of the initial 625 GVOG feature set on the random forest algorithm for predicting taxonomy.

a Changes in the prediction accuracy scores with increasing number of features at the order level (top) and the family level (bottom). The vertical lines indicate the number of GVOGs that were employed in TIGTOG’s final models. b Permutation importance for the 15 most important features in the classification model at the order level (top) and family level (bottom). Features were shown in decreasing order based on their impact on accuracy when they were randomly permuted. Permutation importance testing was performed 10 times. Mean values were denoted by green triangles.