Fig. 5: Fusing classical machine learning with CNN.

a F1-scores (y-axis) that result when making final test set predictions by averaging the probabilities assigned to classes by the SVM and CNN models. The weight given to the CNN probabilities over SVM probabilities in the weighted average is depicted on the x-axis (i.e., 1 means only CNN probabilities are considered, 0 means only SVM probabilities are considered, 0.5 amounts to averaging the probabilities without any weighting). The thick solid red line represents the mean F1 score across 20 repeats of cross-validation. The shaded area with thin outer lines corresponds to standard error of the mean across these repeats (SEM). b F1-scores (y-axis) are shown as a function of hyperparameter choice for the stacked model (purple lines and dots) to simulate the best-case stacking result. The solid purple line represents mean performance over 20 repeats of cross-validation with shaded areas corresponding to SEM over these repeats and dots corresponding to results from individual repeats. Performance of the tuned lower-level CNN model is shown for comparison (orange). The thicker dotted line represents mean CNN performance with shaded areas corresponding to SEM and individual thinner and darker solid lines corresponding to results from each repeat. c Violin plots showing CNN performance over 20 repeats of the cross-validation scheme (last or bottom violin in each row) relative to the best-case stacked model performance (first or top violin in each row). Violin plot colors correspond to different performance measures which are additionally separated by horizontal lines that separate rows. See Fig. 3 for information represented by the violin plots.