Fig. 4: Model performance by ML architecture type after 10 stratified train-test runs (with nested grid search cross validation in each run).

a Boxplots showing the median, upper quartile, lower quartile, and outliers of balanced accuracy, (b) Area under precision-recall curve (AUPRC), (c) false positive rate (FPR), and d false negative rate (FNR). Asterisk (*) indicates group pairs with statistically significant difference in model performance.