Fig. 1: Comparison of the metrics of four different MIL-based and baseline ANNs (MIL with attention (MIL-attention), MIL with maxpooling (MIL-max), MIL with mean pooling (MIL-mean), and the baseline SELU CNN (baseline SELU)).

A Four different ANNs were tested on a test set of histologic section of basal cell carcinomas (BCC, n = 97) and normal skin (non-BCC, n = 35) to identify tumorous lesions. Subsequently, ANNs were compared with regard to area under curve (AUC), accuracy, and F1-score (measure of a test’s accuracy that is not sensitive to imbalanced data sets) of 100 retrained ANNs. Indicated we see median (lines), interquartile range (bars), most extreme, non-outlier data points (whiskers), outliers (points). B Receiver operating characteristics (ROC) curves of (median performing out of 100 times retrained) MIL-based and baseline methods were calculated based on the test set of histologic section of basal cell carcinomas (BCC, n = 97) and normal skin (non-BCC, n = 35). *p < 0.05; MIL multiple instance learning, ROC receiver operating characteristic, SELU self-normalizing linear unit.