Fig. 2: Quantitative evaluation of the classification models at patch- and WSI-level.

a Confusion matrices of the CNN models that reach the highest performance in the patch-level classification. The matrices include the raw and the normalized values. The matrices are reported for Catania (upper part) and AIDA (lower part). The AIDA dataset includes a class called dysplasia, instead of high-grade and low-grade dysplasia. The ground truth and the predictions are mapped into the dysplasia class. b ROC curves of the CNN models for the patch-level classification (Catania), the WSI-level classification (Catania) and the image-level classification (publicly available data). In the latter sub-Figure, the predictions are aggregated to match the different annotations across publicly available datasets.