Table 4 Results for the performance of the CNN on the five classes patch-level classification task. The performance is evaluated with Cohen’s κ-score, reporting the average and the standard deviation of the models involved in the k-fold cross-validation. The performance is reported for the CNNs trained using the automatically generated weak labels (SKET labels) and the manually created weak labels (GT labels).

From: Unleashing the potential of digital pathology data by training computer-aided diagnosis models without human annotations

Performance at patch-level

  

Dataset

κ-score (SKET labels)

κ-score (GT labels)

 Catania

0.432 ± 0.027

0.413 ± 0.029

 AIDA

0.482 ± 0.018

0.475 ± 0.008