Table 1 Overview of the dataset composition.

	Class
Source	Cancer	High-grade dysplasia	Low-grade dysplasia	Hyperplastic polyp	Normal	Total images
Training dataset: automatic weak labels (SKET)
Catania	422	464	630	251	462	1704
Radboudumc	189	119	434	493	1000	2065
Total	611	583	1064	744	1462	3769
Training dataset: manual weak labels (ground truth)
Catania	379	454	529	181	438	1704
Radboudumc	188	94	453	428	1048	2065
Total	567	548	982	609	1486	3769
Private testing datasets
Catania	52	44	54	23	79	227
Radboudumc	50	23	92	62	219	423
Total	102	67	146	85	298	650
Public testing datasets
GlaS³⁶	91	0	0	42		133
CRC³⁷	69	0	0	71		140
UNITOPATHO^31,32 (sections)	0	1370	5804	545	950	8669
UNITOPATHO^31,32 (WSI)	0	46	184	41	21	292
TCGA-COAD³³	50	0	0	0	0	50
Xu³⁸	355	0	0	0	362	717
AIDA³⁴	31	4		1	65	101
IMP-CRC³⁵	268		547	271		1086
Total						11888

The dataset includes colon images and reports from digital pathology workflows (Catania and Radboudumc) and publicly available datasets. The dataset is split into training (upper part) and testing (lower part). The training dataset is labeled using automatically extracted weak labels provided by SKET (upper part) and the ground truth of manually annotated weak labels (central part). The training partition includes data from Catania and Radboudumc, used to train the CNN with a 10-fold cross-validation approach and evaluate the approach comparing its performance after training with automatically extracted labels and manually-created labels. The test partition (lower part) includes data from Catania and Radboudumc and data from public datasets. Public datasets are in some cases labeled with different classes than those employed in this work. In such cases, classes are mapped to the five considered ones via aggregation. The task proposed in the paper is a multilabel classification problem, therefore the sum of the rows can differ from the total number of images. Furthermore, SKET weak labels can include mislabeled samples, therefore the sum of the rows can differ between the automatic and the manually-created weak labels, whereas, the total number of images is the same.

Search