Table 4 Summary statistics for the two datasets analyzed in this paper and their corresponding corrected versions proposed.

From: Investigating the Quality of DermaMNIST and Fitzpatrick17k Dermatological Image Datasets

Dataset

Brief Description

#Images

#Diagnoses

#Training Images

#Validation Images

#Testing Images

DermaMNIST

The original DermaMNIST dataset.

10,015

7

7,007

1,003

2,005

DermaMNIST-C

The “corrected” version of DermaMNIST, without any data leakage.

10,015

7

8,215

573

1,227

DermaMNIST-E

The “extended” version of DermaMNIST, without any data leakage and with more images.

11,719

7

10,015

193

1,511

Fitzpatrick17k

The original Fitzpatrick17k dataset.

16,577

114

12,751

3,826

0

Fitzpatrick17k-C

The “cleaned” version of Fitzpatrick17k, with standardized train-valid-test splits after removing duplicates and erroneous images.

11,394

114

7,975

1,139

2,280

  1. For both Fitzpatrick17k and Fitzpatrick17k-C, the partitions correspond to the experiment titled “Random”, both in Table 2 and in Groh et al.20.