Fig. 1
From: Generalizable deep neural networks for image quality classification of cervical images

Overview of dataset and model optimization strategy. We utilized a collated multi-device and multi-geography dataset, labelled “SEED” (orange panel), for model training and selection, and subsequently validated the performance of our chosen best-performing model on an external dataset, labelled “EXT” (blue panel), comprising of images from a new device and new geographies (see Table 1 and METHODS for detailed descriptions and breakdown of the datasets by ground truth). We split the “SEED” dataset 10% : 1% : 79% : 10% for train : validation : Test 1 (“Model Selection Set”) and Test 2 (“Internal Validation”), and subsequently investigated the intersection of model design choices in the bottom table on the train and validation sets. The models were ranked based on classification performance on the “Model Selection Set”, captured by the metrics highlighted on the center green panel. The “Internal Validation” set was subsequently utilized to further verify and confirm the ranked order of the models from the “Model Selection Set”. Finally, we validated the performance of our top model on “EXT”, conducting both an external validation and an interrater study (see METHODS). CE: cross entropy; QWK: quadratic weighted kappa; MSE: mean squared error, AUROC: area under the receiver operating characteristics curve.