Fig. 1: Model performance compared with the published baseline using the filtered training set. | npj Digital Medicine

Fig. 1: Model performance compared with the published baseline using the filtered training set.

From: Optimizing skin disease diagnosis: harnessing online community data with contrastive learning and clustering techniques

Fig. 1

a, b show the top-k diagnosis accuracy and ROC curve of our model. We pre-trained our model using unannotated images collected from the Internet and then fine-tuned it on the full coarse labeled training set. Our top-1 diagnostic accuracy on the test set increased from 42.05% to 45.05% and the AUC of the ROC curve increased from 0.859 to 0.872. After filtering potential noisy labels using validation images, the performance improved as the number of validation images increased. When there were 50 validation images per category, the top-1 accuracy reached 49.64%. c Boxplot showing the performance of three trials using different subset of validation images. Boxes represent the median costs and interquartile range. Whiskers extend to the farthest data points. ANOVA analysis showed that our model’s performance was significantly better than the baseline, and that different validation sets used for filtering did not produce statistically significant differences (p = 0.77). d Top-k diagnosis accuracy improvement of our model saturates when the number of validation images reaches 50 per category, suggesting that 50 validation images per category are sufficient for the filtering process. e Number of images after filtering averaged over three trials did not vary too much when changing the number of validation images, especially when the number of validation images reached 50 per category, indicating the estimated center of each cluster tends to be stable with more validation images.

Back to article page