Fig. 3: Overview of our XAI’s performance.
From: Dermatologist-like explainable AI enhances trust and confidence in diagnosing melanoma

a Ratio of mean Grad-CAM pixel activation value inside the lesion to that outside the lesion (P < 0.0001, two-sided Wilcoxon signed-rank test, n = 196 images). Higher values are better, as they indicate greater attention on regions within the lesion than on regions outside the lesion. Four data points for the baseline and 19 data points for the XAI have values above 300 and have been omitted to more clearly visualise the data. b We calculated the difference in output scores before and after obscuring the important pixels of the images (n = 200 images per threshold). Since we used a threshold on the Grad-CAM heatmaps, we calculated faithfulness values for each threshold ranging from 5 to 95. The stars represent the threshold used in our study and the values of faithfulness at this threshold. The transparent bands represent the 95% bootstrap confidence intervals. c Overlap in ontological explanations between clinician pairs for the same image compared to the overlap in ontological explanations between clinicians and our XAI. The whiskers are positioned close to zero and one, and the median lines are positioned close to zero, making them unnoticeable. Each value is shifted by a random number between −0.02 and 0.02 on the y-axis so that the points can be seen more clearly. The between-clinician category consists of n = 5165 clinician-pairs, whereas the clinician-XAI category comprises n = 1089 images. d Region of interest (ROI) overlap between clinicians and our XAI compared to that of the baseline (P < 0.0001, two-sided paired t test, n = 1120 images). For all boxplots, the horizontal line on each box denotes the median value and the white dot denotes the mean. The upper and lower box limits denote the 1st and 3rd quartiles, respectively, and the whiskers extend from the box to 1.5 times the interquartile range. Source data are provided as a Source Data file.