Fig. 8: Performance comparison of Confidence Probability, Patch Similarity, and pSim in assigning true-positive model output labels for cardiomegaly, pleural effusion, pulmonary edema, pneumonia, and atelectasis. | Nature Communications

Fig. 8: Performance comparison of Confidence Probability, Patch Similarity, and pSim in assigning true-positive model output labels for cardiomegaly, pleural effusion, pulmonary edema, pneumonia, and atelectasis.

From: Accurate auto-labeling of chest X-ray images based on quantitative similarity to an explainable AI model

Fig. 8

We compared the true positive capture rate (TPCR) performance for each of the five clinical output labels, using confidence probability alone (reflecting the global probability distribution of the output labels), patch similarity alone (reflecting the focal spatial localization of the output labels), and pSim (reflecting the harmonic mean between the confidence probability and patch similarity, as per Fig. 1). These results are noteworthy in that the two model output labels that reflect high inter-rater agreement of imaging findings—a cardiomegaly and b pleural effusion, as per Fig. 7—show good agreement between the three confidence-level metrics, with high TPCR’s for each. For the two output labels that show lower inter-rater agreement per Fig. 7—c pulmonary edema and d pneumonia—pSim performance significantly exceeds that of patch similarity for both, and that of confidence probability for pneumonia but not pulmonary edema. This difference is likely attributable to the fact that patch similarity is more sensitive for the detection of focal, regional imaging findings (e.g., as seen with the clinical diagnosis of pneumonia), whereas confidence probability is more sensitive for the detection of global findings (e.g., as seen with the clinical diagnosis of pulmonary edema). The results for e atelectasis, typically a more focal than global finding on CXR, may be similarly explained.

Back to article page