Extended Data Fig. 7: Test set saliency method pipeline localization performance using an ensemble model vs. using the top performing single checkpoint for each pathology.
From: Benchmarking saliency methods for chest X-ray interpretation

For each pathology, we highlight in bold the model (ensemble or single checkpoint) that has the higher metric, and we underline it if the difference is statistically significant (using 95% bootstrap confidence interval).