Fig. 5: Evaluation of the Pneumonia Model on attending physician notes from MIMIC (2001-12).

Error bands show 95% confidence intervals for estimates of the mean obtained using bootstrapping (n = 100). a Receiver operating characteristic (ROC) curve for the Pneumonia Model tested on 790 bootstrapped attending physician notes from MIMIC (2001-12) that were regex-captured for pneumonia. b Shapley-additive explanations (SHAP) values for the top 15 words in terms of their impact on Pneumonia Model’s output probabilities. c Calibration of probabilities by the Pneumonia Model when applied on 790 bootstrapped attending physician notes from MIMIC (2001-12). d Confusion matrix comparing MIMIC (2001-12) attending physician notes, pneumonia labels by the critical care physician (ground truth) against Pneumonia Model adjudications done at a 50% probability threshold. Notice that the numbers add up to 790.