Figure 3
From: EHR foundation models improve robustness in the presence of temporal distribution shift

The impact of temporal distribution shift on the performance (AUROC, AUPRCC, and ACE) of logistic regression models trained on count-based representations (count-LR). Shaded regions indicate time windows in which performance in out-of-distribution years (2013–2021) is worse (red) or better (green) than performance in the in-distribution year group (2009–2012). A Larger red shaded region indicates more degradation relative to the model’s in-distribution performance. Oracle models were trained and evaluated on each of the out-of-distribution years. Error bars indicate 95% confidence interval obtained from 1000 bootstrap iterations. AUROC Area under the receiver operating characteristics curve; AUPRCC Calibrated area under the precision recall curve; ACE Absolute calibration error; LOS Length of stay; ICU Intensive care unit.