Figure 4
From: EHR foundation models improve robustness in the presence of temporal distribution shift

Performance of transformer-based CLMBR-LR and count-LR in the in-distribution (ID) year group and their decay (shaded regions) in out-of-distribution (OOD) year groups. A larger shaded region indicates more performance degradation. GRU-based CLMBR-LR results are available in the Supplementary GRU Experiment online. Error bars indicate 95% confidence interval obtained from 1000 bootstrap iterations. Raw performance scores and change in OOD performance relative to ID are provided in Supplementary Tables S2 and S3 online, respectively. AUROC Area under the receiver operating characteristics curve; AUPRCC Calibrated area under the precision recall curve; ACE Absolute calibration error; LOS Length of stay; ICU Intensive care unit; CLMBR Clinical language model-based representation; LR Logistic regression.