Table 2 Comparison of Source Data Model Performance, Estimated External Validation Performance, and Observed External Validation Performance on 13 Datasets and 5 Modalities
Source Dataset | PSource | PDABIS | PEst | Ext. Dataset | PExt | Δ(PSource, PExt) | Δ(PEst, PExt) | Shuffled Ext. |
|---|---|---|---|---|---|---|---|---|
CXR | 0.85 [0.85-0.85] | 0.63 | 0.73 [0.72-0.73] | CXP | 0.73 [0.73-0.74] | 0.12 | 0.00 | 0.50 |
CXR | 0.85 [0.85-0.85] | 0.63 | 0.73 [0.72-0.73] | NIH | 0.76 [0.76-0.77] | 0.09 | -0.03 | 0.51 |
CXP | 0.79 [0.79-0.79] | 0.57 | 0.72 [0.72-0.73] | CXR | 0.77 [0.77-0.77] | 0.02 | -0.05 | 0.45 |
CXP | 0.79 [0.79-0.79] | 0.57 | 0.72 [0.72-0.73] | NIH | 0.76 [0.76-0.77] | 0.03 | -0.04 | 0.50 |
CXR+CXP | 0.82 [0.82-0.82] | 0.61 | 0.72 [0.71-0.72] | NIH | 0.77 [0.77-0.78] | 0.05 | -0.05 | 0.51 |
CXR+NIH | 0.85 [0.84-0.85] | 0.68 | 0.67 [0.66-0.67] | CXP | 0.69 [0.69-0.69] | 0.16 | -0.02 | 0.50 |
COVID-Ext | 0.99 [0.98-0.99] | 0.80 | 0.68 [0.67-0.69] | COVID-Int | 0.64 [0.63-0.65] | 0.36 | 0.04 | 0.53 |
ILD-Diag | 0.95 [0.93-0.96] | 0.85 | 0.60 [0.58-0.62] | ILD-Plan | 0.66 [0.59-0.73] | 0.29 | -0.06 | 0.52 |
PTB-XL ECG | 0.90 [0.89-0.91] | 0.88 | 0.52 [0.52-0.53] | LUDB | 0.70 [0.62-0.79] | 0.21 | -0.18 | 0.60 |
ICHBHI | 0.97 [0.90-1.00] | 0.91 | 0.57 [0.50-0.67] | JUST | 0.60 [0.38-0.81] | 0.37 | -0.03 | 0.51 |
MIMIC-III | 0.72 [0.68-0.76] | 0.63 | 0.59 [0.54-0.63] | EHR-Int | 0.58 [0.56-0.60] | 0.14 | 0.01 | 0.51 |
Average: | 0.87 | 0.73 | 0.64 | Â | 0.68 | 0.20 | -0.04 | 0.52 |