Fig. 3: Performance comparison of LNM prediction models.

Performance (AUROC) of CTMIL models and baseline methods for LNM prediction on the training set (A, E, I), validation set (B, F, J), and two independent test cohorts (C, D, G, H, K, L). Patch_baseline, AUROC computed from raw patch-level predictions. AvgPool_baseline, per-slide AUROC obtained by averaging patch-level predicted probabilities. Prop_baseline, per-slide AUROC obtained by calculating the proportion of positively predicted patches per slide. AMIL, attention-based multiple instance learning. CTMIL, customized transformer-based multiple instance learning. IRV2_*, models using features extracted by the InceptionResNetV2 backbone. RN50_*, models using features extracted by the ResNet50 backbone. UNI_*, models using features extracted by the UNI backbone. Ensemble_CTMIL: Averaged slide-level predictions from the three CTMIL variants.