Extended Data Fig. 2: Ablation studies of the 5-fold cross-validation on the training set (n = 3,208).
From: Large-scale pancreatic cancer detection via non-contrast CT and deep learning

a, nnUNet vs. PANDA Stage-2 network (multi-task CNN) for lesion detection, where PANDA achieved significant improvement in AUC score (P = 0.00022). At the same (desired) specificity level of 99.0%, PANDA Stage-2 outperformed nnUNet in sensitivity by 4.9% (95.2% vs. 90.3%) (marked in red dotted line). b, Multi-task CNN baseline (same as PANDA Stage-2 network with nnUNet backbone and classification head) vs. PANDA Stage-3 (dual-path transformer) for differential diagnosis, where PANDA achieved significant improvement in both accuracy (Acc.) and balanced accuracy (Bal. acc.). The significance test comparing the AUCs of the AI model and nnUNet is conducted using the Delong test. Two-sided permutation tests were used to compute the statistical differences of accuracy and balanced accuracy.