Table 2 Comparison of NCLS and radiologists

Sets	Group	SEN	SPE	PPV	NPV	F1	AUC	P value
Internal test set	NCLS	87.50 (68.74–100.0)	93.44 (89.73–96.70)	53.85 (34.45–72.41)	98.84 (97.02–100.0)	66.67 (48.26–81.25)	98.16 (95.89–99.65)	–
	Mid-level radiologists	81.25 (74.69– 86.73)	98.66 (98.05–99.11)	84.12 (77.74–89.26)	98.37 (97.71–98.87)	82.66 (78.09–86.85)	89.95 (86.90–92.75)	0.0009
	Junior radiologists (w/o AI)	87.50 (80.97– 92.42)	85.12 (83.31–86.81)	33.96 (29.15–39.03)	98.73 (98.00–99.25)	48.93 (43.76–54.29)	86.31 (83.09–88.91)	0.0019
	Junior radiologists (w. AI)	97.22 (93.04– 99.24)	89.37 (87.79–90.82)	44.44 (38.87–50.12)	99.73 (99.31–99.93)	62.95 (55.90–66.53)	93.30 (91.67–94.69)	0.0097*
External test set	NCLS	96.15 (86.94–100.0)	92.73 (89.88–95.30)	51.02 (36.36–65.91)	99.67 (98.99–100.0)	66.67 (52.17–77.78)	94.44 (89.42–97.31)	–
	Mid-level radiologists	71.68 (66.08– 76.83)	97.55 (96.99–98.03)	69.73 (64.13–74.93)	97.76 (97.23–98.22)	70.69 (66.67–74.60)	84.61 (82.95–87.21)	0.0004
	Junior radiologists (w/o AI)	71.79 (65.56–77.46)	92.36 (91.34–93.29)	42.53 (37.60–47.57)	97.56 (97.02–98.18)	53.42 (48.47–57.62)	82.08 (79.22–85.04)	0.0019
	Junior radiologists (w. AI)	91.03 (86.61– 94.36)	91.25 (90.17–92.24)	45.03 (40.49–49.64)	99.23 (98.83–99.52)	62.20 (55.67–64.35)	91.14 (89.17–92.86)	0.0039*

All metrics are presented as percentages (95% CI). Values for the junior- and mid-level-radiologist groups represent the mean across individual readers. The junior radiologist group was further divided into two subgroups: with AI assistance and without AI assistance. Statistical significance was evaluated solely with a paired, one-sided Wilcoxon signed-rank test on the reader-level ΔAUC. P values without * indicate the comparison between the NCLS model and each radiologist group. P values marked with * refer to the comparison between junior radiologists with and without AI assistance. No adjustment for multiple comparisons was applied. Source data are provided as a Source data file.
SEN sensitivity, SPE specificity, PPV positive predictive value, NPV negative predictive value, F1 F1-score.

Search