Table 2 Comparison of NCLS and radiologists

From: Deep learning approach for screening neonatal cerebral lesions on ultrasound in China

Sets

Group

SEN

SPE

PPV

NPV

F1

AUC

P value

Internal test set

NCLS

87.50 (68.74–100.0)

93.44 (89.73–96.70)

53.85 (34.45–72.41)

98.84 (97.02–100.0)

66.67 (48.26–81.25)

98.16 (95.89–99.65)

Mid-level radiologists

81.25 (74.69– 86.73)

98.66 (98.05–99.11)

84.12 (77.74–89.26)

98.37 (97.71–98.87)

82.66 (78.09–86.85)

89.95 (86.90–92.75)

0.0009

Junior radiologists (w/o AI)

87.50 (80.97– 92.42)

85.12 (83.31–86.81)

33.96 (29.15–39.03)

98.73 (98.00–99.25)

48.93 (43.76–54.29)

86.31 (83.09–88.91)

0.0019

Junior radiologists (w. AI)

97.22 (93.04– 99.24)

89.37 (87.79–90.82)

44.44 (38.87–50.12)

99.73 (99.31–99.93)

62.95 (55.90–66.53)

93.30 (91.67–94.69)

0.0097*

External test set

NCLS

96.15 (86.94–100.0)

92.73 (89.88–95.30)

51.02 (36.36–65.91)

99.67 (98.99–100.0)

66.67 (52.17–77.78)

94.44 (89.42–97.31)

Mid-level radiologists

71.68 (66.08– 76.83)

97.55 (96.99–98.03)

69.73 (64.13–74.93)

97.76 (97.23–98.22)

70.69 (66.67–74.60)

84.61 (82.95–87.21)

0.0004

Junior radiologists (w/o AI)

71.79 (65.56–77.46)

92.36 (91.34–93.29)

42.53 (37.60–47.57)

97.56 (97.02–98.18)

53.42 (48.47–57.62)

82.08 (79.22–85.04)

0.0019

Junior radiologists (w. AI)

91.03 (86.61– 94.36)

91.25 (90.17–92.24)

45.03 (40.49–49.64)

99.23 (98.83–99.52)

62.20 (55.67–64.35)

91.14 (89.17–92.86)

0.0039*

  1. All metrics are presented as percentages (95% CI). Values for the junior- and mid-level-radiologist groups represent the mean across individual readers. The junior radiologist group was further divided into two subgroups: with AI assistance and without AI assistance. Statistical significance was evaluated solely with a paired, one-sided Wilcoxon signed-rank test on the reader-level ΔAUC. P values without * indicate the comparison between the NCLS model and each radiologist group. P values marked with * refer to the comparison between junior radiologists with and without AI assistance. No adjustment for multiple comparisons was applied. Source data are provided as a Source data file.
  2. SEN sensitivity, SPE specificity, PPV positive predictive value, NPV negative predictive value, F1 F1-score.