Table 1 The performance of ENDOANGEL-ED, sole DL model, and endoscopists in image and video tests.

From: Explainable artificial intelligence incorporated with domain knowledge diagnosing early gastric neoplasms under white light endoscopy

 

Accuracy (95% CI)

Sensitivity (95% CI)

Specificity (95% CI)

PPV (95% CI)

NPV (95% CI)

Internal image test

ENDOANGEL-ED

86.61% (83.08–89.50%)

85.22% (77.60–90.56%)

87.11% (82.98–90.35%)

70.50% (62.45–77.45%)

94.22% (90.94–96.36%)

Sole DL model

80.37% (76.37–83.84%)**

90.43% (83.68–94.57%)

76.73% (71.78–81.04%)***

58.43% (51.09–65.42%)

95.69% (92.45–97.58%)

External image test

ENDOANGEL-ED

77.17% (73.01–80.85%)

87.30% (80.36–92.03%)

73.08% (67.90–77.70%)

55.84% (48.86–62.60%)

93.44% (89.61–95.92%)

Sole DL model

74.89% (70.62–78.72%)

92.86% (86.99–96.20%)

67.63% (62.25–72.58%)

57.64% (50.76–64.23%)

96.17% (92.88–97.97%)

Internal video test

ENDOANGEL-ED

81.10% (73.42–86.96%)

85.45% (73.83–92.44%)

77.78% (66.91–85.83%)

74.60% (62.66–83.72%)

87.50% (77.23–93.53%)

Sole DL model

71.65% (63.27–78.76%)

89.09% (78.17–94.90%)

58.33% (46.80–69.01%)*

62.03% (51.01–71.94%)

87.50% (75.30–94.14%)

External video test

ENDOANGEL-ED

88.24% (79.69–93.49%)

97.06% (85.09–99.48%)

82.35% (69.74–90.43%)

78.57% (64.06–88.29%)

97.67% (87.93–99.59%)

Sole DL model

84.71% (75.58–90.84%)

94.12% (80.91–98.37%)

78.43% (65.37–87.51%)

74.42% (59.76–85.07%)

95.24% (84.21–98.69%)

All endoscopists (n = 46)

78.49 % (76.03–80.95%)***^^^

86.45% (84.22–88.67%)***^^^

73.19% (68.34–78.03%)**^

70.95% (67.20–74.70%)***^^

89.45% (88.18–90.73%)***^^^

Novices (n = 21)

78.77% (75.65–81.89%)***^^

85.58% (81.94–89.22%)***^^^

74.23% (67.97–80.49%)*^

70.96% (65.92–76.00%)**^

88.95% (86.85–91.06%)***^^^

Seniors (n = 14)

79.24% (75.17–83.31%)***

86.35% (81.90–90.79%)***^^

74.51% (66.09–82.94%)

71.56% (64.79–78.33%)

89.67% (87.13–92.21%)***^^

Experts (n = 11)

77.01% (68.85–85.16%)

88.24% (83.91–92.57%)***^

69.52% (54.15–84.90%)

70.16% (58.69–81.63%)

90.12% (87.62–92.63%)***^^

Consecutive video test

ENDOANGEL-ED

79.76% (69.96–86.96%)

88.24% (65.67–96.71%)

77.61% (66.29–85.93%)

50.00% (33.15–66.85%)

96.30% (87.47–98.98%)

Sole DL model

70.24% (59.75–78.96%)

82.35% (58.97–93.81%)

67.16% (55.25–77.21%)

38.89% (24.79–55.14%)

93.75% (83.16–97.85%)

  1. The McNemar test was used to compare the accuracy, sensitivity, and specificity between the ENDOANGEL-ED and the sole DL model. The χ2 test was used to compare the PPV and NPV between ENDOANGEL-ED and the sole DL model. Performance metrics between different levels of endoscopists and ENDOANGEL-ED and the sole DL model were compared using the Mann–Whitney U test.
  2. DL deep learning, CI confidence interval, PPV positive predictive value, NPV negative predictive value.
  3. *Significant difference between the target group and ENDOANGEL-ED. *, p < 0.05; **, p < 0.01; ***, p < 0.001.
  4. ^Significant difference between the target group and the sole DL. ^, p < 0.05; ^^, p < 0.01; ^^^, p < 0.001.