Table 3 The performance of endoscopists with or without the ENDOANGEL-ED’s assistance.

From: Explainable artificial intelligence incorporated with domain knowledge diagnosing early gastric neoplasms under white light endoscopy

 

Accuracy (95% CI)

Sensitivity (95% CI)

Specificity (95% CI)

PPV (95% CI)

NPV (95% CI)

ENDOANGEL-ED

81.10% (73.42–86.96%)

85.45% (73.83–92.44%)

77.78% (66.91–85.83%)

74.60% (62.66–83.72%)

87.50% (77.23–93.53%)

Sole DL model

71.65% (63.27–78.76%)

89.09% (78.17–94.90%)

58.33% (46.80–69.01%)*

62.03% (51.01–71.94%)

87.50% (75.30–94.14%)

Without AI assistance

All endoscopists (n = 31)

70.61% (67.59–73.63%)***###

75.95% (72.89–79.02%)***^^^##

66.44% (60.53–72.36%)***^^^##

65.29% (61.48–69.10%)***^^###

78.20% (75.88–80.51%)***^^^###

Novices (n = 21)

67.15% (63.68–70.63%)***^###

76.28% (72.64–79.92%)***^^^##

60.19% (53.48–66.89%)***^###

60.59% (56.93–64.24%)***###

76.64% (73.67–79.62%)***^^^###

Seniors (n = 7)

76.72% (74.62–78.81%)***^^^###

73.51% (63.38–83.64%)*^##

79.17% (68.47–89.86%)^^##

74.69% (67.00–82.37%)^^##

80.50% (75.65–85.34%)*^#

Experts (n = 3)

80.58% (70.92–90.23%)

79.39% (72.49–86.30%)*^#

80.56% (62.30–98.81%)^

76.25% (59.42–93.09%)^

83.69% (82.11–85.28%)*^#

With AI assistance

All endoscopists (n = 31)

79.63% (77.40–81.86%)^^^

82.11% (79.24–84.98%)*^^^

77.73% (72.63–82.84%)*^^^

75.50% (72.39–78.61%)*^^^

85.56% (84.13–86.99%)

Novices (n = 21)

78.14% (75.13–81.31%)**^^^

84.15% (81.10–87.21%)^^^

73.55% (66.82–80.28%)^^^

72.34% (68.78–75.90%)^^^

86.32% (84.74–87.90%)

Seniors (n = 7)

82.23% (80.13–84.32%)^^^

75.06% (66.73–83.40%)**^^

87.70% (83.41–91.99%)**^^

82.88% (78.69–87.07%)***^^

82.53% (78.36–86.69%)*^

Experts (n = 3)

83.99% (71.57–96.41%)^

84.24% (81.65–86.84%)^

83.80% (63.88–100.00%)*^

80.40% (58.93–100.00%)*^

87.35% (83.09–91.61%)

  1. The McNemar test was used to compare the accuracy, sensitivity, and specificity between the ENDOANGEL-ED and the sole DL model. The χ2 test was used to compare the PPV and NPV between ENDOANGEL-ED and the sole DL model. Performance metrics between different levels of endoscopists and ENDOANGEL-ED and the sole DL model were compared using the Mann–Whitney U test.
  2. DL deep learning, CI confidence interval, PPV positive predictive value, NPV negative predictive value.
  3. *Significant difference between the target group and ENDOANGEL-ED. *, p < 0.05; **, p < 0.01; ***, p < 0.001.
  4. ^Significant difference between the target group and the sole DL. ^, p < 0.05; ^^, p < 0.01; ^^^, p < 0.001.
  5. #Significant difference between the AI-assisted and non-AI-assisted groups. #, p < 0.05; ##, p < 0.01; ###, p < 0.001.