Table 7 Performance metrics by bootstrap: ConvNeXt_Tiny with consensus labels and individual annotators.

From: GastroHUN an Endoscopy Dataset of Complete Systematic Screening Protocol for the Stomach

Strategy

Training label

macro

weighted

precision

recall

f1-score

precision

recall

f1-score

Consensus

All

87.58 ± 0.25

86.92 ± 0.27

86.79 ± 0.26

88.10 ± 0.24

87.28 ± 0.24

87.25 ± 0.25

Triple

86.21 ± 0.27

85.15 ± 0.28

84.97 ± 0.28

86.66 ± 0.25

85.51 ± 0.27

85.41 ± 0.27

FG

88.11  ±  0.21

87.09  ±  0.21

87.05  ±  0.21

88.27  ±  0.19

87.43  ±  0.20

87.36  ±  0.20

G

86.42 ± 0.27

85.92 ± 0.27

85.66 ± 0.27

87.27 ± 0.23

86.40 ± 0.25

86.38 ± 0.25

FG1 - G1

86.67 ± 0.27

85.49 ± 0.28

85.44 ± 0.28

86.85 ± 0.25

85.97 ± 0.27

85.87 ± 0.27

FG1 - G2

86.92 ± 0.23

86.03 ± 0.24

85.94 ± 0.24

87.24 ± 0.22

86.37 ± 0.23

86.35 ± 0.23

FG2 - G1

87.16 ± 0.28

86.14 ± 0.28

86.07 ± 0.28

87.89 ± 0.24

87.03 ± 0.26

86.93 ± 0.26

FG2 - G2

85.61 ± 0.26

85.53 ± 0.27

85.11 ± 0.27

86.76 ± 0.26

85.99 ± 0.27

85.96 ± 0.27

Annotator

FG1

84.07 ± 0.27

82.88 ± 0.31

82.86 ± 0.30

84.75 ± 0.26

83.56 ± 0.27

83.63 ± 0.27

FG2

85.37 ± 0.26

84.88 ± 0.27

84.33 ± 0.28

86.50 ± 0.24

85.27 ± 0.26

85.19 ± 0.26

G1

85.64 ± 0.23

84.91 ± 0.23

84.82 ± 0.23

86.24 ± 0.22

85.39 ± 0.23

85.40 ± 0.23

G2

84.21 ± 0.27

83.94 ± 0.26

83.53 ± 0.27

85.16 ± 0.23

84.07 ± 0.25

84.09 ± 0.25

  1. Macro and weighted metrics (precision, recall, F1-score) shown with 95% confidence intervals. “FG” refers to Fellow Gastroenterologists (Team A), and “G” to Gastroenterologists (Team B).