Table 7 Performance metrics by bootstrap: ConvNeXt_Tiny with consensus labels and individual annotators.
From: GastroHUN an Endoscopy Dataset of Complete Systematic Screening Protocol for the Stomach
Strategy | Training label | macro | weighted | ||||
|---|---|---|---|---|---|---|---|
precision | recall | f1-score | precision | recall | f1-score | ||
Consensus | All | 87.58 ± 0.25 | 86.92 ± 0.27 | 86.79 ± 0.26 | 88.10 ± 0.24 | 87.28 ± 0.24 | 87.25 ± 0.25 |
Triple | 86.21 ± 0.27 | 85.15 ± 0.28 | 84.97 ± 0.28 | 86.66 ± 0.25 | 85.51 ± 0.27 | 85.41 ± 0.27 | |
FG | 88.11 ± 0.21 | 87.09 ± 0.21 | 87.05 ± 0.21 | 88.27 ± 0.19 | 87.43 ± 0.20 | 87.36 ± 0.20 | |
G | 86.42 ± 0.27 | 85.92 ± 0.27 | 85.66 ± 0.27 | 87.27 ± 0.23 | 86.40 ± 0.25 | 86.38 ± 0.25 | |
FG1 - G1 | 86.67 ± 0.27 | 85.49 ± 0.28 | 85.44 ± 0.28 | 86.85 ± 0.25 | 85.97 ± 0.27 | 85.87 ± 0.27 | |
FG1 - G2 | 86.92 ± 0.23 | 86.03 ± 0.24 | 85.94 ± 0.24 | 87.24 ± 0.22 | 86.37 ± 0.23 | 86.35 ± 0.23 | |
FG2 - G1 | 87.16 ± 0.28 | 86.14 ± 0.28 | 86.07 ± 0.28 | 87.89 ± 0.24 | 87.03 ± 0.26 | 86.93 ± 0.26 | |
FG2 - G2 | 85.61 ± 0.26 | 85.53 ± 0.27 | 85.11 ± 0.27 | 86.76 ± 0.26 | 85.99 ± 0.27 | 85.96 ± 0.27 | |
Annotator | FG1 | 84.07 ± 0.27 | 82.88 ± 0.31 | 82.86 ± 0.30 | 84.75 ± 0.26 | 83.56 ± 0.27 | 83.63 ± 0.27 |
FG2 | 85.37 ± 0.26 | 84.88 ± 0.27 | 84.33 ± 0.28 | 86.50 ± 0.24 | 85.27 ± 0.26 | 85.19 ± 0.26 | |
G1 | 85.64 ± 0.23 | 84.91 ± 0.23 | 84.82 ± 0.23 | 86.24 ± 0.22 | 85.39 ± 0.23 | 85.40 ± 0.23 | |
G2 | 84.21 ± 0.27 | 83.94 ± 0.26 | 83.53 ± 0.27 | 85.16 ± 0.23 | 84.07 ± 0.25 | 84.09 ± 0.25 | |