Table 7 Performance metrics by bootstrap: ConvNeXt_Tiny with consensus labels and individual annotators.

Strategy	Training label	macro			weighted
Strategy	Training label	precision	recall	f1-score	precision	recall	f1-score
Consensus	All	87.58 ± 0.25	86.92 ± 0.27	86.79 ± 0.26	88.10 ± 0.24	87.28 ± 0.24	87.25 ± 0.25
	Triple	86.21 ± 0.27	85.15 ± 0.28	84.97 ± 0.28	86.66 ± 0.25	85.51 ± 0.27	85.41 ± 0.27
	FG	88.11 ± 0.21	87.09 ± 0.21	87.05 ± 0.21	88.27 ± 0.19	87.43 ± 0.20	87.36 ± 0.20
	G	86.42 ± 0.27	85.92 ± 0.27	85.66 ± 0.27	87.27 ± 0.23	86.40 ± 0.25	86.38 ± 0.25
	FG1 - G1	86.67 ± 0.27	85.49 ± 0.28	85.44 ± 0.28	86.85 ± 0.25	85.97 ± 0.27	85.87 ± 0.27
	FG1 - G2	86.92 ± 0.23	86.03 ± 0.24	85.94 ± 0.24	87.24 ± 0.22	86.37 ± 0.23	86.35 ± 0.23
	FG2 - G1	87.16 ± 0.28	86.14 ± 0.28	86.07 ± 0.28	87.89 ± 0.24	87.03 ± 0.26	86.93 ± 0.26
	FG2 - G2	85.61 ± 0.26	85.53 ± 0.27	85.11 ± 0.27	86.76 ± 0.26	85.99 ± 0.27	85.96 ± 0.27
Annotator	FG1	84.07 ± 0.27	82.88 ± 0.31	82.86 ± 0.30	84.75 ± 0.26	83.56 ± 0.27	83.63 ± 0.27
	FG2	85.37 ± 0.26	84.88 ± 0.27	84.33 ± 0.28	86.50 ± 0.24	85.27 ± 0.26	85.19 ± 0.26
	G1	85.64 ± 0.23	84.91 ± 0.23	84.82 ± 0.23	86.24 ± 0.22	85.39 ± 0.23	85.40 ± 0.23
	G2	84.21 ± 0.27	83.94 ± 0.26	83.53 ± 0.27	85.16 ± 0.23	84.07 ± 0.25	84.09 ± 0.25

Macro and weighted metrics (precision, recall, F1-score) shown with 95% confidence intervals. “FG” refers to Fellow Gastroenterologists (Team A), and “G” to Gastroenterologists (Team B).

Quick links

Search