Table 4 Inter-reader reliability of the models and radiologists

Inter-reader reliability between the ensemble model and radiologists
Fleiss κ (95% CI)	0.501 (0.463–0.538)
Cohen κ (95% CI)	Expert 1	CSTC	Expert 2	CSTC	Expert 3	CSTC
Ensemble model	0.299 (0.265–0.333)	+ +	0.493 (0.456–0.531)	+ + +	0.456 (0.419–0.493)	+ + +
Cohen κ (95% CI)	Expert 3	CSTC	Expert 4	CSTC	Expert 6	CSTC
Ensemble model	0.356 (0.321–0.392)	+ + +	0.570 (0.532–0.607)	+ + +	0.596 (0.560–0.633)	+ + +
Inter-reader reliability among radiologists
Fleiss κ (95% CI)	0.401 (0.364–0.438)
Cohen κ (95% CI)	EG1	CSTC	EG2	CSTC	EG3	CSTC
Cohen κ (95% CI)	0.267 (0.234–0.300)	+ +	0.295 (0.261–0.329)	+ +	0.581 (0.544–0.618)	+ + +

Inter-reader reliability among models
Fleiss κ (95% CI)	0.800 (0.770–0.830)
Cohen κ (95% CI)	E3	CSTC	E4	CSTC	ViT	CSTC	SWIN	CSTC
Ensemble model	0.805 (0.775–0.835)	+ + + + +	0.793 (0.763–0.823)	+ + + +	0.783 (0.752–0.814)	+ + + +	0.908 (0.886–0.930)	+ + + + +

EG expert group, CSTC consistency, E3 EfficientNet B3, E4 EfficientNet B4, ViT vision transformer, SWIN swin transformers, CI confidence interval.
Note: EG1= expert 1+ expert 2 (junior radiologist group); EG2= expert 3+ expert 4 (medium seniority group); EG3= expert 5+ expert 6 (senior radiologist group).
CSTC evaluation (consistency evaluation):
0< Fleiss κ, Cohen κ ≤ 0.2, low consistency, “+”.
0.2< Fleiss κ, Cohen κ ≤ 0.4, general consistency, “+ +”.
0.4< Fleiss κ, Cohen κ ≤ 0.6, moderate consistency, “+ + +”.
0.6< Fleiss κ, Cohen κ ≤ 0.8, high consistency, “+ + + +”.
0.8< Fleiss κ, Cohen κ ≤ 1.0, extremely high consistency, “+ + + + +”.

Quick links

Search