Table 3 Performance metrics comparison for different models, including human performance. Statistical significance was assessed using a Student’s t-test comparing the AI models against human performance. Significance levels: \(p<0.05\), \(p<0.01\), \(p<0.001\). All values are given in percentage and represent the mean ± standard deviation. The highest value for each metric is bold and underlined, and the second-highest is bold only.

Metric	BEiT	ResNet	ViT	Human
AUC	\(\underline{{\textbf {84.3 }}\pm {\textbf {2.0}}}\)***	77.0 ± 3.4***	71.8 ± 3.1***	54.6 ± 3.6
Accuracy	\(\underline{{\textbf {68.1}} \pm {\textbf {2.3}}}\)***	58.7 ± 1.9*	52.9 ± 3.1	38.0 ± 6.4
F1 Score	\(\underline{{\textbf {67.9}} \pm {\textbf {2.0}}}\)***	58.4 ± 1.9**	51.6 ± 3.9**	36.8 ± 5.3
Precision	\(\underline{{\textbf {69.6}} \pm {\textbf {1.8}}}\)***	60.5 ± 2.8***	55.4 ± 4.4***	38.2 ± 4.2
TPR	\(\underline{{\textbf {68.4}} \pm {\textbf {2.5}}}\)***	58.4 ± 1.8**	52.6 ± 3.4**	37.6 ± 5.1
FPR	\(\underline{{\textbf {84.0}} \pm {\textbf {1.2}}}\)***	79.2 ± 0.9**	76.4 ± 1.6*	68.9 ± 2.7

Quick links

Search