Table 3 Performance metrics comparison for different models, including human performance. Statistical significance was assessed using a Student’s t-test comparing the AI models against human performance. Significance levels: *\(p<0.05\), **\(p<0.01\), ***\(p<0.001\). All values are given in percentage and represent the mean ± standard deviation. The highest value for each metric is bold and underlined, and the second-highest is bold only.
Metric | BEiT | ResNet | ViT | Human |
|---|---|---|---|---|
AUC | \(\underline{{\textbf {84.3 }}\pm {\textbf {2.0}}}\)*** | 77.0 ± 3.4*** | 71.8 ± 3.1*** | 54.6 ± 3.6 |
Accuracy | \(\underline{{\textbf {68.1}} \pm {\textbf {2.3}}}\)*** | 58.7 ± 1.9* | 52.9 ± 3.1 | 38.0 ± 6.4 |
F1 Score | \(\underline{{\textbf {67.9}} \pm {\textbf {2.0}}}\)*** | 58.4 ± 1.9** | 51.6 ± 3.9** | 36.8 ± 5.3 |
Precision | \(\underline{{\textbf {69.6}} \pm {\textbf {1.8}}}\)*** | 60.5 ± 2.8*** | 55.4 ± 4.4*** | 38.2 ± 4.2 |
TPR | \(\underline{{\textbf {68.4}} \pm {\textbf {2.5}}}\)*** | 58.4 ± 1.8** | 52.6 ± 3.4** | 37.6 ± 5.1 |
FPR | \(\underline{{\textbf {84.0}} \pm {\textbf {1.2}}}\)*** | 79.2 ± 0.9** | 76.4 ± 1.6* | 68.9 ± 2.7 |