Table 2 The classification performance metrics (Precision, Recall, Accuracy, and F1 score) were compared between nine human evaluators and EfficientNet-B6, the most effective AI model for each metric.

From: Artificial intelligence based sonographic differentiation between skull fractures and normal sutures in young children

  1. The AI results were obtained using the same collection of images assessed by the human raters to ensure a fair comparison. Shading of the cells ranges from white (0.5) to gray (1.0) to represent values. All group comparisons revealed statistically significant differences, as determined by paired samples Wilcoxon rank sign tests.