Fig. 4: Statistical analysis for the case study tasks. | Nature Communications

Fig. 4: Statistical analysis for the case study tasks.

From: TacticAI: an AI assistant for football tactics

Fig. 4

In task 1, we tested the statistical difference between the real corner kick samples and the synthetic ones generated by TacticAI from two aspects: (A.1) the distributions of their assigned ratings, and (A.2) the corresponding histograms of the rating values. Analogously, in task 2 (receiver prediction), (B.1) we track the distributions of the top-3 accuracy of receiver prediction using those samples, and (B.2) the corresponding histogram of the mean rating per sample. No statistical difference in the mean was observed in either cases ((A.1) (z = −0.34, p > 0.05), and (B.1) (z = 0.97, p > 0.05)). Additionally, we observed a statistically significant difference between the ratings of different raters on receiver prediction, with three clear clusters emerging (C). Specifically, Raters A and E had similar ratings (z = 0.66, p > 0.05), and Raters B and D also rated in similar ways (z = −1.84, p > 0.05), while Rater C responded differently from all other raters. This suggests a good level of variety of the human raters with respect to their perceptions of corner kicks. In task 3—identifying similar corners retrieved in terms of salient strategic setups—there were no significant differences among the distributions of the ratings by different raters (D), suggesting a high level of agreement on the usefulness of TacticAI’s capability of retrieving similar corners (F1,4 = 1.01, p > 0.1). Finally, in task 4, we compared the ratings of TacticAI’s strategic refinements across the human raters (E) and found that the raters also agreed on the general effectiveness of the refinements recommended by TacticAI (F1,4 = 0.45, p > 0.05). Note that the violin plots used in B.1 and CE model a continuous probability distribution and hence assign nonzero probabilities to values outside of the allowed ranges. We only label y-axis ticks for the possible set of ratings.

Back to article page