Figure 7

Heatmap of intraclass correlation coefficient (ICC) between human observers and the deep learning model. The interobserver reliability of three human observers (R1: junior resident; R2: spine fellow; R3: senior surgeon), the deep learning model (AI), and the ground truth values (Gt) was compared using the intraclass correlation coefficient (ICC). The ICC heatmap presented a data matrix, where colouring offers an overview of the numeric ICC differences for each radiographic parameter. Hierarchical cluster analysis was used to build a hierarchy of the ICC heatmap clusters. The deep learning model (AI) was capable of matching the reliability of human observers in 15/18 of the parameters.