Fig. 6: Reliability analysis via intraclass correlation coefficient (ICC).
From: Intricacies of human–AI interaction in dynamic decision-making for precision oncology

Bar plots a and b compare McGraw and Wong’s ICC between unassisted decision and AI-assisted decision for NSCLC and HCC, respectively. ICC value along with 95% confidence interval, and p-value (one-sided F-test, \({H}_{\alpha }:{icc} > 0\)) for NSCLC and HCC are presented in Table 1. We applied two-way random effects model to calculate four types of ICC for \(n\times k\) data structure where \(n\) and \(k\) are the number of patients and evaluators, respectively, which were both chosen randomly from a larger pool of patients and evaluators (NSCLC: \(n=8\), \(k=9\); HCC: \(n=9\), \(k=8\)). ICC type Consistency (C) measures the symmetric differences between the decisions of the \(k\) evaluators, whereas ICC type Absolute Agreement (A) measures the absolute differences. ICC unit Single rater corresponds to using the decision from a single evaluator as the basis for measurement and ICC unit Average corresponds to using the average decision from all evaluators.