Table 1 Inter-annotator agreement across skin tone scales

From: Validity of two subjective skin tone scales and its implications on healthcare model fairness

Analysis

Annotator pair

Fitzpatrick

Monk

Primary inter-annotator agreement measure

 Intraclass Correlation coefficient (ICC[2,k])

All annotators

0.66 (95% CI [0.02–0.87])

0.64 (95% CI [0.02–0.85])

Secondary inter-annotator agreement measures

 Weighted Cohen’s Kappa

1 vs. 2

0.63

0.64

1 vs. 3

0.39

0.36

2 vs. 3

0.29

0.30

 Kendall’s W

All Annotators

0.90

0.85

 Krippendorff’s Alpha

All Annotators

0.41

0.41

  1. The Fitzpatrick (scale of I–VI) and Monk (scale of 1–10) refer to the two skin tone scales used for this study. Inter-rater reliability metrics comparing annotators’ ratings for Fitzpatrick and Monk scales. Weighted Cohen’s Kappa reflects pairwise agreement, Kendall’s W evaluates relative rankings across all annotators, Krippendorff’s Alpha measures ordinal agreement, and ICC[2,k] provides a measure of consistency across all annotators.
  2. ICC intraclass correlation coefficient, CI confidence interval