Fig. 8

Cohen’s Kappa values across 10 independent training/testing splits for each model trained on augmented data. All models demonstrate substantial inter-label agreement (Kappa > 0.75), with ResNet-101 achieving the most consistent performance. These values support the reliability of model predictions beyond chance agreement.