Table 13 Statistical analysis of model performance Across RGB, NIR, and multimodal variants.
From: TomatoRipen-MMT: transformer-based RGB and NIR spectral fusion for tomato maturity grading
Model / Variant | Modality | Accuracy Mean (%) | SD | 95% CI | mIoU Mean (%) | SD | 95% CI | Statistical Test Applied | p-value | Effect Size | Significance |
|---|---|---|---|---|---|---|---|---|---|---|---|
ResNet-50 | RGB | 78.4 | 2.1 | [77.6, 79.8] | 58.2 | 2.4 | [57.1, 59.3] | ANOVA vs group | < 0.001 | η2 = 0.77 | Yes |
U-Net (RGB) | RGB | – | – | – | 67.3 | 1.8 | [66.6, 68.2] | ANOVA | < 0.001 | η2 = 0.77 | Yes |
ViT-B/16 | RGB | 81.3 | 2 | [80.2, 82.1] | 69.2 | 1.6 | [68.5, 69.9] | ANOVA | < 0.001 | η2 = 0.77 | Yes |
Swin Transformer (RGB) | RGB | 82.1 | 1.9 | [81.3, 83.2] | 71.4 | 1.7 | [70.5, 72.1] | Tukey vs ResNet | 0.0003 | d = 1.31 | Yes |
NIR-MLP | NIR | 81 | 2.3 | [80.1, 82.4] | – | – | – | t-test vs Swin-NIR | < 0.001 | d = 0.98 | Yes |
NIR U-Net | NIR | – | – | – | 68.5 | 1.8 | [67.8, 69.3] | ANOVA | < 0.001 | η2 = 0.77 | Yes |
Swin-NIR | NIR | 87.6 | 2 | [86.8, 88.7] | 68.5 | 1.7 | [67.8, 69.1] | t-test vs NIR-MLP | < 0.001 | d = 0.98 | Yes |
Early Fusion (A3) | RGB + NIR | 89 | 1.6 | [88.3, 89.6] | 74.2 | 1.5 | [73.6, 75.0] | ANOVA | < 0.01 | η2 = 0.71 | Yes |
Late Fusion (A4) | RGB + NIR | 90.2 | 1.5 | [89.5, 90.9] | 78.1 | 1.4 | [77.4, 78.7] | Tukey vs Early Fusion | 0.009 | d = 0.72 | Yes |
Cross-Attention (A5) | RGB + NIR | 92.5 | 1.4 | [91.9, 93.1] | 80.4 | 1.3 | [79.7, 80.9] | RM-ANOVA | 0.002 | η2 = 0.71 | Yes |
TomatoRipen-MMT (Proposed, A6) | RGB + NIR | 94.8 | 1.2 | [94.3, 95.4] | 82.6 | 1.1 | [82.1, 83.1] | t-test vs best baseline | < 0.0001 | d = 2.14 | Highly Significant |