Table 10 Statistical comparison of top three transformer models using mcnemar’s Test, Holm Correction, and effect Sizes.
From: Classifying human vs. AI text with machine learning and explainable transformer models
Model Pair | p-value | Holm-corrected p | Cohen’s g | ΔF1 | Significant |
|---|---|---|---|---|---|
XLM-RoBERTa vs. BERT | 0.0184 | 0.0368 | 0.0056 | + 0.0051 | ✅ |
XLM-RoBERTa vs. RoBERTa | 0.0195 | 0.0368 | 0.0050 | − 0.0046 | ✅ |
BERT vs. RoBERTa | 2.99 × 10⁻⁶ | 8.99 × 10⁻⁶ | 0.0106 | − 0.0097 | ✅ |