Table 10 Statistical comparison of top three transformer models using mcnemar’s Test, Holm Correction, and effect Sizes.

From: Classifying human vs. AI text with machine learning and explainable transformer models

Model Pair

p-value

Holm-corrected p

Cohen’s g

ΔF1

Significant

XLM-RoBERTa vs. BERT

0.0184

0.0368

0.0056

+ 0.0051

XLM-RoBERTa vs. RoBERTa

0.0195

0.0368

0.0050

− 0.0046

BERT vs. RoBERTa

2.99 × 10⁻⁶

8.99 × 10⁻⁶

0.0106

− 0.0097