Table 7 Post-hoc analysis summary across datasets.

From: M-estimation activation functions for high-performance extreme learning machine ensemble classification

Dataset

Significant Difference (KW Test)?

Significant Pairs (Post-hoc)?

Were Proposed Models Statistically Better?

Non-significant but Outperformed Cases

Satimage

Yes (p < 0.00001)

Yes (≥ 79 pairs)

Yes, Several proposed models (e.g., P4, P3, P10) significantly better than traditional models (e.g., RAF, Tan-Sig, Sine, Sigmoid)

Some cases like P1 vs. P6 had better performance but not significant

Email

Yes (p < 0.00001)

Yes (≥ 83 pairs)

Yes, Proposed 10, 11, 5, 7 showed superiority over several traditional models

Proposed 10 vs. 4 showed better scores but lacked significance

Breast

Yes (p < 0.00001)

Yes (≥ 66 pairs)

Yes, Proposed 10, 3, 11 consistently better than others

Variants like P2 and P4 performed well but not always significantly

IRIS

Yes (p < 0.00001)

Yes ((≥ 30 pairs

Yes proposed 11 better than others and equally perform with RAF, Sigmoid, Tang-sigmoid

Mostly equally perform not much difference as data is very simple classification and even base classifiers equally performed than ensemble models.