Fig. 15
From: A practical evaluation of AutoML tools for binary, multiclass, and multilabel classification

Post-hoc p-value distributions for each scenario, illustrating pairwise significance among frameworks for \(F_1\) Score and Training Time. Colors indicate different significance levels, with green for strong significance, yellow for moderate, and red for weak. The red dashed line marks the \(\alpha\) threshold.