Table 2 Optimal number of features, corresponding feature subsets and model performances for six ML models obtained by exhaustive screening

From: Machine learning for phase prediction of high entropy carbide ceramics from imbalanced data

Algorithm

Feature number

Best feature subset

Best AUC value

SVM.rbf

6

\({\sigma }_{{VEC}}\), \({\sigma }_{{{\chi }}_{\text{p}}}\), ΔSconf, \(\overline{{r}_{\text{Me}}}\), \({\sigma }_{{r}_{\text{Me}}}\), \(\overline{{z}^{* }}\)

88.58%

SVM.linear

8

\({\sigma }_{{{\chi }}_{\text{p}}}\),\(\,{\sigma }_{{{\chi }}_{\text{m}}}\), ΔSconf, \(\overline{{r}_{\text{Me}}}\), \({\sigma }_{{r}_{\text{Me}}}\), \({\sigma }_{{Z}^{* }}\)

84.28%

SVM.poly

6

\({\sigma }_{{VEC}}\), \({\sigma }_{{{\chi }}_{\text{p}}}\), \({\sigma }_{{{\chi }}_{\text{m}}}\), ΔSconf, \(\bar{l}\), \(\overline{{r}_{{\text{Me}}}}\), \({\sigma }_{{r}_{\text{Me}}}\), \(\overline{{z}^{* }}\)

88.46%

RF

5

\({\sigma }_{{VEC}}\), \(\overline{{\chi }_{\text{p}}}\), \({\sigma }_{{I}_{1}}\), \(\overline{{z}^{* }}\), Λ

90.22%

XGB

6

\(\overline{{r}_{\text{Me}}}\), Λ, \(\overline{{\chi }_{\text{p}}}\), \({\sigma }_{{I}_{1}}\), \({\sigma }_{{VEC}}\), \(\overline{{z}^{* }}\)

89.92%

LOG

7

\({\sigma }_{{VEC}}\), \({\sigma }_{{{\chi }}_{\text{p}}}\), \({\sigma }_{{{\chi }}_{\text{m}}}\), ΔSconf, \(\overline{{r}_{\text{Me}}}\), \({\sigma }_{{r}_{\text{Me}}}\), \({\sigma }_{{I}_{1}}\)

83.15%