Table 4 Feature selection process driven by performance of XGBoost on different feature sets.

From: Prediction and characterization of human ageing-related proteins by using machine learning

short description of the feature set

number of features

depth of trees

number of trees

number of predictions

AUC

average

std dev

GO w/o ancestors, with ageing GOs

16820

6

20

20

0.8787

0.0061

GO w/o ancestors

16800

6

20

20

0.8729

0.0050

GO

21000

6

20

20

0.9086

0.0049

GO XGBoost one pass filter

373

6

20

20

0.9187

0.0042

GO XGBoost two pass filter

65

6

20

20

0.9219

0.0033

GO XGBoost two pass filter UniNet, CoExp

79

6

20

20

0.9294

0.0034

GO XGBoost two pass filter, UniNet

78

6

20

20

0.9293

0.0036

GO XGBoost two pass filter, degree

66

6

20

20

0.9283

0.0027

GO XGBoost two pass filter, ageing_n

66

6

20

20

0.9314

0.0029

GO XGBoost three pass filter, ageing_n

32

1

50

20

0.9322

0.0011

  1. Performance of different feature sets, from weakest down to strongest, by comparing classification performance of 20 prediction each. Default settings for Gene Ontology (GO) features are “without ageing GOs but with GO ancestors”; we marked when used otherwise. For each feature set description (row), we list the number of features, the depth and number of trees in the model and the average and standard deviation of AUC values generated by 20 predictions of 5-fold cross-validation. “UniNet” means the set of network features (including degree, ageing_n, and the remaining network features), “CoExp” means the co-expression feature.