Table 1 A simple model, produced by tree boosting (XGBoost), to classify human proteins as ageing-related or non-ageing-related.

From: Prediction and characterization of human ageing-related proteins by using machine learning

feature ID

description of the feature

category

score

relative frequency in ageing/non-ageing

ageing_n_0

number of ageing-related neighbours = 0

Net

−2.896

38.8/92.1

ageing_n_1

number of ageing-related neighbours = 1

Net

−2.275

15.8/5.6

ageing_n_2

number of ageing-related neighbours = 2

Net

−1.168

15.1/1.4

ageing_n_3_4

number of ageing-related neighbours = 3,4

Net

−0.744

12.8/0.6

GO:0043567

regulation of insulin-like growth factor receptor signaling pathway

BP

1.327

2.6/0.1

GO:0006979

response to oxidative stress

BP

0.9

21.7/1.4

GO:0003684

damaged DNA binding

MF

0.837

8.6/0.2

GO:0009987

cellular process

BP

0.805

99.3/70.0

GO:0005576

extracellular region

CC

0.636

21.7/8.8

GO:0065008

regulation of biological quality

BP

0.563

60.2/14.9

GO:0051276

chromosome organization

BP

0.515

14.5/1.6

GO:0032502

developmental process

BP

0.497

69.4/22.5

GO:0043066

negative regulation of apoptotic process

BP

0.474

32.9/3.5

GO:0009628

response to abiotic stimulus

BP

0.441

38.2/4.4

GO:0007169

transmembrane receptor protein tyrosine kinase signaling pathway

BP

0.413

19.1/2.1

GO:0010332

response to gamma radiation

BP

0.411

8.6/0.1

GO:0019838

growth factor binding

MF

0.405

5.3/0.4

GO:0040008

regulation of growth

BP

0.398

22.0/2.8

GO:0044710

single-organism metabolic process

BP

0.388

42.1/15.4

GO:0031325

positive regulation of cellular metabolic proc

BP

0.331

64.8/12.8

GO:0050896

response to stimulus

BP

0.288

77.3/22.8

GO:0031667

response to nutrient levels

BP

0.285

16.8/1.5

GO:0005515

protein binding

MF

0.271

75.7/24.4

GO:2000377

regulation of reactive oxygen species metabolic process

BP

0.259

13.8/0.6

GO:0051716

cellular response to stimulus

BP

0.257

62.2/11.1

GO:0005654

nucleoplasm

CC

0.235

49.7/14.1

GO:0080135

regulation of cellular response to stress

BP

0.225

27.3/2.6

GO:0048511

rhythmic process

BP

0.224

15.1/1.2

GO:0044427

chromosomal part

CC

0.197

24.0/3.4

ageing_n_5+

number of ageing-related neighbours ≥ 5

Net

0.192

17.4/0.2

GO:0003682

chromatin binding

MF

0.171

17.1/2.1

GO:0006974

cellular response to DNA damage stimulus

BP

0.167

27.6/3.1

GO:0097159

organic cyclic compound binding

MF

0.166

62.8/28.8

GO:0005739

mitochondrion

CC

0.16

20.4/6.1

GO:0019899

enzyme binding

MF

0.128

39.8/6.8

GO:0009894

regulation of catabolic process

BP

0.125

25.7/3.4

  1. Features are listed by ID and description. Feature category can take values “Net” (Network), “MF” (Molecular Function), “CC” (Cellular Component), or “BP” (Biological Process). The table consists of only binary (true or false) features. For each protein we can compute the predicted relevance of ageing as follows: for each row of the table, we check whether the given feature is true for the protein and then we add up the corresponding scores. The larger the final sum, the more important role of a protein is predicted in the human ageing process. For example, suppose that a protein has 3 ageing-related neighbours and their UniProt record contains only two GO terms, “response to oxidative stress”, and “regulation of growth”. Then the predicted ageing relevance of that protein is − 0.744 + 0.9 + 0.398 = 0.554. Predicted scores produced by the above summation method are presented in the “Table1_pred” column of Supplementary Table S1. Scores obtained by summation are not necessarily bounded by 1. The actual output of XGBoost, which we used in the rest of the paper, was normalized to take values in [0…1]. In fact, we use the average of normalized predicted values made by several models (see the Methods). The relative frequency of features in the ageing-related and the non-ageing-related sets of proteins, a value independent of our particular model, is displayed in the last column.