Table 3 Summary of predictive performance of the best ML models for classification tasks.

From: Explainable AI reveals changes in skin microbiome composition linked to phenotypic differences

Phenotype

Dataset

Model

F1 score per class Test

F1 score

Test

F1 score

Train

Precision per class Test

Recall per class Test

Ave

F1-score

CV

Std

F1-score

CV

Menopausal status

62 first samples

LightGBM

[0.86, 0.95]

0.92

0.98

[1., 0.9]

[0.75, 1. ]

0.93

0.06

1200 samples blocked by individual

XGBoost

[0.89, 0.75]

0.85

1

[0.89, 0.75]

[0.89, 0.75]

0.82

0.07

Smoking

status

62 first samples

XGBoost

[0.89, 0.75]

0.85

0.98

[0.89, 0.75]

[0.89, 0.75]

0.72

0.12

1200 samples blocked by individual

LightGBM

[0.88, 0.10]

0.74

1.0

[0.82, 0.21]

[0.94, 0.07]

0.93

0.08

  1. Three ML models (RF, LightGBM, XGboost) have been evaluated on different subsets of the Canada cohort (62 samples taken from each subject at the first time point, and 1200 time series samples blocked by individual). When applied to time series samples, the ML models have been tuned and trained blocking by individual, e.g., samples of the same subjects are not present both in the training and test datasets. The table reports F1-score, precision and recall per class as computed on the test dataset, weighted average F1-score on the test, training datasets and on cross validation. The table reports the performances scores of the best fine-tuned model per dataset and phenotype, while Supplementary Table 2 shows the full list.