Figure 2
From: Discovery of novel CSF biomarkers to predict progression in dementia using machine learning

Performance evaluation of different RF classifiers. Four different RF classifiers were trained and evaluated on the independent test set. The Olink + age model was trained on all protein measurements and age; The Olink model is trained on 810 protein measurements. The age model is only trained on age; the Random model is trained on 810 protein measurements with shuffled labels. Four models were evaluated on different metrics: ROC–AUC, F1 score (F1), accuracy (ACC), and balanced accuracy (BACC). (a) Receiver operating characteristic and (b) precision-recall (PR) curves show the performance of the RF classifiers. The ROC–AUC and PR-AUC of the models using all the protein expression values achieve the highest values of 0.82 ad 0.86 respectively. (c) Different evaluation metrics for four models. Stratified train and test split were performed on 10 different random seeds. (d) Bar plot on top of the graph show top 20 most important features for predicting the rate of decline based on the Olink model. The bar plot below shows the correlation with the progression. Most of the biomarkers are negatively correlated, thus the lower abundance of these proteins in the CSF is associated with the faster decline in the MMSE trajectory. Protein relative abundance distributions of all selected biomarkers are shown in Fig. S1. The feature importance and correlation analysis for the Olink + age model are displayed in Fig. S2.