Fig. 4: Using the baseline proteome to predict critical illness development.

a Log(hazard ratio) ± the standard error for time to development of critical disease using protein abundance at hospital admission. The 50 proteins with the highest significance levels are shown and ordered from high to low log(hazard ratio). n = 249 samples/individuals. b Kaplan Meier curves showing the association between protein abundance at hospital admission and time to intubation or death. Protein abundance below the median was considered low, above the median was considered high. The bands denote confidence intervals. The proteins with the two highest and two lowest log(hazard ratio) are shown. n = 249 samples/individuals. c Dotplot of pathways involved in proteins whose abundance at hospital admission were associated with time to intubation or death, identified by PathfindR. The 30 most significant clusters are represented by the most significant pathway term of each cluster. Dot size indicates the number of proteins, colours represent fold enrichment. Terms are ordered by significance. BIO = Biocarta, KEGG = Kyoto Encyclopedia of Genes and Genomes, PID = Pathway Interaction Database, RCT = Reactome, WP = Wikipathways. d Clinical and protein abundance data at hospital admission were used to predict critical illness development using Least Absolute Shrinkage and Selection Operator (LASSO) regression. The training cohort consisted of all participants except those that were patients at the Amsterdam UMC, location VUmc hospital (VUMC). The model was trained by performing five-fold cross-validation and tested in the validation cohort. e Feature importance of proteins and clinical variables based on the LASSO models using protein data (green), clinical data (orange) or both (blue). The performance of the models in the training cohorts is shown in receiver operating characteristic (ROC) curves. The mean area under the curve (AUC) and standard error are given. f Performance of the models on the validation cohort using protein data (green), clinical data (orange) or both (blue). The area under the curve (AUC) is given. Source data are provided in the supplementary Source Data file. Full protein names, estimates and p values are provided in the Supplementary data 5 (a,b) and 12 (c).