Fig. 7: Risk factors for composite outcome, infection and treatment.
From: Machine learning can identify newly diagnosed patients with CLL at high risk of infection

a Top 10 risk factors for the composite outcome. b Top 2 risk factors specific to infection prior to CLL treatment. c Top 2 risk factors specific to CLL treatment prior to infection. Mean absolute SHAP contribution indicates the magnitude by which the probabilistic output of CLL-TIM is affected by the given feature. This was calculated using 3720 Danish CLL patients (i.e. training, validation and test cohorts) and averaged over CLL-TIM’s 28 base-learners. Specificity to infection in b was calculated as the mean difference between the feature’s SHAP values for patients who as a first event, had an infection, to those that had CLL treatment. The converse of this was used to calculate specificity to CLL treatment in c. Mean differences for b, c were significant (p < 0.005) using one-tailed Mann–Whitney U-Test. Risk factors identified in a-c were also confirmed using multiple univariate tests (Supplementary Data 2, 3). For box-and-whisker plots, whiskers are the 95% confidence interval, white square is the mean, centre line is the median, bounds of box are the interquartile range and black dots are outliers. When patients had missing data for a given feature the SHAP contribution was zero.