Fig. 3: CLL-TIM benchmarking on internal test cohort.
From: Machine learning can identify newly diagnosed patients with CLL at high risk of infection

a Kaplan–Meier graphs of infection-free, CLL treatment-free survival for CLL-TIM high-confidence (HC) predicted high-risk (yellow curve with 95% confidence intervals) and low-risk groups (blue curve with 95% confidence intervals) on a subset of the internal test cohort (BENCH-I, n = 288 with n = 145 high-confidence predictions). Patients in BENCH-I have full CLL-IPI and a full 2-year follow-up. p-value is by log-rank test. b Cumulative incidence plots for CLL-TIM HC predicted high-risk and low-risk groups for CLL treatment (dark green), infection (bright green) and death (grey) as first events on BENCH-I. c Two-year outcome PR-AUC (Precision-Recall Area-Under-Curve) for CLL-TIM and CLL-IPI on BENCH-I. To allow for an equitable comparison, CLL-TIM HC (i.e. with removal of uncertain predictions) was benchmarked against an additional two versions of CLL-IPI score; CLL-IPI with removal of patients in the intermediate-risk category, CLL-IPI NI_4+, and CLL-IPI with removal of the intermediate and high-risk category, CLL-IPI NIH_7+. For box-and-whisker plots, whiskers are 95% confidence intervals generated using 5000 bootstrapped datasets sampled from each respective cohort (See Methods), white square is the mean, centre line is the median, bounds of box are the interquartile range and black dots are outliers. We performed model comparison using a one-tailed Mann–Whitney U-test on the mean difference of the PR-AUC over the bootstrapped datasets. *** indicates p < 0.0005. d Average missing feature rate for patients in internal test cohort. Shaded distributions are blue – low-risk high-confidence predictions, gray – low-confidence predictions, gold – high-confidence high-risk predictions. Missing feature rate is the percentage of CLL-TIM’s 228 features that were missing for the given patient. Data shown for CLL-TIM’s predictions on the internal Danish test cohort (n = 646).