Fig. 1: Setup for developing and benchmarking machine learning (ML) models to predict daily ischemic stroke admissions and their performance. | npj Digital Medicine

Fig. 1: Setup for developing and benchmarking machine learning (ML) models to predict daily ischemic stroke admissions and their performance.

From: Machine learning-based forecasting of daily acute ischemic stroke admissions using weather data

Fig. 1

a Six years (2015–2020; n = 2190 days) constituted the training set, wherein 5 × 5-fold, time-stratified, nested cross-validation was performed to optimize hyperparameters of the benchmarked ML models. The optimized models were then applied to the hold-out test set (2021; n = 365 days) in a regression setting. The investigated ML models (horizontal facet panels) included both well-established statistical models like Poisson regression (baseline) and boosted generalized additive models (GAM) as well as shallow ML algorithms such as support vector regression (SVR), random forest (RF) and extreme gradient boosting (XGB). For each year (vertical facet panels), the daily number of observed (blue lines) and ML-predicted (red lines) AIS cases were smoothed for a two-week period. b Combination plot showing the histogram (top) of the observed number of AIS admission in the test set (2021), alongside the mean absolute error (MAE) of the respective ML model (center, blue shades). XGB outperformed all other models achieving near-zero MAE within the 2–4 cases range. c Box- and violin plots of residuals (predicted-observed) with mean (\(\bar{X}\)) and median values, along with corresponding p-values (signif. in black) of pairwise Wilcoxon signed-rank tests (Ntests = 10) after Holm correction. XGB showed the widest distribution around 0 (\(\bar{X}\) = 0.28, median = 0). Although SVR had the lowest \(\bar{X}\) = 0.09 (median = 0.17), it produced a broader range of predictions, resulting in significantly lower MAE compared to Poisson (median = \(\bar{X}\) = 0.39, p = 2.2 × 10−16), GAM (\(\bar{X}\) = 0.23, median = 0.5, p = 1.1 × 10−8), RF (\(\bar{X}\) = 0.29, median = 0.49, p = 3.4 × 10−6) and XGB (p = 1.1 × 10−6). RF showed residuals similarly narrow to XGB (p = 0.24). Only XGB effectively learned the quantized prediction space of patient counts, while only Poisson predicted very rare days with >5 admissions (Supplementary Fig. 1c, d).

Back to article page