Table 3 Performance Comparison of RF and BERT + LSTM Models for the ITU Test Set

From: AI assisted prediction of unplanned intensive care admissions using natural language processing in elective neurosurgery

 

RFs

BERT + LSTM

 

Prec ↑ (mean, CI)

Recall ↑ (mean, CI)

F1↑ (mean, CI)

FN ↓ (mean, CI)

Prec ↑ (mean, CI)

Recall ↑ (mean, CI)

F1↑ (mean, CI)

FN ↓ (mean, CI)

Ward

0.88 (0.84–0.92)

1.00 (1.00- 1.00)

0.94 (0.91–0.96)

-

0.84 (0.80–0.89)

1.00 (1.00– 1.00)

0.92 (0.89–0.94)

-

ITU

1.00 (1.00–1.00)

0.87 (0.82–0.91)

0.93 (0.90–0.95)

0.13 (0.09–0.18)

1.00 (1.00–1.00)

0.82 (0.77–0.87)

0.90 (0.87–0.93)

0.18 (0.13–0.23)

Planned

1.00 (1.00–1.00)

0.85 (0.80–0.91)

0.92 (0.88–0.95)

0.15 (0.09–0.21)

1.00 (1.00- 1.00)

0.81 (0.74–0.87)

0.89 (0.85–0.93)

0.19 (0.13–0.26)

Unplanned

1.00 (1.00–1.00)

0.89 (0.82–0.96)

0.94 (0.90–0.98)

0.11 (0.04–0.18)

1.00 (1.00–1.00)

0.86 (0.77–0.93)

0.92 (0.87–0.96)

0.14 (0.07–0.23)

  1. Results are averaged over 500 runs for the RF model and 5 runs for the BERT + LSTM model. We report precision, recall, F1-score, and False Negative (FN) ratio for each patient group, with a breakdown of the ITU group into planned and unplanned admissions. To estimate the mean and 95% confidence intervals, we employed bootstrapping by resampling the test examples with replacement 1000 times.