Table 3 Summary Table of Performance Metrics Across Studies

From: Natural language processing techniques to detect delirium in hospitalized patients from clinical notes: a systematic review

Study

NLP Method

Sensitivity

Specificity

PPV

NPV

F1-Score

AUROC/AUC-PR

Validation

Chen et al.30

Keyword frequency detection

61.8%¹

85.4%¹

NR

NR

NR

0.76 (0.69-0.81)²

Internal: Test-retest reliability

Amjad et al.24

Dictionary-based (D-NLP-pos)

82.8%

94.1%

92.6%

86.0%

87.5%

0.917

Internal: 80/20 split

Dictionary + negation (D-NLP-pos+neg)

89.4%

89.4%

88.3%

90.4%

88.8%

0.914

All-tokens (T-NLP-pos)

84.2%

63.5%

67.3%

81.8%

74.8%

0.826

Fu et al.29

NLP-CAM (rule-based)

91.9%

100%

NR

NR

NR³

NR

Internal: Split sample

NLP-mCAM (rule-based)

82.7%

91.3%

NR

NR

NR³

NR

Amonoo et al.27

ClinicalRegex NLP

NR

NR

NR

NR

NR

NR

Inter-rater: κ = 0.90

St. Sauver, et al.42

Rule-based NLP

64% (56–72%)

84% (74–93%)

NR

NR

NR

NR

Manual review n = 200

Ge, et al.31

Transformer model

99.1%⁴ / 98.5%⁵

NR

98.6%⁴ / 99.1%⁵

NR

97.8%⁶ / 91.8%⁷

0.984⁸

External: LTM dataset

Shao et al.32

LDA topic modeling

48.1%

NR

45.5%

NR

46.8%

NR

Internal: n = 100

ICD-2 method

61.2%

NR

75.7%

NR

67.7%

NR

Keyword search

28.5%

NR

98.4%

NR

44.2%

NR

Chen et al.25

GatorTron (transformer)

81.19%⁹ / 88.23%¹⁰

NR

79.93%⁹ / 86.96%¹⁰

NR

80.55%⁹ / 87.59%¹⁰

NR

Internal: 381/55/110 split

Veeranki et al.33

Random Forest (ML)

NR

NR

NR

NR

NR

0.8804–0.8857¹¹

10-fold CV

Pagali et al.28

NLP-CAM algorithm

80%¹²

NR

NR

NR

NR

NR

Manual chart review

Young et al.43

NLP-Dx-BD (rule-based)

NR¹³

NR¹³

NR

NR

NR

NR

Comparison with CAM-ICU

Young et al.35

NLP-Dx-BD (rule-based)

NR¹³

NR¹³

NR

NR

NR

NR

None

Mikalsen et al.26

Elastic net + anchors

NR

NR

NR

NR

NR

0.962–0.964¹⁴

Bootstrap CI

  1. AUC-PR Area Under the Curve - Precision-Recall, AUROC Area Under the Receiver Operating Characteristic Curve, CAM Confusion Assessment Method, CAM-ICU Confusion Assessment Method for the Intensive Care Unit, CI Confidence Interval, CV Cross-Validation, D-NLP Dictionary-based Natural Language Processing, F1-Score Harmonic mean of precision and recall, ICD International Classification of Diseases, LDA Latent Dirichlet Allocation, LTM Long-Term Memory, ML Machine Learning, NLP Natural Language Processing, NLP-CAM Natural Language Processing - Confusion Assessment Method, NLP-Dx-BD Natural Language Processing Diagnosis - Behavioral Disturbance, NPV Negative Predictive Value, NR Not Reported, PPV Positive Predictive Value, T-NLP Token-based Natural Language Processing.