Table 2 Performance of deep learning NLP models to characterize statin nonuse from unstructured clinical notes in persons with ASCVD.

From: Using deep learning-based natural language processing to identify reasons for statin nonuse in patients with atherosclerotic cardiovascular disease

Task

Dataset

Precision*

Recall*

F1 score*

AUC*

Binary classification of statin use

10-fold cross-validation (N = 1,393)

0.88 (0.86–0.90)

0.82 (0.77-0.87)

0.85 (0.83–0.87)

0.94 (0.93–0.95)

 

Test set (N = 349)

0.87 (0.82–0.91)

0.82 (0.76–0.88)

0.84 (0.81–0.88)

0.94 (0.93–0.96)

Two-step classifier* for statin nonuse reasons

10-fold cross-validation (N = 800)

0.63 (0.59–0.65)

0.62 (0.54–0.72)

0.62 (0.59–0.64)

0.84 (0.81–0.85)

 

Test set (N = 200)

0.68 (0.63–0.75)

0.69 (0.60–0.79)

0.68 (0.62–0.75)

0.88 (0.86–0.91)

Multilabel classification of statin nonuse reasons (simple mutlilabel model)

10-fold cross-validation (N = 800)

0.60 (0.58–0.64)

0.61 (0.56–0.66)

0.59 (0.56–0.63)

0.85 (0.83–0.87)

 

Test set (N = 200)

0.64 (0.61–0.70)

0.66 (0.60–0.73)

0.64 (0.58–0.71)

0.86 (0.82–0.89)

  1. *The two-step classifier represents the predicted probabilities of multiple classifiers (each reason for statin nonuse versus others) reconciled by a Random Forest.
  2. ASCVD atherosclerotic cardiovascular disease, NLP natural language processing.