Table 1 A comparison of traditional NLP models (top) vs pretrained LLMs (bottom)

From: The foundational capabilities of large language models in predicting postoperative risks using clinical notes

Model

Death in 30 days

PE

Pneumonia

DVT

AKI

Delirium

 

AUROC

AUPRC

AUROC

AUPRC

AUROC

AUPRC

AUROC

AUPRC

AUROC

AUPRC

AUROC

AUPRC

cbow

Mean: 0.528

CI: (0.409, 0.648)

Mean: 0.023

CI: (0.015, 0.031)

Mean: 0.506

CI: (0.418, 0.593)

Mean: 0.004

CI: (0.002, 0.006)

Mean: 0.526

CI: (0.384, 0.668)

Mean: 0.009

CI: (0.001, 0.016)

Mean: 0.524

CI: (0.457, 0.59)

Mean: 0.006

CI: (0.005, 0.007)

Mean: 0.56

CI: (0.488, 0.632)

Mean: 0.156

CI: (0.125, 0.187)

Mean: 0.501

CI: (0.464, 0.539)

Mean: 0.474

CI: (0.441, 0.507)

Doc2vec

Mean: 0.479

CI: (0.348, 0.611)

Mean: 0.021

CI: (0.012, 0.03)

Mean: 0.517

CI: (0.466, 0.567)

Mean: 0.004

CI: (0.004, 0.004)

Mean: 0.495

CI: (0.347, 0.643)

Mean: 0.006

CI: (0.003, 0.01)

Mean: 0.531

CI: (0.443, 0.619)

Mean: 0.007

CI: (0.004, 0.01)

Mean: 0.523

CI: (0.421, 0.624)

Mean: 0.146

CI: (0.092, 0.199)

Mean: 0.484

CI: (0.439, 0.53)

Mean: 0.466

CI: (0.417, 0.514)

fastText

Mean: 0.725

CI: (0.67, 0.781)

Mean: 0.05

CI: (0.04, 0.06)

Mean: 0.652

CI: (0.602, 0.701)

Mean: 0.007

CI: (0.005, 0.01)

Mean: 0.696

CI: (0.643, 0.749)

Mean: 0.016

CI: (0.008, 0.024)

Mean: 0.694

CI: (0.642, 0.746)

Mean: 0.014

CI: (0.011, 0.017)

Mean: 0.726

CI: (0.702, 0.75)

Mean: 0.273

CI: (0.239, 0.307)

Mean: 0.565

CI: (0.541, 0.589)

Mean: 0.533

CI: (0.513, 0.554)

GloVe

Mean: 0.818

CI: (0.807, 0.83)

Mean: 0.128

CI: (0.118, 0.139)

Mean: 0.664

CI: (0.628, 0.701)

Mean: 0.01

CI: (0.007, 0.013)

Mean: 0.765

CI: (0.732, 0.799)

Mean: 0.04

CI:(0.017, 0.063)

Mean: 0.723

CI: (0.7, 0.745)

Mean: 0.019

CI: (0.013, 0.024)

Mean: 0.81

CI: (0.805, 0.815)

Mean: 0.441

CI: (0.43, 0.451)

Mean: 0.666

CI: (0.652, 0.681)

Mean: 0.636

CI: (0.613, 0.66)

bioClinicalBERT

Mean: 0.85

CI: (0.84, 0.861)

Mean: 0.156

CI: (0.138, 0.173)

Mean: 0.683

CI: (0.621, 0.745)

Mean: 0.008

CI: (0.006, 0.011)

Mean: 0.809

CI: (0.785, 0.833)

Mean: 0.043

CI: (0.027, 0.059)

Mean: 0.76

CI: (0.723, 0.796)

Mean: 0.02

CI: (014, 0.027)

Mean: 0.83

CI: (0.828, 0.831)

Mean: 0.469

CI: (0.457, 0.48)

Mean: 0.68

CI: (0.663, 0.697)

Mean: 0.653

CI: (0.626, 0.68)

bioGPT

Mean: 0.862

CI: (0.851, 0.872)

Mean: 0.161

CI: (0.141, 0.182)

Mean: 0.711

CI: (0.679, 0.743)

Mean: 0.011

CI: (0.005, 0.017)

Mean: 0.818

CI: (0.8, 0.837)

Mean: 0.047

CI: (0.037, 0.058)

Mean: 0.773

CI: (0.734, 0.813)

Mean: 0.024

CI: (0.016, 0.032)

Mean: 0.835

CI: (0.833, 0.838)

Mean: 0.478

CI: (0.465, 0.492)

Mean: 0.691

CI: (0.672, 0.71)

Mean: 0.664

CI: (0.638, 0.69)

ClinicalBERT

Mean: 0.855

CI: (0.842, 0.867)

Mean: 0.155

CI: (0.137, 0.173)

Mean: 0.717

CI: (0.691, 0.743)

Mean: 0.013

CI: (0.009, 0.017)

Mean: 0.806

CI: (0.784, 0.827)

Mean: 0.04

CI: (0.024, 0.056)

Mean: 0.764

CI: (0.73, 0.799)

Mean: 0.022

CI: (0.015, 0.03)

Mean: 0.83

CI: (0.827, 0.833)

Mean: 0.469

CI: (0.458, 0.48)

Mean: 0.686

CI: (0.671, 0.702)

Mean: 0.66

CI: (0.634, 0.686)

  1. The results are presented as the mean and 95% confidence interval (CI) across all 5-folds. The best performing models are underlined. As shown amongst the results, the pretrained LLMs consistently outperform traditional word embeddings.