Table 1 Micro and macro accuracy of (%) of our model on (a) CASI abbreviations, (b) i2b2 generated by hand labeling, (c) i2b2 generated by RS, and (d) MIMIC III generated by RS.

From: Automatically disambiguating medical acronyms with ontology-aware deep learning

 

(a) CASI accuracy

(b) i2b2 hand-labeled accuracy

(c) i2b2 RS accuracy

(d) MIMIC-III accuracy

Dataset generation method

Hand-labeled

Hand-labeled

RS

RS

 

Macro

Micro

Macro

Micro

Macro

Micro

Macro

Micro

Sampling method

Control

0.672

0.673

0.702

0.682

0.869

0.850

0.948

0.917

Control + global

0.686*

0.687

0.738

0.745

0.877*

0.862

0.955*

0.929

SWR

0.705*

0.708

0.701

0.680

0.864

0.834

0.948

0.914

SWR + global

0.715*

0.712

0.701

0.677

0.873*

0.850

0.956*

0.931

Relatives

0.813*

0.806

0.833*

0.795

0.873

0.827

0.945

0.910

Relatives + global

0.825**

0.820

0.855**

0.816

0.886**

0.842

0.954**

0.925

Relatives + global + HP

0.841***

0.834

0.859

0.825

0.889***

0.848

0.961***

0.935

Clinical BERT

0.648

0.643

0.602

0.591

0.824

0.788

0.917

0.871

Clinical BERT + Relatives

0.721****

0.717

0.690****

0.699

–

–

–

–

  1. *p < 0.05 (one-sided Wilcoxon signed-rank test compared with Control model).
  2. **p < 0.02 (one-sided Wilcoxon signed-rank test compared with Relatives model).
  3. ***p < 0.01 (one-sided Wilcoxon signed-rank test compared with Relatives + global model).
  4. ****p < 0.03 (one-sided Wilcoxon signed-rank test compared with Clinical BERT model).
  5. We sample training data with replacement (SWR)and augmentation with related medical concepts (Relatives). We report results for when we incorporate the ontology during pretraining (HP) and the global context of the note (global). Bolded values indicate the best-performing model for each column. We have omitted running ClinicalBERT + Relatives on the RS datasets due to computational constraints.