Table 2 Comparison of GatorTron with existing biomedical and clinical transformer models for semantic textual similarity, natural language inference, and question answering.

From: A large language model for electronic health records

Transformer

Semantic textual similarity

Natural language inference

Question answering

2019 n2c266

MedNLI71

emrQA medication77

emrQA relation77

Pearson correlation

Accuracy

F1 score

Exact Match

F1 score

Exact Match

BioBERT

0.8744

0.8050

0.6997

0.2475

0.9262

0.8361

ClinicalBERT

0.8787

0.8270

0.6905

0.2406

0.9306

0.8533

BioMegatron

0.8806

0.8390

0.7231

0.2882

0.9405

0.879

GatorTron-base (1/4 data)

0.8675

0.8643

0.7281

0.2952

0.9390

0.8579

GatorTron-base

0.8810

0.8670

0.7181

0.2978

0.9543

0.9029

GatorTron-medium

0.8903

0.8720

0.7354

0.3018

0.9677

0.9243

GatorTron-large

0.8896

0.9020

0.7408

0.3155

0.9719

0.9310

  1. The best evaluation scores are presented in bold.