Table 3 Comparison of GatorTronS with existing transformer-based LLMs for clinical concept extraction and medical relation extraction.

From: A study of generative large language model for medical research and healthcare

 

Clinical concept extraction

Medical relation extraction

2010 i2b220

2012 i2b221

2018 n2c222

2018 n2c222

Transformer

Precision

Recall

F1 score

Precision

Recall

F1 score

Precision

Recall

F1 score

Precision

Recall

F1 score

ClinicalBERT

NA

NA

0.878

NA

NA

0.789

0.859

0.883

0.871

0.968

0.941

0.954

GatorTron, 90B

0.875

0.904

0.889

0.764

0.822

0.792

0.876

0.904

0.890

0.972

0.948

0.960

GatorTronS, 1B

0.874

0.907

0.890

0.753

0.812

0.781

0.871

0.892

0.882

0.971

0.945

0.958

GatorTronS, 5B

0.879

0.909

0.894

0.777

0.823

0.799

0.899

0.903

0.901

0.974

0.949

0.962

GatorTronS, 10B

0.882

0.911

0.896

0.765

0.823

0.793

0.887

0.904

0.895

0.974

0.950

0.962

GatorTronS, 20B

0.889

0.911

0.899

0.784

0.836

0.809

0.892

0.907

0.900

0.975

0.947

0.961

  1. B billion words of text Clinical concepts in 2010 i2b2 and 2012 i2b2 challenges: problems, treatments, lab tests; clinical concepts in 2018 n2c2 challenge: drugs, adverse events, and drug-related attributes (e.g., dose). Medical relation in 2018 n2c2 challenge: drug induced adverse events; B: billion words of text. Best evaluation scores are bolded. NA: scores not reported.