Table 1 Statistics of the datasets used in the development and validation - private Mayo clinic dataset and publicly available MIMIC-III and MIMIC-IV

From: Weakly supervised language models for automated extraction of critical findings from radiology reports

Attributes

Dataset

Ā 

Mayo Clinic

MIMIC-III

MIMIC-IV

Training dataset size

15,000

–

–

Test dataset size

80

123

5000

Avg. report length

476.3

53.5

199.3

Modality

Frequency (% reports)

CT-Scan

6345 (41.4)

73 (60.9)

1490 (29.8)

XR

4127 (26.9)

35 (28.4)

1296 (25.9)

MR

3278 (21.4)

9 (7.3)

579 (11.6)

Ultrasound

1507 (9.8)

2 (1.7)

599 (12.0)

Other

73 (0.48)

4 (3.2)

1036 (20.7)

Anatomy

Frequency (% reports)

Chest

1905 (12.4)

45 (36.6)

1834 (36.7)

Head

4770 (31.1)

30 (24.4)

1278 (25.6)

Neck

2835 (18.5)

15 (1.6)

427 (8.5)

Abdomen

5820 (39.9)

28 (31.7)

634 (12.7)

Other

–

4 (3.2)

1024 (20.5)

Result statistics

Frequency (% reports)

Test reports with critical findings

15 of 80 (18.7)

50 of 123 (40.6)

–

Top 5 findings

Small bowel obstruction, Pulmonary embolism, Pleural effusion, Occlusion of Artery, Lesion in kidney

Small bowel obstruction, Ischemic bowel, Subarachnoid hemorrhage, Parenchymal hemorrhage, Subdural Hematoma

–