Table 1 Overview of tasks in the DRAGON benchmark

From: The DRAGON benchmark for clinical NLP

ID

Name

Task type

Metric

Number of development cases

Number of testing cases

T1

Adhesion presence

SL Bin Clf

AUROC

397

166

T2

Pulmonary nodule presence

SL Bin Clf

AUROC

1000

200

T3

Kidney abnormality identification

SL Bin Clf

AUROC

417

183

T4

Skin histopathology case selection

SL Bin Clf

AUROC

531

225

T5

RECIST timeline

SL Bin Clf

AUROC

278

119

T6

Histopathology cancer origin

SL Bin Clf

AUROC

715

304

T7

Pulmonary nodule size presence

SL Bin Clf

AUROC

348

66

T8

PDAC size presence

SL Bin Clf

AUROC

418

179

T9

PDAC diagnosis

SL MC Clf

Unweighted Kappa

1374

588

T10

Prostate radiology suspicious lesions

SL MC Clf

Linearly Weighted Kappa

5111

2229

T11

Prostate histopathology significant cancers

SL MC Clf

Linearly Weighted Kappa

2213

952

T12

Histopathology tissue type

SL MC Clf

Unweighted Kappa

707

304

T13

Histopathology tissue origin

SL MC Clf

Unweighted Kappa

718

297

T14

Entailment diagnostic sentences

SL MC Clf

Linearly Weighted Kappa

12,627

1422

T15

Colon histopathology diagnosis

ML Bin Clf

Macro AUROC

2748

1177

T16

RECIST lesion size presence

ML Bin Clf

AUROC

278

119

T17

PDAC attributes

ML MC Clf

Unweighted Kappa

418

179

T18

Hip Kellgren-Lawrence scoring

ML MC Clf

Unweighted Kappa

4803

172

T19

Prostate volume measurement

SL Reg

RSMAPES (ε = 4 cm3)

5138

2170

T20

Prostate specific antigen measurement

SL Reg

RSMAPES (ε = 0.4 ng/mL)

4759

2046

T21

Prostate specific antigen density measurement

SL Reg

RSMAPES (ε = 0.04 ng/mL2)

4700

2020

T22

PDAC size measurement

SL Reg

RSMAPES (ε = 4 mm)

343

147

T23

Pulmonary nodule size measurement

SL Reg

RSMAPES (ε = 4 mm)

186

32

T24

RECIST lesion size measurements

ML Reg

RSMAPES (ε = 4 mm)

278

119

T25

Anonymization

SL NER

Macro F1

3078

1307

T26

Medical terminology recognition

SL NER

F1

175

75

T27

Prostate biopsy sampling

ML NER

Weighted F1

349

146

T28

Skin histopathology diagnosis

ML NER

Weighted F1

439

185

  1. AUROC area under the receiver operating characteristic curve, SL single-label, ML multi-label, Bin binary, MC multi-class, Clf classification, Reg regression, NER named entity recognition, RSMAPES Robust Symmetric Mean Absolute Percentage Error Score, RECIST response evaluation criteria in solid tumors, PDAC pancreatic ductal adenocarcinoma.