Table 1 Comparable corpora for doctor-patient note generation from conversations.

From: Aci-bench: a Novel Ambient Clinical Intelligence Dataset for Benchmarking Automatic Visit Note Generation

dataset

description

src-len (tok/turns)

target-len (tok/sent)

size

open

MTS-dialogue13

dialogue-note snippets where conversations are created using clinical note sections

142/9

48/3

1701

Y

primock5714

role-played dialogue-note pairs

1489/97

161/23

57

Y

aci-bench [this work]

role-played dialogue-note pairs

1302/55

490/49

207

Y

3 M Health9

dialogue-note pairs where notes are created using conversations

−/−

−/− (hpi only)

1342

N

Abridge8

dialogue-note pairs where notes are created using conversations

1500/−

−/27

6862

N

Augmedix11

real clinical dialogue-note pairs

−/175

−/47

500

N

emr.ai6

real clinical dictation-note pairs

616/1

550/-

9875

N

Nuance7

real clinical dialogue-note pairs

972 /−

452/-1

802k

N

  1. The majority of datasets are proprietary and unshare-able for community evaluation. (src-len = source/transcript length, target-len = target/note length, - = unreported).
  2. 1The authors model sections of the note differently. The number of sources and note sections are different. Here we approximate the average note length by adding the average section lengths together. Average source length was approximated by averaging the sources for different sections.