Table 1 Comparable corpora for doctor-patient note generation from conversations.

dataset	description	src-len (tok/turns)	target-len (tok/sent)	size	open
MTS-dialogue¹³	dialogue-note snippets where conversations are created using clinical note sections	142/9	48/3	1701	Y
primock57¹⁴	role-played dialogue-note pairs	1489/97	161/23	57	Y
aci-bench [this work]	role-played dialogue-note pairs	1302/55	490/49	207	Y
3 M Health⁹	dialogue-note pairs where notes are created using conversations	−/−	−/− (hpi only)	1342	N
Abridge⁸	dialogue-note pairs where notes are created using conversations	1500/−	−/27	6862	N
Augmedix¹¹	real clinical dialogue-note pairs	−/175	−/47	500	N
emr.ai⁶	real clinical dictation-note pairs	616/1	550/-	9875	N
Nuance⁷	real clinical dialogue-note pairs	972 /−	452/-¹	802k	N

The majority of datasets are proprietary and unshare-able for community evaluation. (src-len = source/transcript length, target-len = target/note length, - = unreported).
¹The authors model sections of the note differently. The number of sources and note sections are different. Here we approximate the average note length by adding the average section lengths together. Average source length was approximated by averaging the sources for different sections.

Quick links

Search