Table 12 Model performance on different test sets splits, comparison between aci dialogues with ASR and ASRcorr transcript.

From: Aci-bench: a Novel Ambient Clinical Intelligence Dataset for Benchmarking Automatic Visit Note Generation

Test set

Bart Fine-tuning

Test Split

ROUGE-1

ROUGE-2

ROUGE-L

medcon

1

train

ASR

54.03

24.19

48.09

45.97

+trainASRcorr

ASR

54.10

24.47

47.92

47.17

train

ASRcorr

54.04

24.35

48.19

46.00

+trainASRcorr

ASRcorr

54.04

24.50

47.85

47.36

2

train

ASR

51.62

23.05

46.01

45.31

+trainASRcorr

ASR

53.14

24.47

47.43

45.34

train

ASRcorr

51.56

23.13

45.97

45.98

+trainASRcorr

ASRcorr

53.44

24.51

47.66

45.47

3

train

ASR

53.07

23.53

47.73

45.27

+trainASRcorr

ASR

52.53

23.14

47.06

46.38

train

ASRcorr

53.13

23.48

47.63

45.52

+trainASRcorr

ASRcorr

52.38

23.00

46.87

45.78

  1. The model finetuned on the train set is the BART + FTSAMSum (Division) fine-tuned with 10 epochs on the original train set, as in the baseline methods. The train + trainASRcorr model refers to the BART + FTSAMSum (Division) finetuned for 3 more epochs on the aci with ASRcorr split of the train set.