Table 11 Model performance on different test sets splits, comparison between virtscribe dialogues with ASR and human transcript.

Test set	Bart Fine-tuning	Test Split	ROUGE-1	ROUGE-2	ROUGE-L	medcon
1	train	ASR	48.61	18.94	41.74	42.63
	+train_ASR	ASR	49.70	19.96	43.82	41.96
	train	human	48.28	20.09	43.98	46.13
	+train_ASR	human	48.50	19.52	43.59	42.85
2	train	ASR	51.29	21.31	43.76	45.21
	+train_ASR	ASR	50.42	21.30	44.68	43.71
	train	human	50.11	20.80	44.44	43.35
	+train_ASR	human	48.44	20.47	43.68	44.28
3	train	ASR	50.41	20.01	43.79	49.91
	+train_ASR	ASR	49.22	19.72	43.19	44.18
	train	human	50.86	19.50	44.59	45.48
	+train_ASR	human	47.42	18.42	42.67	44.72

The model finetuned on the train set is the BART + FT_SAMSum (Division) fine-tuned with 10 epochs on the original train set, as in the baseline methods. The train + train_ASR model refers to the BART + FT_SAMSum (Division) finetuned for 3 more epochs on the virtscribe with ASR split of the train set.

Quick links

Search