Table 11 Model performance on different test sets splits, comparison between virtscribe dialogues with ASR and human transcript.
Test set | Bart Fine-tuning | Test Split | ROUGE-1 | ROUGE-2 | ROUGE-L | medcon |
---|---|---|---|---|---|---|
1 | train | ASR | 48.61 | 18.94 | 41.74 | 42.63 |
+trainASR | ASR | 49.70 | 19.96 | 43.82 | 41.96 | |
train | human | 48.28 | 20.09 | 43.98 | 46.13 | |
+trainASR | human | 48.50 | 19.52 | 43.59 | 42.85 | |
2 | train | ASR | 51.29 | 21.31 | 43.76 | 45.21 |
+trainASR | ASR | 50.42 | 21.30 | 44.68 | 43.71 | |
train | human | 50.11 | 20.80 | 44.44 | 43.35 | |
+trainASR | human | 48.44 | 20.47 | 43.68 | 44.28 | |
3 | train | ASR | 50.41 | 20.01 | 43.79 | 49.91 |
+trainASR | ASR | 49.22 | 19.72 | 43.19 | 44.18 | |
train | human | 50.86 | 19.50 | 44.59 | 45.48 | |
+trainASR | human | 47.42 | 18.42 | 42.67 | 44.72 |