Table 2 Qualitative evaluation and average sentence lengths on the CRIS data (test-gen-mhr) and the MIMIC-III data (test-gen-mimic). Models providing data closest to the original data according to all the scores are highlighted in bold.
From: Generation and evaluation of artificial mental health records for Natural Language Processing
PPL | ROUGE-L↑ | BLEU↑ | TER↓ | ∼l | |
---|---|---|---|---|---|
test-gen-mhr | |||||
genuine | − | − | − | − | 22.44 |
all | 7.24 | 0.76 | 40.88 | 0.39 | 17.84 |
top+meta | 15.57 | 0.58 | 25.10 | 0.59 | 15.02 |
one+meta | 37.46 | 0.40 | 10.29 | 0.80 | 10.63 |
key | − | 0.58 | 7.75 | 0.56 | 10.21 |
test-gen-mimic | |||||
genuine | − | − | − | − | 17.55 |
all | 3.22 | 0.81 | 53.45 | 0.31 | 14.5 |
top+meta | 5.14 | 0.68 | 37.28 | 0.49 | 12.39 |
one+meta | 9.75 | 0.47 | 16.66 | 0.74 | 9.72 |
key | − | 0.59 | 8.70 | 0.56 | 7.94 |