Table 4 Ablation experiment.
From: A hallucination detection and mitigation framework for faithful text summarization using LLMs
Method | ROUGE-1 | ROUGE-2 | ROUGE-L | FACTCC | BertScore | BartScore |
|---|---|---|---|---|---|---|
CNN/Daily Mail | ||||||
 ChatGPT | 34.45 | 13.98 | 32.84 | 35.43 | 88.80 | – 1.80 |
 One iteration | 36.50 | 13.49 | 26.76 | 36.00 | 89.59 | – 1.70 |
 iterations + No sort | 37.80 | 15.88 | 35.77 | 36.61 | 89.91 | – 1.70 |
Pubmed | ||||||
 ChatGPT | 30.16 | 11.04 | 28.15 | 35.10 | 86.05 | – 1.88 |
 One iteration | 30.98 | 11.33 | 28.75 | 37.78 | 88.14 | – 1.76 |
 iterations + No sort | 30.98 | 11.33 | 28.75 | 37.78 | 88.14 | – 1.76 |