Table 3 Human evaluation results.
From: A hallucination detection and mitigation framework for faithful text summarization using LLMs
Method | Coherence | Fluency | Relevance | Consistency | Conciseness | Overall |
|---|---|---|---|---|---|---|
PEGASUS | 3.90 | 4.00 | 4.30 | 3.20 | 3.00 | 3.68 |
BART | 3.80 | 4.10 | 4.00 | 3.70 | 3.70 | 3.74 |
ChatGPT | 4.20 | 4.40 | 4.30 | 4.10 | 3.90 | 4.18 |
Ours | 4.30 | 4.30 | 4.50 | 3.70 | 4.20 | 4.20 |