Table 3 Human evaluation results.

From: A hallucination detection and mitigation framework for faithful text summarization using LLMs

Method

Coherence

Fluency

Relevance

Consistency

Conciseness

Overall

PEGASUS

3.90

4.00

4.30

3.20

3.00

3.68

BART

3.80

4.10

4.00

3.70

3.70

3.74

ChatGPT

4.20

4.40

4.30

4.10

3.90

4.18

Ours

4.30

4.30

4.50

3.70

4.20

4.20