Table 2 Qualitative evaluation of human-written vs. AI-generated discharge summaries.
From: Accurate discharge summary generation using fine tuned large language models with self evaluation
Dimension | Human-Written (Mean ± SD) | AI-Generated (Best Model, Mean ± SD) | Fleiss’ κ (AI) |
|---|---|---|---|
Accuracy | 4.8 ± 0.11 | 4.5 ± 0.19 | 0.81 |
Completeness | 4.9 ± 0.13 | 4.6 ± 0.17 | 0.83 |
Relevance & Clarity | 4.8 ± 0.15 | 4.4 ± 0.26 | 0.78 |
Consistency | 4.7 ± 0.11 | 4.3 ± 0.20 | 0.80 |
Utility | 4.8 ± 0.16 | 4.4 ± 0.18 | 0.82 |