Table 7 Report generation evaluation results
Model | BLEU | METEOR | ROUGE-L | Average generation time(s) | Average generation length | Average Inner-loop size | Average outer-loop size |
|---|---|---|---|---|---|---|---|
Llama3 | 21.50 | 22.10 | 23.80 | 15.3 | 532 | - | - |
ChatGLM-4 | 23.20 | 23.75 | 24.60 | 16.9 | 426 | - | - |
GPT-4 | 26.40 | 26.90 | 27.50 | 14.8 | 584 | - | - |
RAG-Llama3 | 22.84 | 22.91 | 24.12 | 17.8 | 577 | - | - |
RAG-ChatGLM4 | 24.33 | 24.35 | 24.74 | 17.9 | 453 | - | - |
RAG-GPT4 | 25.12 | 27.24 | 27.53 | 16.5 | 623 | - | - |
Loop-RAG-Llama3 | 24.80 | 25.30 | 26.20 | 22.5 | 1038 | 3 | 4 |
Loop-RAG-ChatGLM4 | 25.60 | 26.20 | 26.90 | 18.2 | 1242 | 3 | 5 |
Loop-RAG-GPT4 | 27.10 | 27.60 | 28.30 | 21.4 | 896 | 4 | 6 |