Table 11 Ablation study results on the 7B model series
From: Large language models learning to write rhyming Tang poetry A Xunzi Yayun R1 case study
Configuration | Tones | Rhymes | Antithesis | Length | Total |
|---|---|---|---|---|---|
Qwen2.5-7B-Instruct(SFT only) | 75.93 | 61.48 | 89.88 | 94.33 | 76.22 |
Qwen2.5-7B-Instruct(GRPO only) | 69.67 | 63.27 | 85.53 | 81.32 | 72.09 |
Qwen2.5-7B-Instruct (SFT + GRPO) | 63.54 | 50.71 | 80.83 | 75.35 | 64.33 |
Qwen2.5-7B-Instruct(GRPO + RAG) | 75.92 | 75.6 | 90.08 | 91.03 | 80.17 |
Qwen2.5-7B-Instruct(SFT + RAG) | 66.64 | 69.61 | 83.45 | 77.23 | 71.95 |
Qwen2.5-7B-Instruct(SFT + GRPO + RAG) | 62.61 | 74.68 | 81.39 | 75.3 | 71.26 |