Table 10 Ablation study results on the 32B model series
From: Large language models learning to write rhyming Tang poetry A Xunzi Yayun R1 case study
Configuration | Tones | Rhymes | Antithesis | Length | Total |
|---|---|---|---|---|---|
Qwen2.5-32B-Instruct (SFT only) | 79.62 | 65.84 | 93.29 | 98.37 | 80.10 |
Qwen2.5-32B-Instruct (GRPO only) | 79.74 | 72.38 | 94.38 | 99.22 | 82.41 |
Xunzi-Yayun-R1-32B(SFT + GRPO) | 77.74 | 77.36 | 94.85 | 99.80 | 83.25 |
Qwen2.5-32B-Instruct(GRPO + RAG) | 80.89 | 83.26 | 93.88 | 97.55 | 85.86 |
Qwen2.5-32B-Instruct(SFT + RAG) | 76.81 | 87.86 | 94.69 | 99.77 | 86.00 |
Xunzi-Yayun-R1-32B(SFT + GRPO + RAG) | 75.63 | 91.23 | 94.20 | 98.76 | 86.34 |