Table 10 Ablation study results on the 32B model series

From: Large language models learning to write rhyming Tang poetry A Xunzi Yayun R1 case study

Configuration

Tones

Rhymes

Antithesis

Length

Total

Qwen2.5-32B-Instruct (SFT only)

79.62

65.84

93.29

98.37

80.10

Qwen2.5-32B-Instruct (GRPO only)

79.74

72.38

94.38

99.22

82.41

Xunzi-Yayun-R1-32B(SFT + GRPO)

77.74

77.36

94.85

99.80

83.25

Qwen2.5-32B-Instruct(GRPO + RAG)

80.89

83.26

93.88

97.55

85.86

Qwen2.5-32B-Instruct(SFT + RAG)

76.81

87.86

94.69

99.77

86.00

Xunzi-Yayun-R1-32B(SFT + GRPO + RAG)

75.63

91.23

94.20

98.76

86.34

  1. Bold values represent the best results for each metric.