Table 11 Ablation study results on the 7B model series

From: Large language models learning to write rhyming Tang poetry A Xunzi Yayun R1 case study

Configuration

Tones

Rhymes

Antithesis

Length

Total

Qwen2.5-7B-Instruct(SFT only)

75.93

61.48

89.88

94.33

76.22

Qwen2.5-7B-Instruct(GRPO only)

69.67

63.27

85.53

81.32

72.09

Qwen2.5-7B-Instruct (SFT + GRPO)

63.54

50.71

80.83

75.35

64.33

Qwen2.5-7B-Instruct(GRPO + RAG)

75.92

75.6

90.08

91.03

80.17

Qwen2.5-7B-Instruct(SFT + RAG)

66.64

69.61

83.45

77.23

71.95

Qwen2.5-7B-Instruct(SFT + GRPO + RAG)

62.61

74.68

81.39

75.3

71.26

  1. Bold values represent the best results for each metric.