Fig. 3
From: Feedback-integrated prompt optimiser for problem formulation

Training performance progression for the CoT prompt using Qwen-2.5. The graph illustrates accuracy across optimisation steps for both the training dataset (blue) and the feedback dataset (red). Accuracy is measured as the percentage of problem formulations that correctly yield the ground truth solution. Compared to the Standard prompt, the CoT prompt shows slower improvement and lower final accuracy.