Fig. 8: Analysis of GPT-3.5 and GPT-4 response consistency as influenced by temperature, prompt design, and random seeds.

The number of correct answers across three runs is shown for GPT-3.5 0 a without and c with a fixed seed parameter, and using (a, c) a simple raw prompt versus (b, d) an optimized engineered prompt.