Table 6 Response stability of different large language models.

Model	Proportion of effective responses
Bloom-7B1	98.6%
Qwen-7B-Chat-Int4	99.0%
Qwen-7B-Chat	99.1%
Deepseek-7B	99.6%
ChatGPT-3.5-turb	99.7%
ChatGPT-4	99.8%

Search