Table 6 Response stability of different large language models.
Model | Proportion of effective responses |
|---|---|
Bloom-7B1 | 98.6% |
Qwen-7B-Chat-Int4 | 99.0% |
Qwen-7B-Chat | 99.1% |
Deepseek-7B | 99.6% |
ChatGPT-3.5-turb | 99.7% |
ChatGPT-4 | 99.8% |
Model | Proportion of effective responses |
|---|---|
Bloom-7B1 | 98.6% |
Qwen-7B-Chat-Int4 | 99.0% |
Qwen-7B-Chat | 99.1% |
Deepseek-7B | 99.6% |
ChatGPT-3.5-turb | 99.7% |
ChatGPT-4 | 99.8% |