Table 3 Performance of LLM-Chatbots in refining suboptimal responses with updated model iterations (English)
English Prompts | BARD | ChatGPT 3.5 | ChatGPT 4 |
---|---|---|---|
Number of Suboptimal Responses, n | 9 | 6 | 2 |
Temporal Improvement, n (%)a | 6 (66.7) | 4 (66.7) | 2 (100) |
Self-check, n (%)b | 7 (77.8) | 1 (16.7) | 2 (100) |
Chinese Prompts | ERNIE | ChatGPT 3.5 | ChatGPT 4 |
---|---|---|---|
Number of Suboptimal Responses, n | 12 | 9 | 11 |
Temporal Improvement, n (%)a | 11 (91.6) | 2 (22.2) | 6 (54.5) |
Self-check, n (%)b | 11 (91.6) | 1 (11.1) | 5 (45.4) |