Table 2 The performance of international LLMs in answering questions.
ChatGPT-4o | Gemini 2.0 Pro | Grok3 | χ | P | |
|---|---|---|---|---|---|
Total | 317/400(79.3) | 326/400(81.5) | 269/335(80.3) | 0.642 | 0.733 |
Units of questions | |||||
Unit 1 | 80/100(80) | 78/100(78) | 83/100(83) | 0.802 | 0.670 |
Unit 2 | 80/100(80) | 90/100(90) | 82/97(84.5) | 3.900 | 0.142 |
Unit 3 | 77/100(77) | 85/100(85) | 77/100(77) | 2.634 | 0.268 |
Unit 4 | 69/100(69) | 73/100(73) | 27/38(71.1) | 0.389 | 0.823 |
Text questions/image questions | |||||
Text questions | 260/335(77.6) | 281/335(83.9) | 269/335(80.3) | 4.238 | 0.120 |
Image questions | 46/65(70.8) | 45/65(69.2) | - | 0.037 | 0.848 |
Types of questions | |||||
A1 | 196/254(77.2) | 209/254(82.3) | 202/252(80.3) | 2.089 | 0.352 |
A2-4 | 58/78(74.4) | 61/78(78.2) | 27/33(81.8) | 0.803 | 0.669 |
B | 37/41(90.2) | 39/41(95.1) | 34/41(82.9) | 3.269 | 0.195 |
C | 15/27(55.6) | 17/27(63) | 4/9(44.4) | 0.994 | 0.608 |