Table 2 The performance of international LLMs in answering questions.

	ChatGPT-4o	Gemini 2.0 Pro	Grok3	χ	P
Total	317/400(79.3)	326/400(81.5)	269/335(80.3)	0.642	0.733
Units of questions
Unit 1	80/100(80)	78/100(78)	83/100(83)	0.802	0.670
Unit 2	80/100(80)	90/100(90)	82/97(84.5)	3.900	0.142
Unit 3	77/100(77)	85/100(85)	77/100(77)	2.634	0.268
Unit 4	69/100(69)	73/100(73)	27/38(71.1)	0.389	0.823
Text questions/image questions
Text questions	260/335(77.6)	281/335(83.9)	269/335(80.3)	4.238	0.120
Image questions	46/65(70.8)	45/65(69.2)	-	0.037	0.848
Types of questions
A1	196/254(77.2)	209/254(82.3)	202/252(80.3)	2.089	0.352
A2-4	58/78(74.4)	61/78(78.2)	27/33(81.8)	0.803	0.669
B	37/41(90.2)	39/41(95.1)	34/41(82.9)	3.269	0.195
C	15/27(55.6)	17/27(63)	4/9(44.4)	0.994	0.608

Quick links

Search