Communications Medicine

Table 1 Overview of response length from LLM-Chatbots to cardiovascular disease prevention queries

From: Large language model comparisons between English and Chinese query performance for cardiovascular prevention

English Response Length	BARD	ChatGPT 3.5	ChatGPT 4	P_{BARD vs. 3.5}	P_{BARD vs. 4}	P_{3.5 vs. 4}	P_ANOVA
Words, mean (SD)	209.08 (70.82)	165.01 (55.60)	213.28 (83.67)	<0.001	0.74	<0.001	<0.001

Chinese Response Length	ERNIE	ChatGPT 3.5	ChatGPT 4	P_{ERNIE vs. 3.5}	P_{ERNIE vs. 4}	P_{3.5 vs. 4}	P_ANOVA
Words, mean (SD)	299.68 (119.10)	320.44 (100.54)	405.73 (134.86)	0.25	<0.001	<0.001	<0.001

The p-values in the table represent the following comparisons: P_{BARD vs. 3.5}: Comparison between Google Bard and ChatGPT-3.5. P_{BARD vs. 4}: Comparison between Google Bard and ChatGPT-4. P_{3.5 vs. 4}: Comparison between ChatGPT-3.5 and ChatGPT-4. P_ANOVA: P-value from the ANOVA test comparing all three models (Google Bard, ChatGPT-3.5, and ChatGPT-4).
SD standard deviation.

Back to article page

Search

Advanced search

Quick links