Table 1 Overview of response length from LLM-Chatbots to cardiovascular disease prevention queries

From: Large language model comparisons between English and Chinese query performance for cardiovascular prevention

English Response Length

BARD

ChatGPT 3.5

ChatGPT 4

PBARD vs. 3.5

PBARD vs. 4

P3.5 vs. 4

PANOVA

Words, mean (SD)

209.08 (70.82)

165.01 (55.60)

213.28 (83.67)

<0.001

0.74

<0.001

<0.001

Chinese Response Length

ERNIE

ChatGPT 3.5

ChatGPT 4

PERNIE vs. 3.5

PERNIE vs. 4

P3.5 vs. 4

PANOVA

Words, mean (SD)

299.68 (119.10)

320.44 (100.54)

405.73 (134.86)

0.25

<0.001

<0.001

<0.001

  1. The p-values in the table represent the following comparisons: PBARD vs. 3.5: Comparison between Google Bard and ChatGPT-3.5. PBARD vs. 4: Comparison between Google Bard and ChatGPT-4. P3.5 vs. 4: Comparison between ChatGPT-3.5 and ChatGPT-4. PANOVA: P-value from the ANOVA test comparing all three models (Google Bard, ChatGPT-3.5, and ChatGPT-4).
  2. SD standard deviation.