Table 3 Comparison of the effectiveness of three prompt methods with ChatGPT for USMLE Step 1 samples, clinical non-calculation questions from GPT-4, and calculation questions from GPT-4 (Chi-square statistic was used to calculate P value).

From: Evaluating prompt engineering on GPT-3.5’s performance in USMLE-style medical calculations and clinical scenarios generated by GPT-4

 

ChatGPT direct prompt

ChatGPT CoT prompt

ChatGPT Modified CoT prompt

P value

USMLE step 1 sample

54/95 (61.7%)

59/95 (62.8%)

58/95 (57.4%)

0.734

GPT-4 clinical questions

270/500 (54.0%)

274/500 (54.8%)

257/500 (51.4%)

0.530

GPT-4 calculation questions

397/500 (79.4%)

398/500 (79.6%)

386/500 (77.2%)

0.589