Table 2 The error counts (mean and standard deviation) of the LLMs’ responses on the 1050 RD exam questions.

From: Evaluation of LLMs accuracy and consistency in the registered dietitian exam through prompt engineering and knowledge retrieval

Benchmark

Prompt

GPT-4o

Claude 3.5 S.

Gemini 1.5 P.

RD Exam

Zero Shot

84.8 (2.93)

104.6 (1.02)

96.8 (1.17)

Chain of Thought

59.6 (1.85)

80.6 (2.87)

117.4 (6.62)

Chain of Thought w. Self Consistency

58.0 (2.28)

77.0 (1.67)

104.8 (4.12)

Retrieval Augmented Prompting

75.8 (2.86)

113.2 (1.94)

108.6 (1.20)