Table 2 Performance comparison stratified by question difficulty for the sleep professional examination

From: A personal health large language model for sleep and fitness coaching

Difficulty

Count

Expert

Gemini Ultra 1.0

PH-LLM

Easy (90–100%)

214

90%

94%

95%

Medium (75–90%)

204

81%

78%

80%

Hard (0–75%)

211

53%

55%

57%

  1. Performance of PH-LLM is compared to that of Gemini Ultra 1.0 and human experts. Questions were classified as ‘Easy’, ‘Medium’ or ‘Hard’ based on the reported percentage range of human test takers who answered the corresponding questions correctly. Bold values indicate the highest performance.