Fig. 1: Domain-level success rates by platform and disease type.
From: Evaluating large language models for pharmacotherapy simulations: a mixed-methods study

Proportion of sessions meeting domain-specific pass/fail criteria: Clinical Accuracy & Safety and Clinical Reasoning Fidelity required all subdomains ā„4.0; Instructional Design Quality required all subdomains >3.0 with mean >4.0. A Overall success across the three domains. B Comparison across LLM platforms and overall success. C Comparison by disease type. Error bars = standard error.