Fig. 1: Diagnostic accuracy of LLMs by case (OCD) and control (other psychiatric disorders).

Overall LLM performance: Case (N = 49/51) and control (N = 21/21). ChatGPT-4 and Gemini Pro (Case: 16 OCD vignettes) and Llama 3 (Case: 19 OCD vignettes). All LLMs had the same control group comprised of seven psychiatric disorders (major depressive disorder, generalized anxiety disorder, post-traumatic stress disorder, uni or bipolar depression, depression among adolescents, social anxiety disorder, and panic disorder).