Extended Data Fig. 2: Error pattern distributions across the five core PsychBench tasks. | Nature Machine Intelligence

Extended Data Fig. 2: Error pattern distributions across the five core PsychBench tasks.

From: A domain-adapted large language model to support clinicians in psychiatric clinical practice

Extended Data Fig. 2: Error pattern distributions across the five core PsychBench tasks.The alternative text for this image may have been generated using AI.

Bar plots summarize the major categories of model errors for each task: Task 1 (clinical information summarization), where most errors arose from onset-pattern misjudgment; Tasks 2 & 3 (diagnosis and differential diagnosis), dominated by inaccuracies in associated-symptom assessment; Task 4 (medication recommendation), where overly conservative treatment decisions represented the majority of errors; and Task 5 (long-term course management), where limitations were primarily attributable to remote-information and detailed-information retention. Percentages represent the proportion of each error type within the task-specific error set.

Source data

Back to article page