Table 2 The results of some representative LLMs on the proposed dataset PDCH.

From: A Multimodal Depression Consultation Dataset of Speech and Text with HAMD-17 Assessments

Models

Modality

Precision

Recall

F1

GPT4o-mini-audio-preview

audio

0.383

0.375

0.379

text

0.405

0.400

0.403

text+audio

0.412

0.403

0.407

Qwen2.5-Omni-7B

audio

0.383

0.385

0.384

text

0.428

0.430

0.429

text+audio

0.431

0.433

0.432

Qwen2-Audio-7B-Instruct

audio

0.114

0.107

0.111

text

0.149

0.137

0.143

text+audio

0.130

0.122

0.126

  1. The bold denotes the best performance.