Table 2 The results of some representative LLMs on the proposed dataset PDCH.
From: A Multimodal Depression Consultation Dataset of Speech and Text with HAMD-17 Assessments
Models | Modality | Precision | Recall | F1 |
|---|---|---|---|---|
GPT4o-mini-audio-preview | audio | 0.383 | 0.375 | 0.379 |
text | 0.405 | 0.400 | 0.403 | |
text+audio | 0.412 | 0.403 | 0.407 | |
Qwen2.5-Omni-7B | audio | 0.383 | 0.385 | 0.384 |
text | 0.428 | 0.430 | 0.429 | |
text+audio | 0.431 | 0.433 | 0.432 | |
Qwen2-Audio-7B-Instruct | audio | 0.114 | 0.107 | 0.111 |
text | 0.149 | 0.137 | 0.143 | |
text+audio | 0.130 | 0.122 | 0.126 |