Table 3 The results of some representative LLMs on the MODMA dataset and our proposed dataset PDCH.

From: A Multimodal Depression Consultation Dataset of Speech and Text with HAMD-17 Assessments

Models

Input

Datasets

MODMA

PDCH (Ours)

GPT4o-mini-audio-preview

audio

0.423

0.303

Qwen2.5-Omni-7B

audio

0.654

0.364

Qwen2-Audio-7B-Instruct

audio

0.346

0.061