Table 4 The performance of LLMs under imbalanced length of doctor-patient conversations.
From: A Multimodal Depression Consultation Dataset of Speech and Text with HAMD-17 Assessments
Models | Imbalance | Training | Precision | Recall | F1 |
|---|---|---|---|---|---|
Qwen2.5-7B-Instruct | Large | No | 0.427 | 0.430 | 0.429 |
Large | Yes | 0.609 | 0.613 | 0.611 | |
Small | No | 0.387 | 0.388 | 0.387 | |
Small | Yes | 0.543 | 0.544 | 0.543 | |
LLaMA3.1-8B-Instruct | Large | No | 0.448 | 0.451 | 0.449 |
Large | Yes | 0.584 | 0.588 | 0.586 | |
Small | No | 0.396 | 0.397 | 0.397 | |
Small | Yes | 0.571 | 0.571 | 0.571 |