Table 1 The comparison of existing depression detection datasets.

Datasets	Language	Modality	#Inst.	#Persons	Length	Scale	Data Source
AVEC2013²⁰	German	video, audio	150	292	—	BDI-II	human-computer interaction
AVEC2014²¹	German	video, audio	300	84	274 min	BDI-II	human-computer interaction
DAIC-WoZ¹⁶	English	video, audio, text	189	193	2,756 min	PHQ-8	human-computer interaction
E-DAIC³⁵	English	video, audio, text	275	351	4,282 min	PHQ-8	human-computer interaction
BlackDog¹⁷	English	video, audio, text	60	60	—	DSM-IV	answering open-ended questions
Mundt³⁶	English	audio	35	35	—	HAMD-17, QIDS	automated telephone interface
MODMA¹⁹	Chinese	EEG, audio	53	53	431 min	PHQ-9	real-world clinical consultation
DepressionEmo²⁵	English	text	6,037		—	—	Reddit posts
WU3D²⁴	Chinese	text	—	30,000	—	—	Weibo posts
PDCH (Ours)	Chinese	audio, text	100	100	2,937 min	HAMD-17	real-world clinical consultation

"#Inst.” and “#Persons” denote the number of instances and participants, respectively. “Length” represents the length of all audio records.

Quick links

Search