Table 2 Results of comparative and ablation analysis of SpeechCARE components
Model | Validation | Test | ||
|---|---|---|---|---|
AUC | F1-score | AUC | F1-score | |
Acoustic-only refinements | ||||
mHuBERT (Base Model) | 84.03 ± 1.09 | 66.78 ± 2.27 | 84.07 ± 0.60 | 66.80 ± 1.25 |
mHuBERT + CLS Embedding | 84.92 ± 1.23 | 68.06 ± 1.83 | 84.99 ± 0.60 | 67.77 ± 1.06 |
mHuBERT + CLS Embedding + Segmentation | 84.55 ± 1.14 | 67.60 ± 1.56 | 84.85 ± 0.70 | 68.23 ± 1.11 |
Single-modality baselines and modalities integration | ||||
All Demographics (Age, Gender, Education) | 72.78 ± 0.79 | 55.82 ± 0.80 | 72.31 ± 0.71 | 55.70 ± 0.43 |
Voice (mHuBERT + CLS Embedding + Segmentation) | 84.55 ± 1.14 | 67.60 ± 1.56 | 84.85 ± 0.70 | 68.23 ± 1.11 |
Transcription (mGTE) | 81.26 ± 1.17 | 63.70 ± 1.26 | 85.00 ± 0.40 | 68.88 ± 0.78 |
Fusion-AGF: Voice + Transcription | 84.42 ± 1.95 | 67.57 ± 2.36 | 86.57 ± 0.45 | 70.51 ± 0.93 |
Fusion-AGF: Voice + Transcription + All demographics | 83.49 ± 0.96 | 66.14 ± 1.74 | 85.49 ± 0.69 | 68.42 ± 0.78 |
Fusion-AGF: Voice + Transcription + Education | 83.19 ± 2.09 | 66.02 ± 2.87 | 85.99 ± 0.68 | 69.20 ± 1.00 |
Fusion-AGF: Voice + Transcription + Gender | 84.02 ± 1.67 | 66.78 ± 2.18 | 86.35 ± 0.45 | 69.95 ± 0.68 |
Fusion-AGF: Voice + Transcription + Age | 84.97 ± 1.57 | 68.12 ± 2.69 | 86.83 ± 0.46 | 72.11 ± 0.44 |
Fusion strategies | ||||
Intermediate Fusion | 85.07 ± 1.35 | 67.94 ± 2.09 | 86.28 ± 0.48 | 70.10 ± 0.78 |
Scaled Late Fusion | 85.19 ± 1.38 | 68.62 ± 2.77 | 86.21 ± 0.57 | 70.29 ± 1.13 |
Cross-Modal Attention (+Intermediate Fusion) | 85.49 ± 1.50 | 68.58 ± 2.05 | 86.61 ± 0.56 | 70.51 ± 0.71 |
Adaptive Gating Fusion (AGF) | 84.97 ± 1.57 | 68.12 ± 2.69 | 86.83 ± 0.46 | 72.11 ± 0.44 |
Noise reduction (audio preprocessing) | ||||
SpeechCARE-AGF: Raw Audio | 84.25 ± 1.33 | 85.76 ± 0.80 | 85.76 ± 0.80 | 69.15 ± 1.01 |
SpeechCARE-AGF: CMGAN-Enhanced Audio | 83.22 ± 1.89 | 65.34 ± 2.26 | 85.80 ± 1.10 | 69.00 ± 1.12 |
SpeechCARE-AGF: Low-Pass Filtered Audio | 84.97 ± 1.57 | 68.12 ± 2.69 | 86.83 ± 0.46 | 72.11 ± 0.44 |