Table 2 Results of comparative and ablation analysis of SpeechCARE components

From: SpeechCARE: dynamic multimodal modeling for cognitive screening in diverse linguistic and speech task contexts

Model

Validation

Test

AUC

F1-score

AUC

F1-score

Acoustic-only refinements

 mHuBERT (Base Model)

84.03 ± 1.09

66.78 ± 2.27

84.07 ± 0.60

66.80 ± 1.25

 mHuBERT + CLS Embedding

84.92 ± 1.23

68.06 ± 1.83

84.99 ± 0.60

67.77 ± 1.06

 mHuBERT + CLS Embedding + Segmentation

84.55 ± 1.14

67.60 ± 1.56

84.85 ± 0.70

68.23 ± 1.11

Single-modality baselines and modalities integration

 All Demographics (Age, Gender, Education)

72.78 ± 0.79

55.82 ± 0.80

72.31 ± 0.71

55.70 ± 0.43

 Voice (mHuBERT + CLS Embedding + Segmentation)

84.55 ± 1.14

67.60 ± 1.56

84.85 ± 0.70

68.23 ± 1.11

 Transcription (mGTE)

81.26 ± 1.17

63.70 ± 1.26

85.00 ± 0.40

68.88 ± 0.78

 Fusion-AGF: Voice + Transcription

84.42 ± 1.95

67.57 ± 2.36

86.57 ± 0.45

70.51 ± 0.93

 Fusion-AGF: Voice + Transcription + All demographics

83.49 ± 0.96

66.14 ± 1.74

85.49 ± 0.69

68.42 ± 0.78

 Fusion-AGF: Voice + Transcription + Education

83.19 ± 2.09

66.02 ± 2.87

85.99 ± 0.68

69.20 ± 1.00

 Fusion-AGF: Voice + Transcription + Gender

84.02 ± 1.67

66.78 ± 2.18

86.35 ± 0.45

69.95 ± 0.68

 Fusion-AGF: Voice + Transcription + Age

84.97 ± 1.57

68.12 ± 2.69

86.83 ± 0.46

72.11 ± 0.44

Fusion strategies

 Intermediate Fusion

85.07 ± 1.35

67.94 ± 2.09

86.28 ± 0.48

70.10 ± 0.78

 Scaled Late Fusion

85.19 ± 1.38

68.62 ± 2.77

86.21 ± 0.57

70.29 ± 1.13

 Cross-Modal Attention (+Intermediate Fusion)

85.49 ± 1.50

68.58 ± 2.05

86.61 ± 0.56

70.51 ± 0.71

 Adaptive Gating Fusion (AGF)

84.97 ± 1.57

68.12 ± 2.69

86.83 ± 0.46

72.11 ± 0.44

Noise reduction (audio preprocessing)

 SpeechCARE-AGF: Raw Audio

84.25 ± 1.33

85.76 ± 0.80

85.76 ± 0.80

69.15 ± 1.01

 SpeechCARE-AGF: CMGAN-Enhanced Audio

83.22 ± 1.89

65.34 ± 2.26

85.80 ± 1.10

69.00 ± 1.12

 SpeechCARE-AGF: Low-Pass Filtered Audio

84.97 ± 1.57

68.12 ± 2.69

86.83 ± 0.46

72.11 ± 0.44

  1. best F1-score of each category is presented in bold.