Table 3 Detailed thresholds and performance for each detector on the development set. MaxDP strategy ensures high precision across all detectors, regardless of the missed detection in a single detector.
From: Brain-inspired perception-decision machine for fake speech detection
Domain | Detector ID | Model architecture | Threshold (\(\theta _{opt}\)) | Precision (%) | Recall (%) |
|---|---|---|---|---|---|
Time-Sequence (SA) | SA_1 | SincNet + ResBlock | 0.4614 | 100.0 | 52.5 |
SA_2 | SincNet + ResBlock | 0.1305 | 100.0 | 48.4 | |
SA_3 | SincNet + ResBlock | 0.2726 | 100.0 | 30.2 | |
SA_4 | SincNet + ResBlock | 0.4573 | 100.0 | 35.1 | |
SA_5 | SincNet + ResBlock | 0.2561 | 100.0 | 49.3 | |
SA_6 | SincNet + ResBlock | 0.9163 | 100.0 | 11.0 | |
Frequency (FA) | FA_1 | ResNet-18 | 0.9906 | 100.0 | 15.2 |
FA_2 | ResNet-18 | 0.8107 | 100.0 | 32.1 | |
FA_3 | ResNet-18 | 0.9912 | 100.0 | 18.5 | |
FA_4 | ResNet-18 | 0.9933 | 100.0 | 38.0 | |
FA_5 | ResNet-18 | 0.8679 | 100.0 | 20.2 | |
FA_6 | ResNet-18 | 0.9979 | 100.0 | 19.1 | |
Time-Freq (PA) | PA_1 | CNN + GRU | 0.9944 | 100.0 | 12.5 |
PA_2 | CNN + GRU | 0.9963 | 100.0 | 10.1 | |
PA_3 | CNN + GRU | 0.9966 | 100.0 | 15.3 | |
PA_4 | CNN + GRU | 0.9922 | 100.0 | 8.4 | |
PA_5 | CNN + GRU | 0.9995 | 100.0 | 23.0 | |
PA_6 | CNN + GRU | 0.9959 | 100.0 | 16.1 | |
Phoneme (WA) | WA_1 | XLSR + CNN | 0.9993 | 100.0 | 30.5 |
WA_2 | XLSR + CNN | 0.3345 | 100.0 | 48.1 | |
WA_3 | XLSR + CNN | 0.9985 | 100.0 | 32.4 | |
WA_4 | XLSR + CNN | 0.1319 | 100.0 | 55.6 | |
WA_5 | XLSR + CNN | 0.4596 | 100.0 | 29.8 | |
WA_6 | XLSR + CNN | 0.9998 | 100.0 | 15.0 |