Table 5 Zero-shot transfer test on naturalistic speech mixtures spatialized with back-microphone HRTFs. Depicted is the score for each metric, as well as the difference between the scores for the mixtures spatialized with the back-microphone HRTFs and the scores for the mixtures spatialized with the T-microphone HRTFs (that is, as depicted in Table 1; here, difference = T-microphone score – back microphone score).
Input | Spatial cues | SI-SDRi | STOI | STOIi | PESQ | PESQi |
|---|---|---|---|---|---|---|
One-channel | Latent | 7.15 (+0.09) | 0.77 (-0.01) | 0.14 (+0.01) | 2.29 (+0.06) | 0.46 (+0.00) |
Two-channel, bilateral | Latent | 7.86 (+0.07) | 0.79 (+0.00) | 0.16 (+0.00) | 2.36 (+0.00) | 0.53 (+0.00) |
Two-channel, bilateral, IPD | Latent & pre-computed | 9.19 (+0.00) | 0.82 (+0.00) | 0.19 (+0.00) | 2.59 (+0.00) | 0.75 (+0.00) |
Two-channel, bilateral, ILD | Latent & pre-computed | 8.40 (-0.40) | 0.80 (-0.01) | 0.17 (+0.00) | 2.46 (-0.06) | 0.62 (-0.06) |
Two-channel, bilateral, IPD, ILD | Latent & pre-computed | 8.63 (+0.11) | 0.81 (+0.00) | 0.18 (+0.01) | 2.52 (+0.03) | 0.69 (+0.02) |