Table 7 Image fidelity metrics for the different ablation systems.
# | System | Cond. scale | Average PSNR | Average SSIM |
|---|---|---|---|---|
1 | Only image guide | – | 16.74 | 0.4 |
2 | Full input | 8 | 14.37 | 0.27 |
3 | No image guide | 2 | 10.46 | 0.22 |
4 | No speaker embedding | 8 | 16.29 | 0.37 |
5 | No audio | 4 | 16.5 | 0.39 |
6 | Only audio | 2 | 10.57 | 0.22 |
7 | Speaker embedding only | 4 | 10.52 | 0.2 |
8 | No additional attributes | 4 | 15.71 | 0.34 |