Table 7 Image fidelity metrics for the different ablation systems.

From: Scalable multimodal approach for face generation and super-resolution using a conditional diffusion model

#

System

Cond. scale

Average PSNR

Average SSIM

1

Only image guide

16.74

0.4

2

Full input

8

14.37

0.27

3

No image guide

2

10.46

0.22

4

No speaker embedding

8

16.29

0.37

5

No audio

4

16.5

0.39

6

Only audio

2

10.57

0.22

7

Speaker embedding only

4

10.52

0.2

8

No additional attributes

4

15.71

0.34

  1. Significant values are given in bold.