Table 5 Recall values for the different ablation systems.

From: Scalable multimodal approach for face generation and super-resolution using a conditional diffusion model

#

System

Cond. scale

R@1 (%)

R@3 (%)

R@5 (%)

R@10 (%)

Random selection

0.85

2.50

4.20

8.50

1

Only image guide

6.80

11.10

16.90

22.90

2

Full input

8

16.90

30.50

33.90

45.80

3

No image guide

2

9.30

16.10

23.70

34.70

4

No speaker embedding

8

9.30

13.60

16.10

24.60

5

No audio

4

9.30

15.25

25.40

31.40

6

Only audio

2

7.60

14.40

17.80

28.00

7

Speaker embedding only

4

8.50

13.60

17.80

29.60

8

No additional attributes

4

13.60

20.30

27.10

35.60

  1. Significant values are given in bold.