Table 8 The performance of different systems with respect to different conditional scale values.
System | Cond. scale | R@1 (%) | R@3 (%) | R@5 (%) | R@10 (%) | Gender accuracy (%) | Ethnicity accuracy (%) | Age group accuracy (%) | RMSE of age (years) | Average PSNR | Average SSIM |
|---|---|---|---|---|---|---|---|---|---|---|---|
Random selection | – | 0.85 | 2.50 | 4.20 | 8.50 | – | – | – | – | – | – |
Only image guide | – | 6.80 | 11.10 | 16.90 | 22.90 | – | – | – | – | – | – |
Full input | 0 | 5.10 | 7.60 | 14.40 | 20.30 | 71.20 | 46.60 | 42.40 | 7.55 | 16.72 | 0.4 |
2 | 9.30 | 18.60 | 28.80 | 42.40 | 89.00 | 62.70 | 55.10 | 5.87 | 16.52 | 0.38 | |
4 | 7.60 | 22.00 | 28.00 | 39.00 | 91.50 | 64.40 | 58.50 | 5.05 | 15.57 | 0.34 | |
6 | 16.90 | 24.60 | 31.40 | 46.60 | 87.30 | 64.40 | 58.50 | 5.59 | 14.96 | 0.3 | |
8 | 16.90 | 30.50 | 33.90 | 45.80 | 90.70 | 65.30 | 57.60 | 5.62 | 14.37 | 0.27 | |
Speaker embedding only | 0 | 1.70 | 3.40 | 4.20 | 10.20 | 60.20 | 45.80 | 47.50 | 8.52 | 10.95 | 0.24 |
2 | 8.50 | 12.70 | 17.80 | 29.70 | 89.00 | 46.60 | 43.20 | 6.93 | 11.11 | 0.23 | |
4 | 8.50 | 13.60 | 17.80 | 29.60 | 89.00 | 47.50 | 41.50 | 7.54 | 10.52 | 0.2 | |
6 | 6.80 | 15.30 | 21.20 | 31.40 | 85.60 | 54.20 | 48.30 | 8.31 | 10.38 | 0.18 | |
8 | 4.30 | 8.50 | 14.50 | 32.50 | 88.00 | 47.90 | 43.60 | 7.96 | 10.26 | 0.16 |