Table 8 The performance of different systems with respect to different conditional scale values.

System	Cond. scale	R@1 (%)	R@3 (%)	R@5 (%)	R@10 (%)	Gender accuracy (%)	Ethnicity accuracy (%)	Age group accuracy (%)	RMSE of age (years)	Average PSNR	Average SSIM
Random selection	–	0.85	2.50	4.20	8.50	–	–	–	–	–	–
Only image guide	–	6.80	11.10	16.90	22.90	–	–	–	–	–	–
Full input	0	5.10	7.60	14.40	20.30	71.20	46.60	42.40	7.55	16.72	0.4
	2	9.30	18.60	28.80	42.40	89.00	62.70	55.10	5.87	16.52	0.38
	4	7.60	22.00	28.00	39.00	91.50	64.40	58.50	5.05	15.57	0.34
	6	16.90	24.60	31.40	46.60	87.30	64.40	58.50	5.59	14.96	0.3
	8	16.90	30.50	33.90	45.80	90.70	65.30	57.60	5.62	14.37	0.27
Speaker embedding only	0	1.70	3.40	4.20	10.20	60.20	45.80	47.50	8.52	10.95	0.24
	2	8.50	12.70	17.80	29.70	89.00	46.60	43.20	6.93	11.11	0.23
	4	8.50	13.60	17.80	29.60	89.00	47.50	41.50	7.54	10.52	0.2
	6	6.80	15.30	21.20	31.40	85.60	54.20	48.30	8.31	10.38	0.18
	8	4.30	8.50	14.50	32.50	88.00	47.90	43.60	7.96	10.26	0.16

Quick links

Search