Table 2 Comparison of text-to-video model performance regarding seven evaluation measures.

	Evaluation	BAGen (ours)	CogVideoX	HotShotXL	Pyramid Flow	WAN
	mesaure	BAGen (ours)	CogVideoX	HotShotXL	Pyramid Flow	WAN
	Pairwise	0.986 ± 0.015	0.912 ± 0.054	0.985 ± 0.010	0.964 ± 0.026	0.968 ± 0.027
CLIP	clip	0.986 ± 0.015	0.912 ± 0.054	0.985 ± 0.010	0.964 ± 0.026	0.968 ± 0.027
	Text	0.313 ± 0.029	0.244 ± 0.043	0.265 ± 0.03	0.272 ± 0.038	0.253 ± 0.035
	Alignment	0.313 ± 0.029	0.244 ± 0.043	0.265 ± 0.03	0.272 ± 0.038	0.253 ± 0.035
	Subject	0.972 ± 0.029	0.937 ± 0.038	0.987 ± 0.007	0.947 ± 0.041	0.946 ± 0.072
	Consistency	0.972 ± 0.029	0.937 ± 0.038	0.987 ± 0.007	0.947 ± 0.041	0.946 ± 0.072
	Background	0.985 ± 0.013	0.928 ± 0.041	0.975 ± 0.011	0.968 ± 0.016	0.971 ± 0.025
	Consistency	0.985 ± 0.013	0.928 ± 0.041	0.975 ± 0.011	0.968 ± 0.016	0.971 ± 0.025
VR	Motion	0.992 ± 0.006	0.986 ± 0.011	0.980 ± 0.014	0.995 ± 0.003	0.988 ± 0.008
Bench	Smoothness	0.992 ± 0.006	0.986 ± 0.011	0.980 ± 0.014	0.995 ± 0.003	0.988 ± 0.008
	Aesthetic	0.779 ± 0.073	0.355 ± 0.113	0.646 ± 0.089	0.679 ± 0.046	0.654 ± 0.068
	Quality	0.779 ± 0.073	0.355 ± 0.113	0.646 ± 0.089	0.679 ± 0.046	0.654 ± 0.068
	Imaging	0.651 ± 0.102	0.551 ± 0.115	0.622 ± 0.185	0.575 ± 0.098	0.652 ± 0.078
	Quality	0.651 ± 0.102	0.551 ± 0.115	0.622 ± 0.185	0.575 ± 0.098	0.652 ± 0.078

Quick links

Search