Table 2 Comparison with recent state-of-the-art text-to-image generation models

Method	CLIP-T ↑	LPIPS ↓	FID ↓
SDXL	0.224	0.011	67.784
DALL-E3	0.223	0.085	88.698
GLIDE	0.219	0.969	118.474
Taiyi	0.221	0.831	90.354
ControlNet	0.226	0.026	148.352
P+	0.019	0.021	116.453
T2I-Adapter	0.981	0.085	157.265
CCLAP	0.234	0.303	62.523
Tongyi Wanxiang	0.222	0.765	77.631
RAPHAEL	0.231	0.498	72.687
WenXin4.5 Turbo	0.247	0.684	72.846
LlamaGen-XL	0.229	0.807	97.541
OpenMAGVIT2	0.214	0.797	116.505
DALL.E Mini	0.229	0.734	77.733
Ours	0.334	0.438	61.544

The bold values represent the best results for each evaluation metric among all compared methods, and they are also used to emphasize the results of our proposed model (Ours) for clearer comparison.

Quick links

Search