Table 1 Comparison of different generated images based on CLIP Score, LPIPS, FID, VTC, ASL, AP, and creativeness

From: A novel flexible identity-net with diffusion models for painting-style generation

 

CLIP Score \(\uparrow\)

LPIPS \(\downarrow\)

FID \(\downarrow\)

VTC \(\uparrow\)

ASL \(\uparrow\)

AP \(\uparrow\)

Creativeness \(\uparrow\)

DALL-E 367

0.6942

0.6727

3089

12.1%

9.1%

4.5%

3.0%

Midjourney68

0.7941

0.7584

2213

12.1%

6.1%

19.7%

13.6%

Midjourney + reference68

0.7133

0.7339

2313

7.6%

3.0%

12.1%

10.6%

DreamWorks Diffusion45

0.7776

0.7377

2726

4.6%

7.6%

1.5%

7.6%

PuLID-FLUX69

0.7853

0.7462

3022

3.0%

6.1%

9.1%

9.1%

PDANet (Ours)

0.8147

0.5519

2037

60.6%

68.2%

53.0%

56.1%

  1. VTC represents visual text consistency, ASL represents accurate style learning, and AP represents aesthetics preference.