Table 2 An ablation study comparing various conditions injected into the PDANet (Ours) model

From: A novel flexible identity-net with diffusion models for painting-style generation

 

CLIP Score \(\uparrow\)

LPIPS \(\downarrow\)

FID \(\downarrow\)

VTC \(\uparrow\)

ASL \(\uparrow\)

AP \(\uparrow\)

Creativeness \(\uparrow\)

Baseline

0.8288

0.5185

1015

3.0%

3.0%

4.5%

7.6%

+indentitynet

0.8081

0.6229

1270

31.8%

27.3%

21.2%

13.6%

+stylenet

0.8506

0.4799

1976

3.0%

4.6%

10.6%

19.7%

Ours

0.8844

0.4763

774

62.2%

65.1%

63.7%

59.1%

  1. VTC represents visual text consistency, ASL represents accurate style learning, and AP represents aesthetics preference.