Table 10 Comparison of Different Vision-Language Models in Degradation Recognition Tasks.

From: Multimodal image fusion network with prior-guided dynamic degradation removal for extreme environment perception

Method

Accuracy (%)

Need Additional Text

Params (M)

CLIP (Ours)

89.3

No

86

BLIP

85.7

No

109

LLaVA

88.3

Yes

700