Table 11 Human evaluation scores: Relevance and fluency of captions generated by Baseline MLP, W2VV, and W2VV with Attention and Contrastive Loss.

Model	Relevance	Fluency
Baseline MLP	4.1	3.9
W2VV	4.5	4.3
W2VV + Attention + Contrastive Loss	4.6	4.4

Quick links

Search