Table 11 Human evaluation scores: Relevance and fluency of captions generated by Baseline MLP, W2VV, and W2VV with Attention and Contrastive Loss.

From: AraTraditions10k bridging cultures with a comprehensive dataset for enhanced cross lingual image annotation retrieval and tagging

Model

Relevance

Fluency

Baseline MLP

4.1

3.9

W2VV

4.5

4.3

W2VV + Attention + Contrastive Loss

4.6

4.4