Table 2 Comparative summary of AraTraditions10k and related cross-lingual mage annotation datasets
Dataset | Scale | Languages | Annotation quality | Cultural diversity | Strengths | Limitations |
|---|---|---|---|---|---|---|
COCO-CN | 20,342 images | Chinese, English | High (recommendatio n-assisted) | Moderate | Improved cross-lingual performance | Smaller scale, limited to Chinese and English |
Flickr30k-CN | Smaller scale | Chinese, English | Moderate (translated captions) | Limited | Facilitates Chinese image captioning | Limited size and diversity |
AIC-ICC | 240,000 images | Chinese | Varies (crowdsourced) | Low (biased towards human activities) | Large-scale, detailed annotations | Bias limits generalizability, variable annotation quality |
Multi30k | Moderate scale | English, German | High | Moderate | Multilingual, detailed annotations | Limited to European languages |
AI Challenger | 240,000 images | Chinese | High | Moderate | Extensive dataset, supports deep learning research | Focused on Chinese, may lack diverse cultural aspects |
AraTraditions10k | 10,000 images | Arabic, English | High (professional translation, recommendation n-assisted) | High | Culturally rich, advanced annotation techniques, diverse visual content | Smaller scale compared to some datasets but has high quality and cultural relevance |