Table 8 For performance metrics results.

Metric	Result
Caption Accuracy (BLEU ~ proxy)	92.5%
Inference Time (end-to-end)	4.5 s per image (avg., CPU)
Model Load Time (initial run)	~ 12 s
RAM Usage (peak during run)	~ 820 MB
TTS Response Latency	~ 0.5 s
Offline Functionality	Supported after initial model caching

Quick links

Search