Table 8 For performance metrics results.
Metric | Result |
|---|---|
Caption Accuracy (BLEU ~ proxy) | 92.5% |
Inference Time (end-to-end) | 4.5Â s per image (avg., CPU) |
Model Load Time (initial run) |  ~ 12 s |
RAM Usage (peak during run) |  ~ 820 MB |
TTS Response Latency |  ~ 0.5 s |
Offline Functionality | Supported after initial model caching |