Table 4 Inference speed comparison of various VLMs across different deployment environments.
From: Multitasking vision language models for vehicle plate recognition with VehiclePaliGemma
Method | Deployment | Speed (second) |
|---|---|---|
Moondream2 | Local | 0.09 |
LLaVA-NeXT-7b | Local | 0.84 |
VILA | Local | 0.35 |
Gemini 1.5 flash | API | 1.65 |
GPT-4o-mini | API | 1.7 |
LLaVA-NeXT-34b | Local | 10 |
Gemini 1.5 Pro | API | 1.85 |
GPT-4o | API | 1.6 |
Llama 3.2 | Local | 0.42 |
Claude 3.5 Sonnet | API | 1.8 |
Fine-tuned PaliGemma | Local | 0.135 |