Table 4 Inference speed comparison of various VLMs across different deployment environments.

From: Multitasking vision language models for vehicle plate recognition with VehiclePaliGemma

Method

Deployment

Speed (second)

Moondream2

Local

0.09

LLaVA-NeXT-7b

Local

0.84

VILA

Local

0.35

Gemini 1.5 flash

API

1.65

GPT-4o-mini

API

1.7

LLaVA-NeXT-34b

Local

10

Gemini 1.5 Pro

API

1.85

GPT-4o

API

1.6

Llama 3.2

Local

0.42

Claude 3.5 Sonnet

API

1.8

Fine-tuned PaliGemma

Local

0.135

  1. The table presents both locally deployed models and API-based models, with inference times measured in seconds. Lower values indicate faster inference.