Table 1 Comparison of model characteristics and performance

From: Multimodal AI for Yuan Buddhist sculpture chronology and style

Model

Parameter Size (approx.)

Context Length

Open Source

Characteristics

ChronoStyleNet

~7B (Qwen-VL base)

~32k tokens (vision+ text)

Fully

Domain-specific, fine-tuned for Buddhist sculpture classification tasks.

GPT-4o (OpenAI)

Undisclosed

128k tokens (text)

Not open-source, only accessible via API

Proprietary multimodal model, general-purpose with strong vision-language reasoning.

Claude 3.5 Sonnet (Anthropic)

Undisclosed

200k+ tokens (text)

Not open-source, only accessible via API

High-performance generalist with long-context understanding and reasoning.

Gemini 1.5 Pro (Google)

Undisclosed

1 M tokens (text, experimental)

Not open-source, only accessible via API

Strong code, image, and document reasoning; limited academic access.

LLaMA 3.3 70B (Meta) ( )

70B

8k–128k tokens

Open-source (under Apache 2.0 license)

Open-source, large-scale general model, fine-tuning friendly.

Grok 3 Beta (xAI)

Undisclosed

Unknown

Not open-source, proprietary

Tesla-integrated, web-connected; limited documentation.

  1. Model specifications are compiled from official documentation and release notes. ChronoStyleNet’s base architecture is built on LLaVA-OneVision-Qwen2-7B (Qwen2-7B with multimodal extension), as released on Hugging Face by LMMs-Lab50. Data for GPT-4o, Claude 3.5 Sonnet, and Gemini 1.5 Pro Experimental are sourced from OpenAI52, Anthropic53, and Google DeepMind54, respectively. LLaMA 3.3 70B metadata refers to the LLaMA-3.3-Nemotron-70B-Select release from Meta55. Information on Grok 3 Beta is derived from official descriptions provided by xAI56.