Table 1 Comparison of model characteristics and performance

Model	Parameter Size (approx.)	Context Length	Open Source	Characteristics
ChronoStyleNet	~7B (Qwen-VL base)	~32k tokens (vision+ text)	Fully	Domain-specific, fine-tuned for Buddhist sculpture classification tasks.
GPT-4o (OpenAI)	Undisclosed	128k tokens (text)	Not open-source, only accessible via API	Proprietary multimodal model, general-purpose with strong vision-language reasoning.
Claude 3.5 Sonnet (Anthropic)	Undisclosed	200k+ tokens (text)	Not open-source, only accessible via API	High-performance generalist with long-context understanding and reasoning.
Gemini 1.5 Pro (Google)	Undisclosed	1 M tokens (text, experimental)	Not open-source, only accessible via API	Strong code, image, and document reasoning; limited academic access.
LLaMA 3.3 70B (Meta) ( )	70B	8k–128k tokens	Open-source (under Apache 2.0 license)	Open-source, large-scale general model, fine-tuning friendly.
Grok 3 Beta (xAI)	Undisclosed	Unknown	Not open-source, proprietary	Tesla-integrated, web-connected; limited documentation.

Model specifications are compiled from official documentation and release notes. ChronoStyleNet’s base architecture is built on LLaVA-OneVision-Qwen2-7B (Qwen2-7B with multimodal extension), as released on Hugging Face by LMMs-Lab⁵⁰. Data for GPT-4o, Claude 3.5 Sonnet, and Gemini 1.5 Pro Experimental are sourced from OpenAI⁵², Anthropic⁵³, and Google DeepMind⁵⁴, respectively. LLaMA 3.3 70B metadata refers to the LLaMA-3.3-Nemotron-70B-Select release from Meta⁵⁵. Information on Grok 3 Beta is derived from official descriptions provided by xAI⁵⁶.

Quick links

Search