Fig. 6: Comparative evaluation of CSN and baseline multimodal models.
From: Multimodal AI for Yuan Buddhist sculpture chronology and style

a Heatmaps of evaluation scores across six dimensions (period accuracy, style granularity, detail richness, term consistency, cultural depth, and logic and coherence) for each model (CSN, GPT-4o, Claude 3.5, Gemini 1.5 Pro, LLaMA 3.3 70B, Grok Beta). Darker shades indicate higher scores. b Sankey diagram visualizing model-predicted dynastic categories for the Yuan dynasty samples. Lines represent the proportion of samples classified into each dynasty category by each model. c Comparison of CSN’s performance on in-domain (n = 18) versus out-of-domain (n = 4) samples. Scores are reported as means ± SD across six dimensions (*p < 0.05, **p < 0.01, ***p < 0.001; two-tailed t-test). d Case-wise comparison of expert references, CSN interpretations, and baseline model outputs for three Yuan-dynasty Buddhist statues.