Fig. 6
From: Arch-Eval benchmark for assessing chinese architectural domain knowledge in large language models

Preliminary Experiment ① Results–Output Stability Test Results for Qwen-14B-Chat and GPT-3.5-turbo.
From: Arch-Eval benchmark for assessing chinese architectural domain knowledge in large language models

Preliminary Experiment ① Results–Output Stability Test Results for Qwen-14B-Chat and GPT-3.5-turbo.