Fig. 4

Performance comparison of ChatGPT 4o、Claude 3.5 Sonnet and Gemini 1.5 Pro across multiple areas. Bar charts illustrate the performance of ChatGPT 4o、Claude 3.5 Sonnet and Gemini 1.5 Pro in areas such as Concept, Clinical Features, and Diagnosis. Scores range from 0 to 10, with statistical significance marked by asterisks: up to “****” for P < 0.0001. Each chart compares the models across a specific domain, showing their strengths and weaknesses.