Fig. 3

Comparative performance scores of ChatGPT 4o、Claude 3.5 Sonnet and Gemini 1.5 Pro on various metrics. This bar chart displays the scores of three artificial intelligence models across five performance metrics: Relevance, Completeness, Accuracy, Safety, Readability and Simplicity. Scores range from 0 to 10, based on expert evaluations. Statistical significance is denoted with asterisks, where “*” for P < 0.05, “**” for P < 0.01, “***” for P < 0.001, and “****” for P < 0.0001. Error bars represent the standard error.