Fig. 3 | Scientific Reports

Fig. 3

From: Large language model evaluation in autoimmune disease clinical questions comparing ChatGPT 4o, Claude 3.5 Sonnet and Gemini 1.5 pro

Fig. 3

Comparative performance scores of ChatGPT 4o、Claude 3.5 Sonnet and Gemini 1.5 Pro on various metrics. This bar chart displays the scores of three artificial intelligence models across five performance metrics: Relevance, Completeness, Accuracy, Safety, Readability and Simplicity. Scores range from 0 to 10, based on expert evaluations. Statistical significance is denoted with asterisks, where “*” for P < 0.05, “**” for P < 0.01, “***” for P < 0.001, and “****” for P < 0.0001. Error bars represent the standard error.

Back to article page