Fig. 5: Assessment of model-expert agreement and the quality of chatbot responses. | npj Digital Medicine

Fig. 5: Assessment of model-expert agreement and the quality of chatbot responses.

From: Multimodal machine learning enables AI chatbot to diagnose ophthalmic diseases and provide high-quality medical responses

Fig. 5

a Comparison of diagnostic accuracy of IOMIDS (text + smartphone model), GPT4.0, Qwen, expert ophthalmologists, ophthalmology trainees, and unspecialized junior doctors. The dotted lines represent the mean performance of ophthalmologists at different experience levels. b Heatmap of Kappa statistics quantifying agreement between diagnoses provided by AI models and ophthalmologists. c Kernel density plots of user satisfaction rated by researchers (red) and patients (blue) during clinical evaluation. d Example of an interactive chat with IOMIDS (left) and quality evaluation of the chatbot response (right). On the left, the central box displays the patient interaction process with IOMIDS: entering chief complaint, answering system questions step-by-step, uploading a standard smartphone-captured eye photo, and receiving diagnosis and triage information. The chatbot response includes explanations of the condition and guidance for further medical consultation. The surrounding boxes show a researcher’s evaluation of six aspects of the chatbot response. The radar charts on the right illustrate the quality evaluation across six aspects for chatbot responses generated by the text model (red) and the text + image model (blue). The axes for each aspect correspond to different coordinate ranges due to varying rating scales. Asterisks indicate significant differences between two models based on two-sided t-test. ** P < 0.01, *** P < 0.001, **** P < 0.0001.

Back to article page