Fig. 2: Ophthalmic dataset construction, LLM selection, and model fine-tuning. | npj Digital Medicine

Fig. 2: Ophthalmic dataset construction, LLM selection, and model fine-tuning.

From: A large language model digital patient system enhances ophthalmology history taking skills

Fig. 2

Ophthalmic dataset construction, LLM benchmarking, and model fine-tuning workflow includes: a The preprocessing workflow for the ophthalmic instruction training data. EHR, guidelines, textbooks, multiple-choice questions and knowledge from ophthalmology textbooks, and patient-doctor consultation dialogs were used to construct ophthalmic dataset. After being processed into an instruction-output format, the accuracy of the data was sampled and checked by ophthalmologists. b The source distribution of the ophthalmic instruction dataset. c The distribution of ocular diseases in the ophthalmic instruction dataset. d A workflow for comparing LLMs using the Chatbot Arena approach. The LLMs were given 50 instruction tasks in Chinese, including ophthalmic knowledge, EHR information extraction, doctor’s question paraphrasing, patient’s answer paraphrasing and consultation-related queries. Two ophthalmologists compared the output of the two anonymized LLMs. Finally, the win rate between different models was calculated. e Human evaluation comparing the Baichuan-13B-Chat model with other models. The win, tie and loss rates of Baichuan-13B-Chat versus the other models are shown. f The BLEU and ROUGE-L performance metrics on the test set across the fine-tuning epochs. The model trained at 20 epochs was selected for the study. g Human evaluation comparing the fine-tuned Baichuan-13B-Chat model with other models. LLM large language model, BLEU Bilingual Evaluation Understudy, ROUGE-L Recall-Oriented Understudy for Gisting Evaluation - Longest Common Subsequence, EHR electronic health record.

Back to article page