Fig. 2: Comparative performance of different LLM models on medical documentation tasks.

Automatic evaluation metrics comparing different model variants on cataract surgery documentation. Results are shown for three key document types: Admission Report, Surgery Record, and Discharge Summary. The evaluation includes ChatGLM2-6B46, Baichuan-13B36, Qwen-220, Baichuan-13B-SFT, and Qwen2-7B-SFT models. Across all metrics (BERTScore43, ROUGE-L42, and BLEU41), the models show varying performance in generating accurate and relevant medical documentation.