Evaluation of medical LLMs’ reasoning process in professional medicine remains underexplored. Here, the authors present MedR Bench, which evaluates LLMs’ medical reasoning across exam recommendation, diagnosis, and treatment. They find that models excel at diagnosis but struggle with exams and treatment.
- Pengcheng Qiu
- Chaoyi Wu
- Weidi Xie