Extended Data Fig. 2: Utilization of visual input and clinical context in multiple choice diagnostic questions. | Nature

Extended Data Fig. 2: Utilization of visual input and clinical context in multiple choice diagnostic questions.

From: A multimodal generative AI copilot for human pathology

Extended Data Fig. 2

On the multiple choice diagnostic benchmarks (Combined, n = 105 questions; PathQABench-Private, n = 53; PathQABench-Public, n = 52), we investigated whether PathChat can effectively leverage both unstructured clinical context in the form of natural language as well as visual features in the image ROI instead of deriving its answer solely based on either input alone. In the context only setting, the clinical context is provided to the model but the image is not provided (see Fig. 2a for an example multiple choice question that contains the clinical context, the choices, and the image). On the flip side, in the image only setting, the clinical context is not provided, and the model is asked to infer the correct diagnosis from the possible choices based solely on the image. We observed that PathChat achieves maximum performance when both clinical context and the image are provided. Error bars represent 95% confidence intervals, and the centers represent the computed accuracy.

Back to article page