Extended Data Fig. 5: Comparing model outputs on open-ended question answering, example 3.
From: A multimodal generative AI copilot for human pathology

An example question in PathQABench-Public regarding lung adenocarcinoma where all four models performed poorly. None of the four models accurately describe the image or produce the correct diagnosis. Scale bar is 200 µm.