Extended Data Fig. 3: Radiology report generation. | Nature

Extended Data Fig. 3: Radiology report generation.

From: Merlin: a computed tomography vision–language foundation model and dataset

Extended Data Fig. 3: Radiology report generation.The alternative text for this image may have been generated using AI.

(a) To enable report generation, we extract the last hidden layer embeddings from Merlin and modify the dimension of these embeddings using a projection layer. We generate the report section by section and therefore also embed a report section prompt. The resulting vision and language tokens are used as input to a language model to generate a report section. (b) We compare the performance of our model against RadFM, using four metrics, across each report section and the full report. Data are shown as mean ± 95% CI; statistics were derived from the Merlin internal test set (n = 5,137 CT exams). (c) We provide a densely annotated example of human and Merlin generated reports. We bold the report section headers in the human and Merlin generated reports. We include “uterus and ovaries” in green, as Merlin needs to deduce the correct pelvic anatomy. The icons in a were adapted from the Noun Project (https://thenounproject.com/) under a royalty-free licence.

Back to article page