Fig. 4: Image-to-text semantic search. | Nature Medicine

Fig. 4: Image-to-text semantic search.

From: Vision–language foundation model for echocardiogram interpretation

Fig. 4

a, The query image is first embedded using EchoCLIP-R’s image encoder. b, Then, the similarities between this query embedding and the embeddings of all 21,484 unique text reports in the test set are computed. c, The reports are ranked by their similarity to the query image embedding and the report with the highest similarity is retrieved. d, Corresponding pairs of input frames and PromptCAM visualization of the indicated intracardiac devices in the text report label (color intensity ranging from red for most important to green for less important and no color for not important).

Back to article page