Fig. 2: Qualitative phrase-grounding results when provided with description phrases.

We visualize the association of vision and language on the MS-CXR dataset. The description phrases are marked in white font in the image column. The gold standard annotations outlined by clinical experts are represented with dashed boxes.