Fig. 10: Overview of the A-VPT framework. | npj Digital Medicine

Fig. 10: Overview of the A-VPT framework.

From: Anatomy-guided visual prompt tuning for cross-modal breast cancer understanding

Fig. 10: Overview of the A-VPT framework.

Left: three input modalities (MG/US/MRI) with anatomy maps (glandular/fatty/ductal). Center: the Anatomy-Guided Prompt Generator pools tissue maps into region embeddings, transforms them by MLPs, and produces anatomy-aware prompt tokens via a rank-r projection (<2% trainable params). These tokens interact with visual tokens inside a frozen ViT-B/16 encoder through the Prompt--Token Interaction Module (PIM): token → prompt attention, prompt → token attention, and a gated residual fusion (α() = σ( )). Bottom: cross-modal anatomical alignment (\({{\mathcal{L}}}_{{\rm{RCL}}},\,{{\mathcal{L}}}_{{\rm{TXT}}}\)) harmonizes MG/US/MRI tissue semantics. Right: output heads for classification and segmentation.

Back to article page