Fig. 4
From: Multimodal deep learning for cephalometric landmark detection and treatment prediction

Architecture of the multi-modal feature extraction and fusion mechanism in DeepFuse. The figure shows modality-specific encoders (left), cross-modal alignment (center), and the attention-guided fusion module (right) with connections to downstream task-specific decoders.