Fig. 3
From: Semantic structure preservation for accurate multi-modal glioma diagnosis

Workflow of RFPMSS. In this figure, we outline the workflow of the RFPMSS (Radiology and Pathology Multi-modal Semantic Space) model: forwarding radiographs of the k-th patient study through the medical image transformers, fusing representations from different views using an attention mechanism, and leveraging the information in radiology reports through report generation and study-report representation consistency reinforcement. Section (b) provides an overview of the entire workflow. Section (a) illustrates the architecture of the medical image transformers. In Section (c), \(v^{k}\) and \(t^{k}\) represent the visual and textual features of the k-th patient study, respectively. \(\hat{c}_{1:T}^{k}\) and \(c_{1:T}^{k}\) denote the token-level predictions and ground truth for the k-th radiology report of length T.