Figure 4
From: Multimodal deep learning for midpalatal suture assessment in maxillary expansion

Schematic overview of the proposed DeepMSM architecture. Three imaging modalities (MPS-CBCT, MTM-CBCT, and CVM-LCR) are each passed through a shared ResNet-50 encoder (pre-trained on the RadImageNet dataset27), followed by an MLP-based encoder for tabular features (Age, Gender, CVM stage, MTM stage). The resulting feature vectors are concatenated and passed into a classification head that outputs the MPS maturation stage \(\{A,B,C,D,E\}\). Schematic overview of the DeepMSM architecture. The multimodal deep learning framework integrates three imaging modalities (MPS-CBCT, MTM-CBCT, and CVM-LCR) through shared ResNet-50 encoders pre-trained on RadImageNet, combined with MLP-based processing of tabular features (age, gender, CVM stage, MTM stage). Feature vectors are concatenated and processed through a classification head to output MPS maturation stages A-E.