Multimodal deep learning for midpalatal suture assessment in maxillary expansion

Cai, Jingwen; Wang, Zhenling; Wang, Han; Chen, Zhonghan; Yu, Qinqi; Lai, Zhichen; Xu, Linyu

doi:10.1038/s41598-025-23500-2

Download PDF

Article
Open access
Published: 12 November 2025

Multimodal deep learning for midpalatal suture assessment in maxillary expansion

Jingwen Cai^1,2^na1,
Zhenling Wang^1,2^na1,
Han Wang^1,2^na1,
Zhonghan Chen^1,2,
Qinqi Yu^1,2,
Zhichen Lai^3,4 &
…
Linyu Xu^1,2

Scientific Reports volume 15, Article number: 39723 (2025) Cite this article

1408 Accesses
Metrics details

Subjects

Abstract

Accurate midpalatal suture maturation assessment is critical for orthodontic treatment planning, yet current manual staging methods exhibit substantial inter-examiner variability (kappa values 0.3-0.8), compromising treatment decision reliability. This study developed and validated DeepMSM, an automated multimodal deep learning framework integrating cone-beam computed tomography with clinical indicators for standardized midpalatal suture staging. We retrospectively analyzed cone-beam computed tomography and lateral cephalometric radiographs from 200 orthodontic patients aged 7-36 years. The DeepMSM framework integrated multimodal images with clinical variables including age, gender, cervical vertebral maturation stage, and mandibular third molar stage using attention-based fusion strategies. DeepMSM achieved 93.75% accuracy and 93.81% F1-score, substantially outperforming single-modality approaches (47.50%-71.25% accuracy) and dual-modality models (73.75–81.25% accuracy). The system demonstrated excellent performance in distinguishing critical stages C and D with F1-scores of 92%-93%, representing the decision point between conventional expansion and surgical intervention. All clinical parameters showed significant correlations with midpalatal suture maturation (p<0.05). DeepMSM, a novel multimodal midpalatal suture maturation assessment system, achieved a high accuracy of 93.75%, demonstrating the potential to reduce diagnostic variability and improve treatment reliability. This automated framework particularly benefits less experienced clinicians in making critical treatment decisions for maxillary expansion therapy.

Multimodal deep learning for cephalometric landmark detection and treatment prediction

Article Open access 12 July 2025

Automated classification of midpalatal suture maturation stages from CBCTs using an end-to-end deep learning framework

Article Open access 29 May 2025

Midpalatal suture maturation staging using cone beam computed tomography in patients aged between 9 to 21 years

Article Open access 12 March 2022

Introduction

Transverse maxillary hypoplasia (TMH) represents a common orthodontic condition affecting 13% to 23% of children and up to 30% of adults¹. This condition is characterized by inadequate transverse maxillary dimensions, resulting in posterior crossbites, dental crowding, and compromised masticatory function^2,3. Beyond dental complications, TMH significantly impacts facial aesthetics and nasal airway patency, with airway narrowing increasing the risk of sleep-disordered breathing and substantially compromising patient quality of life^4,5.

Maxillary expansion represents the primary therapeutic intervention for TMH, utilizing lateral orthopedic forces to open the midpalatal suture (MPS) and stimulate new bone formation. The timing and method of expansion are critically dependent on accurate assessment of MPS maturation status^6,7. Patients with patent or partially fused sutures are ideal candidates for rapid palatal expansion (RPE), which can achieve 7–10 mm of maxillary expansion with minimal skeletal side effects during the mixed and early permanent dentition phases. Conversely, individuals with advanced suture maturation require alternative approaches such as mini-screw assisted expansion (MSE) or surgically-assisted rapid palatal expansion (SARPE)^8,9. This fundamental treatment dichotomy makes precise MPS staging essential for optimal therapeutic outcomes and patient safety^10,11,12.

Current clinical practice predominantly uses Angelieri’s five-stage classification system (stages A-E) to evaluate cone-beam computed tomography (CBCT) images¹³. While this system provides a standardized assessment framework, it suffers from significant limitations that compromise clinical decision-making. Inter-examiner reliability varies considerably, with reported weighted kappa values ranging from 0.3 to 0.8, indicating substantial inconsistency in diagnostic interpretation^14,15. These variations are particularly pronounced among less-experienced clinicians, especially when distinguishing between stages C and D—a differentiation of paramount clinical importance as stage C represents the optimal window for conventional expansion while stage D typically requires invasive surgical approaches. The subjective nature of manual staging, combined with time-intensive evaluation processes, further limits the standardization of care delivery across different clinical settings.

Recent advances in artificial intelligence have shown promise for automated MPS assessment, offering potential solutions to these diagnostic challenges. Gao et al.¹⁶ developed texture-based algorithms for analyzing suture ossification patterns, while Zhu et al.¹⁷ achieved 75.37% diagnostic accuracy using deep learning architectures. Tang et al.¹⁸ further explored advanced neural network approaches combining Vision Transformers with convolutional neural networks to improve generalization across different patient populations. However, current automated approaches have critical limitations preventing clinical implementation. The reported accuracy levels fall short of the diagnostic reliability required for confident treatment planning decisions. Most significantly, existing single-modality approaches exhibit insufficient precision in distinguishing stage C from adjacent stages B and D—the very distinction that is paramount for determining conventional versus surgical expansion approaches.

Substantial clinical evidence supports the integration of multiple diagnostic indicators for comprehensive MPS assessment. Patient demographics, particularly age and gender, significantly influence suture maturation patterns, with females typically exhibiting earlier fusion compared to males^19,20,21. Cervical vertebral maturation (CVM) demonstrates strong correlations with MPS development, where early CVM stages (CS1–CS2) correspond to patent sutures (stages A–B) and advanced CVM stages (CS5–CS6) align with complete suture fusion (stage E)^14,22,23. Similarly, mandibular third molar(MTM) calcification patterns show significant correlation with MPS maturation progression²⁴. However, no single clinical indicator adequately captures the complexity of suture maturation, necessitating comprehensive multimodal approaches that can integrate diverse yet complementary biological markers.

This study introduces DeepMSM, a novel multimodal deep learning framework for automated midpalatal suture assessment. The system systematically integrates CBCT images with routinely available clinical indicators, including patient demographics, CVM staging, and MTM development. By leveraging these complementary data sources through advanced artificial intelligence techniques, DeepMSM aims to overcome the limitations of current manual staging approaches while providing standardized, reproducible, and clinically reliable assessments across diverse clinical settings. The framework specifically enhances accuracy in the clinically critical C–D stage differentiation, potentially transforming treatment planning precision for maxillary expansion therapy and advancing the field of intelligent orthodontic diagnostics.

The main contributions of this study are as follows:

Novel Multimodal Clinical Assessment Framework: We present a comprehensive automated system that systematically integrates multiple clinical indicators routinely available in orthodontic practice–CBCT imaging, patient demographics, CVM, and MTM development–to achieve a high diagnostic accuracy (93.75%) in MPS staging, with particular strength in the clinically critical C-D stage differentiation.
Evidence-Based Validation of Clinical Indicators: Through comprehensive analysis of 200 patients, we establish the relative diagnostic value and optimal integration of multiple clinical parameters in MPS assessment, providing evidence-based guidance for clinical decision-making and advancing understanding of craniofacial maturation patterns.
Clinically Applicable Diagnostic Tool: Our system demonstrates consistent diagnostic performance that surpasses typical clinician accuracy while utilizing only routinely acquired clinical data, offering a practical solution for standardizing MPS evaluation across different clinical settings and practitioner experience levels, ultimately improving treatment planning reliability.

Methodology

Study design and participants

Study population

CBCT images and lateral cephalometric radiographs (LCR) images were collected from orthodontic patients seeking treatment between January 2020 and December 2024. All imaging studies were originally performed for routine clinical diagnostic purposes. This retrospective study was approved by the Ethics Committee of the Affiliated Stomatological Hospital of Fujian Medical University (approval number: [2024] FJMU Oral Ethics Review No. 94). All experiments were performed in accordance with relevant guidelines and regulations for human subjects research. Given the retrospective nature of this study using anonymized radiographic data originally acquired for routine clinical purposes, the Ethics Committee granted a waiver of informed consent in accordance with institutional guidelines.

Inclusion Criteria: Patients were included if they met the following criteria: (1) Age between 7 and 36 years at the time of imaging; (2) Availability of both CBCT and lateral cephalometric radiographs acquired within a 1-month interval; (3) No history of orthodontic treatment, orthognathic surgery, or maxillofacial trauma; (4) Absence of systemic diseases known to affect bone metabolism (e.g., osteoporosis, hyperparathyroidism, corticosteroid therapy); (5) No severe skeletal deformities, cleft lip and palate, or other craniofacial anomalies that could affect normal suture development.

Exclusion Criteria: Patients were excluded for: (1) Poor image quality preventing adequate visualization of the midpalatal suture region in CBCT images; (2) Inadequate visualization of C2–C4 cervical vertebrae in lateral cephalometric radiographs; (3) Absence of evaluable MTM; (4) Presence of severe dental pathology affecting MTM assessment, including extensive caries, periapical lesions, or significant root resorption.

Imaging acquisition protocol

All CBCT images were acquired using a standardized protocol with the following parameters: 120 kVp, 5 mAs, field of view 16 $\times$ 13 cm, and voxel size 0.25 mm. Patients were positioned according to manufacturer recommendations with the Frankfort horizontal plane parallel to the floor and the midsagittal plane perpendicular to the floor.

All LCR were obtained using conventional radiographic equipment with standardized patient positioning protocols. All radiographs were taken with consistent magnification factors and exposure parameters to ensure image quality and reproducibility.

The imaging protocol utilized clinically indicated lateral cephalograms and CBCT scans, with all modalities derived from standard diagnostic procedures without additional exposure, strictly following ALARA(As Low As Reasonably Achievable) principles and AAOMR guidelines⁴³ for radiation protection.

Dataset characteristics

The final study cohort comprised 200 patients (75 males, 125 females) with a mean age of $16.65 \pm 6.03$ years (range: 7–36 years). The dataset was randomly stratified into training (60%, n=120), validation (20%, n=40), and testing (20%, n=40) sets with balanced representation of MPS maturation stages across all subsets.

The dataset included three imaging modalities: (1) midpalatal suture regions from CBCT (MPS-CBCT), (2) mandibular third molar regions from CBCT (MTM-CBCT), and (3) cervical vertebrae regions from lateral cephalograms (CVM-LCR). Clinical variables included patient demographics (age, gender) and maturation staging assessments (CVM and MTM stages). The dataset was randomly partitioned at the patient level to ensure that all data modalities for a single patient belonged exclusively to either the training, validation, or testing set, thereby preventing any risk of information leakage.

Statistical analysis was performed using SPSS software (version 26.0, IBM Corp.) with significance level set at $p < 0.05$. Detailed statistical analyses can be found in Section.

Image analysis and clinical assessment

Image processing and region extraction

All CBCT DICOM datasets were processed using Ez3D Plus imaging software (Vatech, Korea) following standardized protocols. Three anatomical regions were systematically extracted for each patient:

Midpalatal Suture Region (MPS-CBCT): Coronal cross-SMctions were obtained at the palatal plane passing through the posterior nasal spine, visualizing the complete suture anatomy from anterior to posterior nasal spine for consistent anatomical orientation and morphological assessment.

Mandibular Third Molar Region (MTM-CBCT): Cross-SMctional images were extracted along the long axis of MTM. When bilateral mandibular third molars were present, the tooth with more advanced developmental stage was selected for analysis.

Cervical Vertebral Region (CVM-LCR): Lateral cephalometric radiographs (LCR) were processed to extract C2–C4 cervical vertebrae regions, emphasizing inferior border characteristics and concavity patterns essential for maturation staging.

All extracted images underwent standardized preprocessing, including resizing to 224 $\times$ 224 pixels and intensity normalization for consistent computational analysis and compatibility with ResNet-based visual backbones.

Clinical staging assessment

Clinical staging assessments was performed by two orthodontists (each with over 10 years of clinical experience) and one senior orthodontic specialist (over 30 years of experience), all possessing extensive expertise in craniofacial radiographic interpretation. The two orthodontists independently performed the staging assessments, with any discrepant cases being adjudicated by the senior specialist. Final staging determinations required consensus from at least two of the three evaluators.

MPS Maturation Staging (MPS Stage): As shown in Figure 1, MPS maturation was classified using Angelieri’s five-stage system¹³: Stage A (straight high-density sutural line), Stage B (scalloped appearance with occasional parallel lines), Stage C (two parallel scalloped lines with small separations), Stage D (fusion in posterior region with visible suture anteriorly), and Stage E (complete fusion with no visible suture).

Cervical Vertebral Maturation (CVM Stage): As shown in Figure 2, CVM staging followed Baccetti’s six-stage method²⁵, evaluating morphological characteristics of C2–C4 vertebral bodies, including concavity presence, depth, and extent at inferior borders, and overall vertebral shape modifications.

Mandibular Third Molar Development (MTM Stage): As shown in Figure 3, MTM development was assessed using Demirjian’s eight-stage classification system²⁶, evaluating crown formation, root development, mineralization patterns, and apical closure status from Stage A (initial crown mineralization) to Stage H (complete root formation with closed apex).

Design of DeepMSM

Overall architecture

As shown in Figure 4, the DeepMSM (Deep Learning for Midpalatal Suture Maturation) framework represents a novel multimodal artificial intelligence system specifically designed for automated MPS staging in clinical practice. The system architecture integrates multiple data sources routinely available in orthodontic practice: three imaging modalities (MPS-CBCT, MTM-CBCT, CVM-LCR) and tabular variables (patient age, gender, CVM stage, MTM stage).

The framework employs a hierarchical integration approach, utilizing deep convolutional neural networks for image feature extraction and multilayer perceptron networks for tabular data processing. These modality-specific features are subsequently integrated through attention-based fusion mechanisms to generate comprehensive multimodal representations for final classification.

Image feature extraction

Enhanced ResNet-50 Backbone:

Image features are extracted using ResNet-50 architectures²⁸ enhanced with Convolutional Block Attention Module (CBAM)²⁹ and initialized with RadImageNet pre-trained weights²⁷. Each imaging modality is processed through identical encoder networks:

$$\begin{aligned} \textbf{z}_{\alpha } = f_{\theta }(I_{\alpha }), \quad \alpha \in \{\textrm{mps}, \textrm{mtm}, \textrm{cvm}\} \end{aligned}$$

(1)

where $I_{\alpha }$ represents the input image for modality $\alpha$ and $\textbf{z}_{\alpha } \in \mathbb {R}^{128}$ is the extracted feature vector.

Attention Mechanism: The CBAM attention module²⁹ enhances feature representation through sequential channel and spatial attention:

$$\begin{aligned} M_c(F)&= \sigma \left( \textrm{MLP}(\textrm{AvgPool}(F)) + \textrm{MLP}(\textrm{MaxPool}(F)) \right) \end{aligned}$$

(2)

$$\begin{aligned} M_s(F)&= \sigma \left( \textrm{Conv}\left( [ \textrm{AvgPool}(F); \, \textrm{MaxPool}(F) ]\right) \right) \end{aligned}$$

(3)

$$\begin{aligned} F''&= M_s(M_c(F) \otimes F) \otimes (M_c(F) \otimes F) \end{aligned}$$

(4)

where $M_c$ and $M_s$ represent channel and spatial attention modules, $\otimes$ denotes element-wise multiplication, and $\sigma$ is the sigmoid activation function.

Tabular data processing

Tabular variables are processed through a unified MLP architecture with embedding layers for categorical variables:

$$\begin{aligned} \textbf{x}_{\textrm{tabular}}&= [\tilde{T}_{\textrm{age}}, \textrm{Embed}(T_{\textrm{gender}}), \textrm{Embed}(T_{\textrm{cvm}}), \textrm{Embed}(T_{\textrm{mtm}})] \end{aligned}$$

(5)

$$\begin{aligned} \textbf{z}_{\textrm{tabular}}&= \textrm{MLP}(\textbf{x}_{\textrm{tabular}}) \end{aligned}$$

(6)

where $\tilde{T}_{\textrm{age}} = \frac{T_{\textrm{age}} - \mu _{\textrm{age}}}{\sigma _{\textrm{age}}}$ represents normalized age, and $\textrm{Embed}(\cdot )$ denotes learnable embedding functions for categorical variables.

Multimodal fusion strategy

Hierarchical Attention-Based Fusion: Image features are first combined through attention-weighted fusion:

$$\begin{aligned} w_{\alpha }&= \frac{\exp (\textbf{v}^T \tanh (W_{\alpha } \textbf{z}_{\alpha }))}{\sum _{\beta } \exp (\textbf{v}^T \tanh (W_{\beta } \textbf{z}_{\beta }))} \end{aligned}$$

(7)

$$\begin{aligned} \textbf{z}_{\textrm{img}}&= \sum _{\alpha \in \{\textrm{mps}, \textrm{mtm}, \textrm{cvm}\}} w_{\alpha } \textbf{z}_{\alpha } \end{aligned}$$

(8)

Image and tabular features are then projected to a common space with gating mechanisms:

$$\begin{aligned} \hat{\textbf{z}}_{\textrm{img}}&= W_{\textrm{img}} \textbf{z}_{\textrm{img}} \odot \sigma (W_{g1} \textbf{z}_{\textrm{img}} + \textbf{b}_{g1}) \end{aligned}$$

(9)

$$\begin{aligned} \hat{\textbf{z}}_{\textrm{tabular}}&= W_{\textrm{tabular}} \textbf{z}_{\textrm{tabular}} \odot \sigma (W_{g2} \textbf{z}_{\textrm{tabular}} + \textbf{b}_{g2}) \end{aligned}$$

(10)

$$\begin{aligned} \textbf{z}_{\textrm{fused}}&= [\hat{\textbf{z}}_{\textrm{img}}, \hat{\textbf{z}}_{\textrm{tabular}}] \end{aligned}$$

(11)

where $\odot$ represents element-wise multiplication and $[\cdot , \cdot ]$ denotes concatenation.

Classification and training strategy

Classification Head: The fused representation is processed through a multi-layer classification network:

$$\begin{aligned} \textbf{h}^{(1)}&= \text {ReLU}(W^{(1)}\textbf{z}_{\textrm{fused}} + \textbf{b}^{(1)}) \end{aligned}$$

(12)

$$\begin{aligned} \textbf{h}^{(2)}&= \text {Dropout}(\text {ReLU}(W^{(2)}\textbf{h}^{(1)} + \textbf{b}^{(2)}), p=0.3) \end{aligned}$$

(13)

$$\begin{aligned} \textbf{z}_{\textrm{logits}}&= W^{(3)}\textbf{h}^{(2)} + \textbf{b}^{(3)} \end{aligned}$$

(14)

Class probabilities are computed using softmax: $p(y=c|\textbf{x}) = \frac{\exp (z_c)}{\sum _{c'=1}^5 \exp (z_{c'})}$ for MPS stages $c \in \{A, B, C, D, E\}$.

Progressive Training Strategy: A three-stage training approach was implemented to optimize model performance. In the first stage, image encoder components were trained while tabular data processing components remained frozen (learning rate: 0.001). The second stage integrated tabular data processing while maintaining frozen image features (learning rate: 0.0005). Finally, the third stage performed end-to-end fine-tuning of all system parameters (learning rate: 0.0001). This progressive approach ensures stable convergence and optimal utilization of the limited training dataset.

Weighted cross-entropy loss is employed to address class imbalance:

$$\begin{aligned} \mathscr {L} = -\sum _{i=1}^N w_{y_i} \log (p(y_i|\textbf{x}_i)) \end{aligned}$$

(15)

where $w_c = \frac{N}{5 \cdot N_c}$ with N representing the total number of samples, $N_c$ being the number of samples in stage c, and $c \in \{A, B, C, D, E\}$.

Tabular integration and interpretability

DeepMSM is designed for seamless integration into clinical workflows, utilizing only routinely acquired diagnostic data without requiring additional examinations or specialized equipment. The system provides interpretable outputs through attention visualization techniques, allowing clinicians to understand the basis for automated predictions and maintain clinical oversight of the diagnostic process.

Results

Study population characteristics

A total of 200 participants were enrolled in this study, comprising 75 males (mean age $15.12 \pm 4.85$ years) and 125 females (mean age $17.57 \pm 6.47$ years). The overall mean age was $16.65 \pm 6.03$ years, reflecting the study’s focus on adolescents and young adults undergoing orthodontic evaluation. The age and gender distribution across different developmental stages is presented in Table 1.

Table 1 Age and gender distribution of participants.

Full size table

The distribution shows a predominance of participants in the 12–17 age group (47%), followed by the 18–29 age group (34%), which is consistent with typical orthodontic patient demographics seeking maxillary expansion treatment.

Statistical analysis

MPS stage distribution patterns

Figure 5 illustrates the distribution of MPS stages across key clinical predictors, demonstrating systematic developmental relationships that support multimodal prediction approaches.

Age-related distribution revealed distinct patterns with early MPS stages (A and B) predominantly occurring in younger participants (Age <12: 83% stages A–B; Age 12–17: 61% stages A–B), while advanced stages (D and E) showed higher frequencies in older age groups (Age 18–29: 54% stages D–E; Age $\ge$30: 75% stages D–E).

CVM stage correlations demonstrated strong developmental correspondence, with early MPS stages (A–B) primarily aligning with early CVM stages (CS1–CS3: 78% stages A–B), and later MPS stages (D–E) correlating with advanced CVM stages (CS4–CS6: 69% stages D–E).

MTM stage correlations showed that MPS-A corresponded primarily to early MTM stages (A–C: 85%), while MPS-D and MPS-E were predominantly associated with advanced MTM stages (F–H: 71%).

Inter-variable correlation analysis

Spearman rank correlation analysis revealed significant relationships among all variables ($p < 0.05$), as shown in Figure 6. The strongest correlation was observed between age and MTM stage ($\rho = 0.74$), followed by strong correlations between MTM stage and MPS stage ($\rho = 0.63$), MTM stage and CVM stage ($\rho = 0.63$), age and CVM stage ($\rho = 0.54$), and CVM stage and MPS stage ($\rho = 0.52$). A moderate correlation was found between age and MPS stage ($\rho = 0.48$).

These correlations validate the biological rationale for multimodal integration while indicating that individual predictors provide complementary rather than redundant information for MPS stage determination.

Model performance comparison

Table 2 presents comprehensive performance metrics across all evaluated models, revealing clear advantages of multimodal integration. DeepMSM achieved the highest diagnostic performance with 93.75% accuracy and 93.81% F1-score, demonstrating superior capability compared to all other evaluated approaches.

Semi-multimodal models utilizing both CBCT and LCR achieved moderate performance, with ResNet50-SM²⁸ reaching 81.25% accuracy, EfficientNet-SM³⁰ achieving 75%, and ResNet18-SM²⁸ obtaining 73.75%. These dual-modality approaches demonstrated the complementary value of combining three-dimensional suture morphology with cervical vertebral maturation indicators, yet remained insufficient for tabular implementation.

Single-modality approaches showed considerable variation in performance across different data sources and architectures. Among CBCT-only models, ResNet50-SG²⁸ achieved the highest accuracy of 71.25%, followed by EfficientNet-SG³⁰ at 70% and ResNet18-SG²⁸ at 56.25%. The tabular data-only baseline model (MLP-SG) achieved only 47.50% accuracy, demonstrating that traditional clinical indicators alone are inadequate for reliable MPS staging. This limited performance of the MLP-SG baseline underscores the complexity of craniofacial maturation patterns that cannot be captured through demographic and developmental parameters without accompanying imaging data.

Manual assessment by orthodontic residents demonstrated substantial challenges, with accuracy rates of 40.11% and 30.36% respectively, highlighting the inherent difficulty of subjective MPS evaluation even with complete access to imaging and tabular information. The performance hierarchy clearly demonstrates the incremental value of multimodal data integration, with DeepMSM’s 93.75% accuracy representing a substantial 12.5% improvement over the best semi-multimodal approach and a 46.25% improvement compared to the tabular baseline (MLP-SG). Most significantly, the automated framework achieved more than double the accuracy of manual assessment, emphasizing the substantial clinical value of standardized, objective diagnostic approaches.

Table 2 Comparative model performance.

Full size table

Detailed performance analysis of DeepMSM

Stage-specific performance

Figure 7 presents the confusion matrix and detailed classification metrics, revealing stage-specific model performance patterns.

DeepMSM demonstrated excellent performance for extreme maturation stages, with Stage A achieving precision of 97%, recall of 95%, and F1-score of 96%, while Stage E reached precision of 96%, recall of 98%, and F1-score of 97%. These results reflect the distinct morphological characteristics of early and complete suture maturation stages, which facilitate reliable automated identification.

For the clinically critical intermediate stages, the system maintained strong diagnostic accuracy across all parameters. Stage B demonstrated precision of 92%, recall of 89%, and F1-score of 91%, while Stage C achieved precision of 91%, recall of 93%, and F1-score of 92%. Most importantly, Stage D showed precision of 94%, recall of 91%, and F1-score of 93%, confirming the model’s capability to accurately identify this crucial treatment decision threshold.

Analysis of misclassification patterns revealed that 87% of errors occurred between adjacent developmental stages, a finding that aligns with clinical expectations given the gradual nature of biological maturation processes. This error distribution is clinically acceptable, as even experienced orthodontists face inherent challenges in distinguishing transitional phases where morphological features overlap between consecutive maturation stages.

Clinical decision-making impact

The model’s ability to accurately distinguish between Stages C and D (92% and 93% F1-scores respectively) is particularly significant for clinical practice, as this differentiation directly determines treatment approach selection between conventional rapid maxillary expansion and surgically-assisted expansion protocols.

Model interpretability and clinical relevance

Model interpretability and validation

Gradient-weighted Class Activation Mapping (Grad-CAM) analysis was employed to visualize and validate the anatomical regions contributing to model decision-making processes (Figure 8). The analysis demonstrated that DeepMSM consistently focused on clinically relevant anatomical structures across all imaging modalities. For cervical vertebral images (CVM-LCR), the model primarily concentrated on C2–C4 vertebral body morphology, particularly the inferior border concavity patterns that are fundamental to established CVM staging criteria. In midpalatal suture images (MPS-CBCT), attention was concentrated on the posterior midpalatal suture regions and surrounding bone density variations, which correspond to the anatomical features used by clinicians for manual MPS assessment. For mandibular third molar images (MTM-CBCT), the model emphasized root development patterns and apical closure status, aligning with the key parameters in Demirjian’s classification system.

These activation patterns provide strong evidence that DeepMSM has learned to identify and utilize the same anatomical features that experienced orthodontists rely upon for manual assessment, supporting the model’s biological validity and clinical applicability.

Clinical validation insights

The systematic attention to anatomically relevant structures demonstrates that DeepMSM has learned clinically meaningful features rather than spurious correlations, supporting its potential for reliable clinical deployment and integration into standard orthodontic diagnostic workflows.

Discussion

This study introduces the DeepMSM framework, a multimodal deep learning system that addresses the critical challenge of accurately assessing midpalatal suture (MPS) maturation. By integrating CBCT, LCR, and clinical data, the model achieved a high accuracy of 93.75%, demonstrating strong performance in differentiating between the clinically pivotal Stages C and D (92% and 93% F1-scores, respectively). This capability directly informs the crucial decision between conventional rapid palatal expansion (RPE) and surgically-assisted rapid palatal expansion (SARPE), promising to enhance treatment planning in orthodontics.

Our results highlight the limitations of current assessment methods, which rely on subjective radiographic interpretation and suffer from significant inter-examiner variability, with reported agreement rates often below 80%^13,15,34. Manual assessments by orthodontic residents in our study yielded accuracies between 30.36% and 40.11%, a finding consistent with benchmarks reported by Chhatwani et al.³⁴, which showed a clear expertise gradient in MPS classification. DeepMSM substantially outperforms these manual baselines by leveraging a multimodal approach that synthesizes three-dimensional anatomical insights from CBCT with skeletal and dental maturation indicators. This integration provides a more comprehensive and objective assessment than any single modality alone. The model’s misclassifications were predominantly between adjacent stages (87%), which is clinically understandable given the gradual nature of biological maturation.

The clinical implications of DeepMSM are twofold. Primarily, it offers a standardized and reproducible tool that can reduce the risk of inappropriate treatment. Accurate identification of Stage C can prevent unnecessary surgeries, while precise Stage D identification helps avoid failed conventional expansion attempts. Secondly, the model serves as a valuable educational and trust-building instrument. Our Grad-CAM visualizations confirm that DeepMSM focuses on clinically relevant anatomical landmarks–such as C2-C4 vertebral morphology and posterior suture characteristics–which fosters clinician trust and acceptance. This transparency also provides an objective guide for trainees, helping to standardize diagnostic approaches and shorten the learning curve associated with this complex assessment.

Despite its promising performance, this study has several limitations that must be acknowledged. First, the single-center, retrospective design necessitates comprehensive external validation to confirm its generalizability across diverse populations, imaging protocols, and clinical settings. Second, the cross-sectional nature of the data limits insights into individual maturation trajectories, a gap that could be addressed by future longitudinal research. Third, our study’s age distribution (mainly 12–29 years) underrepresents older patients who often require MARPE or SARPE due to complete suture fusion^35,40,41, and future multi-center studies should include more mature cases. Finally, our use of 2D images, while computationally efficient, could be enhanced by full 3D architectures in future work.

Future research will focus on addressing these limitations through prospective, multi-center clinical trials to establish the real-world clinical utility of DeepMSM. Technologically, future iterations of the framework may incorporate advancements in AI to further enhance performance. Critically, the successful deployment of DeepMSM requires seamless and responsible integration into clinical workflows. While the system is designed for compatibility with existing PACS and EHR systems, its implementation must be guided by robust data governance and security practices. As recent literature on medical imaging AI emphasizes, addressing legal, ethical, and cybersecurity risks–such as patient consent, data minimization, and security–is paramount for the safe deployment of such technologies in clinical settings⁴².

In summary, DeepMSM represents significant progress toward objective, evidence-based orthodontic assessment. By providing standardized and accurate MPS staging, it serves as an essential decision support tool that complements clinical expertise, with the potential to significantly improve the reliability and safety of maxillary expansion therapy.

Conclusion

This study successfully developed and validated DeepMSM, a multimodal deep learning framework for automated midpalatal suture maturation assessment. The framework achieved 93.75% accuracy by integrating CBCT imaging, lateral cephalometric radiographs, and clinical indicators, substantially outperforming single-modality approaches (47.50%-81.25%) and manual assessment (30.36%-40.11%).

The framework’s ability to accurately distinguish between Stages C and D (92-93% F1-scores) directly addresses the critical clinical challenge in treatment selection between conventional and surgically-assisted maxillary expansion. Grad-CAM analysis confirmed that the model focuses on anatomically relevant structures, supporting its clinical validity.

Study limitations include single-center validation and the need for broader population testing. Future research should focus on multi-center validation and prospective clinical trials to evaluate real-world implementation. Additionally, the systematic exploration and integration of newer generation encoders represents an important direction for future work, as our modular framework design specifically supports such upgrades to benefit from advances in backbone architectures.

DeepMSM provides clinicians with a standardized, objective diagnostic tool that has the potential to improve treatment outcomes for patients requiring maxillary expansion therapy.

Data availability

The datasets generated and analyzed during the current study are available from the corresponding author on reasonable request, subject to ethical approval and data protection regulations.

Code availability

The DeepMSM framework code is available from the corresponding authors upon reasonable request.

References

Ye, G. et al. Comparative evaluation of transverse width indices for diagnosing maxillary transverse deficiency. BMC Oral Health 24, 808 (2024).
Article PubMed PubMed Central Google Scholar
Andrucioli, M. C. D. & Matsumoto, M. A. N. Transverse maxillary deficiency: Treatment alternatives in face of early skeletal maturation. Dental Press J. Orthod. 25, 70–79 (2020).
Article PubMed PubMed Central Google Scholar
Betts, N. J. et al. Diagnosis and treatment of transverse maxillary deficiency. Int. J. Adult Orthodon. Orthognath. Surg. 10, 75–96 (1995).
CAS PubMed Google Scholar
Brunetto, D. P., Sant’Anna, E. F., Machado, A. W. & Moon, W. Non-surgical treatment of transverse deficiency in adults using microimplant-assisted rapid palatal expansion (MARPE). Dental Press J. Orthod. 22, 110–125 (2017).
Article PubMed PubMed Central Google Scholar
Vanarsdall, R. L. Jr. Transverse dimension and long-term stability. Semin. Orthod. 5, 171–180 (1999).
Article PubMed Google Scholar
Sawchuk, D. et al. Diagnostic methods for assessing maxillary skeletal and dental transverse deficiencies: A systematic review. Korean J. Orthod. 46, 331–342 (2016).
Article PubMed PubMed Central Google Scholar
Maraón-Vásquez, G. A. et al. Effect of treatment of transverse maxillary deficiency using rapid palatal expansion on oral health-related quality of life in children: complementary results for a controlled clinical trial. Clin. Oral Investig. 28, 520 (2024).
Google Scholar
Kılıç, N., Kiki, A. & Oktay, H. A comparison of dentoalveolar inclination treated by two palatal expanders. Eur. J. Orthod. 30, 67–72 (2008).
Article PubMed Google Scholar
Rungcharassaeng, K. et al. Factors affecting buccal bone changes of maxillary posterior teeth after rapid maxillary expansion. Am. J. Orthod. Dentofacial Orthop. 132(428), e1-428.e8 (2007).
Google Scholar
Garib, D. G. et al. Rapid maxillary expansion-tooth tissue-borne versus tooth-borne expanders: A computed tomography evaluation of dentoskeletal effects. Angle Orthod. 75, 548–557 (2005).
PubMed Google Scholar
Bell, W. H. & Epker, B. N. Surgical-orthodontic expansion of the maxilla. Am. J. Orthod. 70, 517–528 (1976).
Article CAS PubMed Google Scholar
Betts, N. J. et al. Diagnosis and treatment of transverse maxillary deficiency. Int. J. Adult Orthodon. Orthognath. Surg. 10, 75–96 (1995).
CAS PubMed Google Scholar
Angelieri, F. et al. Midpalatal suture maturation: Classification method for individual assessment before rapid maxillary expansion. Am. J. Orthod. Dentofacial Orthop. 144, 759–769 (2013).
Article PubMed PubMed Central Google Scholar
Angelieri, F., Franchi, L., Cevidanes, L. H. S. & McNamara, J. A. Jr. Diagnostic performance of skeletal maturity for the assessment of midpalatal suture maturation. Am. J. Orthod. Dentofacial Orthop. 148, 1010–1016 (2015).
Article PubMed Google Scholar
Barbosa, N. M. V. et al. Reliability and reproducibility of the method of assessment of midpalatal suture maturation: A tomographic study. Angle Orthod. 89, 71–77 (2019).
Article PubMed Google Scholar
Gao, L. et al. Midpalatal suture CBCT image quantitive characteristics analysis based on machine learning algorithm construction and optimization. Bioengineering 9, 316 (2022).
Article PubMed PubMed Central Google Scholar
Zhu, M. et al. Convolutional neural network-assisted diagnosis of midpalatal suture maturation stage in cone-beam computed tomography. J. Dent. 141, 104808 (2024).
Article PubMed Google Scholar
Tang, H. et al. Prediction of midpalatal suture maturation stage based on transfer learning and enhanced vision transformer. BMC Med. Inform. Decis. Mak. 24, 232 (2024).
Article PubMed PubMed Central Google Scholar
Jimenez-Valdivia, L. M. et al. Midpalatal suture maturation stage assessment in adolescents and young adults using cone-beam computed tomography. Prog. Orthod. 20, 38 (2019).
Article PubMed PubMed Central Google Scholar
Ferrillo, M., Daly, K., Pandis, N. & Fleming, P. S. The effect of vertical skeletal proportions, skeletal maturation, and age on midpalatal suture maturation: a CBCT-based study. Prog. Orthod. 25, 17 (2024).
Article Google Scholar
Cokuner, H. G., Atik, E. & Taner, T. Relationship between midpalatal suture maturation and age and maturation of cervical vertebrae: radiographic evaluation. Acta Odontol. Turc. 35, 79–85 (2018).
Google Scholar
Jang, H. I. et al. Relationship between maturation indices and morphology of the midpalatal suture obtained using cone-beam computed tomography images. Korean J. Orthod. 46, 345–355 (2016).
Article PubMed PubMed Central Google Scholar
Liu, H., Feng, L. & Wang, L. Diagnostic value of cervical vertebral maturation stages for midpalatal suture maturation assessment: a study in the Chinese population. BMC Oral Health 23, 495 (2023).
Article Google Scholar
Šefeldaitė, S. et al. Correlation between third molar mineralization and midpalatal suture maturity: A cone beam computed tomography study. Med. Sci. Monit. 29, e940539 (2023).
Article PubMed PubMed Central Google Scholar
Baccetti, T., Franchi, L. & McNamara, J. A. Jr. An improved version of the cervical vertebral maturation (CVM) method for the assessment of mandibular growth. Angle Orthod. 72, 316–323 (2002).
PubMed Google Scholar
Demirjian, A. Dental development: An index of physiological maturity. Union Med. Can. 109, 832–839 (1980).
CAS PubMed Google Scholar
Mei, X. et al. RadImageNet: An open radiologic deep learning research dataset for effective transfer learning. Radiol. Artif. Intell. 4, e210315 (2022).
Article PubMed PubMed Central Google Scholar
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 770–778 (2016).
Woo, S., Park, J., Lee, J. Y. & Kweon, I. S. CBAM: convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV) 3–19 (2018).
Tan, M. & Le, Q. Efficientnet: rethinking model scaling for convolutional neural networks. In International Conference on Machine Learning 6105–6113 (PMLR, 2019).
Baccetti, T., Franchi, L., Cameron, C. G. & McNamara, J. A. Jr. Treatment timing for rapid maxillary expansion. Angle Orthod. 71, 343–350 (2001).
CAS PubMed Google Scholar
Sayar, G. & Kılınç, D. D. Rapid maxillary expansion outcomes according to midpalatal suture maturation levels. Prog. Orthod. 20, 35 (2019).
Article Google Scholar
Kapetanović, A. et al. Efficacy of miniscrew-assisted rapid palatal expansion (MARPE) in late adolescents and adults: a systematic review and meta-analysis. Eur. J. Orthod. 43, 313–323 (2021).
Article PubMed PubMed Central Google Scholar
Chhatwani, S. et al. Performance of dental students, orthodontic residents, and orthodontists for classification of midpalatal suture maturation stages on cone-beam computed tomography scans-a preliminary study. BMC Oral Health 24, 373 (2024).
Article PubMed PubMed Central Google Scholar
Angelieri, F. et al. Cone beam computed tomography evaluation of midpalatal suture maturation in adults. Int. J. Oral Maxillofac. Surg. 46, 1562–1570 (2017).
Article Google Scholar
Mahdian, A. et al. Correlation assessment of cervical vertebrae maturation stage and mid-palatal suture maturation in an Iranian population. J. World Fed. Orthod. 9, 134–138 (2020).
Google Scholar
Estrada, J. T. et al. Correlation between cervical vertebrae maturation and midpalatal suture fusion in patients aged between 10 and 20 years: a cross-sectional and 3D study. Int. Orthod. 20, 100659 (2022).
Article PubMed Google Scholar
Kumar, S. et al. Skeletal maturation evaluation using mandibular second molar calcification stages. Angle Orthod. 82, 501–506 (2012).
Article PubMed Google Scholar
Chen, J. et al. Correlation between dental maturity and cervical vertebral maturity. Oral Surg. Oral Med. Oral Pathol. Oral Radiol. Endod. 110, 777–783 (2010).
Article PubMed Google Scholar
Colonna, A. et al. Association of the mid-palatal suture morphology to the age and to its density: A CBCT retrospective comparative observational study. Int. Orthod. 19(2), 235–242 (2021).
Article PubMed Google Scholar
Gonzálvez Moreno, A. M. et al. Cone Beam Computed Tomography evaluation of midpalatal suture maturation according to age and sex: A systematic review. Eur. J. Paediatr. Dent. 23(1), 44–50 (2022).
PubMed Google Scholar
Kováč, J. et al. AI-driven facial image analysis for early detection of rare diseases: Legal, ethical, forensic, and cybersecurity considerations. AI 5, 723–736 (2024).
Article Google Scholar
American Academy of Oral and Maxillofacial Radiology. Clinical recommendations regarding use of cone beam computed tomography in orthodontics. [corrected]. Position statement by the American Academy of Oral and Maxillofacial Radiology. Oral Surg Oral Med Oral Pathol Oral Radiol116(2), 238–257 (2013).

Download references

Funding

This work was supported by the National Natural Science Foundation of China (No. 82501193), the Project of Fujian Provincial Department of Finance (No. 2024CZZX04), the Startup Fund for Scientific Research, Fujian Medical University (No. 2024QH1163), and the Fujian Provincial Natural Science Foundation (Nos. 2025J01807 and 2025J01123719).

Author information

Jingwen Cai, Zhenling Wang, Han Wang have contributed equally to this work.

Authors and Affiliations

Fujian Key Laboratory of Oral Diseases, School and Hospital of Stomatology, Fujian Medical University, Fuzhou, 350025, China
Jingwen Cai, Zhenling Wang, Han Wang, Zhonghan Chen, Qinqi Yu & Linyu Xu
Orthodontics Department, School and Hospital of Stomatology, Fujian Medical University, Fuzhou, 350025, China
Jingwen Cai, Zhenling Wang, Han Wang, Zhonghan Chen, Qinqi Yu & Linyu Xu
College of Computer and Data Science, Fuzhou University, Fuzhou, 350108, China
Zhichen Lai
Department of Computer Science, Aalborg University, Aalborg, 9220, Denmark
Zhichen Lai

Authors

Jingwen Cai
View author publications
Search author on:PubMed Google Scholar
Zhenling Wang
View author publications
Search author on:PubMed Google Scholar
Han Wang
View author publications
Search author on:PubMed Google Scholar
Zhonghan Chen
View author publications
Search author on:PubMed Google Scholar
Qinqi Yu
View author publications
Search author on:PubMed Google Scholar
Zhichen Lai
View author publications
Search author on:PubMed Google Scholar
Linyu Xu
View author publications
Search author on:PubMed Google Scholar

Contributions

J.C., Z.W., and H.W. conceived the study and designed the methodology. J.C., Z.W., H.W., and Z.C. collected and preprocessed the data. Z.L. developed the deep learning algorithms and implemented the code. Z.C. and Q.Y. conducted the experiments and analyzed the results. Z.L. and L.X. supervised the project and provided critical guidance. All authors reviewed the manuscript.

Corresponding authors

Correspondence to Zhichen Lai or Linyu Xu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Ethics approval and consent to participate

This retrospective study was approved by the Ethics Committee of the Affiliated Stomatological Hospital of Fujian Medical University (approval number: [2024] FJMU Oral Ethics Review No. 94). All experiments were performed in accordance with relevant guidelines and regulations including the Declaration of Helsinki and institutional ethical standards for human subjects research. Given the retrospective nature of this study using anonymized radiographic data originally acquired for routine clinical diagnostic purposes, the Ethics Committee granted a waiver of informed consent in accordance with institutional guidelines and applicable regulations.

Consent for publication

Not applicable. This study used only anonymized radiographic data with no identifiable patient information or images.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Cai, J., Wang, Z., Wang, H. et al. Multimodal deep learning for midpalatal suture assessment in maxillary expansion. Sci Rep 15, 39723 (2025). https://doi.org/10.1038/s41598-025-23500-2

Download citation

Received: 02 June 2025
Accepted: 07 October 2025
Published: 12 November 2025
Version of record: 12 November 2025
DOI: https://doi.org/10.1038/s41598-025-23500-2