Abstract
Bimodal convolutional neural networks (CNNs) are frequently combined with patient information or several medical images to enhance the diagnostic performance. However, the technologies that integrate automatically generated clinical measurements within the images are scarce. Hence, we developed a bimodal model that produced automatic algorithm for clinical measurement (aaCM) from radiographic images and integrated the model with CNNs. In this multicenter research project, the diagnostic performance of the model was investigated with 813 radiographic hip images of infants at risk of developmental dysplasia of the hips (232 and 581 images of unstable and stable hips, respectively), with the ground truth defined by provocative examinations. The results indicated that the accuracy of aaCM was equal or higher than that of specialists, and the bimodal model showed better diagnostic performance than LightGBM, XGBoost, SVM, and single CNN models. aaCM can provide expert’s knowledge in a high level, and our proposed bimodal model has better performance than the state-of-art models.
Similar content being viewed by others
Introduction
Developmental dysplasia of the hip (DDH) is one of the most common congenital abnormalities of the musculoskeletal apparatus affecting infants, ranging from mild dysplasia to dislocated hips1. Provocative maneuvers have been used widely for screening unstable hips2,3,4,5. Therapeutic interventions should be performed on the unstable hips of infants under 6 months6, whereas infants with stable hips can be observed. Notably, a randomized study found that interventions for stable dysplasia did not affect acetabular growth7, thus making it crucial to identify unstable hips because therapeutic strategies differ according to hip instability. Image investigations can also assist with diagnosing DDH with the use of the following four parameters on the hip region: acetabular index, O-edge angle, Yamamuro A, and Yamamuro-B8,9 (Fig. 1). Thus, these clinical measurements are considered substantial for detecting unstable hips.
Machine learning models have been introduced in multiple image modalities for DDH detection10,11,12,13. In general, physicians gain patient information from a medical interview and/or pathological reviews when making an image-based diagnosis. Conventionally, the data from medical interview or pathological review were obtained manually. Recently, many researchers combined machine learning models and images with the patient information14,15,16,17,18,19. These integrated models performed better than image-only models14,19,20. Thus, now data can also be obtained using another multi-modal technique. Previously, few researchers combined series of radiographic images to detect fractures or anomalies in bimodal convolutional neural network (CNN) models21,22. Some researchers developed multi-modal models by adding manual tabular data or another external image to a single image. However, the technologies that integrate automatically-generated clinical measurements within the images are scarce. Thus, we hypothesized that automatic algorithm of clinical measurements (aaCM) with high accuracy can enhance the diagnostic performance.
Here, we developed a bimodal machine learning model to generate aaCM and later integrated it with radiographic images. Our proposed model was divided into the two components: aaCM and pure CNN. Regarding aaCM: traditional image processing has been applied to calculate fine parameters, such as angle and distance, of the radiographic images pertaining to DDH23,24,25; application of artificial intelligence and CNN has also profoundly impacted image processing in medical field26. Based on it, in this study, hybrid image processing was developed for clinical measurements. In particular, CNN was applied to detect the regions of interest (ROIs) in the radiographic images. Next, traditional image processing was applied to detect bone contours, identify feature points, and generate the parameters.
This study was aimed at evaluating the diagnostic performance of the bimodal model on infantile hip images by integrating clinical measurements produced by aaCM.
Materials and methods
Model development
Datasets and ground truth
In this multicenter and retrospective study, we enrolled all infants at risk for DDH, who visited the orthopedic department of our hospital and a related hospital between 2010 and 2020 based on the Japanese Pediatric Orthopaedic Association’s guidelines27. Anterior–posterior X-rays were collected from infants aged 4–6 months. A total of 813 images were collected as Digital Imaging and Communications in Medicine (DICOM) data, with no data excluded. As a binary ground truth, three orthopedic surgeons with 22, 17, and 16 years of experience defined whether infants had an unstable hip. An unstable hip was diagnosed if provocative examinations, such as the Barlow or Ortolani tests, were positive. Therapeutic intervention such as brace treatment was performed on infants with unstable hips. The three surgeons also grounded the four parameters in each image. Demographic data in the whole dataset showed the affected side (right or left) in infants with unstable hips (Table 1). Six-fold cross-validation stratified by binary ground truth was adopted for the dataset28.
Framework of the proposed models
The settings were prepared as follows: Setting A included pure CNNs; Setting B included aaCM followed by models for tabular data; Setting C was the integrated model (Fig. 2). In Setting A, unstable hips were directly predicted from the X-ray images. In Setting B, ROIs were extracted from the X-ray images, and then, four clinical parameters per a hip were generated. Thus, this process yielded eight clinical measurements per an image, which were then used to predict unstable hips using models for tabular data. In Setting C, the features extracted from Setting A were concatenated with the clinical measurements obtained from aaCM, and the combined data were fed into a fully connected model for prediction.
Overview of end-to-end models. The settings were prepared as follows: Setting A as pure CNNs, Setting B as aaCM followed by models for tabular data, and Setting C as the bimodal model. In Setting A, the unstable hips directly predicted from the X-ray images. In Setting B, the regions of interest (ROIs) were extracted from the X-ray images, and then, four clinical parameters per a hip (acetabular index, O-edge angle, Yamamuro A, Yamamuro B) were generated. Finally, unstable hips were predicted using models for tabular data. In Setting C, the features obtained from Setting A were concatenated with the automatically-produced clinical measurements. The combined data were fed into a fully connected model for prediction. aaCM: automatic algorithms of clinical measurements; CNNs: convolutional neural networks on images.
aaCM part
Definition of bone ROIs and hip ROIs
As training data, the right and left sides of the acetabulum, proximal femur, and ischium were annotated and validated by the three surgeons. As these ROIs were considered bone ROIs (bROIs), each image had six bROIs. Using the ipsilateral three bROIs, a rectangle with hip ROIs (hROIs) was built as follows: height was defined as the distance between the upmost point of bROIs and the bottommost point, whereas width was defined as the distance between the lateralmost and innermost points.
Component 1: ROI detection
YOLOv5 was used to detect and extract six distinct bROIs from each image, represented as rectangular regions29, with the S, M, and L models trained based on input parameters. To ensure the detection of all six bROIs in each image, we applied post-processing steps to remove any overlapping detections, guaranteeing the presence of exactly six unique bROIs per image. As augmentation methods, in addition to the basic techniques provided by the library29, we flipped the images and bROIs horizontally to correct for the right and left sides.
Component 2: yielding clinical measurements
Binarization with adaptive thresholding was applied to each bROI to transform the bone area into a blob30, while γ correction was adopted to adjust the image contrast. The local threshold was calculated at every individual point of the image with sliding window image processing31. The threshold value is based on the intensity of the pixel and its neighborhood, with the blob itself detected by labeling processing. The contour of the blob was described by a convex hull. Then, the featured points based on the bone contour were detected as follows (Fig. 3):
-
P1: the bottommost point in the ROI containing the acetabulum.
-
P2: the lateral point to P1with straight line.
-
P3: the lateralmost point in ROIs with the ischium.
-
P4: the upmost point in ROIs.
-
P5: the point nearest to the middle point between the innermost points in ROIs with the femur.
Automatic algorithm of the clinical parameters. For each bROI extracted by YOLOv5, binarization was applied to transform the bone area into a blob. The local threshold was calculated at every individual point of the image. Then, the blob was detected by labeling processing. The contour of the blob colored by red was described by a convex hull. Next, the featured points based on the contour were detected as follows; P1: the bottommost point in the ROI containing the acetabulum; P2: the lateral point to P1with straight line; P3: the lateralmost point in ROIs with the ischium; P4: the upmost point in ROIs; P5: the point nearest to the middle point between the innermost points in ROIs with the femur. bROI: bone region of interest; YOLO: You Only Look Once.
The radiographic hip parameters were measured using P1–P5. The output was defined as the clinical measurements acetabular index, O-edge angle, Yamamuro A, and Yamamuro B.
Setting A: pure CNNs
As widely investigated models, EfficientNet B0, B4, and B8 models were investigated, with their initial parameters ported from ImageNet-pretrained models32,33,34,35. A sigmoid function was used for activation, and binary cross-entropy loss was used to train the NN36. Each source image was converted into an 8-bit grayscale image. Then, the resulting pixel values were normalized using the mean and standard deviation of the dataset. Image augmentation, including flip, Gauss noise, blur, Contrast-Limited Adaptive Histogram Equalization (CLAHE), and saturation processes, were performed. Furthermore, the gradient-weighted class activation mapping (Grad-CAM) technique was applied to an image in each fold37.
Setting B: aaCM followed by models for tabular data
Light Gradient-Booting Machine (LightGBM), support vector machine (SVM), random forest algorithm (RF), and logistic regression (LR) models were applied and compared. LightGBM was trained38 using the outputs of aaCM. The maximum depth was three, early stopping round was 50, and boost round was 10,000. For the SVM model, we used the radial basis function (RBF) kernel, and the optimal values for the γ and C parameters were determined through a grid search.
To compare feature importance of the affected and contralateral sides in the infants with unstable hips, SHapley Additive exPlations (SHAP) methodology was applied39,40 to the 232 positive cases (Table 1). SHAP values can individually represent feature-importance estimates and the direction39,40; features with a positive sign contribute to the prediction. The color of each point in the plot represents the relative magnitude of the feature value, while the position indicates the SHAP value; higher SHAP values signify a greater positive impact on the outcome. Thus, the top-ranked parameters have higher SHAP values.
Setting C: integrated model
The features obtained from the convolutional layers of the CNNs with the clinical measurements were concatenated as the input for the fully connected layer. This architecture was originally created to enable training and inference in a single shot by dynamically combining the cached features with additional inputs in an online setting, rather than caching the generated features.
Evaluation metrics
Six-fold cross-validation has been used to reach a majority decision41,42. It was used in this study to evaluate the outcomes. The model output undergoes a logistic transformation to constrain the predicted probabilities (_p_) between 0 and 1. The model performance was evaluated by comparing _p_ with the ground truth labels at different thresholds using following metrics:
where TP, FP, FN and TN are the abbreviations for the true positive, false positive, false negative, and true negative, respectively.
The reported values are based on the Youden Index43, which maximizes the sum of recall and specificity. The receiver operating curve (ROC) plots recall against the false positive rate, while the precision-recall (PR) curve plots precision against recall. The areas under these curves (AUC) summarize the overall performance of the model.
The mean average precision (mAP) was calculated to evaluate the performance of ROI detection44. Mean absolute errors between the clinical measurements from aaCM and the two orthopedic specialists were calculated.
Several evaluation metrics were analyzed to compare the settings: accuracy, average AUPRC, AUROC, and F1 score for the test data in each fold41. These values were compared using a paired t-test with Bonferroni correction for multiple comparisons42.
Ethics approval and consent to participate
This multicenter and retrospective study was approved by the local ethics committee of Hokkaido University hospital (018-0397) and by a participating hospital, and informed consent was obtained from every person prior to their inclusion in this study. The guidelines of human ethics according to the Declaration of Helsinki were followed.
Results
Performance of end-to-end models for unstable hips
The performance was evaluated among Settings A–C using the CNNs and models for tabular data (Table 5). After performing six-fold cross-validation using EfficientNet B4, the average accuracy of the models for predicting unstable hips was 0.809 (± 0.010) and 0.822 (± 0.024) for Settings A and B, respectively (Fig. 4a–c). The average area under the PR curve (AUPRC) was 0.733 (± 0.043) and 0.706 (± 0.042) for Settings A and B, respectively. The average area under the ROC (AUROC) was 0.840 (± 0.029) and 0.789 (± 0.037) for Settings A and B, respectively. The average F1 score was 0.690 (± 0.027) and 0.710 (± 0.039) for Settings A and B, respectively. Setting C was trained on all images simultaneously, obtaining an average accuracy of 0.832 (± 0.018), average AUPRC of 0.804 (± 0.053), average AUROC of 0.885 (± 0.022), and average F1 score of 0.738 (± 0.025). Thus, Setting C offered significantly better performance than Settings A and B with P ≤ 0.05 for accuracy, AUPRC, AUROC, and F1 score (Fig. 4a–c).
Model performance. (a) Distribution of the accuracy, AUPRC, AUROC, and F1 score of the model. EfficientNetB4 models were trained during six-fold cross-validation per group. The 10th, 50th (median), and 90th quantiles, as well as minimum and maximum, are shown. A paired t-test with Bonferroni correction for multiple comparisons. *P < 0.05, **P < 0.01, ***P < 0.001 compared with setting C. (b) Precision–recall and receiver operating characteristics curves of settings A, B, and C. The mean of the six-fold cross-validation is shown. AUPRC: the area under the precision-recall curve; AUROC: the area under the receiver operating characteristics.
The diagnostic performance was evaluated using EfficientNet B0, B4 and B8; the average AUROC in Setting C was highest at 0.885 (± 0.022) with EfficientNet B4. It was significantly higher than the average AUROC in Setting A at 0.848 (± 0.027) (P = 0.02) with EfficientNet B8.
ROI detection
The right and left sides of the acetabulum, proximal femur, and ischium were annotated. As these ROIs were considered bROIs, each image had six bROIs. The six bROIs in all images were extracted using YOLOv5 models. The mean value of mAP50 for hROIs was 0.978 (± 0.011) in YOLOv5 S, 0.987 (± 0.004) in YOLOv5 M, and 0.990 (± 0.003) in L (Table 4). Using the ipsilateral three bROIs, a rectangle with hip ROIs (hROIs) was built.
Accuracy of the estimated parameters by aaCM
The mean absolute error of the estimated acetabular index was 1.76° on the right and 1.91° on the left, lesser than previously reported13,24 (Table 2). Moreover, the estimated acetabular index produced by aaCM had a significantly smaller error than that produced by the orthopedic specialists (Table 3).
SHAP values in estimated parameters by aaCM
The feature importance scores of the individual parameters in the 232 positive cases (Table 1) were evaluated between the affected and contralateral sides (Fig. 4). The parameters with the higher SHAP values signify a greater positive impact on the outcome and are ranked higher. Thus, the three parameters at the affected side—Yamamuro A, O-edge angle, and acetabular index—had higher positive impacts than those at the contralateral side. Furthermore, the Grad-CAM heatmaps also revealed that the weighted region was consistent with the affected side in the positive case (Fig. 5b).
Feature importance and Grad-CAM. (a) SHAP values of each parameter in the images of the infants with unstable hips. The position indicates the SHAP value, and higher SHAP values signify a greater positive impact on the outcome. The top-ranked parameters have higher SHAP values. The color of each point in the plot represents the relative magnitude of the feature value, while the position indicates the SHAP value. (b) Representative images of positive and negative cases in which Grad-CAM heat-map was integrated with hROIs. SHAP: the SHapley Additive exPlanations; hROIs: hip regions of interest. Contr: contralateral.
Discussion
We developed a bimodal model, in which X-ray images were integrated with automatically-produced clinical measurements. The bimodal model showed an average AUPRC and AUROC of 0.804 (± 0.060) and 0.885 (± 0.022), respectively. Thus, it outperformed the pure CNNs. For Setting A, we obtained an average AUPRC and AUROC of 0.733 (± 0.050) and 0.840 (± 0.029), respectively. Furthermore, the accuracy of aaCM was equal to or higher than the orthopedic specialists. Feature importance scores on the parameters at affected side were also highly ranked than those at the contralateral side (Tables 4, 5).
The diagnostic performance of the end-to-end model enhanced without any manual preparation of clinical data by physicians, owing to an ad hoc adoption of clinical insights. According to some previous studies, patient demographic information enhances the diagnostic performance of medical images in orthopedic and other areas14,19,45. Patient information can also improve the performance of artificial intelligence on various modalities of medical images: computed tomography, magnetic resonance imaging, pathological images. Generally, the patient demographic information is manually prepared. In another multi-modal technique, radiographic images in different views (anterior, posterior, lateral, and oblique condition) were combined and used as input data of CNNs to detect fractures or anomalies21,22. Yoon et al. developed a bimodal CNN by combining gray scale images and scalograms of electrocardiogram. This model outperformed single CNN models19. Other researchers developed bimodal models by adding manual tabular data or another image to a single image. Conversely, our model automatically created additional data from a single image. The clinical measurements are based on expert knowledge; therefore, Setting C might not have been established if the models had been utilized only by technicians without domain knowledge about DDH. Ad hoc adoptions like our model could contribute toward performance reinforcement independently from versatile machine learning (ML) techniques.
The estimated parameters produced by aaCM had an absolute error smaller than those of the orthopedic specialists or previous reports13,24. YOLOv5 was applied to detect ROIs in the radiographic images. Then, traditional image processing was applied to detect contour of bone, identify feature points, and generate parameters with high accuracy. Thus, the combination of traditional image processing and artificial intelligence had sufficient performance to measure the clinical parameters. Artificial intelligence and CNN has profoundly impacted image processing in medical fields26. As aaCM was attributable to the reliable detection using YOLOv5, the robustness of this model on radiographic images was as also shown by previous literature related to DDH46. Individual feature importance was also evaluated by SHAP39,40, revealing that the parameters at affected side were highly weighted compared to those at contralateral side. This result was consistent with the Grad-CAM heatmaps and exhibited moderate diagnostic performance in Setting B.
Clinically, unstable hips have been conventionally diagnosed by provocative maneuvers2,3,4,5, which are widely used around the world as screening techniques. However, the reliability of this test depends on the skill and experience of the examiner, with iatrogenic effects possible from repeated examinations47; thus, a standardized image analysis method is desired to address this issue. Automated analysis of images could decrease the number of provocative maneuvers or shorten the waiting period for infants to get the correct diagnosis.
This study has some limitations. First, in this study, we used radiographic images. While ultrasound inspection is a representative modality used to evaluate DDH in clinics, radiographs are also accepted for four- to six-month-old infants. Some reports state that radiographs are preferred for this age48,49. Second, the ground truth for unstable hips is provocative maneuvers, considered a conventional method. Although those maneuvers are widely used for screening, dynamic or static ultrasounds might also be preferred for defining unstable hips. Third, our dataset was relatively small compared to that of a previous study13 because we focused on six-month-old infants, a critical age group for DDH.
In conclusion, we presented the bimodal model on infantile hip images integrated with clinical measurements. We demonstrated that the automatic algorithm of clinical measurements had high accuracy and reinforced the diagnostic performance.
Data availability
The datasets used and/or analyzed during the current study are available from the corresponding author upon reasonable request.
Code availability
The underlying code for this study is available from the corresponding author upon reasonable request.
References
Mureşan, S., Mărginean, M. O., Voidăzan, S., Vlasa, I. & Sîntean, I. Musculoskeletal ultrasound: a useful tool for diagnosis of hip developmental dysplasia: One single-center experience. Medicine 98, e14081. https://doi.org/10.1097/md.0000000000014081 (2019).
Cook, K. A. et al. Pavlik Harness initiation on Barlow positive hips: Can we wait?. J. Orthop. 16, 378–381. https://doi.org/10.1016/j.jor.2019.03.012 (2019).
Neal, D. et al. Comparison of Pavlik Harness treatment regimens for reduced but dislocatable (Barlow positive) hips in infantile DDH. J. Orthop. 16, 440–444. https://doi.org/10.1016/j.jor.2019.06.027 (2019).
Jackson, J. C., Runge, M. M. & Nye, N. S. Common questions about developmental dysplasia of the hip. Am. Fam. Phys. 90, 843–850 (2014).
Williams, N. Improving early detection of developmental dysplasia of the hip through general practitioner assessment and surveillance. Aust. J. Gener. Pract. 47, 619–623 (2018).
Agostiniani, R. et al. Recommendations for early diagnosis of Developmental Dysplasia of the Hip (DDH): Working group intersociety consensus document. Ital. J. Pediatr. 46, 150. https://doi.org/10.1186/s13052-020-00908-2 (2020).
Pollet, V. et al. Abduction treatment in stable hip dysplasia does not alter the acetabular growth: Results of a randomized clinical trial. Sci. Rep. 10, 9647. https://doi.org/10.1038/s41598-020-66634-1 (2020).
Narayanan, U. et al. Reliability of a new radiographic classification for developmental dysplasia of the hip. J. Pediatr. Orthop. 35, 478–484. https://doi.org/10.1097/bpo.0000000000000318 (2015).
Ohmori, T. et al. Radiographic prediction of the results of long-term treatment with the Pavlik harness for developmental dislocation of the hip. Acta Med. Okayama 63, 123–128. https://doi.org/10.18926/amo/31850 (2009).
Kinugasa, M. et al. Diagnosis of developmental dysplasia of the hip by ultrasound imaging using deep learning. J. Pediatr. Orthop. 43, e538–e544. https://doi.org/10.1097/bpo.0000000000002428 (2023).
Zhang, S. C. et al. Clinical application of artificial intelligence-assisted diagnosis using anteroposterior pelvic radiographs in children with developmental dysplasia of the hip. Bone Jt. J. 102-b, 1574–1581. https://doi.org/10.1302/0301-620x.102b11.Bjj-2020-0712.R2 (2020).
Fraiwan, M., Al-Kofahi, N., Ibnian, A. & Hanatleh, O. Detection of developmental dysplasia of the hip in X-ray images using deep transfer learning. BMC Med. Inform. Decis. Mak. 22, 216. https://doi.org/10.1186/s12911-022-01957-9 (2022).
Liu, C. et al. Misshapen pelvis landmark detection with local-global feature learning for diagnosing developmental dysplasia of the hip. IEEE Trans. Med. Imaging 39, 3944–3954. https://doi.org/10.1109/tmi.2020.3008382 (2020).
Kita, K. et al. Bimodal artificial intelligence using TabNet for differentiating spinal cord tumors-Integration of patient background information and images. iScience 26, 107900. https://doi.org/10.1016/j.isci.2023.107900 (2023).
Tiulpin, A. et al. Multimodal machine learning-based knee osteoarthritis progression prediction from plain radiographs and clinical data. Sci. Rep. 9, 20038. https://doi.org/10.1038/s41598-019-56527-3 (2019).
Joo, S. et al. Multimodal deep learning models for the prediction of pathologic response to neoadjuvant chemotherapy in breast cancer. Sci. Rep. 11, 18800. https://doi.org/10.1038/s41598-021-98408-8 (2021).
Han, X. et al. The value of longitudinal clinical data and paired CT scans in predicting the deterioration of COVID-19 revealed by an artificial intelligence system. iScience 25, 104227. https://doi.org/10.1016/j.isci.2022.104227 (2022).
Bozzo, A., Tsui, J. M. G., Bhatnagar, S. & Forsberg, J. Deep learning and multimodal artificial intelligence in orthopaedic surgery. J. Am. Acad. Orthop. Surg. 32, e523–e532. https://doi.org/10.5435/jaaos-d-23-00831 (2024).
Yoon, T. & Kang, D. Bimodal CNN for cardiovascular disease classification by co-training ECG grayscale images and scalograms. Sci. Rep. 13, 2937. https://doi.org/10.1038/s41598-023-30208-8 (2023).
Zheng, X. et al. Deep learning radiomics can predict axillary lymph node status in early-stage breast cancer. Nat. Commun. 11, 1236. https://doi.org/10.1038/s41467-020-15027-z (2020).
Choi, J. W. et al. Using a dual-input convolutional neural network for automated detection of pediatric supracondylar fracture on conventional radiography. Invest. Radiol. 55, 101–110. https://doi.org/10.1097/rli.0000000000000615 (2020).
Rayan, J. C., Reddy, N., Kan, J. H., Zhang, W. & Annapragada, A. Binomial classification of pediatric elbow fractures using a deep learning multiview approach emulating radiologist decision making. Radiol. Artif. Intell. 1, e180015. https://doi.org/10.1148/ryai.2019180015 (2019).
Yang, G. et al. A semi-automatic diagnosis of hip dysplasia on X-ray films. Front. Mol. Biosci. 7, 613878. https://doi.org/10.3389/fmolb.2020.613878 (2020).
Al-Bashir, A. K., Al-Abed, M., Abu Sharkh, F. M., Kordeya, M. N. & Rousan, F. M. Algorithm for automatic angles measurement and screening for Developmental Dysplasia of the Hip (DDH). Annu. Int. IEEE Eng. Med. Biol. Soc. 2015, 6386–6389. https://doi.org/10.1109/embc.2015.7319854 (2015).
Li, Q. et al. Auxiliary diagnosis of developmental dysplasia of the hip by automated detection of Sharp’s angle on standardized anteroposterior pelvic radiographs. Medicine 98, e18500. https://doi.org/10.1097/md.0000000000018500 (2019).
Pinto-Coelho, L. How artificial intelligence is shaping medical imaging technology: A survey of innovations and applications. Bioengineering https://doi.org/10.3390/bioengineering10121435 (2023).
Shimizu, T. et al. Validation of parameters recommended for secondary screening for developmental dysplasia of the hip in Japan. J. Orthop. Sci. https://doi.org/10.1016/j.jos.2023.06.013 (2023).
Jung, Y. & Hu, J. A K-fold averaging cross-validation procedure. J. Nonparametric Stat. 27, 167–179. https://doi.org/10.1080/10485252.2015.1010532 (2015).
Zhu, X., Lyu, S., Wang, X. & Zhao, Q. in Proceedings of the IEEE/CVF international conference on computer vision. 2778–2788.
Korzynska, A. et al. Validation of various adaptive threshold methods of segmentation applied to follicular lymphoma digital images stained with 3,3’-Diaminobenzidine&Haematoxylin. Diagn. Pathol. 8, 48. https://doi.org/10.1186/1746-1596-8-48 (2013).
Sezgin, M. & Sankur, B. Survey over image thresholding techniques and quantitative performance evaluation. J. Electron. .ing 13, 146–165. https://doi.org/10.1117/1.1631315 (2004).
Tan, M. & Le, Q. in International conference on machine learning. 6105–6114 (PMLR).
Marques, G., Ferreras, A. & de la Torre-Diez, I. An ensemble-based approach for automated medical diagnosis of malaria using EfficientNet. Multimed. Tools Appl. 81, 28061–28078. https://doi.org/10.1007/s11042-022-12624-6 (2022).
Chen, X. et al. Application of EfficientNet-B0 and GRU-based deep learning on classifying the colposcopy diagnosis of precancerous cervical lesions. Cancer Med. 12, 8690–8699. https://doi.org/10.1002/cam4.5581 (2023).
Sharma, N. et al. EfficientNetB0 cum FPN based semantic segmentation of gastrointestinal tract organs in MRI scans. Diagnostics https://doi.org/10.3390/diagnostics13142399 (2023).
Tan, M. & Le, Q. V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. arXiv:1905.11946 (2019). https://ui.adsabs.harvard.edu/abs/2019arXiv190511946T
Selvaraju, R. R. et al. in 2017 IEEE International Conference on Computer Vision (ICCV). 618–626.
Ke, G. et al. in Proceedings of the 31st International Conference on Neural Information Processing Systems 3149–3157 (Curran Associates Inc., Long Beach, California, USA, 2017).
Rodríguez-Pérez, R. & Bajorath, J. Interpretation of machine learning models using shapley values: application to compound potency and multi-target activity predictions. J. Comput. Aided Mol. Des. 34, 1013–1026. https://doi.org/10.1007/s10822-020-00314-0 (2020).
Shapley, L. (Princeton University Press New Jersey, 1953).
Foersch, S. et al. Multistain deep learning for prediction of prognosis and therapy response in colorectal cancer. Nat. Med. 29, 430–439. https://doi.org/10.1038/s41591-022-02134-1 (2023).
Moncada-Torres, A., van Maaren, M. C., Hendriks, M. P., Siesling, S. & Geleijnse, G. Explainable machine learning can outperform Cox regression predictions and provide insights in breast cancer survival. Sci. Rep. 11, 6968. https://doi.org/10.1038/s41598-021-86327-7 (2021).
Schisterman, E. F., Faraggi, D., Reiser, B. & Hu, J. Youden Index and the optimal threshold for markers with mass at zero. Stat. Med. 27, 297–315. https://doi.org/10.1002/sim.2993 (2008).
Lin, T.-Y. et al. in Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6–12, 2014, Proceedings, Part V 13. 740–755 (Springer).
Vollmer, A. et al. Multimodal artificial intelligence-based pathogenomics improves survival prediction in oral squamous cell carcinoma. Sci. Rep. 14, 5687. https://doi.org/10.1038/s41598-024-56172-5 (2024).
Den, H., Ito, J. & Kokaze, A. Diagnostic accuracy of a deep learning model using YOLOv5 for detecting developmental dysplasia of the hip on radiography images. Sci. Rep. 13, 6693. https://doi.org/10.1038/s41598-023-33860-2 (2023).
Sewell, M. D. & Eastwood, D. M. Screening and treatment in developmental dysplasia of the hip-where do we go from here?. Int. Orthop. 35, 1359–1367. https://doi.org/10.1007/s00264-011-1257-z (2011).
Schaeffer, E., Lubicky, J. & Mulpuri, K. AAOS appropriate use criteria: The management of developmental dysplasia of the hip in infants up to 6 months of age: Intended for use by general pediatricians and referring physicians. J. Am. Acad. Orthop. Surg. 27, e364–e368. https://doi.org/10.5435/jaaos-d-18-00500 (2019).
Shaw, B. A. & Segal, L. S. Evaluation and referral for developmental dysplasia of the hip in infants. Pediatrics https://doi.org/10.1542/peds.2016-3107 (2016).
Author information
Authors and Affiliations
Contributions
H. S. designed the study and collected data, all supervised by D.T.. K.E. and H.K. conducted models. D.T., T.S., and S.S. defined the positive cases. K.S. and T.O. measured the clinical parameters. H. Shimizu wrote the manuscript, and K.E. illustrated the figures. N.I. and S.T. directed the entire study.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Shimizu, H., Enda, K., Koyano, H. et al. Bimodal machine learning model for unstable hips in infants: integration of radiographic images with automatically-generated clinical measurements. Sci Rep 14, 17826 (2024). https://doi.org/10.1038/s41598-024-68484-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-024-68484-7