An AI based classifier model for lateral pillar classification of Legg–Calve–Perthes

Soydan, Zafer; Saglam, Yavuz; Key, Sefa; Kati, Yusuf Alper; Taskiran, Murat; Kiymet, Seyfullah; Salturk, Tuba; Aydin, Ahmet Serhat; Bilgili, Fuat; Sen, Cengiz

doi:10.1038/s41598-023-34176-x

Download PDF

Article
Open access
Published: 27 April 2023

An AI based classifier model for lateral pillar classification of Legg–Calve–Perthes

Zafer Soydan¹,
Yavuz Saglam²,
Sefa Key³,
Yusuf Alper Kati⁴,
Murat Taskiran⁵,
Seyfullah Kiymet⁵,
Tuba Salturk⁶,
Ahmet Serhat Aydin²,
Fuat Bilgili² &
…
Cengiz Sen²

Scientific Reports volume 13, Article number: 6870 (2023) Cite this article

2700 Accesses
8 Citations
1 Altmetric
Metrics details

Subjects

Abstract

We intended to compare the doctors with a convolutional neural network (CNN) that we had trained using our own unique method for the Lateral Pillar Classification (LPC) of Legg–Calve–Perthes Disease (LCPD). Thousands of training data sets are frequently required for artificial intelligence (AI) applications in medicine. Since we did not have enough real patient radiographs to train a CNN, we devised a novel method to obtain them. We trained the CNN model with the data we created by modifying the normal hip radiographs. No real patient radiographs were ever used during the training phase. We tested the CNN model on 81 hips with LCPD. Firstly, we detected the interobserver reliability of the whole system and then the reliability of CNN alone. Second, the consensus list was used to compare the results of 11 doctors and the CNN model. Percentage agreement and interobserver analysis revealed that CNN had good reliability (ICC = 0.868). CNN has achieved a 76.54% classification performance and outperformed 9 out of 11 doctors. The CNN, which we trained with the aforementioned method, can now provide better results than doctors. In the future, as training data evolves and improves, we anticipate that AI will perform significantly better than physicians.

Artificial intelligence tools trained on human-labeled data reflect human biases: a case study in a large clinical consecutive knee osteoarthritis cohort

Article Open access 05 November 2024

Hierarchical deep learning pipeline for robust cervical parameter measurement in radiographs with C7 obscuration

Article Open access 19 February 2026

Automatic measurement of the Cobb angle for adolescent idiopathic scoliosis using convolutional neural network

Article Open access 04 September 2023

Introduction

LCPD is an idiopathic avascular necrosis of the femoral head in the pediatric population. Treatment choice mainly depends on the prediction of the prognosis of femoral head sphericity and the congruency of the hip joint. The indication of containment treatment is determined by prognostic prediction. Several prognostic classification systems have been proposed for this purpose. Herring et al.¹ described LPC in 1992 to predict the prognostic severity of disease at the fragmentation phase. According to LPC, the height of the lateral pillar is normal in group A. At least 50% of lateral pillar height was preserved in group B, and less than 50% of lateral pillar height was maintained in group C.

There are many studies in the literature about the interobserver reliability and prognostic value of LPC. Many studies reported moderate to high interobserver reliability and high prognostic value^{2,3,4,5,6,7,8,9,10,11}, but as an opposing view, Agus et al.¹² reported superior reliability for the Salter-Harris and Catterall classifications over the LPC. They found poor reliability in both intra- and inter-observer agreement in their study. Liggieri et al.¹³ stated that the degree of interobserver concordance was far from ideal for the three classification systems, including LPC, and advised complementary systems for staging. Some authors criticized this system by stating that there was a change between the first and last classifications^10,11,14. Even according to Price¹⁵, LPC may only be predictive in retrospect after the opportunity for containment has passed, and there is still a need for the prospective indicators to guide treatment choice. Given the criticisms in the literature, as LPC may not be sufficient for staging in every patient, the prognosis may not be as expected. We anticipate that CNN will be a sharper prospective indicator in terms of classification and prognosis.

There are studies in the literature in which AI-based systems are used to solve the problems of proximal femur bone shape deformation in LCPD patients. The proximal femur bone shape deformity rates were determined using the patient's healthy femoral head in the study conducted by Memis et al. in 2021 using an image registration approach. Experimental studies using Magnetic Resonance Imaging (MRI) data from 13 patients suffering from LCPD have shown that the algorithm yields very promising results¹⁶. However, this technique is not useful for patients suffering from LCPD on both sides. In a subsequent study published in 2021 by the same research group, a method for rapidly and accurately extracting bilateral proximal femurs from bilateral hip joint MRI images utilizing random subsampling points was provided¹⁷. In the study conducted by Bugeja et al., a new automated three-dimensional (3D) pipeline for segmentation and measurement of cam volume, surface area, and height using MRI data from 97 patients suffering from femoroacetabular impingement (FAI) was provided. The proposed method in this study consists of two main parts: 3D U-net with focused shape modeling (FSM) for segmenting the proximal femur and identifying the cam bone mass and obtaining the anatomical information of the patient¹⁸. In another study using U-Net architecture, Memis et al. performed multiform proximal femur and femoral head bone segmentation from low-quality MRI sections with U-Net architecture¹⁹. These studies do not include any classification methods; they simply provide segmentation. In 2021, Memiş et al. used Support Vector Machine (SVM), one of the most widely used Machine Learning (ML) algorithms in the literature, to classify the Waldenstrom Stages of LCPD in MRI images and reported a 62% success rate²⁰. These studies were all carried out using MRI scans. There is no research on LPC that uses AI with X-ray images. Also, in an other study on LCPD, the objective was not to stage the disease but rather to identify whether it was present or not, and it was claimed that the study was a great success²¹.

When the studies in the literature are examined, it is seen that the studies generally focus on the proximal femur bone shape segmentation. The reason for the small number of classification studies is the low number of patients suffering from LCPD and the failure to get the data required for training DL networks by researchers. Within the scope of this study, data that could represent LCPD patients were derived from modified data of healthy individuals, allowing researchers to train a DL network that could assist orthopedists with LPC. We aimed to show how CNN will give a result in terms of both reliability and percentage agreement in this regard. The flowchart, which provides information about the training and testing processes, is given in (Fig. 1).

Then, three more orthopaedists and two radiologists were added to the study, and the results of a total of nine orthopaedists and two radiologists were compared to those of CNN in terms of their classification accuracy.

The following sections provide more information about the proposed methods and the study's results.

Materials and methods

LCPD dataset

The study was approved by the local ethics committee of the Istanbul University Faculty of Medicine (No. 15/01/2020-003) and conducted in compliance with the 1964 Helsinki Declaration. The Ethics Committee of the Istanbul University Faculty of Medicine waived the requirement for informed consent because all data were anonymized and de-identified. Clinical records of LCPD patients on follow-up at two institutions between 2008 and 2016 were reviewed. One hundred eighteen patient hip radiographs were retrieved. Radiographs were excluded if they were not in the fragmentation phase. Hip radiographs taken in the neutral position from a distance of 120 cm in the full anteroposterior plane were used. Eighty-one radiographs were available for this study (72 males and 9 females, ages 4–12 years; mean age was 9.46; 45 right hips and 36 left hips). Only four males have bilateral LCPD. The time after onset was 4–6 weeks. At first, six orthopaedists with more than five years of pediatric orthopaedic experience made the classification individually. Then, a definitive list (a consensus list) was made by the unanimous decision of these six orthopaedists. We used the classic LPC and not the modified one because we did not have enough group b/c patient images to test. We had only four real patient images, which were classified as group b/c by doctors. This amount of data is not sufficient for researchers to obtain a reliable result. So, we decided to use classic LPC. Reliability tests were conducted using these six orthopaedists’ individual results and CNN’s results. After reliability tests, we added five more independent doctors, including two radiologists, for a percentage agreement test. They labeled 81 images individually as A, B, or C class. Finally, we compared the classification success rates of all of them using the percentage agreement test. Since all of the doctors were already familiar with LPC, none of them were given a tutorial at the start of this study.

The data set, described in Table 1, contains few samples due to the low prevalence of LCPD in the general population. A large number of images from various classes must be employed in order to train CNN architectures. The quantity of radiographs was insufficient for training the CNN architecture, so the authors devised a method for creating a training dataset. In this method, 328 radiographic images of healthy hips were obtained from patients between the ages of 4 and 12. 130 randomly chosen radiographs from the 328 images are labeled “group A”. Two computer engineers applied Adobe Photoshop® software to modify the femoral heads of the remaining 198 images under the guidance of six orthopedists. They produce 114 group B and 84 group C labeled images according to LPC (Fig. 2). Since modified images were created for training, all of the real LCPD radiographs were reserved to test the generated CNN model.

Table 1 Dataset description.

Full size table

Data Augmentation

In our study, there were 328 images of the femoral head for training, but the number of training samples was not enough to obtain acceptable results. According to Wong et al.²², augmentation has a significant impact on improving performance and reducing overfitting issues. Training data was augmented by horizontal flipping, zooming, and shearing of every image. At the end of the process, the number of training images increased to 1312 (520 for group A, 456 for group B, and 336 for group C).

Preprocessing

The femoral heads that will be used in testing and training processes were segmented from real patient and modified hip radiographs in this section. During this process, the determination of femoral head boundaries and segmentation of these parts were performed under the supervision of six orthopedists (Fig. 3). After segmentation, other preprocessing approaches such as histogram equalization, sharpening, or morphological operations were applied to enhance the visual quality of the images. However, it was revealed that these preprocessing techniques had no significant impact on CNN's performance. So, it was decided not to use these preprocessing steps after segmentation because the CNN can work well enough without them.

Deep learning (DL) and proposed CNN architecture

DL is a ML method that has recently attracted the attention of the medical sciences as well as many other fields. Unlike traditional ML methods, the DL algorithm learns features directly from the data via its multilayered structures²³. To efficiently extract input data features, DL computes their internal parameters in the forward pass, then iteratively adjusts them during backpropagation. DL's forward and backpropagation procedures improve DL's learning capacity and provide a means to learn complex patterns in computer vision and other domains²⁴. DL can be investigated through four main network structures. These are Deep Reinforcement Learning, Deep Boltzmann Machines, Recurrent Neural Networks (RNN), and CNNs. Systems that employ deep reinforcement learning interact with their surroundings and pick up knowledge from them²⁵. Deep Boltzmann Machines are used for processing data by learning a probability density function over the inputs²⁶. RNN are special in that they can operate on a series of vectors over time. This indicates that they can relate earlier knowledge to the current progress. With its several advantages, such as being more similar to the human visual processing system, CNN is employed in this study. It is widely used for the classification of images and is also proficient in learning and extracting features from 2D data²⁷.

CNNs are a common type of deep learning neural network that is used for image classification applications. The network is composed of layers, each of which has a specific function. The fundamental layers of a CNN consist of the following:

Convolutional layer Convolution operations are carried out by this layer on the input image to aid in the feature extraction process. Each of the filters used by the convolutional layer on top of the image is designed to detect a certain feature in the image.

Activation layer This layer applies an activation function to the output of the convolutional layer, which is often the rectified linear unit (ReLU) function in image classification. The ReLU function enhances the network's capacity to learn complicated features by introducing non-linearity into the system.

Pooling layer Pooling is used to minimize the spatial dimensions of the feature maps, which reduces the computational cost of the network and increases the robustness of the features to slight changes in an object's position within an image.

Fully connected layer All the neurons in one layer are linked to the neurons in the next layer by this layer. Its purpose is to learn non-linear combinations of features from the previous layer.

Output layer This layer generates the network's final output, which could be a class label or a collection of class probabilities for an image classification task.

Training process

Figure 4 demonstrates the proposed neural network design, which consists of six convolutional layers, each followed by an activation and pooling layer. After the activation layers have normalized the output of the convolutional layers, the pooling layers use a maximum pooling operation with a kernel size of 2 × 2 to minimize the dimension of the data. Employing 3 × 3 filters, the first three convolutional layers generate 128 feature maps, while the following three convolutional layers generate 256 feature maps. The architecture also has a flattened layer and two fully connected layers. The activation function for all activation layers and the first fully connected layer is the ReLU function, which also serves as a solution to the vanishing gradient problem²⁸. Only the last fully connected layer uses the Softmax activation function because the class of input is determined in this layer. Also, between the first and second fully connected layers, 0.2 ratio of randomly chosen neuron connections are dropped out to avoid overfitting²⁹. While creating the CNN model, optimizers are implemented to minimize the error between the predicted output and the true output. Due to its capacity to manage the complexity of deep learning and its ability to adaptively adjust the learning rate for each parameter, the proposed network implements the Adaptive Moment Estimation (Adam) Optimizer with a learning rate of 0.001³⁰. The selection of hyper-parameters such as number of convolution layers, activation functions, optimizer, dropout rate, learning rate and batch size used in this study were determined by grid search algorithm and the hyper-parameters that gave the best results were used in the study. Determining hyper parameters with the grid search algorithm is a widely used method in deep learning studies. Determining the hyper-parameters with an algorithm among various values enables the deep neural network to give results with higher performance in different problems and contributes to the creation of the model suitable for the problem. (Fig. 4).

Once the architecture of the network was established, images with dimensions of 224 × 224 × 3 were given to the first convolutional layer with a 32-batch size, and the training process was started with 1312 images labeled by six doctors. In the study, the epoch number was chosen as 1000, which is a large value to avoid the underfitting problem, and the early stopping criterion was applied to stop training to avoid overfitting when the validation value declined. The training was terminated automatically between 416 and 482 epochs, each epoch time was around 30 s, and the weights with the highest validation success were recorded. The early stopping criterion is optimized by using different patience values and patience value was chosen as 50 to prevent overfitting. Radiographic images of real patients, referred to as “test images”, are never used in the training process. They're only used throughout the testing process.

Evaluation of metrics

After the training process of the model was completed, the network parameters obtained were recorded, and the network was made ready for experimental tests. Various evaluation metrics, such as accuracy, precision, specificity, recall (sensitivity), and F1 score, are used to evaluate the performance of the trained CNN during the tests. Accuracy is the most common evaluation metric for CNN classification systems because it is easy to understand and interpret. However, when dealing with unbalanced data, the F1 score is an effective option for evaluating the classification model. We also demonstrate the other evaluation metrics to support our classification model results. The accuracy rate of the system was 83.33% on validation and 76.54% on test images; the F1 score of the system on test images is 77.13% according to the consensus list.

$$Accuracy=\frac{\sum_{i}^{Total Class Number}{TP}_{i}+{TN}_{i}}{Total Observations}$$

$$Precision=\frac{\sum_{i}^{Total Class Number}{TP}_{i}}{({TP}_{i}+{FP}_{i})}$$

$$Specificity=\frac{TN}{(FP+TN)}$$

$$Sensitivity=\frac{TP}{(TP+FN)}$$

$$F1=2*\frac{Precision*Sensitivity}{Precision+Sensitivity}$$

At this stage, to evaluate the performance of CNN against doctors by percentage agreement, three more orthopaedists and two radiologists who were completely unaware of this study participated in this study and independently classified the test group. A success ranking was determined after the results were obtained.

For statistical analysis, fleiss kappa, weighted kappa, and the intraclass correlation coefficient using the SPSS software pack (version 21, IBM, New York, USA) were used. P value < 0.05 was considered for statistical significance^31,32,33,34.

Hardware and software used in the experimental studies

Experimental studies carried out within the scope of this study were executed using an AMD Athlon X940 Black Edition processor, 16 GB of DDR3 RAM, and MSI GTX 1070 graphics card. In experimental studies, Python 3.6.5 was used as a programming language. In studies where Keras libraries are used to create DL architectures, Keras and Numpy libraries are used for image preprocessing. Evaluation metric values are calculated by authors using accuracy, precision, specificity, sensitivity, and F1 score equations.

Ethics and consent to participate

The study protocol was approved by the institutional review board of the local ethics committee(No 15-01-2020/003). No patient consent was required because of the retrospective nature of the study. Patient data were anonymized in the paper.

Results

According to the classification result, the confusion matrix shown in Table 2 is created. All accuracy, precision, sensitivity, and specificity results for every class and the whole system were calculated with the information given in the confusion matrix. In the confusion matrix in Table 2, the columns represent the classification results performed by the model, while the rows represent the ground truths of the data. Accuracy, Precision, Specificity, Sensitivity, F1 Score evaluation metrics represent the values obtained for each class, respectively, and the values in the last column represent the total result. The reason for using evaluation metrics other than Accuracy is to evaluate the performance of the model in each class separately and to prove that the model is not inclined to recognize a class.

Table 2 The Confusion Matrix and Evaluation Metrics of CNN Classification.

Full size table

The most successful model accuracy rate is 83.33% on validation, 76.54% on test images, according to the consensus list, and approximately 99.5% on training. Other precision, specificity, sensitivity, and F1 score results are shown in Table 2.

The results of six orthopaedists and CNN were analyzed using the Fleiss Kappa coefficient for the total and weighted Kappa between individual observers. Interobserver reliability was determined.

The ICC values of each orthopaedist were above 0.8, indicating good or excellent reliability. The ICC of CNN was 0.868, which is one of the highest in the literature. The kappa value for interobserver reliability between CNN and orthopaedists was 0.433–0.607 (Table 3).

Table 3 A summary of weighted k values and ICC Weighted kappa values less than 0.20 indicate poor agreement, 0.21 to 0.40 indicate fair agreement, 0.41 to 0.60 indicate moderate agreement, 0.61 to 0.80 indicate good agreement, and greater than 0.80 indicate very good agreement. ICC below 0.5 is considered poor reliability, 0.5 and 0.75 as moderate reliability, 0.75–0.9 as good reliability, and over 0.90 as excellent reliability^31,32.

Full size table

We found that the overall Fleiss Kappa was 0.532, which indicates moderate agreement among all the reviewers, including CNN. Group A-B-C kappa scores separately were 0.677–0.354–0.565, respectively. The highest scores for the interobserver agreement were between DR1 and DR2, and between DR1 and consensus. CNN had the third highest agreement with the consensus among six orthopaedists (Table 4).

Table 4 Fleiss Kappa among all reviewers. 0.41–0.60 is considered moderate, 0.61–0.80 is considered substantial, and more than 0.80 is considered excellent^33,34.

Full size table

To compare CNN's success to that of doctors, the rate at which images were correctly labeled by doctors based on the consensus list was compared to CNN's success rate. CNN's classification success rate was greater than 9 out of 11 doctors. CNN determined 62 of 81 (76.5%) images to be true. If we take them separately, CNN was achieved to determine 27/29 (93.1%) in group A, 18/31 (58.1%) in group B, and 17/21 (80.9%) in group C (Table 5).

Table 5 Quantitative evaluation of all assessors and CNN.

Full size table

The distribution of LPC obtained by each doctor and CNN network is given in detail in Fig. 5.

Discussion

In this study, LPC was performed using DL algorithms. Controversial results regarding the interobserver reliability of the LPC have been reported in the literature. Therefore, a DL algorithm was developed that can help orthopedists first. Then, reliability and performance tests were carried out. In this context, hip radiographs representing various stages of LCPD were created using the aforementioned method, and the CNN model was trained using these radiographs. The model, whose training was completed, was tested using 81 hip radiographs belonging to real patients. Eleven doctors' test results were compared with the CNN model's results.

Several clinical studies of CNNs have been proposed in the medical sciences since 2012. The initial AI applications in medicine have usually focused on image recognition tasks, such as diabetic retinopathy, mammographic lesions, and skin cancer recognition. With varying degrees of success, CNN was used in orthopedics for recognition and segmentation tasks such as fractures of various bones, meniscal tears, knee cartilage lesions, anterior cruciate ligament tears, vertebral body localization and segmentation, bone age assessment, and degenerative spine interpretation³⁵. While various DL models demonstrate expert-level performance even now, they also have the potential to exceed expert physicians in the future. Olczak et al.³⁶ compared five openly available DL networks to identify fracture, laterality, body part, and exam view and published at least 90% accuracy for laterality, body part, and exam view and 83% accuracy for fracture identification. Chung et al.³⁷ trained a deep CNN to detect and classify the proximal humerus fractures using 1891 images. They concluded that CNN showed superior performance over general physicians and orthopedists and similar performance to orthopedists specializing in the shoulder. In complex 3- and 4-part fractures, CNN showed superior performance. Helm et al.³⁸ reviewed the literature and reported that AI helps orthopedic surgeons better serve their patients. Urakawa et al.³⁹ reported in their study that AI was better than five orthopedists for detecting hip fractures. In their systematic review, Langerhuizen et al.⁴⁰ reported that AI reflected near-perfect prediction for fracture detection in five studies. AI outperformed human examiners for the detection and classification of hip and proximal femur fractures. One study showed equivalent performance for wrist, hand, and ankle fractures.

The accuracy and precision of our CNN model were 76.54% and 76.91%, respectively, compared to the 62% accuracy and 52% precision result in the support vector machine classifier, according to the Waldenstrom classification of LCPD²⁰.

LPC has some advantages, including reproducibility, ease of application (requires only an AP radiograph), and high correlation with the outcome. According to many authors, LPC has the highest interclass reliability among the three classification systems^{6,7,8,9,10,11}.

Recent studies demonstrated highly variable reliability; the ICC for LPC was reported as 0.388–0.596¹², and as 0.70–0.85⁴¹, so the ICC of CNN in this study was one of the highest previously reported. CNN's weighted kappa was compatible with previously reported values of 0.722⁷, 0.71–0.79⁴¹, 0.65–0.70⁴², 0.56–0.70⁴³, and 0.510⁹.

Many authors studied the prognostic value of this classification, and a high association was found with the final outcome^2,3,4,5. Patients with Group A and Group B below eight years of age are treated symptomatically; patients with Group C and the Group B/C border above eight years of age need surgical containment. However, there is a debate at this point. Some authors thought that there were changes in the groups between the first and last classifications. Lappin et al.¹¹ stated that 75% of group A and 30% of group B cases required upgrading within seven months of initial symptoms. Meurer et al.¹⁴ reported the need for upgrading in 30% of cases. Park et al.¹⁰ stated that 45% of cases needed upgrades between their initial and final grades. Thus, the initial and final grades of the hip can be different, and this difference may postpone the containment treatment to preserve the shape of the femoral head, especially in patients under eight years of age. Even Akgun et al.⁴⁰ suggested the use of measurements instead of estimates, especially in borderline cases when LPC was to be used as the prognostic indicator. We predict that CNN will be a more accurate predictive indicator in terms of classification and predicting prognosis, especially in the future, as underlined by Price¹⁵.

We have some limitations in our study. First, test and training data should never be mixed to accurately assess the CNN's performance. When similar studies in the literature were examined, it was seen that many real patient images were used in the studies. Chung et al.³⁷ used 515 normal and 1376 fractured proximal humerus images for this purpose. Urakawa et al.³⁹ used a total of 3346 hip images to train CNN in their study. Even considering that the incidence of LCPD ranges from 0.4–29/100,000 children < 15 years of age⁴⁴, the probability of finding data close to the numbers in the mentioned studies is almost negligible in a two-center study. Thus, we did not have enough labeled radiographs to use them separately to execute both the test and training. We designate such a method to overcome this problem. The images, which were obtained by modifying normal hip radiographs, were used only to train CNN, while real patients' images were used only to test. Secondly, by our method, we only changed the height of the femoral head, especially in the lateral pillar. Femoral head fragmentation seen in the natural course of LCPD was not present in our images obtained from normal hip radiographs. Although we believe that this disadvantage may have negatively affected our results, our CNN still showed better performance than 9 out of 11 doctors, which is a very promising result. We believe if we had the same number of real patient radiographs as we produced to train, the performance of CNN could surpass us all. In addition, our study demonstrates that it is possible to train CNN and obtain good results using the data produced by the aforementioned method, particularly in studies where patient-specific data is insufficient.

Conclusion

It is widely known that large quantities of data are used during the training of CNN to ensure successful outcomes. This research presents a novel method for training deep neural networks in the presence of limited data, such as that associated with rare medical conditions like LCPD, by employing modified data during training. AI systems can be employed for many rare diseases with this technique, in which only healthy data is utilized by modifying it. Moreover, this is the first study to perform LPC on x-ray images of LCPD patients. The results indicate that the classification accuracy of the developed model is superior to that of many of the doctors who participated in the study. Since the modified data is constructed with a predetermined pattern structure, we predict that the real pattern structure will yield superior results, as it will reflect not just the changes in head height, but also other structural alterations that occur in the femoral head during the disease.

Data availability

The data used during the current study are available from the corresponding author on reasonable request.

Abbreviations

LCPD:: Legg–Calve–Perthes disease
LPC:: Lateral pillar classification
AI:: Artificial İntelligence
CNN:: Convolutional neural network
DL:: Deep learning
ML:: Machine learning
ANN:: Artificial neural network
MRI:: Magnetic Resonance Imaging
ReLU:: Rectified linear unit
TP:: True positive
TN:: True negative
FP:: False positive
FN:: False negative
ICC:: Intraclass correlation coefficient

References

Herring, J. A., Neustadt, J. B., Williams, J. J., Early, J. S. & Browne, R. H. The lateral pillar classification of Legg-Calvé-Perthes disease. J. Pediatric Orthop. 12, 143–150 (1992).
Article CAS Google Scholar
Wiig, O., Terjesen, T. & Svenningsen, S. Prognostic factors and outcome of treatment in Perthes’ disease: A prospective study of 368 patients with five-year follow-up. J. Bone Joint Surg. British 90, 1364–1371 (2008).
Article CAS Google Scholar
Herring, J. A., Kim, H. T. & Browne, R. Legg-Calvé-Perthes disease: Part II: Prospective multicenter study of the effect of treatment on outcome. JBJS 86, 2121–2134 (2004).
Article Google Scholar
Ritterbusch, J. F., Shantharam, S. S. & Gelinas, C. Comparison of lateral pillar classification and Catterall classification of Legg-Calvé-Perthes’ disease. J. Pediatr. Orthop. 13, 200–202 (1993).
CAS PubMed Google Scholar
Huhnstock, S. et al. Radiographic classifications in Perthes disease: Interobserver agreement and association with femoral head sphericity at 5-year follow-up. Acta Orthop. 88, 522–529 (2017).
Article PubMed PubMed Central Google Scholar
Mahadeva, D., Chong, M., Langton, D. J. & Turner, A. M. Reliability and reproducibility of classification systems for Legg-Calvé-Perthes disease: A systematic review of the literature. Acta Orthopædica Belgica 76, 48 (2010).
PubMed Google Scholar
Sambandam, S. N., Gul, A., Shankar, R. & Goni, V. Reliability of radiological classifications used in Legg–Calve–Perthes disease. J. Pediatric Orthop. B 15, 267–270 (2006).
Article Google Scholar
Kollitz, K. M. & Gee, A. O. Classifications in brief: The Herring lateral pillar classification for Legg-Calvé-Perthes disease. Clin. Orthop. Relat. Res. 471, 2068–2072 (2013).
Article PubMed PubMed Central Google Scholar
Podeszwa, D. A., Stanitski, C. L., Stanitski, D. F., Woo, R. & Mendelow, M. J. The effect of pediatric orthopaedic experience on interobserver and intraobserver reliability of the Herring lateral pillar classification of Perthes disease. J. Pediatric Orthop. 20, 562–565 (2000).
Article CAS Google Scholar
Park, M. S., Chung, C. Y., Lee, K. M., Kim, T. W., & Sung, K. H. Reliability and stability of three common classifications for Legg-Calvé-Perthes disease. Clinical Orthopaedics and Related Research®, 470, 2376–2382 (2012).
Lappin, K., Kealey, D. & Cosgrove, A. Herring classification: How useful is the initial radiograph?. J. Pediatric Orthop. 22, 479–482 (2002).
Article Google Scholar
Agus, H., Kalenderer, O., Eryanlmaz, G. & Ozcalabi, I. T. Intraobserver and interobserver reliability of Catterall, Herring, Salter-Thompson and Stulberg classification systems in Perthes disease. J. Pediatric Orthop. B 13, 166–169 (2004).
Google Scholar
Liggieri, A. C., Tamanaha, M. J., Abechain, J. J., Ikeda, T. M. & Dobashi, E. T. Intra and interobserver concordance between the different classifications used in Legg-Calvé-Perthes disease. Revista Brasileira de Ortopedia 50, 680–685 (2015).
Article PubMed PubMed Central Google Scholar
Meurer, A., Schwitalle, M., Humke, T., Rosendahl, T. & Heine, J. Vergleich der prognostischen Aussagefähigkeit der Catterall-und Herring-Klassifikation bei Patienten mit Morbus Perthes. Z. Orthop. Ihre Grenzgeb. 137, 168–172 (1999).
Article CAS PubMed Google Scholar
Price, C. T. The lateral pillar classification for Legg-Calve-Perthes disease. J. Pediatric Orthop. 27, 592–593 (2007).
Article Google Scholar
Memiş, A., Varlı, S. & Bilgili, F. A novel approach for computerized quantitative image analysis of proximal femur bone shape deformities based on the hip joint symmetry. Artif. Intell. Med. 115, 102057 (2021).
Article PubMed Google Scholar
Memiş, A., Varlı, S. & Bilgili, F. Fast and accurate registration of the proximal femurs in bilateral hip joint images by using the random sub-sample points. IRBM 43, 130–141 (2022).
Article Google Scholar
Bugeja, J. M. et al. Automated volumetric and statistical shape assessment of cam-type morphology of the femoral head-neck region from 3D magnetic resonance images. arXiv preprint arXiv:2112.02723 (2021).
Memiş, A., Varlı, S. & Bilgili, F. Semantic segmentation of the multiform proximal femur and femoral head bones with the deep convolutional neural networks in low quality MRI sections acquired in different MRI protocols. Comput. Med. Imaging Graph. 81, 101715 (2021).
Article Google Scholar
Memiş, A., Varlı, S., & Bilgili, F. Automatic classification of the waldenstrom stages of Legg-Calve-Perthes disease from the 2D total proximal femur shape deformity. In 2021 International Conference on INnovations in Intelligent SysTems and Applications (INISTA) (pp. 1–6). IEEE (2021).
Davison, A. K., Cootes, T. F., Perry, D. C., Luo, W., Medical Student Annotation Collaborative, & Lindner, C. Perthes disease classification using shape and appearance modelling. In Computational Methods and Clinical Applications in Musculoskeletal Imaging: 6th International Workshop, MSKI 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 16, 2018, Revised Selected Papers 6 (pp. 86–98). (2018).
Wong, S. C., Gatt, A., Stamatescu, V., & McDonnell, M. D. Understanding data augmentation for classification: when to warp?. In 2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA) (pp. 1–6). IEEE https://doi.org/10.1109/DICTA.2016.7797091. (2016)
Chartrand, G. et al. Deep learning: A primer for radiologists. Radiographics 37, 2113–2131 (2017).
Article PubMed Google Scholar
Liu, X. Y., Fang, Y., Yang, L., Li, Z., & Walid, A. High-performance tensor decompositions for compressing and accelerating deep neural networks. In Tensors for Data Processing (pp. 293–340). Academic Press (2022). https://doi.org/10.1016/B978-0-12-824447-0.00018-2
Arulkumaran, K., Deisenroth, M. P., Brundage, M. & Bharath, A. A. Deep reinforcement learning: A brief survey. IEEE Signal Process. Mag. 34, 26–38 (2017).
Article ADS Google Scholar
Srivastava, N., & Salakhutdinov, R. R. Multimodal learning with deep boltzmann machines. Adv. Neural Inf. Process. Syst. 25. (2012)
Alom, M. Z. et al. A state-of-the-art survey on deep learning theory and architectures. Electronics 8, 292 (2019).
Article Google Scholar
Ide, H., & Kurita, T. Improvement of learning for CNN with ReLU activation by sparse regularization. In 2017 International Joint Conference on Neural Networks (IJCNN) (pp. 2684–2691). IEEE (2017) https://doi.org/10.1109/IJCNN.2017.7966185
Baldi, P. & Sadowski, P. The dropout learning algorithm. Artif. Intell. 210, 78–122 (2014).
Article MathSciNet PubMed PubMed Central MATH Google Scholar
Şen, S. Y., & Özkurt, N. Convolutional neural network hyperparameter tuning with adam optimizer for ECG classification. In 2020 Innovations in Intelligent Systems and Applications Conference (ASYU) (pp. 1–6). IEEE (2020). https://doi.org/10.1109/ASYU50717.2020.9259896
Landis, J. R. & Koch, G. G. The measurement of observer agreement for categorical data. Biometrics 33, 159–174 (1977).
Article CAS PubMed MATH Google Scholar
Cohen, J. Weighted kappa: nominal scale agreement provision for scaled disagreement or partial credit. Psychol. Bull. 70, 213 (1968).
Article CAS PubMed Google Scholar
Koo, T. K. & Li, M. Y. A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J. Chiropr. Med. 15, 155–163 (2016).
Article PubMed PubMed Central Google Scholar
Fleiss, J. L. Measuring nominal scale agreement among many raters. Psychol. Bull. 76, 378 (1971).
Article Google Scholar
Chea, P. & Mandell, J. C. Current applications and future directions of deep learning in musculoskeletal radiology. Skeletal Radiol. 49, 183–197 (2020).
Article PubMed Google Scholar
Olczak, J. et al. Artificial intelligence for analyzing orthopedic trauma radiographs. Acta Orthop. 88, 581–586 (2017).
Article PubMed PubMed Central Google Scholar
Chung, S. W. et al. Automated detection and classification of the proximal humerus fracture by using deep learning algorithm. Acta Orthop. 89, 468–473. https://doi.org/10.1080/17453674.2018.1453714 (2018).
Article PubMed PubMed Central Google Scholar
Helm, J. M. et al. Machine learning and artificial intelligence: Definitions, applications, and future directions. Curr. Rev. Musculoskelet. Med. 13, 69–76 (2020).
Article PubMed PubMed Central Google Scholar
Urakawa, T. et al. Detecting intertrochanteric hip fractures with orthopedist-level accuracy using a deep convolutional neural network. Skeletal Radiol. 48, 239–244 (2019).
Article PubMed Google Scholar
Langerhuizen, D. W. G. et al. What are the applications and limitations of artificial intelligence for fracture detection and classification in orthopaedic trauma imaging? A systematic review. Clin. Orthop. Relat. Res. 477, 2482–2491. https://doi.org/10.1097/CORR.0000000000000848 (2019).
Article PubMed PubMed Central Google Scholar
Herring, J. A., Kim, H. T. & Browne, R. Legg-Calvé-Perthes disease: Part I: Classification of radiographs with use of the modified lateral pillar and Stulberg classifications. JBJS 86, 2103–2120 (2004).
Article Google Scholar
Akgun, R. et al. The accuracy and reliability of estimation of lateral pillar height in determining the herring grade in Legg-Calve-Perthes disease. J. Pediatr. Orthopaed. 24, 651–653 (2004).
Article Google Scholar
Wiig, O., Terjesen, T. & Svenningsen, S. Inter-observer reliability of radiographic classifications and measurements in the assessment of Perthes’ disease. Acta Orthop. Scand. 73, 523–530 (2002).
Article PubMed Google Scholar
Loder, R. T. & Skopelja, E. N. The epidemiology and demographics of legg-calvé-perthes’ disease. Int. Sch. Res. Not. https://doi.org/10.5402/2011/504393 (2011).
Article Google Scholar

Download references

Funding

No financial support was received for this study.

Author information

Authors and Affiliations

Orthopedics and Traumatology, Bhtclinic İstanbul Tema Hastanesi, Nisantası University, Atakent Mh 4. Cadde No 36 PC, 34307, Kucukcekmece, Istanbul, Turkey
Zafer Soydan
Orthopedics and Traumatology, Istanbul University Istanbul Faculty of Medicine, Istanbul, Turkey
Yavuz Saglam, Ahmet Serhat Aydin, Fuat Bilgili & Cengiz Sen
Orthopedics and Traumatology, Bingol State Hospital, Bingol Merkez, Turkey
Sefa Key
Orthopedics and Traumatology, Antalya Egitim ve Arastirma Hastanesi, Antalya, Turkey
Yusuf Alper Kati
Department of Electronics and Communication Engineering, Yildiz Technical University, Istanbul, Turkey
Murat Taskiran & Seyfullah Kiymet
Department of Informatics, Yildiz Technical University, Istanbul, Turkey
Tuba Salturk

Authors

Zafer Soydan
View author publications
Search author on:PubMed Google Scholar
Yavuz Saglam
View author publications
Search author on:PubMed Google Scholar
Sefa Key
View author publications
Search author on:PubMed Google Scholar
Yusuf Alper Kati
View author publications
Search author on:PubMed Google Scholar
Murat Taskiran
View author publications
Search author on:PubMed Google Scholar
Seyfullah Kiymet
View author publications
Search author on:PubMed Google Scholar
Tuba Salturk
View author publications
Search author on:PubMed Google Scholar
Ahmet Serhat Aydin
View author publications
Search author on:PubMed Google Scholar
Fuat Bilgili
View author publications
Search author on:PubMed Google Scholar
Cengiz Sen
View author publications
Search author on:PubMed Google Scholar

Contributions

Concept – Z.S., Y.S., S.K., Y.A.K., M.T., T.S., A.S.A, F.B., C.S. .; Design – Z.S.,T.K., T.I.; Supervision – Z.S., M.T., S.K., T.S., F.B., C.S.,; Materials – Z.S., Y.S., S.K., Y.A.K., A.S.A., F.B., C.S., .; Data Collection and/or Processing – Z.S., Y.S., M.T., S.K., T.S., F.B., C.S..; Analysis and/or Interpretation—Z.S., Y.S., M.T., S.K., T.S., F.B., C.S; Literature Review – Z.S., Y.S., M.T., S.K., T.S., F.B., C.S.; Writing – Z.S., Y.S., S.K., Y.A.K., M.T., T.S., A.S.A, F.B., C.S.; Critical Review—Z.S., Y.S., S.K., Y.A.K., M.T., T.S., A.S.A, F.B., C.S.

Corresponding author

Correspondence to Zafer Soydan.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Soydan, Z., Saglam, Y., Key, S. et al. An AI based classifier model for lateral pillar classification of Legg–Calve–Perthes. Sci Rep 13, 6870 (2023). https://doi.org/10.1038/s41598-023-34176-x

Download citation

Received: 17 November 2022
Accepted: 25 April 2023
Published: 27 April 2023
Version of record: 27 April 2023
DOI: https://doi.org/10.1038/s41598-023-34176-x

This article is cited by

Femoral head vascular status in early-stage Legg–Calvé–Perthes disease assessed by contrast-enhanced magnetic resonance imaging: comparison with the contralateral side
- Shiyu Huang
- Xiang Cheng
- Yuxi Su
Orphanet Journal of Rare Diseases (2025)
Deep learning-assisted screening and diagnosis of scoliosis: segmentation of bare-back images via an attention-enhanced convolutional neural network
- Xingyu Duan
- Xiaojun Ma
- Ningkui Niu
Journal of Orthopaedic Surgery and Research (2025)
Artificial intelligence and pediatric imaging data: ethical strategies for learning and collaboration
- Konstantinos Vrettos
- Konstantina Giouroukou
- Michail E. Klontzas
Pediatric Radiology (2025)
Ethical guidance for reporting and evaluating claims of AI outperforming human doctors
- Jojanneke Drogt
- Megan Milota
- Karin Jongsma
npj Digital Medicine (2024)

Subjects

Abstract

Similar content being viewed by others

Artificial intelligence tools trained on human-labeled data reflect human biases: a case study in a large clinical consecutive knee osteoarthritis cohort

Hierarchical deep learning pipeline for robust cervical parameter measurement in radiographs with C7 obscuration

Automatic measurement of the Cobb angle for adolescent idiopathic scoliosis using convolutional neural network

Introduction

Materials and methods

LCPD dataset

Data Augmentation

Preprocessing

Deep learning (DL) and proposed CNN architecture

Training process

Evaluation of metrics

Hardware and software used in the experimental studies

Ethics and consent to participate

Results

Discussion

Conclusion

Data availability

Abbreviations

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's note

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Femoral head vascular status in early-stage Legg–Calvé–Perthes disease assessed by contrast-enhanced magnetic resonance imaging: comparison with the contralateral side

Deep learning-assisted screening and diagnosis of scoliosis: segmentation of bare-back images via an attention-enhanced convolutional neural network

Artificial intelligence and pediatric imaging data: ethical strategies for learning and collaboration

Ethical guidance for reporting and evaluating claims of AI outperforming human doctors

Search

Quick links