Introduction

Clubfoot deformity is one of the most common congenital orthopaedic disorders1. Worldwide approximately 175,000 children are born with unilateral or bilateral clubfoot every year2. A clubfoot is characterized by a three-dimensional deformity that involves the foot as well as the lower leg and consists of four main characteristics: equinus, cavus, varus of the heel and adductus of the forefoot. Left untreated the clubfoot leads to deformity, functional disability and pain3. Nowadays, the Ponseti method is the gold standard for the initial treatment of the clubfoot4,5. Despite its good initial outcomes, children who undergo the Ponseti method still suffer from a relapse in 1.9% up to 67.3% of the time6,7,8.

A relapse entails the recurrence of one or more aspects of the original deformities. It causes functional and pain-related problems. Relapse clubfeet identified in an early stage can often be treated with non-invasive treatment methods9,10. Late recognition of a relapse often results in a surgical approach, which might adversely affect pain, cosmetics, and foot functionality later in life11. Therefore, early identification or preferably prevention of a relapse is essential to avoid the need for additional – surgical – interventions and improve treatment12,13. Unfortunately, early identification of a relapse can be difficult due to the diverse nature and timing of a relapse12,14. Clinical follow-up is currently used to monitor the status of a clubfoot, in which signs of relapse are often identified by the subjective assessment of a clinician using clinical tools such as the Clubfoot Assessment Protocol (CAP) or the Pirani/Sinclair score15,16,17. Hence, early and objective identification of a relapse is still a great challenge in a clinical setting.

Objective assessment of foot functionality and overall physical ability of the child over time may aid in the early identification of a relapse. Three-dimensional movement analysis can be used to quantify movement characteristics, such as joint kinematics during walking or other dynamic activities. Previous studies demonstrate multiple deviations at hip, knee, ankle and foot level in clubfoot patients compared to typically developing children18,19,20. Based on clinical observations, some of these deviations, such as decreased dorsiflexion and internal foot progression angle, are associated with a relapse12,14. Furthermore, previous research also shows multiple kinematic deviations in the gait pattern of children with relapsed clubfoot compared to non-relapsed clubfoot19,21,22,23,24. However, a relapse typically involves a combination of deformities that affects a child’s movement pattern across multiple joint levels, formed by a complex kinematic chain11. Although these complex interactions are captured during three-dimensional movement analysis, they cannot be objectively identified using conventional analysis methods (e.g. discrete statistics), as these evaluate each joint separately.

Machine learning algorithms can analyse complex nonlinear relationships which are generally present in human movement patterns25. Therefore, machine learning could be used to find patterns in biomechanical data of a patient’s movement26enabling the categorization of patients or their clinical status based on recognition of certain movement characteristics27,28,29,30,31. Several models, such as artificial neural networks (ANN), support vector machines (SVM) or logistic regression (LR), have been used to classify patients based on pathological gait patterns26. As far as we are aware, such a machine learning approach has not been used to classify children with clubfoot to assess whether or not a child has a relapsed clubfoot based on their movement pattern. Additionally, using explainable models may identify indicators of relapse in the biomechanical data which could be measured in a clinical setting to aid the detection of future relapse. Therefore, the primary aim of this study is to explore to what extent biomechanical data collected with three-dimensional movement analysis can be used to distinguish children with relapsed clubfoot from children with non-relapsed clubfoot and what gait features contribute to this classification.

We hypothesized that a model can be constructed with the potential to classify children with non-relapsed and relapsed clubfoot based on the kinematics of dynamic activities. Furthermore, we expected that the kinematics of more demanding activities would lead to more accurate classification. These activities are more demanding in terms of joint mobility, balance control, and muscle force generation. Therefore, children with relapsed clubfoot may exhibit more pronounced deviations during more challenging dynamic activities than during walking.

Results

Demographic characteristics

The study population consists of 14 relapsed clubfoot and 21 non-relapsed clubfoot. Except for three unknown cases due to the transfer of treatment from another institution, all patients underwent an Achilles tenotomy as part of their initial Ponseti treatment (Table 1). Muscle function and motion quality are reduced in relapsed clubfoot compared to non-relapsed clubfoot. No other differences in demographics are found between groups.

Table 1 Demographic characteristics of the relapsed and non-relapse clubfoot (mean ± standard deviation, and count).

Model performance per activity based on movement cycle prediction

For each activity, the best-performing model was a logistic regression model. Table 2 summarizes the mean AUC test scores, sensitivity and specificity across the eight outer folds, based on model predictions for each individual movement cycle. Training a model using the kinematic movement patterns of toe walking results in the highest AUC score (0.81), followed by walking (0.71), heel walking (0.70), and running (0.62). Sensitivity scores, a model’s ability to correctly identify individual movement cycles of subjects with relapsed clubfoot, range from 0.45 to 0.54, with toe walking and walking showing the highest sensitivity. Model performance regarding specificity, a model’s ability to correctly identify individual movement cycles of subjects with non-relapsed clubfoot, is the highest for running (0.78) and toe walking (0.77).

Table 2 Mean and standard deviation of the test scores over all eight folds of the outer cross-validation loop based on individual movement cycles.

Subject classification based on the different activities

Figure 1 displays the confusion matrices for each activity for predictions made at the subject level (Fig. 1a, b, c and d). These predictions were obtained by aggregating individual movement cycle predictions through majority voting, resulting in sensitivity and specificity scores at subject level. False positive predictions, which means predicted relapsed but actual non-relapsed clubfoot, range from 9 to 15%, while false negative predictions, which means predicted non-relapsed but actual relapsed clubfoot, range from 17 to 27%. Making predictions based on toe walking movement pattern results in the fewest false positives (Fig. 1b) whereas using the walking movement pattern results in the fewest false negatives (Fig. 1a). This results in the best specificity (0.86) for toe walking and the best sensitivity (0.57) for walking.

Fig. 1
figure 1

Confusion matrix, including sensitivity and specificity scores, on subject level per activity. True positive, true negative, false positive and false negative are given in percentage of the total population. (a) Walking, (b) toe walking, (c) heel walking, (d) running.

Figure 2 shows the confusion matrix for predictions made at the subject level aggregated across the four activities. Basing the subject’s classification on at least one predicted relapse over the four activities results in the fewest false negatives (9%), but also the most false positives (29%) (Fig. 2a). This yields a sensitivity of 0.79 and a specificity of 0.52. In 62% of the children classified as relapse, this prediction was based on at least two activities. Additional analysis showed that making a prediction based only on walking and toe walking activities yielded the best combination of sensitivity (0.71) and specificity (0.71) scores (Fig. 2b, and supplementary information Table S2).

Fig. 2
figure 2

Confusion matrix – Prediction at subject level aggregated across the four activities, in percentages of total population. (a) classification based on at least one predicted relapse classification throughout the four activities, (b) classification based on at least one predicted relapse classification throughout the two activities (walking and toe walking).

Feature occurrence

The features occurring with a nonzero coefficient in at least six out of the eight models of the outer CV loop were aggregated by joint and movement plane (supplementary information, Table S3.). In all activities, features with a high number of occurrences are mainly related to pelvis, hip, and knee kinematics, except for the transversal plane kinematics of the forefoot in relation to the tibia during walking.

Discussion

This study aimed to explore to what extent biomechanical data collected with three-dimensional movement analysis can be used to distinguish children with relapsed clubfoot from children with non-relapsed clubfoot. The results indicated the potential to distinguish children with relapsed clubfoot from children with non-relapsed clubfoot based on their kinematic movement patterns. Based on the classification of individual movement cycles, the best-performing model is based on the kinematic movement pattern of toe walking. At the subject level, the best sensitivity scores were reached when model predictions across four activities were combined into one overarching subject classification. While making a prediction based only on walking and toe walking activities yielded the best combination of sensitivity and specificity. Furthermore, a key finding from the results relates to the kinematic variables that contributed to the classification of the children, specifically involving the pelvis, hip, and knee joints.

The model’s performance on unseen data, based on individual movement cycles using AUC scores, is the highest for a model based on the kinematic movement pattern during toe walking, which follows our expectation that classification performs better when based on the kinematics of more demanding activities. The reduced calf muscle function observed in our relapse group likely leads to a deviated kinematic movement pattern, particularly during more demanding tasks such as toe walking, which require adequate muscle strength and balance. As the purpose is to classify subjects rather than separate movement cycles, classification at the subject level is more relevant than a model’s performance based on single movement cycles, as reported in Table 1. At the subject level, despite the high ability to correctly identify subjects with non-relapsed clubfoot (specificity) based on toe walking kinematics, the model’s ability to correctly identify subjects with relapsed clubfoot (sensitivity) is only 0.38. For the clinical purpose of early relapse detection, sensitivity is more critical than specificity. A missed relapse can result in delayed treatment, potentially worsening the condition of the clubfoot and leading to the need for a more invasive treatment9,11,12. The sensitivity results indicate that subject-level classification based on the kinematic movement pattern of walking would be preferable. However, approximately 40% of children with relapsed clubfoot would still be missed. A closer examination of our data showed that the classification of a subject varied between activities, demonstrating the complexity of a relapse. Children with relapsed clubfoot may exhibit similar kinematics compared with children with non-relapsed clubfoot in one activity, leading to a non-relapsed classification, but have impaired kinematics in another activity, resulting in a classification of relapse. The specific activities in which actual relapsed clubfeet were also classified as relapse appeared to vary between patients. This can probably be explained by the wide variety of involved components that can be present in a relapsed clubfoot12. Thus, subject-level classification may be more reliable if based on multiple activities rather than a single activity. In this study, analysing aggregated subject classifications across all activities resulted in a sensitivity of approximately 0.8. This underlines the need to include various activities, requiring different motor skills, in the evaluation of the status of a clubfoot32,33,34. Therefore, based on our results, we would recommend using a combination of walking and toe walking kinematics to classify children with or without relapse.

Contrary to our expectations, the model’s performance for unseen data based on the kinematic movement pattern of running was the poorest. Despite promising validation scores, the AUC for unseen data was only 0.62. This is possibly related to the complexity of the activity in relation to the age of our subjects which is between 5 and 9 years old. As the motor ability required for running only matures at the age of 6 years old35. Based on their biological age, this would mean that in approximately 50% of the subjects the running pattern has not yet matured. This may have resulted in high variability in movement patterns that are unrelated to differences between children with relapsed and non-relapsed clubfoot, creating noise in our classification model.

A subsequent analysis highlighted the relevance of features related to pelvis, hip, and knee kinematics, underlining that a relapsed clubfoot not only affects movement patterns at foot level but affects the entire kinematic chain of the lower limb11. This suggests that kinematic deviations due to compensations, captured in features at the pelvis, hip and knee level, are more robust across subjects than foot-related features, which might be more variable. However, we should keep in mind that feature selection in the model might discard correlated, equally important features when both provide the same information for classification, potentially leading to overlooked features36. Nevertheless, features with high occurrences, e.g. pelvic obliquity and hip ab/adduction which are also previously identified kinematic differences20have the potential to be used as indicators for relapse in clinical practice. Future research should explore easily accessible instruments to measure these kinematics during movements.

A limitation of this study is that it cannot be ruled out that subjects with potential future relapsed clubfoot were included in the non-relapsed group. Despite the clear distinction made between relapsed and non-relapsed clubfoot at the time of inclusion, a relapsed clubfoot was detected during clinical follow-up six months to two years after the measurements in four subjects that were previously assigned to the non-relapsed group. For three of these subjects, post hoc analysis showed that they were falsely classified as relapse in model predictions based on the kinematics of two activities. The type of activity in which the subjects were classified as relapse differed between subjects. This might demonstrate the potential of using the classification model for early detection before signs of relapse are clinically detectable. It should be noted that this study focuses exclusively on kinematic movement patterns, while other known predictive factors for relapse, such as tissue maturation and brace compliance, were not taken into account. Furthermore, the sample size of this study is relatively small, which may impact the generalizability and robustness of our machine learning prediction model37,38. However, the primary aim of this study was exploratory, aiming to demonstrate the potential of machine learning classification in children with clubfoot, rather than to develop a generalizable model. Additionally, predictive performance might be improved by including factors, such as kinetics39,40. However, focusing on kinematics facilitates the transfer to clinical practice, as kinematics can be measured easily outside a laboratory environment.

In conclusion, the results demonstrate the potential of classifying subjects based on kinematic movement patterns. Moreover, the study highlights biomechanical features that should be considered during clinical follow-up of children with clubfoot. This might aid early identification and treatment of relapse clubfoot, which is expected to prevent the necessity of surgical treatment in these young patients. Making a prediction based on a combination of (demanding) dynamic activities improves the sensitivity in distinguishing children with relapsed clubfoot from children with non-relapsed. This finding underlines the existing variability in relapse characteristics and functional impairments within the relapse population. Furthermore, analysing the features revealed the importance of features associated with pelvis, hip, and knee kinematics. This supports the fact that a relapsed clubfoot affects the entire kinematic chain, even though the pathology primarily impacts the foot and lower leg. For future application of machine learning classification in clinical practice, a larger subject population will be necessary to develop a generalizable and robust model.

Methods

Study population

In this study, a total of 35 children with Ponseti-treated idiopathic clubfoot were included. All children were between the age of 5–9 years old. During consultation, the treating orthopaedic surgeon, who is specialized in treatment of children with clubfoot, assigned a child to one of the two pre-defined groups: children with relapsed clubfoot and children with non-relapsed clubfoot. This resulted in 14 children with relapsed clubfoot and 21 children with non-relapsed clubfoot. A relapsed clubfoot was defined as a reoccurrence of one or more of the original deformities of the clubfoot after initial successful correction which needed additional treatment6. Additional treatment comprised of physiotherapy or surgical treatment (tibialis anterior tendon transfer, with or without casting or bracing). Patients were excluded if they: (I) had another disorder affecting the lower limb functioning (e.g., fractures, underlying syndrome, neurologic disorder), (II) were unable to follow instructions, (III) were obese, or (IV) received prior additional surgical treatment for relapse. Renewed Achilles tenotomy to correct residual equinus early in the bracing period was not considered additional treatment. After inclusion, specialized paediatric physiotherapists assessed a child’s functional status using the CAP version 1.1 to obtain insight into the functional characteristics of our population17.

This study was approved by the Medical Ethics Committee Máxima MC and the local review board of the Máxima MC (METC NL76757.015.21/W21.015). All research was performed in accordance with relevant guidelines and regulations. Informed consent was obtained from the legal guardians of all participants.

Three-dimensional movement analysis

Biomechanical data was collected by performing objective movement analysis using a wireless active 3D system, including four tripod cameras and 25 infrared markers (100 Hz, Charnwood Dynamics Ltd.), together with two force plates (1000 Hz, AMTI) that were integrated into a walkway. Markers were placed according to an extended Helen Hayes model combined with the Oxford Foot Model (OFM)41,42. The Oxford Foot Model was placed unilaterally for every patient, which comprised the affected side in unilateral patients and the most affected side in bilateral patients. Children were asked to perform five different dynamic tasks in a random order: walking, toe walking, heel walking, hopping, and running. These clinically relevant activities were selected based on the CAP17. Each activity was performed barefoot and at a self-selected speed.

Kinematic modelling was done using Visual 3D (2021, C-Motion Inc). The data was interpolated with a third-order polynomial and filtered using a Butterworth filter with a cut-off frequency of 6 Hz. Subsequently, data was analysed in MATLAB R2019b (The MathWorks Inc).

Data preprocessing

For each subject, five to ten consistent movement cycles (from initial contact to the next initial contact of the same leg) were selected based on marker visibility (Table 3). Initial contact and toe-off were determined based on sagittal velocity of the heel or toe markers43. Only unilateral data, of the leg at which the OFM was placed, was included. For each activity, we computed a set of kinematic features to characterize a movement pattern including characteristics, such as peak values and range of motion, across an entire movement cycle as well as divided in stance and swing phase. This set of features, consisting of lower limb kinematics including the pelvis, hip, knee, ankle and foot, was based on previously reported movement characteristics in children with clubfoot19. Ankle joint kinematics were not included in the set of features, except for the activity of heel walking. Instead, parameters addressing the OFM kinematics were used, as OFM kinematics provide more detailed information regarding ankle and foot motion44. During heel walking, the foot segments of the OFM could not be defined due to poor marker visibility of the calcaneus markers and lateral forefoot makers. Therefore, for this activity, parameters addressing the ankle joint kinematics were included in the set of features. Table 3 gives an overview of the input data per activity.

For the activities of toe walking, heel walking and running, the number of subjects per group slightly deviated from the total study population (Table 3). For the activities of toe walking and heel walking individual subjects were excluded due to data quality issues, resulting in group sizes of 13 relapsed clubfoot and 21 non-relapsed clubfoot in toe walking, and 13 relapsed and 20 non-relapsed clubfoot in heel walking. For running, a larger number of subjects were excluded because no complete running cycle was captured within the camera’s reach. Consequently, the group comprised 11 subjects with relapsed clubfoot and 15 subjects with non-relapsed clubfoot. Due to the limited hopping ability of children with a relapse, we decided not to analyse the data of hopping as previously reported20.

Table 3 Overview input data per activity.

Machine learning approach

Using kinematic features extracted from a movement cycle, we implemented a machine-learning approach to classify children according to their relapse status. The approach involved feature standardization, feature selection based on mutual information with the target, and a classifier model which was trained within each fold of cross-validation. We explored three classifiers: support vector machines, extreme gradient boosting (XGBoost), and logistic regression.

The approach was tuned and evaluated in a nested cross-validation (CV) strategy. The five-fold inner CV loop was used for hyperparameter tuning and model selection. To ensure sufficient data, we implemented an eight-fold outer CV loop which provides an estimate of the model’s performance on unseen data. In both inner and outer folds, the data was grouped at the subject level and stratified according to class (relapsed or non-relapsed) and age group (three age groups: 5-year-old, 6-to-7-year-old, and 8-to-9-year-old). Subjects were stratified according to age group to minimize the effect of differences in movement pattern due to age-related motor ability45.

The number of selected features and (the inverse of) the regularization strength were hyperparameters tuned in a grid search approach. For the XGBoost models, additional tuned hyperparameters were the maximum depth, the number of estimators, the minimum loss reduction for splits, the L2 regularization term, and the minimum sum of instance weights in a child node.

Hyperparameter tuning for the feature selector and classifier was performed using a grid search approach. The feature selection process considered a range of selected features from 10 to the full feature set. For logistic regression, L1 regularization was applied, with the inverse of the regularization strength (overall range over all explored models: 10− 5-1000) as a hyperparameter. Similarly, in support vector machines the inverse of the regularization strength was tuned (10− 19-10− 6). For the XGBoost models, hyperparameters included the maximum tree depth (3–5), and the number of estimators (10–40).

We performed the training and evaluation of the approach in Python 3.10 using the packages scikit-learn and xgboost.

Model evaluation

We evaluated the model’s performance at three levels, which were model’s performance based on the classification of: (1) individual movement cycles, (2) subjects per activity and (3) subjects across all activities.

For the model’s performance based on the classification of individual movement cycles, we used the area under the receiving operating characteristic curve (AUC) as the main metric to tune and evaluate the models. In addition, we calculated sensitivity (true positive rate) and specificity (true negative rate) for a decision boundary at 0.5 and with relapse as the positive class.

Subject level confusion matrices were calculated both for each activity individually and across all activities, providing further insights into the ability to correctly predict the relapse status of a child. For activity-specific confusion matrices, the predicted class of a subject was determined using a majority vote on the predictions of all movement cycles for that subject and activity. For the construction of a confusion matrix on a subject-level over all activities, a subject was considered as a relapse when the majority vote for at least one activity indicates relapse. Additionally, the model’s performance based on the classification of subjects across different combinations of activities was evaluated. Analysing aggregated performance in addition to performance on activity level allows us to assess the added value of including multiple activities in the classification process.

To aid interpretation and transfer to clinical practice, we investigated which features were present in the logistic regression model with a nonzero coefficient in at least six out of the eight models in the outer CV loop. These features were then aggregated according to joints and movement planes to provide a meaningful translation for clinical interpretation. This translation could be valuable in clinical interpretation giving directions to clinicians regarding kinematic indicators of relapse.