Abstract
Conservative treatment for chronic non-specific low back pain (CLBP) includes lumbar extension traction (LET) to re-align lumbar lordosis (LL). This study explores the use of machine learning (ML) models to predict post-treatment outcomes in patients with CLBP undergoing LET and how these predictions can support clinical decision-making. We utilized a retrospective database of 431 consecutive patients with uncomplicated CLBP. Post-treatment variables predicted included LL, NRS pain score, and Oswestry Disability Index (ODI). Input model variables included pre-treatment LL, sacral base angle (SBA), ratio of LL/SBA fit type, NRS, ODI, frequency, duration, LET compliance, and demographic variables of age and BMI. Initial variables were analyzed to predict post-treatment outcomes. Three ML models—Random Forest (RF), XGBoost, and Multilayer Perceptron (MLP)—were employed to handle both continuous and categorical variables, and performance was evaluated for predictive accuracy. Factors affecting outcomes were identified using Shapley Additive Explanations. Treatment was a multimodal spine rehabilitation program featuring LET applied 3–6 times per week, varied between 4 and 10 weeks, and follow-up was performed at the end of care. Improvements in LL, NRS, and ODI were − 11.5° to − 23.6°, 7.3/10 to 3.3/10, and 33.2% to 10.4%, respectively. Among the ML models, XGBoost demonstrated the highest predictive accuracy for lumbar lordotic angle (R2 = 0.728) and pain score (R2 = 0.648), while Random Forest slightly outperformed XGBoost for ODI (0.631 vs. 0.616). MLP performed poorly for ODI predictions (R2 = 0.201), indicating difficulty in capturing functional disability patterns. SHAP analysis identified fit type, compliance, traction frequency, pre-treatment lumbar curve, and BMI as the most influential predictors. These predictors offer actionable insight for clinical decision-making by allowing clinicians to stratify patients based on predicted responsiveness, tailor LET frequency and duration, and educate patients on the importance of compliance. This study demonstrates that ML models, particularly XGBoost and Random Forest, can effectively predict LET outcomes, supporting personalized treatment strategies for CLBP patients.
Introduction
Lumbar pain, particularly non-specific chronic low back pain (CLBP), remains one of the most prevalent musculoskeletal disorders globally, causing limitations in function and quality of life1. Conventional treatment strategies for CLBP include pharmacological management, physical therapy including exercise, manipulative and other manual therapies, and lumbar distraction or long-axis traction therapy2,3,4,5,6. Regarding spinal traction, it is interesting that recent international guidelines5 and consensus initiatives6 rather emphatically list distraction or long-axis traction as a procedure without evidence and not to be employed. Notably, there is a sub-type of lumbar spine traction which is not acknowledged or discussed by these guidelines5,6 and this is lumbar extension traction (LET) which is designed to increase the hypo-lordotic lumbar spine7,8,9,10,11.
According to data from two recent systematic reviews, in CLBP patients, one of the most common findings on examination is a reduced lumbar lordosis12,13. Problematically, classically applied spinal distraction traction is known to straighten the lumbar lordosis and while it might be effective in a subgroup of cases, this may not be the optimum for those with significant hypo-lordosis of the lumbar spine14,15. In 2002, the first clinical trial on LET to increase the lumbar lordosis was reported7. In the past decade, four more randomized trials using multimodal treatment programs have documented better patient outcomes in CLBP patients8,9 with and without radiating leg pain10,11 in the treatment groups receiving LET to increase the lumbar lordosis. The likely reason for the improvement in CLBP outcome measures in the group treated with LET is generally suggested to be an increase in lumbar curvature that results from the intervention though no study has conclusively proven this.
LET is a targeted therapy designed to address mechanical reduction of the lumbar lordosis by elongating the lumbar tissues (ligaments, muscles, discs) that are anterior to the axis of extension rotation caused by the application of the extension traction device. Load application is applied up to 20 min to cause visco-elastic creep deformation of the tissues thereby gradually restoring the natural curve of the spine7,8,9. The primary indication for usage of this technique in CLBP patients is a clear reduction of the normal lumbar lordosis on spine radiography. Arguably, restoring lumbar lordosis is essential for redistributing mechanical loads across spinal segments, which can reduce the strain on intervertebral discs, facet joints, and surrounding musculature16.
Despite the demonstrated effectiveness of lumbar extension traction, patient responses vary significantly due to factors such as baseline lumbar curve, age, BMI, compliance with treatment, and the frequency and duration of the applied interventions10,17. Predicting which patients will respond favorably to any intervention, including lumbar extension traction, remains a clinical challenge. Machine learning (ML) offers a promising solution to this issue, as it can analyze multiple variables simultaneously, capturing complex patterns that traditional statistical models may overlook17,18,19,20,21. ML models, particularly those that handle both continuous and categorical data, can be trained to predict treatment outcomes based on pre-treatment metrics, enabling a personalized approach to lumbar extension traction including adapting such variables as frequency, duration (weeks), time per session, and angle of traction pull17,18,19,20,21.
This study addresses that gap by developing a predictive ML model. The study aims to improve clinical decision-making, enabling clinicians to personalize LET therapy to the unique needs of each patient, thereby enhancing treatment efficacy and outcomes. In this study, we leverage machine learning to develop a predictive model for LET outcomes, focusing on CLBP patients. Our approach considers key input variables such as pretreatment lumbar curve, sacral base angle (SBA), and frequency of traction sessions to forecast post-treatment outcomes (i.e., lumbar lordotic angle, pain score, and disability). The aim of this investigation is to use these insights to identify the variables most strongly associated with successful spinal realignment and patient outcomes.
Materials and methods
Study design and data collection
This retrospective cohort analysis aimed to develop a machine learning model to predict post-treatment outcomes for patients with CLBP undergoing a conservative care protocol. The cohort included patients treated with LET therapy between 2010 and 2023. The study followed all ethical standards with the institutional review boards ethical approval (Cairo University and University of Sharjah, IRB No. P.T.REC/011/012581). The study protocols adhered to the Declaration of Helsinki, and all participants provided written informed consent. All eligible participants underwent a medical examination by an orthopedist to exclude other specific causes of back pain. Neither the inclusion nor exclusion criteria were modified during the study. A total of 431 patients met the inclusion criteria of diagnosed CLBP coupled with hypo-lordosis of the lumbar spine measured via radiography, treatment with a multi-modal care plan specifically using LET therapy, and completing pre- and post-treatment data. Patients with a history of lumbar surgery, spinal deformities, or other specific diagnoses (e.g., herniated discs) were excluded to maintain the study’s focus and integrity. Data were collected from electronic health records, including demographic, clinical, and treatment details.
Treatment interventions, frequency, and duration
All treatments and measurements were conducted by two experienced physiotherapists, each with over 15 years of professional expertise, ensuring consistency and reliability. All participants received standard physiotherapy care, reflecting current clinical practices, combined with LET. We employed a standard care program, applied in the same manner to each participant in this investigation for fidelity, and consistent with the existing literature8,9. We used a multimodal approach that integrates several techniques to address pain and improve function combined with extension traction to improve the lumbar lordosis and SBA. The program consisted of the following interventions:
-
1.
TENS: the TENS frequency was set at 80 Hz, pulse width of 50 µs, intensity at the sensory threshold, modulation up to 50%, using symmetrical and rectangular biphasic waveform, optimized for analgesic effects22.
-
2.
Infrared therapy: was applied in the prone position, with the low back exposed. The lamp was positioned 50–75 cm away, with radiation striking at a right angle for maximum penetration. Each session lasted 15 min23.
-
3.
Stretching exercises: these were targeted to the erector spinae and hamstring muscles. Each stretch was held for 30 s and repeated three times each session24.
-
4.
Lumbar Extension Traction: was performed according to the protocol from Harrison et al.7 employing a three-point bending technique. Each participant lay supine, and traction was applied as follows: (A) a posterior padded strap applied an anterior pull between the upper torso and pelvis at the apex of the patient’s lumbar curve reduction (part 1). (B) a second strap was placed below the femur heads and was used to constrain the upper thighs, allowing for forward pelvic rotation once the strap in part 1 is loaded (part 2). (C) a third strap was placed across the mid-lower torso applying load anterior posterior constraining the torso to the traction frame (Part 3) to maintain torso alignment and ensure extension load was mainly applied to the lumbo-pelvic region. Traction was applied starting at 3 min of sustained load per session and increasing by 1–2 min each session, pending patient tolerance, until 15–20 min per session was achieved. Patients were encouraged to use the maximum tolerable force per session once familiar with the procedure.
As a unique feature of our investigation, no standardized frequency and duration was forced; instead, we allowed variations based on participant tolerance and ability and treating clinician recommendations for patient needs. Treatment duration was allowed to vary from a minimum of 4 and up to a maximum of 10 weeks. Likewise, treatment frequency varied between 3 and 6 sessions per week. Many external factors influenced these different frequency and duration variables with the most significant challenge being patient time commitment to their sessions making it challenging to precisely control these variables for each patient.
Model input variables
In the effort to predict patient outcomes of care at the ending point of treatment we included a set of ten variables that were chosen for their potential impact on post-treatment outcomes. These variables included 7 initial examination findings of: (1) age, (2) BMI, (3) initial lumbar lordotic angle (via standing upright lateral X-rays), (4) initial SBA on x-ray, (5) initial pain score using a numerical rating scale (NRS), (6) the patients initial Oswestry disability index (ODI), and (7) the patient’s relationship of their magnitude of lumbar lordosis relative to their SBA or fit type. Fit type was classified into 3 sub-types and are explained below under radiographic variables. Additionally, 3 treatment variables were included in the model: (8) traction frequency of sessions per week, (9) traction duration of total treatment weeks, and (10) compliance to their program relative to made vs. recommended sessions. The model input variables are summarized in Table 1.
Assessment variables
The primary outcomes, crucial in assessing the effectiveness of LET therapy in patients with CLBP, were carefully selected. These included the post-treatment lumbar lordotic angle, the post-treatment NRS score, and the post-treatment ODI which were performed at the end of each individuals unique frequency and duration program as described above.
-
Lumbar Lordosis: The pre-treatment and post-treatment lumbar lordotic angle (°) were measured via radiographic imaging at initial examination and a minimum of 1-day after the patients final treatment session. Radiographs were acquired in the upright neutral resting position with the patient’s right side up against the x-ray grid cabinet, their elbows slightly bent with their fingers resting in the ipsilateral clavicular fossae following standardized protocols25. The lumbar lordosis was calculated by measuring the posterior tangent angle between intersecting lines along the posterior vertebral body of L1 and posterior body of L5 vertebrae. The SBA was constructed by drawing a best fit line along the superior endplate of the S1 vertebra and measuring this relative to horizontal. This method has excellent reliability with small standard errors of measurement (SEM ≤ 2°)26.
The diagnosis of hypo-lordosis of the lumbar spine was made on initial examination for all included patients when both L1-L5 and SBA were ≤ 30°. This 30° value of lumbar lordosis is the approximate mean value of lordosis found in CLBP patients and the 30° value of the SBA is more than one standard deviation below normal values that are typically reported12,27. In determining our 3 fit types for the relationship of the lumbar lordosis to the SBA we used the data from Kobayashi et al.28 where a congruent fit of the lordosis vs. the SBA was found to be lordosis ≥ 0.8 SBA and this was our ‘normal fit type’. We defined two different types of abnormal fits where a ‘low fit’ is when the difference of lumbar lordosis minus SBA is > 20% but the SBA is less than 15°. In contrast, we defined a ‘high fit’ when the difference of lumbar lordosis minus SBA is > 20% but the SBA is ≥ 15°. Our choice of the 15° SBA separation for high vs. low fit was based on inspection that this value created approximately equal numbers of each abnormal fit type and the fact that SBA < 15° is a large reduction from normal.
-
Pain score: The pre-treatment and post-treatment pain scores were assessed using the NRS, ranging from 0 (no pain) to 10 (worst pain) and treated as a continuous variable. The NRS is reliable and valid and has a minimal clinically important change (MCIC) of NRS ≥ 2 points29.
-
ODI: The pre-treatment and post-treatment ODI are comprehensive percentage scores derived from a 10-section patient-reported questionnaire where the total is scored out of 50 points and this is multiplied by two in order to obtain a percentage of disability. The higher the score, the greater the disability. ODI scores have a reported MCIC that range between 6% and 12% for patient outcomes in the clinical setting (simple cases and expectation of outcomes) and the percent change of improvement comparing the 1st to the 2nd score of 30% has been recommended (25% on the 1st and 18% on the 2nd would be a 30% net improvement)29. However for clinical trials, more complicated cases, and new treatments a more robust MCIC value ≥ 24% is recommended30. Note this 24% is not the percent change between the 1st and 2nd score, this MCIC is the actual difference between the 1st score subtracted from the 2nd score (40% on the 1st score and 16% on the 2nd would reach the minimum of the 24% MCIC)30. We use two MCIC values for clarity: the simple 12% and the more robust 24% for clinical trials and ‘new’ interventions.
Exploratory correlation analysis
Prior to model training, we performed an exploratory correlation analysis to assess the relationships among continuous clinical variables and to detect potential multicollinearity. A Pearson correlation matrix was computed for all continuous input features, including age, BMI, baseline ODI, baseline NRS, lumbar lordosis angle, sagittal balance angle (SBA), and compliance score.
Correlation coefficients (r) were interpreted according to standard thresholds, with |r| ≥ 0.8 indicating high multicollinearity. Pairs exceeding this threshold were flagged for potential removal or feature transformation. The resulting lower-triangular correlation matrix is visualized in the Results section and informed subsequent model diagnostics and feature importance interpretation.
Machine learning model selection
To develop a robust predictive model for post-treatment outcomes in patients with CLBP three machine learning algorithms were chosen for evaluation based on their ability to handle continuous and categorical data and their effectiveness in capturing complex relationships between input and outcome variables. Each model was evaluated for all outcomes. These three models are:
-
1.
Random Forest Regressor: This tree-based ensemble model was selected for its ability to capture intricate interactions and non-linear relationships between input features. It reduces overfitting through bootstrap aggregation (bagging) and can provide robust predictions in noisy datasets.
-
2.
XGBoost Regressor: An advanced gradient boosting technique, XGBoost iteratively builds decision trees to minimize prediction errors. It was chosen for its strong performance in regression tasks, capacity to handle sparse data, and ability to control overfitting through regularization.
-
3.
Multilayer Perceptron (MLP) Regressor: A deep learning model based on a feedforward artificial neural network architecture, MLP was selected to explore non-linear relationships that might exist between the input variables and the post-treatment outcomes. The model’s multiple hidden layers allow it to capture complex patterns.
Model training
Before model training, the dataset was split into an 80:20 training/test set. Hyperparameter optimization was performed using stratified 5 fold cross validation (CV) within the training set to obtain generalizable hyperparameters selected based on average cross-validation performance. While cross validation helps estimate model generalization, it does not inherently ensure robustness; therefore, each model also employed algorithm specific regularization strategies. For Random Forest, we tuned the number of trees, maximum depth, and minimum samples per leaf to control model complexity. For XGBoost, L1 and L2 regularization parameters (alpha, lambda) were optimized in addition to the number of boosting rounds and learning rate. Early stopping was used specifically to mitigate overfitting by halting training when validation loss plateaued, rather than to directly improve generalizability. The final tuned model was retrained on the full training set before evaluation on the held out test set. In addition to CV, algorithm-specific regularization strategies were applied to further mitigate overfitting:
-
For Random Forest, we tuned CV hyperparameters (the number of trees, maximum tree depth), and Regularization hyperparameters (minimum samples per leaf to control model complexity and variance).
-
For XGBoost, CV hyperparameters such as learning rate, number of boosting rounds, maximum depth, and L1/L2 regularization hyperparameters (alpha, lambda) were optimized. Early stopping was applied based on validation loss to halt training when generalization performance plateaued.
-
For MLP, we tuned the number of hidden layers and neurons per layer, learning rate, and activation functions, along with the use of L2 regularization and early stopping to prevent overfitting.
Numerical variables (e.g., pretreatment lumbar curve, SBA) were standardized using a standard scaler, which removes the mean and scales values to unit variance to ensure feature comparability. Categorical variables such as fit type were transformed using one-hot encoding to preserve non-ordinal relationships in the data. We also explored and incorporated potential interaction terms (e.g., age × BMI, compliance × NRS score) as engineered features, allowing the models to capture latent synergies that may contribute to treatment response. These were evaluated during the CV phase and retained when they improved cross-validation performance. This comprehensive approach (combining stratified cross-validation, grid-based hyperparameter tuning, algorithm-level regularization, and feature engineering) ensured that the final models were trained with a high degree of rigor, minimizing overfitting while maximizing their predictive accuracy and generalizability.
Evaluation metrics
To assess the performance of the machine learning models in predicting these three post-treatment outcomes standard regression metrics were employed:
-
R-squared (R²): Measures the proportion of variance in the outcome explained by the model. R² values range from 0 to 1, with values closer to 1 indicating better predictive accuracy.
-
Root Mean Squared Error (RMSE): Reflects the average magnitude of the prediction error, expressed in the same units as the target variable. Lower RMSE values indicate more accurate predictions. The formula is:
Where \(\:{y}_{i}\) is the actual value, \(\:{\widehat{y}}_{i}\) is the predicted value, and n is the number of observations.
-
Mean Absolute Error (MAE): This represents the average absolute difference between predicted and actual values, providing a straightforward measure of prediction accuracy. Lower MAE values indicate better performance. The formula is:
Sensitivity analysis
To assess the robustness of model performance under variability in clinical features and data composition, a comprehensive sensitivity analysis was conducted. We used both SHAP analysis and feature exclusion experiments to provide complementary perspectives on feature relevance. SHAP values quantified how input variables contributed to prediction logic, while feature exclusion assessed how the removal of a variable affected model performance (e.g., R², RMSE). These two methods were used independently and in parallel to validate key predictors and to distinguish between influential and indispensable variables. This aimed to evaluate how small perturbations in key predictors or their inclusion/exclusion influenced the predictive accuracy of each machine learning model.
Feature exclusion experiments
To investigate the relative dependence of each model on specific clinical predictors, we conducted controlled feature exclusion experiments. One feature at a time was removed from the input space, and each model (Random Forest, XGBoost, and MLP) was retrained from scratch using the same training-test split, preprocessing pipeline, and optimized hyperparameters. The following high-influence variables identified through (SHAP) values and domain knowledge were targeted for exclusion. SHAP values quantify the extent each variable contributes to a model’s prediction from the base value to the actual prediction for a instance31; the values included:
-
Age
-
BMI
-
Baseline ODI
-
Baseline NRS
-
Lumbar lordosis angle
-
Compliance score
Performance degradation was evaluated by computing the change in R², RMSE, and MAE relative to the baseline model trained on the full feature set. This allowed us to quantify the marginal utility of each feature in contributing to model accuracy and robustness.
Gaussian noise injection
To simulate real-world clinical variability in data entry or measurement (e.g., manual goniometer errors or patient self-reporting variability). Gaussian noise was applied only to continuous and quasi-continuous variables (e.g., lumbar lordosis, SBA, and NRS). While the NRS is technically ordinal, it was treated as quasi-continuous based on common practice in pain modeling and was only perturbed during training to simulate reporting variability. No noise was injected into strictly ordinal variables such as ODI, and the test set remained unaltered to ensure interpretability.
Noise was sampled from a zero-mean Gaussian distribution with a standard deviation equal to 10% of the feature’s original standard deviation. Each model was retrained on the noise-perturbed dataset while the test set remained unchanged. This assessed model sensitivity to minor fluctuations in key biomechanical and patient-reported variables.
Feature exclusion and Gaussian noise injection were conducted as separate and non-overlapping sensitivity analyses. Feature exclusion (ablation) was performed first, with one feature removed per model run. Gaussian noise injection was subsequently performed independently on continuous variables, with the full feature set retained. At no point were both perturbation strategies applied simultaneously.
Performance metrics & comparison
For both types of sensitivity tests, model performance was evaluated on the fixed test set using: R², RMSE, and MAE.
Results were benchmarked against the full model to assess relative performance deterioration, and summarized to identify features that strongly influenced prediction stability across different model classes. This analysis allowed us to simulate data uncertainty and evaluate input redundancy, providing insight into which clinical measures are most critical for reliable outcome prediction in lumbar extension traction therapy.
Robustness and bootstrapping
To evaluate the stability and generalizability of model performance beyond a single train-test split, a comprehensive bootstrapping analysis was conducted for each predictive model—Random Forest, XGBoost, and MLP—across the three clinical outcome variables. The aim was to assess the variance and reliability of model metrics under sampling uncertainty using a non-parametric resampling framework. A total of 100 bootstrap iterations were performed per model. In each iteration:
-
The training set was re-sampled with replacement from the original data (n = 431).
-
The full pipeline—including standardization, encoding, model training with optimal hyperparameters, and prediction—was executed from scratch to reflect real-world application stability.
For each iteration, we recorded the key regression performance metrics for R², RMSE and MAE.
Bland-Altman analysis
To further assess agreement between predicted and actual values for post-treatment disability, we performed a Bland-Altman analysis for LLA, NRS and ODI. This method plots the differences between predicted and observed scores against their mean and computes the mean bias and 95% limits of agreement (LoA = mean difference ± 1.96 × SD). This analysis is widely used to evaluate the agreement and systematic error between two quantitative measurement methods and was deemed appropriate for all three primary outcomes.
Percentage contribution analysis using SHAP
To complement SHAP summary plots and enhance global interpretability, we calculated the percentage contribution of each input variable to model predictions. This was done by aggregating the mean absolute SHAP values across all test instances and normalizing the values to sum to 100%. This approach allows a clearer clinical interpretation of model influence. Separate analyses were performed for each of the three outcome variables: post-treatment ODI, NRS, and lumbar lordotic angle (LLA).
Results
Model input data
Our population included 431 (221 females) consecutive participants with CLBP with a mean age of 37.5 ± 8.1 years and a mean BMI of 24.7 ± 4.8. On initial radiographic examination the lumbar lordosis was 11.5° ± 5.5° and the SBA was 14.5° ± 4.5° and the patients presented with an ODI score of 16.6 ± 8.1 out of 50 points and an NRS of 7.3 ± 1.1. Furthermore, the patients frequency of traction in sessions per week, duration of traction in weeks, compliance, and their initial fit type describing the relationship between their lumbar lordosis magnitude relative to their SBA is shown in Table 2. These initial 10 variables were used as model input data to develop a predictive model that could accurately explain the post treatment variables of change / improvement in patients lumbar lordosis (23.6° ± 4.7°), NRS (3.31 ± 0.9), and ODI (5.2 ± 3 / 50); thus predicting their response to treatment (Table 2). When considering the MCIC of the NRS 80% of the patients reached an improvement on the post treatment of ≥ 2 points. Similarly, on the ODI we used two different MCIC values: (1) for the cutoff of ODI difference between the initial and follow-up (ODI baseline minus ODI follow-up) and (2) 53% of the patients achieved ≥ 24% points (ODI baseline minus ODI follow-up) after treatment. Table 2.
Correlation matrix
No feature pair exceeded the critical correlation threshold (|r| ≥ 0.8), indicating that input variables were sufficiently independent. This supports the assumption that the model inputs contribute unique information, enabling more stable and interpretable predictive modeling. As part of the initial data exploration, we generated a correlation matrix (not provided) to identify potential multicollinearity and understand the linear relationships between clinical and demographic variables. Moderate correlations were observed between pre-treatment lumbar curve and SBA (r = 0.61), and between BMI and age (r = 0.48), guiding our initial feature selection and transformation strategy.
Model performance and evaluation
The evaluation metrics (R², RMSE, MAE) provided in Table 3 show that the models performed reasonably well predicting post-treatment outcomes. Although all three models achieved positive R² values, which indicates that they explained some of the variance in the outcomes, the differences in RMSE and MAE suggest that the models varied in their predictive accuracy. Random Forest and XGBoost performed similarly across all outcomes, particularly for the post-treatment lumbar lordotic angle, where XGBoost achieved a slightly higher R² (0.728) than Random Forest (0.696). For the post-treatment NRS, XGBoost slightly outperformed Random Forest with an R² of 0.648 compared to 0.618. Both models performed well in capturing the variability in pain outcomes. MLP, however, showed lower performance, particularly for post-treatment ODI, where the R² was significantly lower (0.201) compared to Random Forest (0.631) and XGBoost (0.616).
Bland-Altman analysis
We performed a Bland-Altman analysis to assess the agreement between predicted and observed post-treatment Oswestry Disability Index (ODI) scores. The mean difference (bias) between predicted and actual ODI scores was approximately 0.06, indicating minimal systematic error. The 95% limits of agreement (LoA), calculated as ± 1.96 standard deviations from the mean difference, ranged from − 3.72 to + 3.84. Importantly, more than 95% of observations fell within these limits, demonstrating strong consistency between predicted and true values. Additionally, the distribution of residuals did not indicate any heteroscedastic trend — that is, the prediction errors did not systematically increase or decrease with the severity of the patient’s baseline condition.
Feature contribution summary via SHAP percentage analysis
Figures 1, 2 and 3 illustrate the normalized percentage contributions of each input feature to the model’s predictions for the three primary clinical outcomes. The ODI Prediction (Fig. 1) shows that the baseline ODI (27.4%) and Compliance (22.1%) were the dominant predictors of post-treatment disability. These were followed by Lumbar Curve (15.7%) and Sacral Base Angle (11.4%), indicating the critical role of spinal structure and adherence in disability outcomes. Fit Type (4.7%), Frequency of Traction (4.1%), and Duration of Traction (3.7%) also contributed meaningfully, underscoring the influence of intervention parameters. Age (3.0%), BMI (2.4%), and Baseline NRS (2.4%) had comparatively lower impact, suggesting limited predictive power from demographic and pain intensity alone.
The NRS prediction (Fig. 2) indicated that baseline NRS (31.2%) was the most influential predictor of post-treatment pain, followed by Compliance Score (19.8%) and BMI (15.6%), highlighting the importance of symptom severity, treatment adherence, and physiological status. Fit Type (6.4%), Traction Frequency (5.5%) and Traction Duration (5.2%), emphasizing the role of structural alignment and treatment dosage. Sacral Base Angle (5.1%) and Lumbar Curve (4.4%) were moderate contributors. Baseline ODI (3.8%) and Age (3.0%) had the lowest relative importance in predicting pain outcomes.
LLA Prediction (Fig. 3) identified that structural predictors dominated this model. Lumbar curve (29.3%) and SBA (24.6%) were the top contributors, followed by Compliance (15.2%), highlighting treatment engagement as a key factor. Fit Type (10.6%) and Traction Frequency (7.1%) further supported the biomechanical influence. Traction Duration (5.9%) and BMI (3.6%) also contributed, while Age (2.2%), Baseline ODI (1.1%), and Baseline NRS (0.4%) played minor roles.
A SHAP-based percentage contribution of input features to random forest prediction of post-treatment ODI.
SHAP-based percentage contribution of input features to random forest prediction of post-treatment NRS.
SHAP-based percentage contribution of input features to random forest prediction of post-treatment lumbar lordotic angle (LLA).
Sensitivity analysis results
Results from the univariate feature exclusion analysis (Table 4) revealed that MLP was most sensitive to the removal of BMI (ΔMSE = 0.1371), while RF was least affected. In the Gaussian noise injection tests, lumbar curve and SBA had the greatest effect on prediction error for all models, confirming the SHAP-derived importance scores. These findings validate the robustness of tree-based models to feature perturbation and emphasize the need to ensure accurate measurement of biomechanical variables in practice.
Robustness and bootstrapping results
These values were aggregated across the 100 bootstraps to compute the mean ± standard deviation, offering insight into each model’s robustness and variability under distributional shifts in training data. The results are summarized in Table 5.
The Random Forest model exhibited the most consistent performance, with a narrow spread in R² scores and low standard deviations in RMSE and MAE. This reflects its inherent ensemble robustness and ability to reduce overfitting through bootstrapped aggregation of decision trees. XGBoost showed slightly more variance than Random Forest, yet still maintained competitive and stable performance across resamples. The use of boosting may contribute to a moderate sensitivity to training data variability, especially in the presence of minor data noise or outliers. In contrast, the MLP model displayed substantial variability, particularly in the R² scores, which ranged widely across bootstraps. This indicates a higher susceptibility to initialization sensitivity, model overfitting, or insufficient regularization under small sample shifts—common characteristics in fully connected deep neural networks trained on tabular clinical data.
Discussion
Chronic low back pain remains one of the leading causes of disability worldwide, significantly impacting patients’ function and health quality of life measures1,32,33. Problematically, CLBP treatment regimens vary considerably and have limited or only short-term efficacy32. This is why CLBP treatment strategies remain a high priority research avenue within various scientific disciplines33, and this is particularly an area of urgent need in conservative care disciplines34. One under reported conservative care method for treating CLBP disorders involving patients with concomitant hypo-lordosis of the lumbar spine is LET therapy. This intervention has few clinical trials7,8,9,10,11 with a systematic review of these available35 and despite preliminary efficacy, predicting which patients respond most favorably to extension traction remains a clinical challenge. Accordingly, this study aimed to fill this gap by developing a predictive ML model using a large consecutive case database that incorporates key patient biomechanical and demographic predictors to forecast treatment outcomes for LET therapy in patients with CLBP. An exploration of these key findings with their clinical significance is presented.
According to the systematic review from Oakley et al.35 the clinical trials on LET demonstrated an improved lumbar lordosis of 7–11° over 10–12 weeks and 30–36 sessions. In our current investigation we identified a similar mean correction of lumbar lordosis being 12° over the course of 31 sessions, (average frequency multiplied by duration in Table 1). Concerning the NRS pain outcomes, the systematic review reported mean improvements after treatment of between 2 and 4 points35. Whereas our current investigation reports a mean 4-point reduction on the NRS scale, we suggest this is quite consistent given our population had a greater initial NRS score. Lastly, for the LET trials that reported an ODI, an improvement of between 10.6% and 12.6% in the raw change was found35 whereas in the current investigation we report a mean raw change in the ODI of 22.8%, approximately twice the average clinical trial data. This significantly better improvement in the ODI herein is difficult to explain but possibilities include: (1) increased frequency of care in the current study (mean 4.1 sessions per week vs. 3 in the trials) which is consistent with some previous studies36,37, (2) a different type of CLBP population exists between the trials and the current study, (3) examiner experience, and (4) the fact that only 2 trials exist to compare our ODI data to making it limited35. Herein, we identified a significantly high percentage (80%) of patients who achieved the MCIC of 2 points on the NRS pain questionnaire and this is significant29. Similarly, we provided two different cutoff values for the MCIC of the ODI where (1) using the true change/difference of the ODI ≥ 12% points we identified that 95% of the treated patients achieved this MCIC; and (2) 53% of the patients achieved ≥ 24% points improvement.
Considering the three ML models we tested on our database, the results showed that XGBoost consistently outperformed both Random Forest and MLP in predicting post-treatment outcomes, as reflected in its higher R² values and lower RMSE and MAE. Specifically, XGBoost demonstrated the highest R² values of 0.728 for lumbar lordotic angle, 0.648 for NRS, and 0.616 for the ODI, suggesting that it explained a greater proportion of the variance in these outcomes compared to the other models. These findings highlight XGBoost’s strong ability to capture the complex relationships between mechanical and demographic predictors and their associated treatment outcomes.
When predicting lumbar lordotic angle, both XGBoost and Random Forest performed well, with XGBoost slightly outperforming Random Forest (R² of 0.728 vs. 0.696). This is clinically relevant, as improvements in lumbar lordosis are a key goal of LET therapy7,8,9,10,11,35. In predicting pain score, XGBoost and MLP demonstrated similar performance (R² values of 0.648 and 0.643, respectively), with Random Forest following closely behind (R² of 0.618). Given the subjective nature of pain, even small improvements in predictive accuracy are clinically meaningful, as they can help inform individualized treatment plans. The superior performance of XGBoost in predicting pain-related outcomes is further underscored by its lower RMSE (0.525) and MAE (0.265), indicating higher precision and accuracy in predicting the post-treatment NRS. In regards to the ODI, Random Forest outperformed XGBoost by a small margin (R² of 0.631 vs. 0.616), while MLP showed considerably lower performance (R² of 0.201). Given that ODI is an important measure of functional recovery29,30, the stronger performance of Random Forest and XGBoost suggests that these models may be more suitable for predicting functional outcomes associated with LET therapy.
SHAP analysis was employed to enhance the interpretability of the ML models and identify the key features that most significantly influenced the prediction of post-treatment outcomes31. Considering only the two most accurate ML Models, Random Forest and XGBoost, the SHAP summary plots revealed that for predicting the lumbar lordosis outcome the following variables were the most influential: (1) pre-treatment lumbar curve, (2) fit type, (3) patient age, and (4) compliance with the program. Secondarily, the variables of frequency of traction, BMI, and SBA showed moderate predictive values (Fig. 1). Clinically, these results align with the known mechanics of LET therapy, which aims to restore normal spinal alignment35. However, it remained unclear whether spine correction was specifically dependent on the frequency and duration of treatment, in other words, did the change reported in clinical trials35 occur in the first half of the 10–12 week treatment regimen or does the change occur slowly over the entire duration? Our results, herein, provide a clear answer to this question and demonstrate a gradual increase in lumbar curve correction over the duration of the 10 weeks with more frequency sessions and suggest that even longer programs will result in more correction.
Interestingly, fit type (the relationship of the ratio of the lordosis vs. SBA) was one of the strongest predictors of the amount of lumbar lordosis correction. In the normal spine, the correlation between the lumbar lordosis and the sacral base angle is strong, with r values between 0.6 and 0.8 reported in healthy populations27,28,38,39. However, in CLBP populations not only is the lumbar lordosis hypo-lordotic12,13 but the correlation between the lumbar lordosis and SBA is lost becoming moderate to weak at best27,28,39. Similarly, in our population, the initial correlation between the lumbar lordosis and SBA was weak indicating a loss of the normal relationship between these two variables. The fact that the best overall lumbar curve correction was found in high and low fit types compared to the normal fit type makes mechanical sense as extension traction is designed to restore this relationship27,35.
In the prediction of NRS pain scores, the SHAP analysis identified the following variables to be the strongest predictors in descending order of significance: (1) BMI, (2) pre-treatment NRS, (3) pre-treatment ODI, and (4) frequency of traction. Secondarily, the variables of age, compliance, and traction duration were of moderate importance. The pre-treatment ODI as a significant predictor of post-treatment pain, underscores the interrelated nature of pain and function in lumbar pain management. Considering the post-treatment ODI, the SHAP analysis for the MLP model indicated that pre-treatment ODI, patient age, SBA and BMI were the strongest predictors while moderate predictors included pre-treatment lumbar curve, compliance, pain pre-treatment NRS score. This suggests that the MLP model was particularly sensitive to initial levels of functional disability, with patients presenting with higher levels of disability before treatment showing more pronounced improvements in their post-treatment disability scores. The fact that alignment variables of pre-treatment SBA and lumbar curvature were also a notable factors influencing ODI outcomes supports the notion that individualized treatment approaches need to take into account not only pre-treatment functional status but they need to use tailored interventions based on a patient’s unique radiological abnormality of their spine alignment as these are paramount27,28,35,39.
The model analyses contained in Figs. 1, 2 and 3 for the SHAP percentages can be interpreted as the strength of the variable in predicting the outcome, providing valuable insights for clinical prioritization. For instance, stronger predictors have a great percentage contribution to the model output indicating that the strongest predictors of outcomes (shown in Figs. 1, 2 and 3) included the following three variables: (1) pre-treatment lumbar curve, (2) traction frequency, (3) BMI, and (4) fit type of the lumbar lordosis relative to the SBA. The second strongest predictors included: (1) compliance to the treatment program, (2) patient age, and pre-treatment NRS pain score. These predictors suggest that while factors like compliance, age, and pre-treatment pain score do play a role in treatment success, their influence is secondary to the structural determinants of spinal alignment or anatomical characteristics and duration of intervention.
One of the most pressing challenges facing spine rehabilitation clinicians today is the debate over whether spine radiography is useful in identifying ‘non red flag’ (non-emergency referral conditions) spine diagnostic disorders that impact a patient’s condition, influence precise treatment decisions / techniques, and influence patient outcomes. In fact, many recent articles strongly oppose the clinical utility of spine radiography in conservative care management and outcomes40,41,42. For instance, in a randomized trial of older patients with CLBP and related disability, Maiers et al.40 concluded that the use of radiography to diagnose spine disorders had no influence on the outcomes of treatment after 12 weeks of general spine manipulation and exercise interventions. In the Maiers trial, outcomes were assessed at a fixed 12-week follow-up after a standardized intervention (spinal manipulation or exercise), whereas in our cohort the post-treatment assessment occurred at the completion of each patient’s individualized LET program, which typically ranged from 4 to 10 weeks. These differences in follow-up timing, study design, and intervention approach limit direct comparison. However, the data from Maiers et al.40 are discussed in a contextual benchmark for the proportion of patients achieving a MCIC in conservative care treatment trials.
Aside from the apparent differences between the Mairers et al.40 trial and our current report in terms of study design and population specifics, arguably only our investigation precisely used the radiographic displacement abnormalities of the lumbo-pelvic spine to determine the exact type of intervention and said care improved the radiographic spine displacements post-treatment verified by the follow-up spine radiograph. The findings of our investigation are consistent with the contemporary understanding that the lumbopelvic sagittal alignment cannot be adequately assessed without radiography because lumbar lordosis cannot be considered as a single angle of curvature in isolation. Rather the congruence or incongruence of how the lumbar lordosis correlates to the SBA and the angle of pelvic morphology (PM) is critical for understanding if a patient has normal or abnormal alignment38,39,43,44,45,46. Importantly, an individual’s correlation between lordosis and SBA and how both lordosis and SBA correlate to PM has been demonstrated to determine the presence or absence of CLBP, disability, need for intervention, and outcomes of care27,35,38,39,43,44,45,46. Although, our current investigation did not include the assessment of PM our findings clearly indicate that initial radiographic findings of the lumbar lordosis and fit type are drivers of post-treatment outcomes when using LET to improve spine alignment. Thus, the distinction between conflicting opinions on the utility of spine radiography can be understood when opponents do not use the information to precisely determine interventions and outcomes40,41,42 whereas proponents do precisely use radiographic alignment data27,28,35,39 and there is evidence supporting each viewpoint depending on the criteria used.
Implications for clinical decision-making need to be emphasized herein. While the primary aim of this study was to develop predictive ML models, our findings have direct relevance for clinical decision-making. The identification of key predictors using SHAP values allows clinicians to stratify patients based on their likelihood of responding favorably to LET. For instance, patients with a substantial discrepancy between LL and SBA (i.e., abnormal fit type) exhibited a greater degree of lordotic correction, highlighting them as ideal candidates for LET. This provides an anatomical basis for treatment selection. Similarly, high treatment compliance emerged as a consistently strong predictor across all outcomes. This underscores the importance of patient engagement and can inform clinician-patient discussions regarding the expected benefits of adherence. Clinicians may choose to implement strategies to monitor or improve compliance when treating patients predicted to benefit substantially from LET.
Other predictive features such as treatment frequency, duration, and initial spinal alignment can inform the intensity and length of prescribed therapy. For example, patients with a severely reduced pre-treatment lordosis and a compliant attendance profile may be prescribed longer-duration protocols for optimal outcomes. In contrast, patients with near-normal alignment or poor expected compliance might be directed to alternative therapies. These predictive relationships support a move from generalized care protocols to precision rehabilitation, where the therapy plan is individualized based on data-driven insights. This personalized approach can potentially improve efficacy, resource utilization, and patient satisfaction in managing CLBP with lumbar extension traction.
One of the main limitations of our investigation is the retrospective nature of the data, which can introduce variability in patient records, particularly regarding compliance with treatment protocols and consistency in spinal measurement methods. The use of a retrospective design may also limit the ability to control for potential confounding variables that could affect treatment outcomes. Additionally, the moderate sample size of the study may limit the robustness of the results, especially when applied to neural network models, which typically require larger and more diverse datasets to perform optimally. Expanding the sample size and including a broader range of patient demographics and clinical settings would help strengthen the models’ predictive power and improve their general applicability across different patient populations. Our model did not assess sex differences nor the integration of psychological and lifestyle factors, such as stress levels, physical activity, and patient engagement, which can significantly influence chronic pain and rehabilitation outcomes.
It is possible that the better R squared value for the XGBoost model is due to prediction around the average and most of these outcome values were close to the mean. Another key limitation of this study is the absence of a formal inter-rater agreement analysis between the two treating physiotherapists. While both clinicians possessed over 15 years of specialized experience and followed standardized protocols, inter-rater reliability metrics such as intraclass correlation coefficients (ICC) were not computed. This limits the ability to statistically confirm consistency in measurements such as lumbar curvature or compliance scoring. Future studies should incorporate structured reliability testing to further strengthen data validity and generalizability. Further, future work may explore ordinal regression models or categorical encodings to improve methodological precision for variables like NRS or ODI. While NRS was modeled as a continuous variable, it is technically ordinal. Future work may explore ordinal regression models or categorical encodings to improve methodological precision for variables like NRS or ODI. Finally, our study did not include long-term follow-up measures. However, the prior investigations on LET methods indicate that patient improvements in lumbar lordosis, pain, and disability can last up to 1-year with minimal loss of the therapeutic benefits35.
Conclusion
This study demonstrates the effectiveness of LET for CLBP patients and highlights the potential of machine learning models, particularly XGBoost and Random Forest, in predicting treatment outcomes. By identifying key strong predictors of patient outcomes such as lumbar curvature, traction duration, and fit type between the lordosis and SBA, this research provides a foundation for personalized treatment planning. The integration of machine learning into clinical practice can optimize treatment decisions, improve patient outcomes, and enhance overall satisfaction. Future research should focus on refining these models, incorporating additional factors, and exploring advanced techniques to further personalize and improve lumbar pain management.
Data availability
Data is available upon request from the corresponding author.
References
Wu, A. et al. Global low back pain prevalence and years lived with disability from 1990 to 2017: estimates from the global burden of disease study 2017. Ann. Transl. Med. 8 (6). (2020).
Foster, N. E. et al. Prevention and treatment of low back pain: evidence, challenges, and promising directions. Lancet 391 (10137), 2368–2383 (2018).
Ketenci, A. & Zure, M. Pharmacological and non-pharmacological treatment approaches to chronic lumbar back pain. Turkish J. Phys. Med. Rehabilitation. 67 (1), 1 (2021).
Guo, Y. et al. Bibliometric analysis of research on manual therapy for low back pain from 2013 to 2023. Med. (Baltim). 104 (8), e41618. https://doi.org/10.1097/MD.0000000000041618 (2025).
Hurwitz, E. L., Haldeman, S. & Cedraschi, C. The global spine care initiative: applying evidence-based guidelines on the non-invasive management of back and neck pain to low- and middle-income communities. Eur. Spine J. 27 (Suppl 6), 851–860. https://doi.org/10.1007/s00586-017-5433-8 (2018). Epub 2018 Feb 19.
WHO guideline for non-surgical management of chronic primary low back pain in adults in primary and community care settings. ISBN 978-92-4-008178-9 (electronic version). (Accessed 13 March 2025). https://www.who.int/publications/i/item/9789240081789 (2023).
Harrison, D. E., Cailliet, R., Harrison, D. D., Janik, T. J. & Holland, B. Changes in sagittal lumbar configuration with a new method of extension traction: nonrandomized clinical controlled trial. Arch. Phys. Med. Rehabil. 83 (11), 1585–1591. https://doi.org/10.1053/apmr.2002.35485 (2002).
Diab, A. A. & Moustafa, I. M. The efficacy of lumbar extension traction for sagittal alignment in mechanical low back pain: a randomized trial. J. Back Musculoskelet. Rehabil. 26 (2), 213–220. https://doi.org/10.3233/BMR-130372 (2013).
Diab, A. A. & Moustafa, I. M. Lumbar lordosis rehabilitation for pain and lumbar segmental motion in chronic mechanical low back pain: a randomized trial. J. Manipulative Physiol. Ther. 35 (4), 246–253. https://doi.org/10.1016/j.jmpt.2012.04.021 (2012).
Lee, C. H., Heo, S. J., Park, S. H., Jeong, H. S. & Kim, S. Y. Functional changes in patients and morphological changes in the lumbar intervertebral disc after applying Lordotic Curve-Controlled traction: A Double-Blind randomized controlled study. Med. (Kaunas). 56 (1), 4. https://doi.org/10.3390/medicina56010004 (2019).
Moustafa, I. M. & Diab, A. A. Extension traction treatment for patients with discogenic lumbosacral radiculopathy: a randomized controlled trial. Clin. Rehabil. 27 (1), 51–62. https://doi.org/10.1177/0269215512446093 (2013).
Chun, S. W., Lim, C. Y., Kim, K., Hwang, J. & Chung, S. G. The relationships between low back pain and lumbar lordosis: a systematic review and meta-analysis. Spine J. 17 (8), 1180–1191. https://doi.org/10.1016/j.spinee.2017.04.034 (2017).
Sadler, S. G., Spink, M. J., Ho, A., De Jonge, X. J. & Chuter, V. H. Restriction in lateral bending range of motion, lumbar lordosis, and hamstring flexibility predicts the development of low back pain: a systematic review of prospective cohort studies. BMC Musculoskelet. Disord. 18 (1), 179. https://doi.org/10.1186/s12891-017-1534-0 (2017).
Cardoso, L. et al. Computational modeling of posteroanterior lumbar traction by an automated massage bed: predicting intervertebral disc stresses and deformation. Front. Rehabil Sci. 3, 931274. https://doi.org/10.3389/fresc.2022.931274 (2022).
Lee, C. H., Heo, S. J. & Park, S. H. The real time geometric effect of a Lordotic Curve-Controlled spinal traction device: A randomized cross over study. Healthc. (Basel). 9 (2), 125. https://doi.org/10.3390/healthcare9020125 (2021).
de Andrada Pereira, B. et al. Influence of lumbar lordosis on posterior rod strain in long-segment construct during Biomechanical loading: a cadaveric study. Neurospine 18 (3), 635 (2021).
Moustafa, I. M. et al. Utilizing machine learning to predict post-treatment outcomes in chronic non-specific neck pain patients undergoing cervical extension traction. Sci. Rep. 14 (1), 11781 (2024).
Tangsrivimol, J. A. et al. Artificial intelligence in neurosurgery: a state-of-the-art review from past to future. Diagnostics (Basel) 13 (14), 2429. https://doi.org/10.3390/diagnostics13142429 (2023).
Javaid, M., Haleem, A., Pratap Singh, R., Suman, R. & Rab, S. Significance of machine learning in healthcare: Features, pillars and applications. Int. J. Intell. Networks. 3, 58–73. https://doi.org/10.1016/J.IJIN.2022.05.002 (2022).
Tschuggnall, M. et al. Machine learning approaches to predict rehabilitation success based on clinical and patient-reported outcome measures. Inf. Med. Unlocked 24 https://doi.org/10.1016/J.IMU.2021.100598 (2021).
Tagliaferri, S. D. et al. Artificial intelligence to improve back pain outcomes and lessons learnt from clinical classification approaches: three systematic reviews. Npj Digit. Med. 3, 1–16. https://doi.org/10.1038/s41746-020-0303-x (2020).
Shanahan, C., Ward, A. R. & Robertson, V. J. Comparison of the analgesic efficacy of interferential therapy and transcutaneous electrical nerve stimulation. Physiotherapy 92, 247–253. https://doi.org/10.1016/J.PHYSIO.2006.05.008 (2006).
Robertson, V., Ward, A., Low, J. & Reed, A. Electrotherapy explained, Principles and Practice 4th edn (Heinemann, 2006).
Kisner, C. & Colby, L. A. Therapeutic Exercise: Foundation and Techniques 5th edn (F.A.Davis Company, 2007).
Horton, W. C. et al. Is there an optimal patient stance for obtaining a lateral 36 radiograph? A critical comparison of three techniques. Spine (Phila Pa. 1976). 30 (4), 427–433. https://doi.org/10.1097/01.brs.0000153698.94091.f8 (2005).
Betz, J. W. et al. Reliability of the Biomechanical assessment of the sagittal lumbar spine and pelvis on radiographs used in clinical practice: A systematic review of the literature. J. Clin. Med. 13 (16), 4650. https://doi.org/10.3390/jcm13164650 (2024).
Harrison, D. E., Haas, J. W., Moustafa, I. M., Betz, J. W. & Oakley, P. A. Can the mismatch of measured pelvic morphology vs. Lumbar lordosis predict chronic low back pain patients? J. Clin. Med. 13 (8), 2178. https://doi.org/10.3390/jcm13082178 (2024).
Kobayashi, T. M. D., Atsuta, Y. M. D., Matsuno, T. M. D. & Takeda Naoki MD†. A longitudinal study of congruent sagittal spinal alignment in an adult cohort. Spine 29 (6), 671–676. https://doi.org/10.1097/01.BRS.0000115127.51758.A2 (2004).
Ostelo, R. W. et al. Interpreting change scores for pain and functional status in low back pain: towards international consensus regarding minimal important change. Spine (Phila Pa. 1976). 33 (1), 90–94. https://doi.org/10.1097/BRS.0b013e31815e3a10 (2008).
Fritz, J. M. & Irrgang, J. J. A comparison of a modified Oswestry low back pain disability questionnaire and the Quebec back pain disability scale. Phys. Ther. 81 (2), 776–788. https://doi.org/10.1093/ptj/81.2.776 (2001).
Raptis, S., Ilioudis, C. & Theodorou, K. From pixels to prognosis: unveiling radiomics models with SHAP and LIME for enhanced interpretability. Biomedical Phys. Eng. Express. 10 (3), 035016 (2024).
Hartvigsen, J. et al. Lancet low back pain series working Group. What low back pain is and why we need to pay attention. Lancet 391, 2356–2367 (2018).
Buchbinder, R. et al. Lancet low back pain series working Group. low back pain: a call for action. Lancet 391, 2384–2388 (2018).
Nuckols, T. K. et al. Rigorous development does not ensure that guidelines are acceptable to a panel of knowledgeable providers. J. Gen. Intern. Med. 23, 37–44 (2008).
Oakley, P. A., Ehsani, N. N., Moustafa, I. M. & Harrison, D. E. Restoring lumbar lordosis: a systematic review of controlled trials utilizing chiropractic bio Physics® (CBP®) non-surgical approach to increasing lumbar lordosis in the treatment of low back disorders. J. Phys. Ther. Sci. 32 (9), 601–610. https://doi.org/10.1589/jpts.32.601 (2020).
Haas, M., Vavrek, D., Peterson, D., Polissar, N. & Neradilek, M. B. Dose-response and efficacy of spinal manipulation for care of chronic low back pain: a randomized controlled trial. Spine J. 14 (7), 1106-16. https://doi.org/10.1016/j.spinee.2013.07.468 (2014).
Haas, M., Groupp, E. & Kraemer, D. F. Dose-response for chiropractic care of chronic low back pain. Spine J. 4 (5), 574–583. https://doi.org/10.1016/j.spinee.2004.02.008 (2004).
Legaye, J., Duval-Beaupère, G., Hecquet, J. & Marty, C. Pelvic incidence: a fundamental pelvic parameter for three-dimensional regulation of spinal sagittal curves. Eur. Spine J. 7 (2), 99–103. https://doi.org/10.1007/s005860050038 (1998).
Mendoza-Lattes, S., Ries, Z., Gao, Y. & Weinstein, S. L. Natural history of spinopelvic alignment differs from symptomatic deformity of the spine. Spine (Phila Pa. 1976). 35 (16), E792–E798. https://doi.org/10.1097/BRS.0b013e3181d35ca9 (2010).
Maiers, M. J., Albertson, A. K., Major, C., Mendenhall, H. & Petrie, C. P. The association between individual radiographic findings and improvement after chiropractic spinal manipulation and home exercise among older adults with back-related disability: a secondary analysis. Chiropr. Man. Th. 33 (1), 2. https://doi.org/10.1186/s12998-024-00566-9 (2025).
Haslam-Larmer, L. et al. Gleaning a lot from the history and physical exam, and reasonably confident without imaging: a qualitative study of primary care clinicians’ management of patients with low back pain. BMC Prim. Care. 26 (1), 26. https://doi.org/10.1186/s12875-025-02726-z (2025).
Williams, B., Gichard, L., Johnson, D. & Louis, M. An investigation into the chiropractic practice and communication of routine, repetitive radiographic imaging for the location of postural misalignments. J. Clin. Imaging Sci. 14, 28. https://doi.org/10.25259/JCIS_68_2024 (2024).
Noshchenko, A. et al. Spinopelvic parameters in asymptomatic subjects without spine disease and deformity: A systematic review with Meta-Analysis. Clin. Spine Surg. 30, 392–403 (2017).
Fujishiro, T. et al. European spine study Group, ESSG. Decision-making factors in the treatment of adult spinal deformity. Eur. Spine J. 27 (9), 2312–2321 (2018).
Banno, T. et al. The cohort study for the determination of reference values for spinopelvic parameters (T1 pelvic angle and global tilt) in elderly volunteers. Eur. Spine J. 25, 3687–3693 (2016).
Tominaga, R. et al. Dose-response relationship between spino-pelvic alignment determined by sagittal modifiers and back pain-specific quality of life. Eur. Spine J. 30 (10), 3019–3027 (2021). https://doi.org/10.1007/s00586-021-06965-3
Acknowledgements
Partial funding for this project was received from The NCMIC Foundation and CBP NonProfit for funding of open access fees if accepted for publication.
Author information
Authors and Affiliations
Contributions
Authors I.M.M., D.U.O., M.T.M., S.Z., I.K., P.A.O., and D.E.H. all participated in the research idea and participated in its design. I.M.M., D.U.O., M.T.M., and D.E.H. contributed to the statistical analysis. I.M.M., D.U.O., M.T.M., S.Z., I.K. participated in data collection and supervision. I.M.M., D.U.O., M.T.M., S.Z., I.K., P.A.O., and D.E.H. All contributed to the interpretation of the results and wrote the drafts. All authors have read and agreed to the published version of the manuscript.
Corresponding author
Ethics declarations
Competing interests
PAO is a paid consultant for CBP NonProfit, Inc. DEH teaches rehabilitation methods and is the CEO of a company that distributes spine rehabilitation equipment to physicians in the U.S.A. as used in this manuscript. All the other authors declare that they have no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Moustafa, I.M., Ozsahin, D.U., Mustapha, M.T. et al. Machine learning models for predicting treatment outcomes in chronic non-specific back pain patients undergoing lumbar extension traction. Sci Rep 16, 6738 (2026). https://doi.org/10.1038/s41598-026-38059-9
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-026-38059-9


