Introduction

Knee osteoarthritis (KOA), a complex musculoskeletal disease characterized by joint pain and limited mobility, affects 37% of persons over 60 years old worldwide1. Depressive symptoms are a common comorbidity of KOA, with surveys indicating a prevalence of around 20%2,3,4. Mounting evidence suggests that KOA combined with depressive symptoms is associated with worse pain symptom, increased functional decline, and poor disease prognosis5,6. This situation complicates KOA patients’ disease management and exacerbates the health-related burden3,5,7,8,9,10,11,12,13,14,15,16.

Compared with the high incidence and risk, depressive symptoms in KOA patients are underdiagnosed and treated, with less than 10% of individuals receiving effective treatment17. Given the adverse health outcomes and increased financial burden caused by depressive symptoms to KOA patients, strengthening the screening of depressive symptoms for KOA patients is considered beneficial in many studies18,19,20. However, the latest study of the cost-benefit analysis found that depressive symptoms screening for all OA patients did not significantly reduce treatment costs, questioning its cost-effectiveness21. Compared with the large-scale census, it may be a more cost-effective option to screen out the high-risk population by developing a risk prediction model and carrying out the assessment and management of the depressive symptoms for this targeted population.

Recently, there has been growing attention to the issue of depressive symptoms in patients with KOA, and many studies have explored the risk factors for depressive symptoms in this population. Identifying these risk factors could lead to more early diagnosis and treatment that target susceptible populations to improve clinical outcomes. Based on existing research, the risk factors for KOA-related depression can be broadly categorized into sociodemographic, KOA-related symptoms, and other health condition. Sociodemographic factors form the foundation of predictive models, with variables including gender22,23,24, age23,25, marital status23, education level25,26, income23,25, and living alone status27,28 being closely linked to depressive symptoms. Among them, gender and age have been shown to influence the incidence of depressive symptoms22,23,24,25, while socioeconomic status and living conditions (such as education level and living alone status) are also recognized as significant influencing factors25,26,27,28. Furthermore, KOA-related symptoms factors, including KOA duration18, pain23,29, walking speed30, and the time taken for Five-Times-Sit-to-Stand Test (FTSST)31, thought to directly reflect the disease severity and physical function, are also significantly associated with depressive symptoms. Studies have shown that severe pain and functional limitations often increase the risk of depressive symptoms23,29,32. Additionally, other health conditions factors such as comorbidities23, ability to perform activity of daily living (ADL)25,32, self-reported health status33, history of falls33,34, sleep problems35, smoking and alcohol consumption status6,36, and body mass index (BMI)25 are also significantly correlated with depressive symptoms. The deterioration of these health conditions is often accompanied by an increase in depressive moods23,37,38. The depressive symptoms result from the interaction of multiple factors, a single factor alone cannot fully explain their complexity. Therefore, constructing a comprehensive predictive model that incorporates multiple factors is essential.

However, a validated and reliable multi-factorial model is still lacking. While some studies19,20,39 have explored the predictive effect of Kellgren-Lawrence (KL) garding on future depressive symptoms risk in KOA patients, the heterogeneity of the results suggesting a need for more in-depth research. Additionally, one study attempted to develop a prediction model based on 122 KOA patients with depressive symptoms, but it failed to include key risk factors such as sociodemographics, rasing concerns about its stability and reliability due to insufficient sample size and lack of external validation18. Another recent study has tried to predict depressive symptoms by using machine learning (ML) methods, but the prevalence of depressive symptoms in the study sample was much lower than in previous literature, calling into question the representativeness of the model, and the patients in the study were predominantly from white ethnic backgrounds40.

By focusing on a representative cohort of middle-aged and elderly in China, this study aimed to develop a multi-factorial model for predicting the depressive symptoms in patients with KOA. While the data originates from a Chinese population, the findings could contribute to earlier diagnosis and targeted treatment strategies of depressive symptoms in KOA patients, ultimately improving clinical outcomes for susceptible populations globally. Furthermore, the application of ML methods in developing such models provides a framework that can be adapted and validated across diverse settings, thereby enhancing their potential utility beyond China. Given the growing prevalence of KOA and associated comorbidities worldwide, addressing these gaps will facilitate a more comprehensive understanding of depressive symptoms in KOA patients on a global scale, fostering advancements in personalized healthcare.

Methods

Data sources and study population

This study used data from the China Health and Retirement Longitudinal Survey (CHARLS) database for model development and the Osteoarthritis Initiative (OAI) for external validation. Both CHARLS and OAI are multicenter, longitudinal, prospective cohorts. CHARLS, based in China, includes data from 17,000 middle-aged and elderly people, encompassing key predictors of depressive symptoms in KOA patients41. For external validation, data were drawn from the U.S.-based OAI cohort, which focuses on knee health and includes data from 4,796 middle-aged and older patients. We analyzed baseline to 4-year follow-up data from both cohorts. Participants aged ≥ 45 years, diagnosed with KOA, and without depressive symptoms at baseline were included, while those with incomplete KOA or depression data at baseline or follow-up, follow-up durations < 1 year, or missing > 50% of variables were excluded. Ultimately, 496 CHARLS participants contributed to model development and 1,115 OAI participants wer eincluded for validation (Fig. 1).

Fig. 1
Fig. 1
Full size image

Formation process of the modeling and validation cohorts.

Outcome variable and assessment tool

The primary outcome in our study was the developement of depressive symptoms at the 4-year follw-up in the database. In the CHARLS cohort, depressive symptoms were assessed using the 10-item CES-D scale, with a score ≥ 12 indicating depressive symptoms42. The OAI cohort used the 20-item CES-D scale, with a score ≥ 16 as indicative of depressive symptoms43. Both the 10-item and 20-item CES-D versions have demonstrated good reliability and validity in numerous studies, showing similar effectiveness in identifying depressive symptoms44,45.

Predictor variables and assessment tools

During the selection of potential predictor variables, we first conducted a literature review to identify variables that have been shown to be important in predicting the outcome and included these literature-supported variables as candidate predictors6,22,23,24,25,27,28,29,32,33,34,35,36,46,47. 18 variables from the CHARLS databases were included as potential predictors in this study. Specifically, the variables include six sociodemographic factors (gender, age, education level, marital status, income, living alone or not), four KOA-related symptoms (duration of KOA, pain intensity, walking speed, FTSST), eight other health condition factors (self-reported health status, difficulties with ADL/IADL, comorbidities, history of falls, frequency of sleep problems, smoking status, alcohol consumption status, BMI).

Comorbidities were stratified based on the numbers into no comorbidities, one comorbidity, and two or more comorbidities. Sleep problems were classified into three groups according to the frequency of self-reported sleep disturbances in the past week: rarely (< 1 day), sometimes (1–4 days), and always (5–7 days). BMI was categorized according to the WHO: normal or underweight (< 25.0 kg/m²) and overweight or obese (≥ 25.0 kg/m²)48. Pain severity was classified using the digital pain rating scale into mild (≤ 3 points), moderate (4–6 points), and severe (≥ 7 points). Walking speed and the FTSST were both assessed using cutoffs from the 2019 Asian Working Group for Sarcopenia49: walking speed was divided into speed ≥ 1.0 m/s and speed < 1.0 m/s, while FTSST was categorized by time ≤ 12s and time > 12s.

Selection of predictive variables

As a convention, a model should estimate the high accuracy of the outcome using a minimal number of variables. In this study, the Least Absolute Shrinkage and Selection Operator (LASSO) regression algorithm was applied to identify the most efficient input variables. The LASSO algorithm is a linear regression method using L1-regularization. Its basic idea is to minimize the residual sum of squares under the constraint condition that the absolute sum of the regression coefficients is less than a constant, to generate some regression coefficients strictly equal to 0 and obtain a more refined model, which is a biased estimation for processing data with complex col-linearity. Compared to conventional regression, the LASSO algorithm was considered to prevent the over-fitting problem and produce a more interpretative, compact, and accurate model50. In this study, 10-fold cross-validation was used to determine the parameter λ. All variables were evaluated and those with non-zero LASSO regression coefficients were selected as input variables. The final 11 variables were chosen as inputs to develop the model.

Model development

To develop the model for depressive symptoms prediction in KOA patients, four ML-based methods including logistic regression, decision tree, random forest, and artificial neural networks (ANN) were investigated. For details on their implementation methods and configurable parameters in R language, refer to Supplementary Table S1. To train supervised classifiers, the CHARLS dataset was randomly divided into a 70% : 30% ratio, of which 70% was the training set and 30% was the testing set.

Logistic regression

Logistic regression is widely used in the construction of various risk prediction models due to its simplicity and efficiency51. It is a multiple regression analysis method to analyze the relationship between dependent variables and certain influencing factors52.

Decision tree

Decision tree is a predictive analysis model expressed in the form of an imitation tree structure. Generally, a decision tree contains a root node, several internal nodes, and several leaf nodes. The root node and internal nodes represent a feature or attribute, and the leaf node represents a category53. This study uses the classification and regression tree (CART) algorithm to construct a decision tree.

Random forest

Random Forest is an ensemble algorithm composed of the decision tree. The original data were randomly selected by sampling with a replacement method to form a dataset, and randomly selected some features as input, then a random decision tree is obtained. Through repeat many times to obtain random forest. When predicting, every decision tree in the forest makes a decision, and the final output category is determined by the mode of the individual tree’s output category54. Compared with a single decision tree, the random forest can learn the interaction between features, has a good anti-noise ability, and has stable performance55.

Artificial neural network

Artificial Neural Network is a kind of information processing system that simulates the structure and function of the biological brain neural network56, including the input layer, hidden layer, and output layer. Generally, the neuron numbers of the input layer are related to the number of features, and the output layer’s neuron number is the same as the number of categories. The number of layers and neurons in the hidden layer is complex and can be optimized through customization57. ANN model is friendly to nonlinear data and has the advantages of strong memory function and self-learning ability, but it also has the disadvantages of black box nature and poor interpretation54.

Model performance

The model’s overall performance was evaluated on an independent CHARLS testing set and externally validated using the OAI dataset to ensure generalizability.

The performance of models was quantified based on discriminant performance, calibration, and clinical utility. Discriminative performance was estimated by the areas under the receiving operating characteristic curves (AUC), with a higher AUC indicated better discrimination53. To evaluate calibration, a probabilistic calibration curve was used, the closer the curve is to the diagonal (intercept 0, slope 1), the higher the degree of calibration. Clinical utility was evaluated with the decision curve analysis (DCA), which identified the model with the greatest net benefit. DCA meets the actual needs of clinical decision-making and integrates the preferences of patients or decision-makers into analysis, which focuses on the benefits brought by models with different threshold probabilities58,59,60. Furthermore, to enhance model interpretability, the most important predictive features will be identified from the best-performing model on external validation.

Ethics approval

Since the CHARLS and OAI cohort is openly accessible, Medical Ethics Board Committee of Peking University granted the study an exemption from review.

Results

Participant characteristics

The initial CHARLS dataset included 17,708 patients. After excluding 17,212 ineligible patients, 496 were retained for modeling, with 347(70%) in the training set and 149(30%) in the testing set. the OAI dataset began with 4,796 patients, after excluding 3,681 patients, a final sample of 1,115 was obtained for externally validation (Fig. 1). The baseline characteristics of participants in the CHARLS and OAI datasets are detailed in Supplementary Table S2 and Table S3, respectively.

A comparison of baseline characteristics between the CHARLS modeling dataset and the OAI validation dataset (Table 1) revealed that KOA patients in the OAI cohort were older (P < 0.001), had higher educational attainment (high school or above, P < 0.001), and a lower proportion of spouses (P < 0.001). More OAI patients lived alone (P < 0.001), fewer reported poor health (P < 0.001) or ADLs/IADLs difficulties (P < 0.001), and fewer had comorbidities (P < 0.001), though a greater proportion reported falls (P < 0.001). OAI participants also had lower rates of frequent sleep problems (> 5 days/week, P < 0.001), higher rates of smoking and alcohol use (P < 0.001), lower prevalence of severe KOA (> 5 years, P < 0.001), and a greater proportion had a walking speed < 1.0 m/s (P < 0.001) and took > 12s to complete FTSST (P < 0.001). No significant differences in gender distribution or pain severity were observed between datasets (p > 0.05).

Table 1 Comparison of baseline characteristics between CHARLS and OAI.

Feature selection

Among 18 candidate variables (Table 1), 11 variables were selected by LASSO regression, including gender, education level, income, self-reported health status, difficulties with ADLs/IADLs, history of falls, frequency of sleep problems, smoking status, BMI, pain intensity, and duration of FTSST. The dynamic process of LASSO regression screening variables was shown in Fig. 2. Each curve (Fig. 2a) represents the change trajectory of each variable coefficient. With Log(λ) increasing, the coefficient of each variable gradually approaches 0, and the later it approaches 0, the more important the variable is. In this study variable 18 (frequency of sleep problems) and variable 9 (difficulties with ADLs/IADLs) were compressed to 0 at the latest. Corresponding to the different number of variables/Log(λ), Fig. 2b shows the mean value and 95% confidence interval of regression model deviance after 10-fold cross-validation. Deviance refers to the degree of deviation between the developed model and the ideal model (with perfect fitting data), where the smaller the deviance value, the better the goodness of fit. The two dotted lines in Fig. 2b indicate two special λ values (λmin and λ1se), and λ values between them are all considered appropriate. Specifically, λmin represents the one with the smallest deviance, and λ1se represents the one with a one standard deviation increase of λmin. The λ1se model was finally selected from these analyses by simultaneously considering the accuracy and simplicity. The LASSO regression coefficient of each variable in λ1se is shown in Supplementary Table S4.

Fig. 2
Fig. 2
Full size image

LASSO regression screening variable dynamic process diagram.

(a) trajectory (b) the deviation confidence interval.

Model development

Logistic regression

The parameters of the logistic regression model are shown in Table 2. Specifically, the model expression is:

$$\begin{gathered} n\left( {\frac{p}{{1 - p}}} \right) = \left( { - 1.548} \right) + 0.380*gender + \left( { - 0.338} \right)*education~level + 0.268* \hfill \\ difficulty~with\frac{{ADLs}}{{IADLs}} + 0.214*frequency~of~sleep~problems + 0.118* \hfill \\ pain~intensity + 0.349*Smoking~status + 0.135*self - \hfill \\ reported~health~status + 0.167*duration~of~FTSST + 0.223* \hfill \\ history~of~falls + \left( { - 0.169} \right)*income + ~\left( { - 0.200} \right)*BMI. \hfill \\ \end{gathered}$$

In the training stage, the logistic regression model using all the 11 variables showed the sensitivity, specificity, and accuracy were 0.924, 0.163, 0.624 (95% CI 0.614–0.634), and AUC = 0.607 (95% CI 0.595–0.619).

Table 2 Parameters of logistic regression model.

Decision tree

The decision tree model was developed based on selected variables, and the model was pruned (to reduce complexity) according to the Gini coefficient to reduce overfitting. The decision tree formed based on the minimum Gini coefficient is shown in Fig. 3. The first major split in the tree defines pathways separating sleep problems (sometimes or always) and not having sleep problems. The fall history at the second level, when fall occurred, KOA patients with ADLs/IADLs difficulty, per capita household income below or above the median level, mild to moderate pain, and poor self-rated health status had a higher risk of depressive symptoms; when fall did not occur, KOA patients with BMI < 25.00 kg/m2, female, smoking, or household per capita income below the median level had a higher risk of depressive symptoms. Using 10-fold cross-validation on the training set to evaluate model performance, the results showed that the overall accuracy of the decision tree model was 0.656, and the sensitivity, specificity, and AUC were 0.245, 0.923, and 0.607 (95%CI: 0.596–0.619).

Fig. 3
Fig. 3
Full size image

Decision tree model of risk of KOA with depressive symptoms.

Random forest model

An optimal random forest model was developed by determining two important parameters, decision tree number and feature number. As shown in Fig. 4, the overall misjudgment rate was the lowest (0.135) when the decision tree numbers were 220. Figure 5 shows the accuracy of the model under different extraction feature number when the decision tree numbers were fixed at 220. When the feature number was 11, the accuracy of the model is the highest. 10-fold cross-validation on the training set of the model yielded a high AUC was 0.939 (95% CI: 0.934–0.945), and the accuracy, sensitivity, and specificity were 0.874, 0.917, and 0.808.

Fig. 4
Fig. 4
Full size image

Misjudgment rate of random forest model with different numbers of decision trees.

Fig. 5
Fig. 5
Full size image

Accuracy of random forest model with different numbers of selected features.

Artificial neural network

The ANN model with the structure of “11-12-1” was developed in this study. ‘11’ represents the number of input neurons, which were 11 input variables; ‘12’ represents the number of hidden layer neurons, ‘1’ represents the number of neurons in the output layer, indicating whether depressive symptoms occur. Figure 6 shows the accuracy of the model under the different number of hidden layer neurons and weight attenuation parameters. When the number of hidden layer neurons at 12 and the weight attenuation parameter at 0.1, the accuracy was the highest. In the training stage, the ANN model showed a good discriminatory performance where the AUC was 0.877 (95% CI: 0.870–0.884), and the accuracy, sensitivity, and specificity were 0.803, 0.877, and 0.689, respectively.

Fig. 6
Fig. 6
Full size image

Accuracy of random forest model with different numbers of hidden layers and weight attenuation parameters.

Model performance

Discriminatory power

The performance of four models was evaluated by the testing set. As shown in Fig. 7, the AUC of the logistic regression is 0.622 (95% CI: 0.603–0.641), showing a certain degree of discrimination power. The AUC value of the decision tree is 0.611 (95% CI 0.593–0.630), slightly lower than the logistic regression model. The ANN model shows a good discriminatory performance where the AUC is 0.868 (95% CI 0.857–0.879), which is higher than logistic regression and decision tree (P < 0.001). The AUC of the random forest is 0.928 (95% CI 0.920–0.937), which shows the best discriminatory performance (P < 0.001).

In addition, the accuracy, sensitivity, and specificity of models were also evaluated (Table 3). Among four models, the random forest shows the highest accuracy (0.856), followed by the ANN (0.786), the decision tree (0.654), and the logistic regression (0.627). In sensitivity, the logistic regression has the highest sensitivity (0.927), while the random forest has a sensitivity of more than 0.9 (0.904). In terms of specificity, the decision tree model is the highest (0.922), followed by the random forest (0.786).

Fig. 7
Fig. 7
Full size image

ROC curve of models.

Table 3 Model performance in CHARLS testing set.

Calibration & clinical utility

Calibration was evaluated by the probability calibration curve, which showed a good calibration for all models (Fig. 8). The closer the curve is to the diagonal, the higher the calibration degree of the model is, whereas the closer the predicted probability is to reality. The probabilistic calibration curves of the ANN and decision tree showed good calibration degrees, which were closest to the diagonal, then followed by the random forest and the logistic regression.

Fig. 8
Fig. 8
Full size image

Probabilistic calibration curves of models.

In addition, the DCA was used to evaluate clinical utility. As shown in Fig. 9, when the threshold probability on 0.2–0.9, the clinical utility value of the random forest was the highest, and when on 0-0.2 or 0.9-1, the ANN was the highest, while the logistic regression and decision tree were always lower. Considering the real clinical situation, the threshold probability of intervention for depressive symptoms in KOA patients was more likely to be in the range of 0.2–0.9, at which point the benefit of decision-making using the random forest model was higher than others.

Fig. 9
Fig. 9
Full size image

Decision curve of models.

Through the comprehensive evaluation of discriminant performance, calibration, and clinical utility among four models, the random forest model was the optimal model in the internal test set. The optimal model was externally validated in the OAI dataset, initially showing limited discrimination with an AUC of 0.539 (95% CI 0.528–0.550). After adjusting the random forest parameters by setting the number of decision trees to 140 and the number of randomly selected features to 9, the AUC improved significantly to 0.877 (95% CI 0.864–0.889), demonstrating strong discrimination and calibration.

Feature importance ranking of predictive variables

The feature importance of the optimal model was ranked according to each variable’s impact on prediction accuracy. The results showed that pain severity was the most significant predictor, followed by the duration of FTSST and sleep problems. Other key features included smoking status, fall history, gender, ADL/IADL difficulty, income, BMI, and education level.

Discussion

Principal results

Based on a representative cohort in China, we developed four ML models for depressive symptoms in KOA patients from a variety of easily accessible potential predictors such as sociodemographics, KOA symptom-related data, and general health conditions data. The developed ML models achieved the clinically identification of individuals at high risk of depressive symptoms in KOA, of which the random forest model was considered to be the best performing model. To our knowledge, this is the first study in China to use ML methodologies to predict depressive symptoms in patients with KOA, with comprehensively considered the factors of demographic, KOA symptom-related, and general health conditions. Meanwhile, this is also the first study to use ANN to predict the risk of depressive symptoms in KOA.

In this study, we used routinely available demographic and clinical data to develop the model and identified only 11 key features for prediction through the LASSO method, which increased the simplicity and practicability of the model compared with previous studies. LASSO was used to comprehensively screened variables among a broad range of sociodemographic, KOA symptom-related, and general health condition factors, and 11 predictive features, including gender, education level, per capita income of the family, self-rated health status, ADLs/IADLs difficulty, fall history, sleep problems, smoking status, BMI, pain degree, and the duration of FTSST, were included. Compared with the least square estimation method of the traditional regression model, the LASSO regression method can effectively optimize the overfitting problem of the traditional regression model61. In addition, this method can obtain a more simplified model with higher prediction accuracy at the cost of a certain estimation bias, which is especially suitable for machine learning models. Because in a machine learning model, if too many meaningless variables are input into it, the complexity of the model will be greatly increased, which is not conducive to algorithm convergence and will also increase the calculation time62,63. The excellent ability of the LASSO method in variable screening and model stability has been verified in many disease prediction fields such as cancer, cardiovascular, and perinatal health64,65,66,67.

With the development of ML algorithms, ML methods are widely used in the field of disease prediction. Based on the data of CHARLS from 2011 to 2015, this study used logistic regression, decision tree, random forest, and ANN to construct the risk prediction model. The developed models in this study achieved a clinically acceptable discrimination between depressed and non-depressed individuals, with the random forest model demonstrated the highest predictive performance. In recent years, random forest method has been widely used to deal with classification problems in digital health technology, and has an important role in augmenting clinical diagnosis33. Our findings demonstrate the potential of this method in dealing with complex classification problems, such as KOA combined with depressive symptoms.

Comparison with prior work

At the broadest level, while many studies have been conducted on the association of KOA with depressive symptoms, few have applied ML methods to predict the risk. Sayre et al.18 applied logistic regression to developed a prediction model based on a longitudinal cohort data, with a clinically acceptable performance (AUC = 0.742). However, the model was developed base on conventional statistical methods, had small sample and lacked external validation. It is doubtful whether the predictive performance of the model is stable and can be replicated in other KOA patients. Nowinka et al.40 applied six ML prediction models to predict depressive symptoms in patients with KOA, but the prevalence of depressive symptoms in the study sample was much lower than in previous literature, calling into question the representativeness of the model, and the patients in the study were predominantly from white ethnic backgrounds. To our knowledge, this is the first study in China to use ML methodologies to predict depressive symptoms in patients with KOA. Meanwhile, this is also the first study to use ANN to predict the risk of depressive symptoms in KOA.

There are several strengths in this study. First, a sensible number of participants from a random, nationwide sample were included to enable the models with a relative high accuracy and representativeness. Second, the robustness and generalization of the developed models were reinforced by the use of internal validation and external validation to validate respectively. Lastly, only 11 crucial features were included, all of which were easily accessible, demonstrating the simplicity of our approach and the ease of widespread application in primary health care.

Limitations

As in all studies, there are some limitations in the present study. In terms of the random forest model, one limitation is that the interpretation of the model is limited by the integration and black box nature of the random forest algorithm. The other limitation is that the application of the model requires practitioners to have some programming ability, which limits the application of the model to a certain extent. In future studies, the risk prediction model constructed in this study can be used as a web tool or embedded in the medical care information system to improve the convenience of using the model. In addition, although the utility value of the risk prediction model was analyzed by the decision curve, the cost-effectiveness of the model needs to be further analyzed through more clinical studies.

Conclusions

In conclusion, focusing on the identification of depressive symptoms in KOA, the study proposes a model for predicting the risk of depressive symptoms in KOA patients. The model is developed based on various easily accessible latent predictive factors, such as demographic information, KOA symptom-related data, and general health status data, achieving an overall performance of 87.7%. In comparison to previous methods, this model demonstrates outstanding performance. It is noteworthy that this is the first study employing ANN method to predict the risk of depressive symptoms in KOA patients. Simultaneously, as the first multi-factor external validation ML model based on longitudinal cohort data in China, this model aids healthcare professionals in early identification of depressive symptoms risk among KOA patients, thereby optimizing personalized preventive strategies in healthcare.