Abstract
This study retrospectively collected clinical data from 480 patients with connective tissue diseases (CTDs) at Nanjing First Hospital between August 2019 and December 2023 to develop and validate a multi-classification machine learning (ML) model for assessing depression risk. Addressing the limitations of traditional assessment tools, six ML models were constructed using univariate analysis and the LASSO algorithm, with the categorical boosting (Catboost) model emerging as the best performer, demonstrating strong predictive ability across different depression severity levels (none_F1 = 0.879, mild_F1 = 0.627, moderate and severe_F1 = 0.588). Additionally, the study provided an interpretation of the best-performing model using SHAP and developed a user-friendly R Shiny application (https://macnomogram.shinyapps.io/Catboost/) to facilitate clinical use. The findings suggest that the Catboost model represents a significant advancement in assessing depression risk among CTD patients, highlighting the potential of ML in enhancing mental health management for this patient population.
Similar content being viewed by others
Introduction
Depression, significantly prevalent globally, ranks as the second primary cause of disability and is a major risk factor for suicide1,2. Connective tissue diseases (CTDs), including systemic lupus erythematosus (SLE), Sjögren’s syndrome (SS), rheumatoid arthritis (RA), and others, are autoimmune conditions that lead to both physical and emotional disturbances, notably depressive symptoms, thereby reducing life quality3,4,5,6.
The underlying causes of depression in CTD patients are complex, potentially involving autoantibodies, genetic factors, inflammatory cytokines, and medication side effects such as glucocorticoids7. Evidence suggests that inflammation and specific autoantibodies in conditions like SLE significantly correlate with major depression8,9. The cytokine hypothesis supports this, proposing that inflammation induced by various stressors plays a key role in depression10.
However, traditional tools for assessing depression, like the Patient Health Questionnaire-9 (PHQ-9), Hamilton Depression Rating Scale (HDRS), and Beck Depression Inventory (BDI), though widely used, have limitations. They demand considerable time and clinical expertise, underscoring the need for improved depression assessment methods in CTD patients11,12,13. Amidst these challenges, machine learning (ML) is emerging as a promising tool in the medical field, particularly in psychiatric risk assessment. Yet, there is a research gap specifically concerning depression risk in CTD patients. Most existing studies rely on simple statistical models or descriptive analyses, which do not fully utilize the available data to uncover patterns related to depression risk14,15,16,17,18,19.
This study aims to develop an advanced ML-based multi-classification model for assessing depression risk in CTD patients. By systematically analyzing comprehensive data including clinical, laboratory, and psychological metrics, we intend to identify critical predictors of depression risk. Advanced ML algorithms will then be used to create a predictive model that accurately assesses this risk, enhancing mental health management and treatment strategies for this patient group.
Materials and methods
Study design
The study protocol strictly follows the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) reporting guidelines20. A detailed workflow chart of the study is presented in Fig. 1.
Participants
This study was conducted from August 1, 2019 to December 1, 2023 in the inpatient department of the Rheumatology and Immunology Department of Nanjing First Hospital. All subjects in this study were definitely diagnosed as CTDs, including SLE, SS, RA, systemic sclerosis (SSc), dermatomyositis (DM), polymyositis (PM) and mixed connective tissue disease (MCTD). The inclusion criteria were: (1) Chinese citizens aged ≥ 18 years, (2) voluntary participation in the survey and (3) being able to complete the PHQ-9 questionnaires independently. The exclusion criteria were: (1) prior medical history with severe mental ill- nesses, (2) prior use of medical drugs to treat mental disorders, (3) pregnant and lactating women, (4) medical history of malignant tumor and (5) disability caused by diseases other than CTDs.
Data collection
We collected comprehensive patient data in a structured format, including demographics (age, disease duration, education, sleep duration, gender, marital status), comorbidity history (including other CTDs, thyroid diseases, diabetes, chronic diseases, interstitial lung disease (ILD), pulmonary arterial hypertension (PAH), infection, and fibromyalgia), medication use (corticosteroids (represented as prednisone equivalent), biological agents, classical immunosuppressants (i.e. cyclophosphamide, mycophenolate mofetil, cyclosporine A, tacrolimus and azathioprine), conventional synthetic disease modifying anti-rheumatic drugs (csDMARDs) (i.e. methotrexate, leflunomide, hydroxychloroquine, sulfasalazine and iguratimod), and laboratory test results (WBC count, hemoglobin, platelet count, neutrophil count, lymphocyte count, autoantibodies, inflammatory cytokines and immunological parameters). For the assessment of pain, the Visual Analog Scale (VAS) is utilized, with a score of 0 indicating the absence of pain and a score of 10 reflecting the most severe, intolerable level of discomfort. Disease activity of SLE was assessed with Systemic Lupus Erythematosus Disease Activity Index (SLEDAI)−2000 and Systemic Lupus Erythematosus Disease Activity Score (SLEDAS), and disease activity of RA was assessed with Disease Activity Score (DAS) 28-CRP.
Depression assessment
We employed the PHQ-9 to assess depression severity in accordance with established diagnostic criteria. The PHQ-9 is a well-validated and reliable tool for measuring symptoms of major depressive episodes over the past two weeks21. It comprises nine items, each scored from 0 to 3 points, resulting in a total score ranging from 0 to 27 points. Depression severity was categorized based on standard PHQ-9 score cutoffs: 0–4 points = no depression, 5–9 points = mild depression, and 10 or greater points = moderate or severe depression. In this study, we formulated depression prediction as a three-class classification task: no depression (PHQ-9 score 0–4), mild depression (PHQ-9 score 5–9), and moderate or severe depression (PHQ-9 score ≥ 10).
Data preprocessing
The raw data underwent a comprehensive preprocessing to address missing values, data leakage, and feature standardization. Samples and features with missing values exceeding 20% and 25%, respectively, were excluded. To prevent data leakage, the dataset was split into training (80%) and test (20%) sets using the holdout method. The training set was used for feature selection, model training, and hyperparameter tuning. Based on the principle of maximizing the accuracy of model, hyperparameters were determined on the training set using the Grid search algorithm with 5-fold cross validation. The test set was only used to verify the generation of our proposed models.
Feature selection
A two-step feature selection process was employed. First, univariate analysis identified statistically significant features (p< 0.05) potentially associated with depression. Subsequently, the LASSO algorithm with hyperparameter lambda (λ) was applied to the training set to further refine the feature subset. LASSO shrinks the coefficients of irrelevant features to zero, resulting in the selection of features with non-zero coefficients22.
Data imputation and normalization
Missing values were imputed using K-Nearest Neighbors (KNN) for continuous variables and mode imputation for categorical variables in both the training and test sets. Additionally, continuous features were normalized using Z-score standardization, while multi-categorical features were one-hot encoded to create dummy variables, prior to LASSO-based feature selection.
ML models
Following feature selection, we developed and evaluated six multi-classification ML models to predict depression severity: logistic regression (LR), support vector machine (SVM), random forest classifier (RFC), light gradient boosting machine (LGBM), categorical boosting (Catboost), and artificial neural network (ANN). All models have incorporated 10-fold cross validation during the training process for thorough training and evaluation. When training each model, we address the class imbalance problem by adjusting the parameter of class weight. For LR, SVM, RFC, LGBM, and Catboost, the algorithms internally select the optimal strategy (one-vs-one or one-vs-rest) for handling multi-classification tasks based on the data characteristics. To prevent overfitting in the ANN model, we introduced dropout layers and early stopping during the training process. All models were implemented using the scikit-learn (version 1.1.2), lightgbm (version 3.3.2), Catboost (version 1.1.1), and Keras (version 2.9.0) libraries in Python (version 3.9.12).
Model evaluation
The hold-out test set was used exclusively to evaluate the generalizability of the developed models. Considering the inherent characteristics of the multi-classification task, some common binary-classification evaluation metrics (e.g., the area under the curve (AUC) and F1-score) are not directly applicable to multi-class problems. Therefore, we employed a comprehensive suite of evaluation metrics to assess multi-classification model performance. This included: (1) Confusion Matrix: Visualized the distribution of true and predicted depression classifications across categories (no, mild, moderate/severe); (2) Multi-class ROC Curves: Illustrated the trade-off between sensitivity and specificity for each depression class; (3) Macro-average AUC: Evaluated overall model performance by averaging AUC scores across all depression categories; (4) Kappa Statistic: Assessed model agreement with the true classifications, accounting for class imbalance; (5) Average Precision, Recall, and F1-score. Although CTD patients experience varying levels of depression in real-world scenarios, leading to class imbalance, it is important to recognize that all prediction levels should not be treated equally in terms of their predictive value. In a clinical setting, the ideal approach would be to develop a predictive model that is more effective at identifying depression-positive cases, such as mild, moderate, and severe cases.
Model interpretation
The SHapley Additive exPlanations (SHAP) technique, demonstrating each variable’s influence on the overall model, was further applied to gain insight into the best-performance model23. We utilized the SHAP technique to evaluate the contribution of features in the best-performing model to the prediction outcomes of the test set. In the three-class classification task, the model’s objective is to accurately assign samples to one of the three categories. Consequently, we calculated the SHAP values for each category. For each sample, the SHAP value is represented as a matrix of size (1, k, 3), where each matrix element corresponds to the contribution of a feature to a specific category (with k representing the number of features in the model). We generated separate SHAP bar and dot plots for each category using the test set data. Furthermore, to visualize the distribution of key features across different depression severities in a sample of patients, we generated a grouped-clustered heatmap for ten randomly selected CTD patients using the “pheatmap” package in R (version 4.2.1). Finally, for enhanced clinical usability, the best-performance multi-classification model was deployed as an R Shiny application, facilitating user-friendly model interaction.
Statistical analysis
This study aimed to compare the differences in demographic, clinical, and laboratory characteristics across three levels of depression. The initial step involved evaluating the normality of continuous variables utilizing the Shapiro-Wilk test. Continuous variables that adhered to a normal distribution were presented as mean ± standard deviation (SD) and were analyzed using the Chi-square test. In contrast, those not following normal distribution were depicted as median ± interquartile range (IQR) and assessed with the Kruskal-Wallis test. Subsequently, categorical variables were compared using either the Chi-square (χ2) or Fisher’s exact test, depending on their distribution and sample size. A p-value of less than 0.05 was considered statistically significant for all tests. The entire statistical analysis process was performed using the “compareGroups” package in R version 4.2.1.
Results
Participants’ characteristics
This study recruited a total of 500 patients, of which 480 patients with confirmed diagnoses were deemed eligible. The median age of the participants was 58.5 years (IQR: 48.8–68 years), with a predominance of female patients, accounting for 87.9%. These 480 patients with CTDs were randomly divided into two groups: the training cohort (n = 384) and the test cohort (n = 96). Comparative analysis of baseline characteristics between the two cohorts indicated a general balance, as detailed in Supplementary Table S1.
Depression, the outcome of this study, was categorized into three levels: no depression, mild depression, and moderate or severe depression. Within the training cohort, the prevalence rates for these categories were 48.2% (n = 185/384), 31.5% (n = 121/384), and 20.3% (n = 78/384), respectively. Correspondingly, in the test cohort, the prevalence rates were 59.4% (n = 57/96), 24.0% (n = 23/96), and 16.6% (n = 16/96), respectively.
Feature selection
In this research, forty-six potential variables related to depression in patients with CTDs were initially considered. Of these, twenty-three variables were identified as having missing data. Twenty-three variables had missing values, and six variables with missing values over 25% were excluded from the analysis (Supplementary Table S2). Univariate analysis (Supplementary Table S3) subsequently pinpointed eleven variables markedly associated with the depression outcome, including sleep duration, presence of other CTDs, ILD, PAH, infection, fibromyalgia, fatigue, use of classic immune preparations, platelet count, lymphocyte count, and anti-Ro52 antibodies.
These identified eleven variables were further analyzed using the LASSO algorithm to discern the optimal subset. The LASSO algorithm ultimately determined that five variables had a significant association with the multi-classification depression outcome: fatigue, anti-Ro52 antibodies, sleep duration, platelet count, and lymphocyte count.
Model performance
The aforementioned five variables were used to construct the following six different multi-classification ML models: LR, SVM, RFC, LGBM, Catboost, and ANN. The optimal hyperparameters for each model are documented in Supplementary Table S5. Supplementary Table S4 and Figure S1 show the performance metrics of these models within the training cohort. Figure S2 shows the confusion matrix of each multi-class model on the training cohort. Notably, except for the LR and the ANN models, all other models demonstrated superior performance, ranking in the top four across various metrics, including mild_F1, moderate and severe_F1. These models have the potential to correctly detect more positive cases of CTD related-depression.
Furthermore, the validity of the proposed models was assessed using the test cohort. The performance of the six models within the test cohort is detailed in Table 1 and illustrated in Fig. 2. Confusion matrix of the six multi-classification ML models in the test cohort is shown in Fig. 3. Among these, the Catboost model exhibited the highest levels of mild_F1, moderate and severe_F1. These results highlight the robust performance of the Catboost model and its effectiveness in classifying depression levels in patients with CTDs.
ROC curve of the six multi-classification ML models in the test cohort. (a) LR; (b) SVM; (c) RFC; (d) LGBM; (e) Catboost; (f) ANN. Notes: ROC, receiver operating curve; ML, machine learning; LR, logistic regression; SVM, support vector machine; RFC, random forest classifier; LGBM, light gradient boosting machine; Catboost, categorical boosting; ANN, artificial neural network.
Confusion matrix of the six multi-classification ML models in the test cohort. (a) LR; (b) SVM; (c) RFC; (d) LGBM; (e) Catboost; (f) ANN. Notes: ML, machine learning; LR, logistic regression; SVM, support vector machine; RFC, random forest classifier; LGBM, light gradient boosting machine; Catboost, categorical boosting; ANN, artificial neural network.
Model interpretation
The Catboost model was explained through the application of the SHAP algorithm, which calculated the average absolute SHAP values to quantify the impact of each variable.
Figure 4c illustrate the distributions of feature importance analyses and variable impact model outputs for the Catboost model across the none, mild, moderate, and severe depression categories. Notably, the analysis highlights that higher levels of fatigue, presence of anti-Ro52 antibodies, reduced sleep duration, lower platelet counts, and decreased lymphocyte counts were associated with increased SHAP values. This suggests these factors correlate with an elevated risk of depression, as indicated by the model. Additionally, Fig. 4d provides a comparative visualization of the contributions of these five critical variables across a subset of ten randomly selected patients from the test cohort.
SHAP summary plots for explaining the five variables contributing to the Catboost model. (a) The bar and dot chart displaying the feature importance rankings and the distribution of each variable influence on the Catboost outputs for the “none” category; (b) the bar and dot chart displaying the feature importance rankings and the distribution of each variable influence on the Catboost outputs for the “mild” category; (c) the bar and dot chart displaying the feature importance rankings and the distribution of each variable influence on the Catboost outputs for the “moderate and severe” category; (d) grouped-clustered heatmap illustrating a comparison of the five variables among the random ten CTD patients of the test cohort.
Model deployment
To enhance the practical applicability of our proposed Catboost model in clinical settings, the model was deployed to the cloud, making it accessible to healthcare professionals for real-time depression risk assessment in patients with CTDs. The model is hosted on a web-based platform available at https://macnomogram.shinyapps.io/Catboost/. The user interface of this online tool is illustrated in Figure S3, designed to provide a user-friendly experience for clinicians and researchers alike.
Discussion
This study introduces the ANN model as a predictive tool for assessing the risk of depression, demonstrating superior performance over competing models across multiple metrics such as AUC, accuracy, precision, recall, F1 score, and kappa coefficient. The model’s robustness in parsing complex medical data reveal its potential for accurate depression risk predictions.
Unlike traditional tools such as the PHQ-9 and BDI, the ANN model integrates and analyzes a broad range of clinical and laboratory data, automating the analysis process. This circumvents the need for manual scoring by clinicians, facilitating quicker, more objective assessments. The development of a web-based calculator enhances the model’s clinical utility, providing healthcare professionals with a readily accessible tool.
This research contributes to the literature on ML applications in refining depression risk assessment strategies, expanding upon previous studies15,16. It acknowledges depression’s multifactorial nature, arising from various factors24, which manifest the need for comprehensive approaches to understanding and managing this condition. The ANN model marks a notable advancement in applying ML to mental health, offering a nuanced and efficient method for identifying at-risk individuals25,26.
The relationship between fatigue and depression is intricate, influenced by various biological and psychological factors. Recent researches indicate neuroinflammation’s central role in these concurrent conditions27, with further studies linking immune dysregulation in multiple sclerosis patients to both conditions27. Tarasiuk et al. propose that inflammation, oxidative stress, and neurodegeneration contribute to the convergence of fatigue and depression, suggesting comprehensive treatment strategies focusing on inflammatory processes and brain arousal systems to prevent progression from fatigue to depression28.
In recent years, anti-Ro52 has been found in a variety of CTDs and has attracted widespread attention from rheumatologists. A large number of studies have confirmed that anti-Ro52 is associated with higher frequency of ILD in CTD patients29. Interestingly, patients with ILD are more prone to depression30. Further studies have explored the role of anti-Ro52 antibodies in psychiatric symptoms among CTD patients, identifying them as potential biomarkers for heightened risk of depression and anxiety9. Xu et al. investigated their prognostic meaning in specific autoimmune conditions, linking these antibodies to more severe disease trajectories and potential psychiatric symptom development31.
Recent studies display the complex and bidirectional relationship between sleep duration and depression, showing that both insufficient and excessive sleep can increase the risk of depressive disorders. Bender et al. identified a U-shaped correlation between sleep duration and depression risk, revealing that individuals who sleep less than seven hours or more than nine hours nightly face a heightened risk of depression32. This observation is supported by Geoffroy et al., who also found a U-shaped link between sleep duration and mental health outcomes. This reinforces the importance of promoting adequate sleep as a preventative strategy against psychiatric conditions33.
Recent research is illuminating the complex connections between blood cell counts—specifically platelets and lymphocytes—and depression, positing that systemic inflammation may mediate this relationship. Zhu et al. have reported findings that link elevated levels of inflammatory markers, such as the monocyte-to-lymphocyte ratio (MLR), to individuals with depression34. This link to systemic inflammation suggests these markers could serve as potential biomarkers for depression, bolstering the theory that inflammation plays a critical role in the pathophysiology of the condition. Additionally, the relevance of another inflammatory marker, the platelet-to-lymphocyte ratio (PLR), has been explored by researchers like Gasparyan et al. and Danese et al.35,36. Furthermore, Gansner et al. discussed the effects of plateletpheresis a common procedure in blood donation on the immune system, highlighting its potential implications for inflammatory processes37. This body of research could lead to the identification of novel biomarkers or therapeutic targets for managing depression.
This study not only builds upon our previous research identifying specific antibodies as depression risk factors in CTD patients9, but also integrates these findings into a predictive ML model. Rather than defining depressive outcomes as a simple yes or no, our multi-classification model opens up the possibility of more precise risk stratification of patients (no depression, mild depression, moderate and severe depression), allowing clinicians to target interventions more precisely based on the different predictions for each patient. In addition, this model also offers clinicians new insights for enhancing mental health management and optimizing treatment strategies. Future studies will aim to improve feature extraction and selection methods and expand the model’s integration into broader medical information systems to support automated risk assessments and early intervention.
Our study also has some limitations, including potential biases from the single-center data collection and the dependence of ML model performance on data quality and feature selection. The exclusion of variables with excessive missing data points, such as certain cytokines might have affected the model’s performance38. In addition, there are many risk factors for depression, including dysfunctional cognitions, stressful life events and circumstances, parental depression, interpersonal dysfunction39, these variables were not included in the study due to resource constraints, challenges in data collection and measurement, and interactions and complexities among variables.
Conclusions
This study introduces a novel multi-classification ML-based model, employing the ANN algorithm, for the assessment of depression risk in patients with CTDs. Through rigorous analysis of multidimensional patient data, including clinical information, laboratory test results, and psychological assessments, we identified key variables substantially associated with depression risk. The ANN model outperformed existing assessment tools and other ML models in accuracy and efficiency. The utilization of SHAP for model interpretation further elucidates the contributions of specific variables to depression risk, enhancing the model’s clinical relevance. Deployed as an R Shiny application, the model offers a practical tool for healthcare professionals, facilitating early identification and management of depression in CTD patients. Despite its promising results, future research should focus on expanding the model’s generalizability through multi-center studies and integrating additional biomarkers to refine its predictive capability. This study displays the potential of advanced ML techniques in transforming the approach to mental health assessment within the medical field, particularly for patients with complex autoimmune diseases.
Data availability
Data presented in this study were obtained from patients admitted to the Department of Rheumatology and Immunology, Nanjing first hospital who consented to participate in the study. Owing to data protection rules, we are not allowed to share personal-level data. The datasets used and/or analysed during the current study available from the corresponding author on reasonable request.
References
Lim, G. Y. et al. Prevalence of depression in the community from 30 countries between 1994 and 2014. Sci. Rep. 8, 2861. https://doi.org/10.1038/s41598-018-21243-x (2018).
World Health Organization. Suicide worldwide in 2019: Global health estimates. iv, 28 (World Health Organization, 2021).
Kósa, F. et al. High risk of depression, anxiety, and an unfavorable complex comorbidity profile is associated with SLE: A nationwide patient-level study. Arthritis Res. Ther. 24, 116. https://doi.org/10.1186/s13075-022-02799-6 (2022).
Shen, C. C., Yang, A. C., Kuo, B. I. & Tsai, S. J. Risk of psychiatric disorders following primary sjögren syndrome: A nationwide population-based retrospective cohort study. J. Rheumatol. 42, 1203–1208. https://doi.org/10.3899/jrheum.141361 (2015).
Costa, T., Rushton, S. P., Watson, S. & Ng, W. F. Depression in Sjögren’s syndrome mediates the relationship between pain, fatigue, sleepiness, and overall quality of life. Rheumatol. Immunol. Res. 4, 78–89. https://doi.org/10.2478/rir-2023-0012 (2023).
Didier, K. et al. Autoantibodies associated with connective tissue diseases: What meaning for clinicians? Front. Immunol. 9, 541. https://doi.org/10.3389/fimmu.2018.00541 (2018).
Ayres, A. et al. Cognitive performance in patients with myasthenia gravis: An association with glucocorticosteroid use and depression. Dement. Neuropsychol. 14, 315–323. https://doi.org/10.1590/1980-57642020dn14-030013 (2020).
Leng, Q. et al. Anti-ribosomal P protein antibodies and insomnia correlate with depression and anxiety in patients suffering from systemic lupus erythematosus. Heliyon 9, e15463. https://doi.org/10.1016/j.heliyon.2023.e15463 (2023).
Yang, L. et al. Association of anti-Ro52 antibody with depression and anxiety in patients with connective tissue diseases: An observational, single-centre, cross-sectional study. Clin. Exp. Rheumatol. https://doi.org/10.55563/clinexprheumatol/be9n92 (2023).
Grygiel-Górniak, B., Limphaibool, N. & Puszczewicz, M. Cytokine secretion and the risk of depression development in patients with connective tissue diseases. Psychiatry Clin. Neurosci. 73, 302–316. https://doi.org/10.1111/pcn.12826 (2019).
Fusar-Poli, P. et al. Prevention of psychosis: Advances in detection, prognosis, and intervention. JAMA Psychiatry. 77, 755–765. https://doi.org/10.1001/jamapsychiatry.2019.4779 (2020).
Ho, R. C., Mak, K. K., Chua, A. N., Ho, C. S. & Mak, A. The effect of severity of depressive disorder on economic burden in a university hospital in Singapore. Expert Rev. Pharmacoecon. Outcomes Res. 13, 549–559. https://doi.org/10.1586/14737167.2013.815409 (2013).
Acharya, U. R. et al. Automated EEG-based screening of depression using deep convolutional neural network. Comput. Methods Programs Biomed. 161, 103–113. https://doi.org/10.1016/j.cmpb.2018.04.012 (2018).
Nickson, D., Meyer, C., Walasek, L. & Toro, C. Prediction and diagnosis of depression using machine learning with electronic health records data: A systematic review. BMC Med. Inf. Decis. Mak. 23, 271. https://doi.org/10.1186/s12911-023-02341-x (2023).
Hossain, M. M., Asadullah, M., Hossain, M. A. & Amin, M. S. Prediction of depression using machine learning tools taking consideration of oversampling. Malays. J. Public. Health Med. 22, 244–253. https://doi.org/10.37268/mjphm/22/2/art.1564 (2022).
Muzafar, D., Khan, F. & Qayoom, M. Machine learning algorithms for depression detection and their comparison. https://doi.org/10.48550/arXiv.2301.03222 (2023).
Dong, C. et al. SVM-based model combining patients’ reported outcomes and lymphocyte phenotypes of depression in systemic lupus erythematosus. Biomolecules 13. https://doi.org/10.3390/biom13050723 (2023).
Jiang, W., Wang, X., Tao, D. & Zhao, X. Identification of common genetic characteristics of rheumatoid arthritis and major depressive disorder by bioinformatics analysis and machine learning. Front. Immunol. 14, 1183115. https://doi.org/10.3389/fimmu.2023.1183115 (2023).
Liao, J. et al. A cross-sectional study on the association of anxiety and depression with the disease activity of systemic lupus erythematosus. BMC Psychiatry. 22, 591. https://doi.org/10.1186/s12888-022-04236-z (2022).
Collins, G. S., Reitsma, J. B., Altman, D. G. & Moons, K. G. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): The TRIPOD statement. BMJ 350, g7594. https://doi.org/10.1136/bmj.g7594 (2015).
Costantini, L. et al. Screening for depression in primary care with patient health questionnaire-9 (PHQ-9): A systematic review. J. Affect. Disord. 279, 473–483. https://doi.org/10.1016/j.jad.2020.09.131 (2021).
Sun, H. et al. Multi-classification model incorporating radiomics and clinic-radiological features for predicting invasiveness and differentiation of pulmonary adenocarcinoma nodules. Biomed. Eng. Online. 22, 112. https://doi.org/10.1186/s12938-023-01180-1 (2023).
Mosch, B., Hagena, V., Herpertz, S. & Diers, M. Brain morphometric changes in fibromyalgia and the impact of psychometric and clinical factors: A volumetric and diffusion-tensor imaging study. Arthritis Res. Ther. 25, 81. https://doi.org/10.1186/s13075-023-03064-0 (2023).
Prak, R. F. et al. Fatigue in primary Sjögren’s syndrome is associated with an objective decline in physical performance, pain and depression. Clin. Exp. Rheumatol. 40, 2318–2328. https://doi.org/10.55563/clinexprheumatol/70s6cs (2022).
Beurel, E., Toups, M. & Nemeroff, C. B. The bidirectional relationship of depression and inflammation: Double trouble. Neuron 107, 234–256. https://doi.org/10.1016/j.neuron.2020.06.002 (2020).
Huang, M. et al. High rates of depression anxiety and suicidal ideation among inpatients in general hospital in China. Int. J. Psychiatry Clin. Pract. 23, 99–105. https://doi.org/10.1080/13651501.2018.1539179 (2019).
Lee, C. H. & Giuliani, F. The role of inflammation in depression and fatigue. Front. Immunol. 10, 1696. https://doi.org/10.3389/fimmu.2019.01696 (2019).
Tarasiuk, J. et al. Co-occurrence of fatigue and depression in people with multiple sclerosis: A mini-review. Front. Neurol. 12, 817256. https://doi.org/10.3389/fneur.2021.817256 (2021).
Nayebirad, S. et al. Association of anti-Ro52 autoantibody with interstitial lung disease in autoimmune diseases: A systematic review and meta-analysis. BMJ Open. Respir Res. 10. https://doi.org/10.1136/bmjresp-2023-002076 (2023).
Shen, Q. et al. Pain is a common problem in patients with ILD. Respir Res. 21, 297. https://doi.org/10.1186/s12931-020-01564-0 (2020).
Xu, A. et al. Prognostic values of anti-Ro52 antibodies in anti-MDA5-positive clinically amyopathic dermatomyositis associated with interstitial lung disease. Rheumatol. (Oxford). 60, 3343–3351. https://doi.org/10.1093/rheumatology/keaa786 (2021).
Bender, A. M., Babins-Wagner, R. & Laughton, A. 1086 Non-linear associations between depression and sleep duration in an International Sample of 16,997 respondents. Sleep. 43, A413-A413, (2020). https://doi.org/10.1093/sleep/zsaa056.1081 (2020).
Geoffroy, P. A., Tebeka, S., Blanco, C., Dubertret, C. & Le Strat, Y. Shorter and longer durations of sleep are associated with an increased twelve-month prevalence of psychiatric and substance use disorders: Findings from a nationally representative survey of US adults (NESARC-III). J. Psychiatr Res. 124, 34–41. https://doi.org/10.1016/j.jpsychires.2020.02.018 (2020).
Zhu, X. et al. Neutrophil/lymphocyte, platelet/lymphocyte, monocyte/lymphocyte ratios and systemic immune-inflammation index in patients with depression. Bratisl Lek Listy 124, 471–474. https://doi.org/10.4149/bll_2023_072 (2023).
Gasparyan, A. Y., Ayvazyan, L., Mukanova, U., Yessirkepov, M. & Kitas, G. D. The platelet-to-lymphocyte ratio as an inflammatory marker in rheumatic diseases. Ann. Lab. Med. 39, 345–357. https://doi.org/10.3343/alm.2019.39.4.345 (2019).
Danese, E., Montagnana, M., Favaloro, E. J. & Lippi, G. Drug-induced thrombocytopenia: Mechanisms and laboratory diagnostics. Semin Thromb. Hemost. 46, 264–274. https://doi.org/10.1055/s-0039-1697930 (2020).
Gansner, J. M. et al. Plateletpheresis-associated lymphopenia in frequent platelet donors. Blood 133, 605–614. https://doi.org/10.1182/blood-2018-09-873125 (2019).
Strawbridge, R., Young, A. H. & Cleare, A. J. Biomarkers for depression: Recent insights, current challenges and future prospects. Neuropsychiatr Dis. Treat. 13, 1245–1262. https://doi.org/10.2147/ndt.S114542 (2017).
Hammen, C. Risk factors for depression: An autobiographical review. Annu. Rev. Clin. Psychol. 14, 1–28. https://doi.org/10.1146/annurev-clinpsy-050817-084811 (2018).
Funding
This study was supported by National Natural Science Foundation of China [82173899].
Author information
Authors and Affiliations
Contributions
The study was designed by J.J.Z., D.L.S and L.L.Y. Machine learning model analysis data was performed by Y.Z.J. and W.L. D.L.S.,L.L.Y. and X.Q.W provided medical expertise in the interpretation of model outputs. X.Q.W were involved in the data preparation. The statistical analysis was conducted by Y.Q.Y. and Y.L.T. The manuscript was written by K.Z.H., L.L.Y., Y.Z.J and D.L.S. K.Z.H. polished the manuscript, which was critically reviewed by all authors. The author(s) read and approved the final manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Ethics approval and consent to participate
The retrospective study was conducted in accordance with the principles of the Declaration of Helsinki and approved by the Ethics Committee of Nanjing First Hospital. The ethics approval number is KY20240603-KS-02. Due to the retrospective nature of the study and the anonymity of the data, the requirement for informed consent was waived by the Ethics Committee of Nanjing First Hospital. All survey participants were informed that their participation was voluntary.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Yang, L., Jin, Y., Lu, W. et al. Application of machine learning in depression risk prediction for connective tissue diseases. Sci Rep 15, 1706 (2025). https://doi.org/10.1038/s41598-025-85890-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-025-85890-7
Keywords
This article is cited by
-
Behavioral, Psychological, and Physical Predictors of Adolescent Drug Use in South Korea: Insights Obtained Using Machine Learning
International Journal of Mental Health and Addiction (2025)