Application of machine learning in depression risk prediction for connective tissue diseases

Yang, Leilei; Jin, Yuzhan; Lu, Wei; Wang, Xiaoqin; Yan, Yuqing; Tong, Yulan; Su, Dinglei; Huang, Kaizong; Zou, Jianjun

doi:10.1038/s41598-025-85890-7

Download PDF

Article
Open access
Published: 11 January 2025

Application of machine learning in depression risk prediction for connective tissue diseases

Leilei Yang¹^na1,
Yuzhan Jin^2,3^na1,
Wei Lu^2,3^na1,
Xiaoqin Wang¹,
Yuqing Yan^2,3,
Yulan Tong^2,3,
Dinglei Su¹,
Kaizong Huang^3,4 &
…
Jianjun Zou^3,4

Scientific Reports volume 15, Article number: 1706 (2025) Cite this article

3143 Accesses
2 Citations
1 Altmetric
Metrics details

Subjects

Abstract

This study retrospectively collected clinical data from 480 patients with connective tissue diseases (CTDs) at Nanjing First Hospital between August 2019 and December 2023 to develop and validate a multi-classification machine learning (ML) model for assessing depression risk. Addressing the limitations of traditional assessment tools, six ML models were constructed using univariate analysis and the LASSO algorithm, with the categorical boosting (Catboost) model emerging as the best performer, demonstrating strong predictive ability across different depression severity levels (none_F1 = 0.879, mild_F1 = 0.627, moderate and severe_F1 = 0.588). Additionally, the study provided an interpretation of the best-performing model using SHAP and developed a user-friendly R Shiny application (https://macnomogram.shinyapps.io/Catboost/) to facilitate clinical use. The findings suggest that the Catboost model represents a significant advancement in assessing depression risk among CTD patients, highlighting the potential of ML in enhancing mental health management for this patient population.

Fine tuned CatBoost machine learning approach for early detection of cardiovascular disease through predictive modeling

Article Open access 25 August 2025

Development and interpretation of a machine learning risk prediction model for post-stroke depression in a Chinese population

Article Open access 05 August 2025

Effectiveness of machine learning models in diagnosis of heart disease: a comparative study

Article Open access 08 July 2025

Introduction

Depression, significantly prevalent globally, ranks as the second primary cause of disability and is a major risk factor for suicide^1,2. Connective tissue diseases (CTDs), including systemic lupus erythematosus (SLE), Sjögren’s syndrome (SS), rheumatoid arthritis (RA), and others, are autoimmune conditions that lead to both physical and emotional disturbances, notably depressive symptoms, thereby reducing life quality^3,4,5,6.

The underlying causes of depression in CTD patients are complex, potentially involving autoantibodies, genetic factors, inflammatory cytokines, and medication side effects such as glucocorticoids⁷. Evidence suggests that inflammation and specific autoantibodies in conditions like SLE significantly correlate with major depression^8,9. The cytokine hypothesis supports this, proposing that inflammation induced by various stressors plays a key role in depression¹⁰.

However, traditional tools for assessing depression, like the Patient Health Questionnaire-9 (PHQ-9), Hamilton Depression Rating Scale (HDRS), and Beck Depression Inventory (BDI), though widely used, have limitations. They demand considerable time and clinical expertise, underscoring the need for improved depression assessment methods in CTD patients^11,12,13. Amidst these challenges, machine learning (ML) is emerging as a promising tool in the medical field, particularly in psychiatric risk assessment. Yet, there is a research gap specifically concerning depression risk in CTD patients. Most existing studies rely on simple statistical models or descriptive analyses, which do not fully utilize the available data to uncover patterns related to depression risk^{14,15,16,17,18,19}.

This study aims to develop an advanced ML-based multi-classification model for assessing depression risk in CTD patients. By systematically analyzing comprehensive data including clinical, laboratory, and psychological metrics, we intend to identify critical predictors of depression risk. Advanced ML algorithms will then be used to create a predictive model that accurately assesses this risk, enhancing mental health management and treatment strategies for this patient group.

Materials and methods

Study design

The study protocol strictly follows the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) reporting guidelines²⁰. A detailed workflow chart of the study is presented in Fig. 1.

Participants

This study was conducted from August 1, 2019 to December 1, 2023 in the inpatient department of the Rheumatology and Immunology Department of Nanjing First Hospital. All subjects in this study were definitely diagnosed as CTDs, including SLE, SS, RA, systemic sclerosis (SSc), dermatomyositis (DM), polymyositis (PM) and mixed connective tissue disease (MCTD). The inclusion criteria were: (1) Chinese citizens aged ≥ 18 years, (2) voluntary participation in the survey and (3) being able to complete the PHQ-9 questionnaires independently. The exclusion criteria were: (1) prior medical history with severe mental ill- nesses, (2) prior use of medical drugs to treat mental disorders, (3) pregnant and lactating women, (4) medical history of malignant tumor and (5) disability caused by diseases other than CTDs.

Data collection

We collected comprehensive patient data in a structured format, including demographics (age, disease duration, education, sleep duration, gender, marital status), comorbidity history (including other CTDs, thyroid diseases, diabetes, chronic diseases, interstitial lung disease (ILD), pulmonary arterial hypertension (PAH), infection, and fibromyalgia), medication use (corticosteroids (represented as prednisone equivalent), biological agents, classical immunosuppressants (i.e. cyclophosphamide, mycophenolate mofetil, cyclosporine A, tacrolimus and azathioprine), conventional synthetic disease modifying anti-rheumatic drugs (csDMARDs) (i.e. methotrexate, leflunomide, hydroxychloroquine, sulfasalazine and iguratimod), and laboratory test results (WBC count, hemoglobin, platelet count, neutrophil count, lymphocyte count, autoantibodies, inflammatory cytokines and immunological parameters). For the assessment of pain, the Visual Analog Scale (VAS) is utilized, with a score of 0 indicating the absence of pain and a score of 10 reflecting the most severe, intolerable level of discomfort. Disease activity of SLE was assessed with Systemic Lupus Erythematosus Disease Activity Index (SLEDAI)−2000 and Systemic Lupus Erythematosus Disease Activity Score (SLEDAS), and disease activity of RA was assessed with Disease Activity Score (DAS) 28-CRP.

Depression assessment

We employed the PHQ-9 to assess depression severity in accordance with established diagnostic criteria. The PHQ-9 is a well-validated and reliable tool for measuring symptoms of major depressive episodes over the past two weeks²¹. It comprises nine items, each scored from 0 to 3 points, resulting in a total score ranging from 0 to 27 points. Depression severity was categorized based on standard PHQ-9 score cutoffs: 0–4 points = no depression, 5–9 points = mild depression, and 10 or greater points = moderate or severe depression. In this study, we formulated depression prediction as a three-class classification task: no depression (PHQ-9 score 0–4), mild depression (PHQ-9 score 5–9), and moderate or severe depression (PHQ-9 score ≥ 10).

Data preprocessing

The raw data underwent a comprehensive preprocessing to address missing values, data leakage, and feature standardization. Samples and features with missing values exceeding 20% and 25%, respectively, were excluded. To prevent data leakage, the dataset was split into training (80%) and test (20%) sets using the holdout method. The training set was used for feature selection, model training, and hyperparameter tuning. Based on the principle of maximizing the accuracy of model, hyperparameters were determined on the training set using the Grid search algorithm with 5-fold cross validation. The test set was only used to verify the generation of our proposed models.

Feature selection

A two-step feature selection process was employed. First, univariate analysis identified statistically significant features (p< 0.05) potentially associated with depression. Subsequently, the LASSO algorithm with hyperparameter lambda (λ) was applied to the training set to further refine the feature subset. LASSO shrinks the coefficients of irrelevant features to zero, resulting in the selection of features with non-zero coefficients²².

Data imputation and normalization

Missing values were imputed using K-Nearest Neighbors (KNN) for continuous variables and mode imputation for categorical variables in both the training and test sets. Additionally, continuous features were normalized using Z-score standardization, while multi-categorical features were one-hot encoded to create dummy variables, prior to LASSO-based feature selection.

ML models

Following feature selection, we developed and evaluated six multi-classification ML models to predict depression severity: logistic regression (LR), support vector machine (SVM), random forest classifier (RFC), light gradient boosting machine (LGBM), categorical boosting (Catboost), and artificial neural network (ANN). All models have incorporated 10-fold cross validation during the training process for thorough training and evaluation. When training each model, we address the class imbalance problem by adjusting the parameter of class weight. For LR, SVM, RFC, LGBM, and Catboost, the algorithms internally select the optimal strategy (one-vs-one or one-vs-rest) for handling multi-classification tasks based on the data characteristics. To prevent overfitting in the ANN model, we introduced dropout layers and early stopping during the training process. All models were implemented using the scikit-learn (version 1.1.2), lightgbm (version 3.3.2), Catboost (version 1.1.1), and Keras (version 2.9.0) libraries in Python (version 3.9.12).

Model evaluation

The hold-out test set was used exclusively to evaluate the generalizability of the developed models. Considering the inherent characteristics of the multi-classification task, some common binary-classification evaluation metrics (e.g., the area under the curve (AUC) and F1-score) are not directly applicable to multi-class problems. Therefore, we employed a comprehensive suite of evaluation metrics to assess multi-classification model performance. This included: (1) Confusion Matrix: Visualized the distribution of true and predicted depression classifications across categories (no, mild, moderate/severe); (2) Multi-class ROC Curves: Illustrated the trade-off between sensitivity and specificity for each depression class; (3) Macro-average AUC: Evaluated overall model performance by averaging AUC scores across all depression categories; (4) Kappa Statistic: Assessed model agreement with the true classifications, accounting for class imbalance; (5) Average Precision, Recall, and F1-score. Although CTD patients experience varying levels of depression in real-world scenarios, leading to class imbalance, it is important to recognize that all prediction levels should not be treated equally in terms of their predictive value. In a clinical setting, the ideal approach would be to develop a predictive model that is more effective at identifying depression-positive cases, such as mild, moderate, and severe cases.

Model interpretation

The SHapley Additive exPlanations (SHAP) technique, demonstrating each variable’s influence on the overall model, was further applied to gain insight into the best-performance model²³. We utilized the SHAP technique to evaluate the contribution of features in the best-performing model to the prediction outcomes of the test set. In the three-class classification task, the model’s objective is to accurately assign samples to one of the three categories. Consequently, we calculated the SHAP values for each category. For each sample, the SHAP value is represented as a matrix of size (1, k, 3), where each matrix element corresponds to the contribution of a feature to a specific category (with k representing the number of features in the model). We generated separate SHAP bar and dot plots for each category using the test set data. Furthermore, to visualize the distribution of key features across different depression severities in a sample of patients, we generated a grouped-clustered heatmap for ten randomly selected CTD patients using the “pheatmap” package in R (version 4.2.1). Finally, for enhanced clinical usability, the best-performance multi-classification model was deployed as an R Shiny application, facilitating user-friendly model interaction.

Statistical analysis

This study aimed to compare the differences in demographic, clinical, and laboratory characteristics across three levels of depression. The initial step involved evaluating the normality of continuous variables utilizing the Shapiro-Wilk test. Continuous variables that adhered to a normal distribution were presented as mean ± standard deviation (SD) and were analyzed using the Chi-square test. In contrast, those not following normal distribution were depicted as median ± interquartile range (IQR) and assessed with the Kruskal-Wallis test. Subsequently, categorical variables were compared using either the Chi-square (χ²) or Fisher’s exact test, depending on their distribution and sample size. A p-value of less than 0.05 was considered statistically significant for all tests. The entire statistical analysis process was performed using the “compareGroups” package in R version 4.2.1.

Results

Participants’ characteristics

This study recruited a total of 500 patients, of which 480 patients with confirmed diagnoses were deemed eligible. The median age of the participants was 58.5 years (IQR: 48.8–68 years), with a predominance of female patients, accounting for 87.9%. These 480 patients with CTDs were randomly divided into two groups: the training cohort (n = 384) and the test cohort (n = 96). Comparative analysis of baseline characteristics between the two cohorts indicated a general balance, as detailed in Supplementary Table S1.

Depression, the outcome of this study, was categorized into three levels: no depression, mild depression, and moderate or severe depression. Within the training cohort, the prevalence rates for these categories were 48.2% (n = 185/384), 31.5% (n = 121/384), and 20.3% (n = 78/384), respectively. Correspondingly, in the test cohort, the prevalence rates were 59.4% (n = 57/96), 24.0% (n = 23/96), and 16.6% (n = 16/96), respectively.

Feature selection

In this research, forty-six potential variables related to depression in patients with CTDs were initially considered. Of these, twenty-three variables were identified as having missing data. Twenty-three variables had missing values, and six variables with missing values over 25% were excluded from the analysis (Supplementary Table S2). Univariate analysis (Supplementary Table S3) subsequently pinpointed eleven variables markedly associated with the depression outcome, including sleep duration, presence of other CTDs, ILD, PAH, infection, fibromyalgia, fatigue, use of classic immune preparations, platelet count, lymphocyte count, and anti-Ro52 antibodies.

These identified eleven variables were further analyzed using the LASSO algorithm to discern the optimal subset. The LASSO algorithm ultimately determined that five variables had a significant association with the multi-classification depression outcome: fatigue, anti-Ro52 antibodies, sleep duration, platelet count, and lymphocyte count.

Model performance

The aforementioned five variables were used to construct the following six different multi-classification ML models: LR, SVM, RFC, LGBM, Catboost, and ANN. The optimal hyperparameters for each model are documented in Supplementary Table S5. Supplementary Table S4 and Figure S1 show the performance metrics of these models within the training cohort. Figure S2 shows the confusion matrix of each multi-class model on the training cohort. Notably, except for the LR and the ANN models, all other models demonstrated superior performance, ranking in the top four across various metrics, including mild_F1, moderate and severe_F1. These models have the potential to correctly detect more positive cases of CTD related-depression.

Furthermore, the validity of the proposed models was assessed using the test cohort. The performance of the six models within the test cohort is detailed in Table 1 and illustrated in Fig. 2. Confusion matrix of the six multi-classification ML models in the test cohort is shown in Fig. 3. Among these, the Catboost model exhibited the highest levels of mild_F1, moderate and severe_F1. These results highlight the robust performance of the Catboost model and its effectiveness in classifying depression levels in patients with CTDs.

Table 1 The performance of the following six multi-classification ML models in the test cohort (n = 96).

Full size table

Model interpretation

The Catboost model was explained through the application of the SHAP algorithm, which calculated the average absolute SHAP values to quantify the impact of each variable.

Figure 4c illustrate the distributions of feature importance analyses and variable impact model outputs for the Catboost model across the none, mild, moderate, and severe depression categories. Notably, the analysis highlights that higher levels of fatigue, presence of anti-Ro52 antibodies, reduced sleep duration, lower platelet counts, and decreased lymphocyte counts were associated with increased SHAP values. This suggests these factors correlate with an elevated risk of depression, as indicated by the model. Additionally, Fig. 4d provides a comparative visualization of the contributions of these five critical variables across a subset of ten randomly selected patients from the test cohort.

Model deployment

To enhance the practical applicability of our proposed Catboost model in clinical settings, the model was deployed to the cloud, making it accessible to healthcare professionals for real-time depression risk assessment in patients with CTDs. The model is hosted on a web-based platform available at https://macnomogram.shinyapps.io/Catboost/. The user interface of this online tool is illustrated in Figure S3, designed to provide a user-friendly experience for clinicians and researchers alike.

Discussion

This study introduces the ANN model as a predictive tool for assessing the risk of depression, demonstrating superior performance over competing models across multiple metrics such as AUC, accuracy, precision, recall, F1 score, and kappa coefficient. The model’s robustness in parsing complex medical data reveal its potential for accurate depression risk predictions.

Unlike traditional tools such as the PHQ-9 and BDI, the ANN model integrates and analyzes a broad range of clinical and laboratory data, automating the analysis process. This circumvents the need for manual scoring by clinicians, facilitating quicker, more objective assessments. The development of a web-based calculator enhances the model’s clinical utility, providing healthcare professionals with a readily accessible tool.

This research contributes to the literature on ML applications in refining depression risk assessment strategies, expanding upon previous studies^15,16. It acknowledges depression’s multifactorial nature, arising from various factors²⁴, which manifest the need for comprehensive approaches to understanding and managing this condition. The ANN model marks a notable advancement in applying ML to mental health, offering a nuanced and efficient method for identifying at-risk individuals^25,26.

The relationship between fatigue and depression is intricate, influenced by various biological and psychological factors. Recent researches indicate neuroinflammation’s central role in these concurrent conditions²⁷, with further studies linking immune dysregulation in multiple sclerosis patients to both conditions²⁷. Tarasiuk et al. propose that inflammation, oxidative stress, and neurodegeneration contribute to the convergence of fatigue and depression, suggesting comprehensive treatment strategies focusing on inflammatory processes and brain arousal systems to prevent progression from fatigue to depression²⁸.

In recent years, anti-Ro52 has been found in a variety of CTDs and has attracted widespread attention from rheumatologists. A large number of studies have confirmed that anti-Ro52 is associated with higher frequency of ILD in CTD patients²⁹. Interestingly, patients with ILD are more prone to depression³⁰. Further studies have explored the role of anti-Ro52 antibodies in psychiatric symptoms among CTD patients, identifying them as potential biomarkers for heightened risk of depression and anxiety⁹. Xu et al. investigated their prognostic meaning in specific autoimmune conditions, linking these antibodies to more severe disease trajectories and potential psychiatric symptom development³¹.

Recent studies display the complex and bidirectional relationship between sleep duration and depression, showing that both insufficient and excessive sleep can increase the risk of depressive disorders. Bender et al. identified a U-shaped correlation between sleep duration and depression risk, revealing that individuals who sleep less than seven hours or more than nine hours nightly face a heightened risk of depression³². This observation is supported by Geoffroy et al., who also found a U-shaped link between sleep duration and mental health outcomes. This reinforces the importance of promoting adequate sleep as a preventative strategy against psychiatric conditions³³.

Recent research is illuminating the complex connections between blood cell counts—specifically platelets and lymphocytes—and depression, positing that systemic inflammation may mediate this relationship. Zhu et al. have reported findings that link elevated levels of inflammatory markers, such as the monocyte-to-lymphocyte ratio (MLR), to individuals with depression³⁴. This link to systemic inflammation suggests these markers could serve as potential biomarkers for depression, bolstering the theory that inflammation plays a critical role in the pathophysiology of the condition. Additionally, the relevance of another inflammatory marker, the platelet-to-lymphocyte ratio (PLR), has been explored by researchers like Gasparyan et al. and Danese et al.^35,36. Furthermore, Gansner et al. discussed the effects of plateletpheresis a common procedure in blood donation on the immune system, highlighting its potential implications for inflammatory processes³⁷. This body of research could lead to the identification of novel biomarkers or therapeutic targets for managing depression.

This study not only builds upon our previous research identifying specific antibodies as depression risk factors in CTD patients⁹, but also integrates these findings into a predictive ML model. Rather than defining depressive outcomes as a simple yes or no, our multi-classification model opens up the possibility of more precise risk stratification of patients (no depression, mild depression, moderate and severe depression), allowing clinicians to target interventions more precisely based on the different predictions for each patient. In addition, this model also offers clinicians new insights for enhancing mental health management and optimizing treatment strategies. Future studies will aim to improve feature extraction and selection methods and expand the model’s integration into broader medical information systems to support automated risk assessments and early intervention.

Our study also has some limitations, including potential biases from the single-center data collection and the dependence of ML model performance on data quality and feature selection. The exclusion of variables with excessive missing data points, such as certain cytokines might have affected the model’s performance³⁸. In addition, there are many risk factors for depression, including dysfunctional cognitions, stressful life events and circumstances, parental depression, interpersonal dysfunction³⁹, these variables were not included in the study due to resource constraints, challenges in data collection and measurement, and interactions and complexities among variables.

Conclusions

This study introduces a novel multi-classification ML-based model, employing the ANN algorithm, for the assessment of depression risk in patients with CTDs. Through rigorous analysis of multidimensional patient data, including clinical information, laboratory test results, and psychological assessments, we identified key variables substantially associated with depression risk. The ANN model outperformed existing assessment tools and other ML models in accuracy and efficiency. The utilization of SHAP for model interpretation further elucidates the contributions of specific variables to depression risk, enhancing the model’s clinical relevance. Deployed as an R Shiny application, the model offers a practical tool for healthcare professionals, facilitating early identification and management of depression in CTD patients. Despite its promising results, future research should focus on expanding the model’s generalizability through multi-center studies and integrating additional biomarkers to refine its predictive capability. This study displays the potential of advanced ML techniques in transforming the approach to mental health assessment within the medical field, particularly for patients with complex autoimmune diseases.

Data availability

Data presented in this study were obtained from patients admitted to the Department of Rheumatology and Immunology, Nanjing first hospital who consented to participate in the study. Owing to data protection rules, we are not allowed to share personal-level data. The datasets used and/or analysed during the current study available from the corresponding author on reasonable request.

References

Lim, G. Y. et al. Prevalence of depression in the community from 30 countries between 1994 and 2014. Sci. Rep. 8, 2861. https://doi.org/10.1038/s41598-018-21243-x (2018).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
World Health Organization. Suicide worldwide in 2019: Global health estimates. iv, 28 (World Health Organization, 2021).
Kósa, F. et al. High risk of depression, anxiety, and an unfavorable complex comorbidity profile is associated with SLE: A nationwide patient-level study. Arthritis Res. Ther. 24, 116. https://doi.org/10.1186/s13075-022-02799-6 (2022).
Article PubMed PubMed Central MATH Google Scholar
Shen, C. C., Yang, A. C., Kuo, B. I. & Tsai, S. J. Risk of psychiatric disorders following primary sjögren syndrome: A nationwide population-based retrospective cohort study. J. Rheumatol. 42, 1203–1208. https://doi.org/10.3899/jrheum.141361 (2015).
Article PubMed MATH Google Scholar
Costa, T., Rushton, S. P., Watson, S. & Ng, W. F. Depression in Sjögren’s syndrome mediates the relationship between pain, fatigue, sleepiness, and overall quality of life. Rheumatol. Immunol. Res. 4, 78–89. https://doi.org/10.2478/rir-2023-0012 (2023).
Article PubMed PubMed Central MATH Google Scholar
Didier, K. et al. Autoantibodies associated with connective tissue diseases: What meaning for clinicians? Front. Immunol. 9, 541. https://doi.org/10.3389/fimmu.2018.00541 (2018).
Article CAS PubMed PubMed Central MATH Google Scholar
Ayres, A. et al. Cognitive performance in patients with myasthenia gravis: An association with glucocorticosteroid use and depression. Dement. Neuropsychol. 14, 315–323. https://doi.org/10.1590/1980-57642020dn14-030013 (2020).
Article PubMed PubMed Central MATH Google Scholar
Leng, Q. et al. Anti-ribosomal P protein antibodies and insomnia correlate with depression and anxiety in patients suffering from systemic lupus erythematosus. Heliyon 9, e15463. https://doi.org/10.1016/j.heliyon.2023.e15463 (2023).
Article CAS PubMed PubMed Central Google Scholar
Yang, L. et al. Association of anti-Ro52 antibody with depression and anxiety in patients with connective tissue diseases: An observational, single-centre, cross-sectional study. Clin. Exp. Rheumatol. https://doi.org/10.55563/clinexprheumatol/be9n92 (2023).
Article PubMed Google Scholar
Grygiel-Górniak, B., Limphaibool, N. & Puszczewicz, M. Cytokine secretion and the risk of depression development in patients with connective tissue diseases. Psychiatry Clin. Neurosci. 73, 302–316. https://doi.org/10.1111/pcn.12826 (2019).
Article CAS PubMed Google Scholar
Fusar-Poli, P. et al. Prevention of psychosis: Advances in detection, prognosis, and intervention. JAMA Psychiatry. 77, 755–765. https://doi.org/10.1001/jamapsychiatry.2019.4779 (2020).
Article PubMed MATH Google Scholar
Ho, R. C., Mak, K. K., Chua, A. N., Ho, C. S. & Mak, A. The effect of severity of depressive disorder on economic burden in a university hospital in Singapore. Expert Rev. Pharmacoecon. Outcomes Res. 13, 549–559. https://doi.org/10.1586/14737167.2013.815409 (2013).
Article PubMed MATH Google Scholar
Acharya, U. R. et al. Automated EEG-based screening of depression using deep convolutional neural network. Comput. Methods Programs Biomed. 161, 103–113. https://doi.org/10.1016/j.cmpb.2018.04.012 (2018).
Article PubMed MATH Google Scholar
Nickson, D., Meyer, C., Walasek, L. & Toro, C. Prediction and diagnosis of depression using machine learning with electronic health records data: A systematic review. BMC Med. Inf. Decis. Mak. 23, 271. https://doi.org/10.1186/s12911-023-02341-x (2023).
Article Google Scholar
Hossain, M. M., Asadullah, M., Hossain, M. A. & Amin, M. S. Prediction of depression using machine learning tools taking consideration of oversampling. Malays. J. Public. Health Med. 22, 244–253. https://doi.org/10.37268/mjphm/22/2/art.1564 (2022).
Article MATH Google Scholar
Muzafar, D., Khan, F. & Qayoom, M. Machine learning algorithms for depression detection and their comparison. https://doi.org/10.48550/arXiv.2301.03222 (2023).
Dong, C. et al. SVM-based model combining patients’ reported outcomes and lymphocyte phenotypes of depression in systemic lupus erythematosus. Biomolecules 13. https://doi.org/10.3390/biom13050723 (2023).
Jiang, W., Wang, X., Tao, D. & Zhao, X. Identification of common genetic characteristics of rheumatoid arthritis and major depressive disorder by bioinformatics analysis and machine learning. Front. Immunol. 14, 1183115. https://doi.org/10.3389/fimmu.2023.1183115 (2023).
Article CAS PubMed PubMed Central Google Scholar
Liao, J. et al. A cross-sectional study on the association of anxiety and depression with the disease activity of systemic lupus erythematosus. BMC Psychiatry. 22, 591. https://doi.org/10.1186/s12888-022-04236-z (2022).
Article PubMed PubMed Central MATH Google Scholar
Collins, G. S., Reitsma, J. B., Altman, D. G. & Moons, K. G. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): The TRIPOD statement. BMJ 350, g7594. https://doi.org/10.1136/bmj.g7594 (2015).
Article CAS PubMed Google Scholar
Costantini, L. et al. Screening for depression in primary care with patient health questionnaire-9 (PHQ-9): A systematic review. J. Affect. Disord. 279, 473–483. https://doi.org/10.1016/j.jad.2020.09.131 (2021).
Article PubMed MATH Google Scholar
Sun, H. et al. Multi-classification model incorporating radiomics and clinic-radiological features for predicting invasiveness and differentiation of pulmonary adenocarcinoma nodules. Biomed. Eng. Online. 22, 112. https://doi.org/10.1186/s12938-023-01180-1 (2023).
Article PubMed PubMed Central MATH Google Scholar
Mosch, B., Hagena, V., Herpertz, S. & Diers, M. Brain morphometric changes in fibromyalgia and the impact of psychometric and clinical factors: A volumetric and diffusion-tensor imaging study. Arthritis Res. Ther. 25, 81. https://doi.org/10.1186/s13075-023-03064-0 (2023).
Article PubMed PubMed Central Google Scholar
Prak, R. F. et al. Fatigue in primary Sjögren’s syndrome is associated with an objective decline in physical performance, pain and depression. Clin. Exp. Rheumatol. 40, 2318–2328. https://doi.org/10.55563/clinexprheumatol/70s6cs (2022).
Article PubMed MATH Google Scholar
Beurel, E., Toups, M. & Nemeroff, C. B. The bidirectional relationship of depression and inflammation: Double trouble. Neuron 107, 234–256. https://doi.org/10.1016/j.neuron.2020.06.002 (2020).
Article CAS PubMed PubMed Central Google Scholar
Huang, M. et al. High rates of depression anxiety and suicidal ideation among inpatients in general hospital in China. Int. J. Psychiatry Clin. Pract. 23, 99–105. https://doi.org/10.1080/13651501.2018.1539179 (2019).
Article ADS CAS PubMed MATH Google Scholar
Lee, C. H. & Giuliani, F. The role of inflammation in depression and fatigue. Front. Immunol. 10, 1696. https://doi.org/10.3389/fimmu.2019.01696 (2019).
Article CAS PubMed PubMed Central MATH Google Scholar
Tarasiuk, J. et al. Co-occurrence of fatigue and depression in people with multiple sclerosis: A mini-review. Front. Neurol. 12, 817256. https://doi.org/10.3389/fneur.2021.817256 (2021).
Article PubMed MATH Google Scholar
Nayebirad, S. et al. Association of anti-Ro52 autoantibody with interstitial lung disease in autoimmune diseases: A systematic review and meta-analysis. BMJ Open. Respir Res. 10. https://doi.org/10.1136/bmjresp-2023-002076 (2023).
Shen, Q. et al. Pain is a common problem in patients with ILD. Respir Res. 21, 297. https://doi.org/10.1186/s12931-020-01564-0 (2020).
Article PubMed PubMed Central MATH Google Scholar
Xu, A. et al. Prognostic values of anti-Ro52 antibodies in anti-MDA5-positive clinically amyopathic dermatomyositis associated with interstitial lung disease. Rheumatol. (Oxford). 60, 3343–3351. https://doi.org/10.1093/rheumatology/keaa786 (2021).
Article CAS MATH Google Scholar
Bender, A. M., Babins-Wagner, R. & Laughton, A. 1086 Non-linear associations between depression and sleep duration in an International Sample of 16,997 respondents. Sleep. 43, A413-A413, (2020). https://doi.org/10.1093/sleep/zsaa056.1081 (2020).
Geoffroy, P. A., Tebeka, S., Blanco, C., Dubertret, C. & Le Strat, Y. Shorter and longer durations of sleep are associated with an increased twelve-month prevalence of psychiatric and substance use disorders: Findings from a nationally representative survey of US adults (NESARC-III). J. Psychiatr Res. 124, 34–41. https://doi.org/10.1016/j.jpsychires.2020.02.018 (2020).
Article PubMed Google Scholar
Zhu, X. et al. Neutrophil/lymphocyte, platelet/lymphocyte, monocyte/lymphocyte ratios and systemic immune-inflammation index in patients with depression. Bratisl Lek Listy 124, 471–474. https://doi.org/10.4149/bll_2023_072 (2023).
Article PubMed MATH Google Scholar
Gasparyan, A. Y., Ayvazyan, L., Mukanova, U., Yessirkepov, M. & Kitas, G. D. The platelet-to-lymphocyte ratio as an inflammatory marker in rheumatic diseases. Ann. Lab. Med. 39, 345–357. https://doi.org/10.3343/alm.2019.39.4.345 (2019).
Article CAS PubMed PubMed Central Google Scholar
Danese, E., Montagnana, M., Favaloro, E. J. & Lippi, G. Drug-induced thrombocytopenia: Mechanisms and laboratory diagnostics. Semin Thromb. Hemost. 46, 264–274. https://doi.org/10.1055/s-0039-1697930 (2020).
Article CAS PubMed Google Scholar
Gansner, J. M. et al. Plateletpheresis-associated lymphopenia in frequent platelet donors. Blood 133, 605–614. https://doi.org/10.1182/blood-2018-09-873125 (2019).
Article CAS PubMed PubMed Central MATH Google Scholar
Strawbridge, R., Young, A. H. & Cleare, A. J. Biomarkers for depression: Recent insights, current challenges and future prospects. Neuropsychiatr Dis. Treat. 13, 1245–1262. https://doi.org/10.2147/ndt.S114542 (2017).
Article CAS PubMed PubMed Central Google Scholar
Hammen, C. Risk factors for depression: An autobiographical review. Annu. Rev. Clin. Psychol. 14, 1–28. https://doi.org/10.1146/annurev-clinpsy-050817-084811 (2018).
Article ADS PubMed MATH Google Scholar

Download references

Funding

This study was supported by National Natural Science Foundation of China [82173899].

Author information

Leilei Yang, Yuzhan Jin and Wei Lu contributed equally to this work.

Authors and Affiliations

Department of Rheumatology and Immunology, Nanjing First Hospital, Nanjing Medical University, Nanjing, China
Leilei Yang, Xiaoqin Wang & Dinglei Su
School of Basic Medicine and Clinical Pharmacy, China Pharmaceutical University, Nanjing, China
Yuzhan Jin, Wei Lu, Yuqing Yan & Yulan Tong
Department of Clinical Pharmacology, Nanjing First Hospital, Nanjing Medical University, Nanjing, China
Yuzhan Jin, Wei Lu, Yuqing Yan, Yulan Tong, Kaizong Huang & Jianjun Zou
Department of Pharmacy, Nanjing First Hospital, China Pharmaceutical University, Nanjing, China
Kaizong Huang & Jianjun Zou

Authors

Leilei Yang
View author publications
Search author on:PubMed Google Scholar
Yuzhan Jin
View author publications
Search author on:PubMed Google Scholar
Wei Lu
View author publications
Search author on:PubMed Google Scholar
Xiaoqin Wang
View author publications
Search author on:PubMed Google Scholar
Yuqing Yan
View author publications
Search author on:PubMed Google Scholar
Yulan Tong
View author publications
Search author on:PubMed Google Scholar
Dinglei Su
View author publications
Search author on:PubMed Google Scholar
Kaizong Huang
View author publications
Search author on:PubMed Google Scholar
Jianjun Zou
View author publications
Search author on:PubMed Google Scholar

Contributions

The study was designed by J.J.Z., D.L.S and L.L.Y. Machine learning model analysis data was performed by Y.Z.J. and W.L. D.L.S.,L.L.Y. and X.Q.W provided medical expertise in the interpretation of model outputs. X.Q.W were involved in the data preparation. The statistical analysis was conducted by Y.Q.Y. and Y.L.T. The manuscript was written by K.Z.H., L.L.Y., Y.Z.J and D.L.S. K.Z.H. polished the manuscript, which was critically reviewed by all authors. The author(s) read and approved the final manuscript.

Corresponding authors

Correspondence to Dinglei Su, Kaizong Huang or Jianjun Zou.

Ethics declarations

Competing interests

The authors declare no competing interests.

Ethics approval and consent to participate

The retrospective study was conducted in accordance with the principles of the Declaration of Helsinki and approved by the Ethics Committee of Nanjing First Hospital. The ethics approval number is KY20240603-KS-02. Due to the retrospective nature of the study and the anonymity of the data, the requirement for informed consent was waived by the Ethics Committee of Nanjing First Hospital. All survey participants were informed that their participation was voluntary.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Yang, L., Jin, Y., Lu, W. et al. Application of machine learning in depression risk prediction for connective tissue diseases. Sci Rep 15, 1706 (2025). https://doi.org/10.1038/s41598-025-85890-7

Download citation

Received: 01 October 2024
Accepted: 07 January 2025
Published: 11 January 2025
DOI: https://doi.org/10.1038/s41598-025-85890-7

Keywords

This article is cited by

Behavioral, Psychological, and Physical Predictors of Adolescent Drug Use in South Korea: Insights Obtained Using Machine Learning
- Jun-hee Kim
International Journal of Mental Health and Addiction (2025)

Subjects

Abstract

Similar content being viewed by others

Fine tuned CatBoost machine learning approach for early detection of cardiovascular disease through predictive modeling

Development and interpretation of a machine learning risk prediction model for post-stroke depression in a Chinese population

Effectiveness of machine learning models in diagnosis of heart disease: a comparative study

Introduction

Materials and methods

Study design

Participants

Data collection

Depression assessment

Data preprocessing

Feature selection

Data imputation and normalization

ML models

Model evaluation

Model interpretation

Statistical analysis

Results

Participants’ characteristics

Feature selection

Model performance

Model interpretation

Model deployment

Discussion

Conclusions

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Ethics approval and consent to participate

Additional information

Publisher’s note

Electronic supplementary material

Supplementary Material 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

This article is cited by

Behavioral, Psychological, and Physical Predictors of Adolescent Drug Use in South Korea: Insights Obtained Using Machine Learning

Search

Quick links