Introduction

Depression, significantly prevalent globally, ranks as the second primary cause of disability and is a major risk factor for suicide1,2. Connective tissue diseases (CTDs), including systemic lupus erythematosus (SLE), Sjögren’s syndrome (SS), rheumatoid arthritis (RA), and others, are autoimmune conditions that lead to both physical and emotional disturbances, notably depressive symptoms, thereby reducing life quality3,4,5,6.

The underlying causes of depression in CTD patients are complex, potentially involving autoantibodies, genetic factors, inflammatory cytokines, and medication side effects such as glucocorticoids7. Evidence suggests that inflammation and specific autoantibodies in conditions like SLE significantly correlate with major depression8,9. The cytokine hypothesis supports this, proposing that inflammation induced by various stressors plays a key role in depression10.

However, traditional tools for assessing depression, like the Patient Health Questionnaire-9 (PHQ-9), Hamilton Depression Rating Scale (HDRS), and Beck Depression Inventory (BDI), though widely used, have limitations. They demand considerable time and clinical expertise, underscoring the need for improved depression assessment methods in CTD patients11,12,13. Amidst these challenges, machine learning (ML) is emerging as a promising tool in the medical field, particularly in psychiatric risk assessment. Yet, there is a research gap specifically concerning depression risk in CTD patients. Most existing studies rely on simple statistical models or descriptive analyses, which do not fully utilize the available data to uncover patterns related to depression risk14,15,16,17,18,19.

This study aims to develop an advanced ML-based multi-classification model for assessing depression risk in CTD patients. By systematically analyzing comprehensive data including clinical, laboratory, and psychological metrics, we intend to identify critical predictors of depression risk. Advanced ML algorithms will then be used to create a predictive model that accurately assesses this risk, enhancing mental health management and treatment strategies for this patient group.

Materials and methods

Study design

The study protocol strictly follows the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) reporting guidelines20. A detailed workflow chart of the study is presented in Fig. 1.

Fig. 1
figure 1

Overview of the study workflow.

Participants

This study was conducted from August 1, 2019 to December 1, 2023 in the inpatient department of the Rheumatology and Immunology Department of Nanjing First Hospital. All subjects in this study were definitely diagnosed as CTDs, including SLE, SS, RA, systemic sclerosis (SSc), dermatomyositis (DM), polymyositis (PM) and mixed connective tissue disease (MCTD). The inclusion criteria were: (1) Chinese citizens aged ≥ 18 years, (2) voluntary participation in the survey and (3) being able to complete the PHQ-9 questionnaires independently. The exclusion criteria were: (1) prior medical history with severe mental ill- nesses, (2) prior use of medical drugs to treat mental disorders, (3) pregnant and lactating women, (4) medical history of malignant tumor and (5) disability caused by diseases other than CTDs.

Data collection

We collected comprehensive patient data in a structured format, including demographics (age, disease duration, education, sleep duration, gender, marital status), comorbidity history (including other CTDs, thyroid diseases, diabetes, chronic diseases, interstitial lung disease (ILD), pulmonary arterial hypertension (PAH), infection, and fibromyalgia), medication use (corticosteroids (represented as prednisone equivalent), biological agents, classical immunosuppressants (i.e. cyclophosphamide, mycophenolate mofetil, cyclosporine A, tacrolimus and azathioprine), conventional synthetic disease modifying anti-rheumatic drugs (csDMARDs) (i.e. methotrexate, leflunomide, hydroxychloroquine, sulfasalazine and iguratimod), and laboratory test results (WBC count, hemoglobin, platelet count, neutrophil count, lymphocyte count, autoantibodies, inflammatory cytokines and immunological parameters). For the assessment of pain, the Visual Analog Scale (VAS) is utilized, with a score of 0 indicating the absence of pain and a score of 10 reflecting the most severe, intolerable level of discomfort. Disease activity of SLE was assessed with Systemic Lupus Erythematosus Disease Activity Index (SLEDAI)−2000 and Systemic Lupus Erythematosus Disease Activity Score (SLEDAS), and disease activity of RA was assessed with Disease Activity Score (DAS) 28-CRP.

Depression assessment

We employed the PHQ-9 to assess depression severity in accordance with established diagnostic criteria. The PHQ-9 is a well-validated and reliable tool for measuring symptoms of major depressive episodes over the past two weeks21. It comprises nine items, each scored from 0 to 3 points, resulting in a total score ranging from 0 to 27 points. Depression severity was categorized based on standard PHQ-9 score cutoffs: 0–4 points = no depression, 5–9 points = mild depression, and 10 or greater points = moderate or severe depression. In this study, we formulated depression prediction as a three-class classification task: no depression (PHQ-9 score 0–4), mild depression (PHQ-9 score 5–9), and moderate or severe depression (PHQ-9 score ≥ 10).

Data preprocessing

The raw data underwent a comprehensive preprocessing to address missing values, data leakage, and feature standardization. Samples and features with missing values exceeding 20% and 25%, respectively, were excluded. To prevent data leakage, the dataset was split into training (80%) and test (20%) sets using the holdout method. The training set was used for feature selection, model training, and hyperparameter tuning. Based on the principle of maximizing the accuracy of model, hyperparameters were determined on the training set using the Grid search algorithm with 5-fold cross validation. The test set was only used to verify the generation of our proposed models.

Feature selection

A two-step feature selection process was employed. First, univariate analysis identified statistically significant features (p< 0.05) potentially associated with depression. Subsequently, the LASSO algorithm with hyperparameter lambda (λ) was applied to the training set to further refine the feature subset. LASSO shrinks the coefficients of irrelevant features to zero, resulting in the selection of features with non-zero coefficients22.

Data imputation and normalization

Missing values were imputed using K-Nearest Neighbors (KNN) for continuous variables and mode imputation for categorical variables in both the training and test sets. Additionally, continuous features were normalized using Z-score standardization, while multi-categorical features were one-hot encoded to create dummy variables, prior to LASSO-based feature selection.

ML models

Following feature selection, we developed and evaluated six multi-classification ML models to predict depression severity: logistic regression (LR), support vector machine (SVM), random forest classifier (RFC), light gradient boosting machine (LGBM), categorical boosting (Catboost), and artificial neural network (ANN). All models have incorporated 10-fold cross validation during the training process for thorough training and evaluation. When training each model, we address the class imbalance problem by adjusting the parameter of class weight. For LR, SVM, RFC, LGBM, and Catboost, the algorithms internally select the optimal strategy (one-vs-one or one-vs-rest) for handling multi-classification tasks based on the data characteristics. To prevent overfitting in the ANN model, we introduced dropout layers and early stopping during the training process. All models were implemented using the scikit-learn (version 1.1.2), lightgbm (version 3.3.2), Catboost (version 1.1.1), and Keras (version 2.9.0) libraries in Python (version 3.9.12).

Model evaluation

The hold-out test set was used exclusively to evaluate the generalizability of the developed models. Considering the inherent characteristics of the multi-classification task, some common binary-classification evaluation metrics (e.g., the area under the curve (AUC) and F1-score) are not directly applicable to multi-class problems. Therefore, we employed a comprehensive suite of evaluation metrics to assess multi-classification model performance. This included: (1) Confusion Matrix: Visualized the distribution of true and predicted depression classifications across categories (no, mild, moderate/severe); (2) Multi-class ROC Curves: Illustrated the trade-off between sensitivity and specificity for each depression class; (3) Macro-average AUC: Evaluated overall model performance by averaging AUC scores across all depression categories; (4) Kappa Statistic: Assessed model agreement with the true classifications, accounting for class imbalance; (5) Average Precision, Recall, and F1-score. Although CTD patients experience varying levels of depression in real-world scenarios, leading to class imbalance, it is important to recognize that all prediction levels should not be treated equally in terms of their predictive value. In a clinical setting, the ideal approach would be to develop a predictive model that is more effective at identifying depression-positive cases, such as mild, moderate, and severe cases.

Model interpretation

The SHapley Additive exPlanations (SHAP) technique, demonstrating each variable’s influence on the overall model, was further applied to gain insight into the best-performance model23. We utilized the SHAP technique to evaluate the contribution of features in the best-performing model to the prediction outcomes of the test set. In the three-class classification task, the model’s objective is to accurately assign samples to one of the three categories. Consequently, we calculated the SHAP values for each category. For each sample, the SHAP value is represented as a matrix of size (1, k, 3), where each matrix element corresponds to the contribution of a feature to a specific category (with k representing the number of features in the model). We generated separate SHAP bar and dot plots for each category using the test set data. Furthermore, to visualize the distribution of key features across different depression severities in a sample of patients, we generated a grouped-clustered heatmap for ten randomly selected CTD patients using the “pheatmap” package in R (version 4.2.1). Finally, for enhanced clinical usability, the best-performance multi-classification model was deployed as an R Shiny application, facilitating user-friendly model interaction.

Statistical analysis

This study aimed to compare the differences in demographic, clinical, and laboratory characteristics across three levels of depression. The initial step involved evaluating the normality of continuous variables utilizing the Shapiro-Wilk test. Continuous variables that adhered to a normal distribution were presented as mean ± standard deviation (SD) and were analyzed using the Chi-square test. In contrast, those not following normal distribution were depicted as median ± interquartile range (IQR) and assessed with the Kruskal-Wallis test. Subsequently, categorical variables were compared using either the Chi-square (χ2) or Fisher’s exact test, depending on their distribution and sample size. A p-value of less than 0.05 was considered statistically significant for all tests. The entire statistical analysis process was performed using the “compareGroups” package in R version 4.2.1.

Results

Participants’ characteristics

This study recruited a total of 500 patients, of which 480 patients with confirmed diagnoses were deemed eligible. The median age of the participants was 58.5 years (IQR: 48.8–68 years), with a predominance of female patients, accounting for 87.9%. These 480 patients with CTDs were randomly divided into two groups: the training cohort (n = 384) and the test cohort (n = 96). Comparative analysis of baseline characteristics between the two cohorts indicated a general balance, as detailed in Supplementary Table S1.

Depression, the outcome of this study, was categorized into three levels: no depression, mild depression, and moderate or severe depression. Within the training cohort, the prevalence rates for these categories were 48.2% (n = 185/384), 31.5% (n = 121/384), and 20.3% (n = 78/384), respectively. Correspondingly, in the test cohort, the prevalence rates were 59.4% (n = 57/96), 24.0% (n = 23/96), and 16.6% (n = 16/96), respectively.

Feature selection

In this research, forty-six potential variables related to depression in patients with CTDs were initially considered. Of these, twenty-three variables were identified as having missing data. Twenty-three variables had missing values, and six variables with missing values over 25% were excluded from the analysis (Supplementary Table S2). Univariate analysis (Supplementary Table S3) subsequently pinpointed eleven variables markedly associated with the depression outcome, including sleep duration, presence of other CTDs, ILD, PAH, infection, fibromyalgia, fatigue, use of classic immune preparations, platelet count, lymphocyte count, and anti-Ro52 antibodies.

These identified eleven variables were further analyzed using the LASSO algorithm to discern the optimal subset. The LASSO algorithm ultimately determined that five variables had a significant association with the multi-classification depression outcome: fatigue, anti-Ro52 antibodies, sleep duration, platelet count, and lymphocyte count.

Model performance

The aforementioned five variables were used to construct the following six different multi-classification ML models: LR, SVM, RFC, LGBM, Catboost, and ANN. The optimal hyperparameters for each model are documented in Supplementary Table S5. Supplementary Table S4 and Figure S1 show the performance metrics of these models within the training cohort. Figure S2 shows the confusion matrix of each multi-class model on the training cohort. Notably, except for the LR and the ANN models, all other models demonstrated superior performance, ranking in the top four across various metrics, including mild_F1, moderate and severe_F1. These models have the potential to correctly detect more positive cases of CTD related-depression.

Furthermore, the validity of the proposed models was assessed using the test cohort. The performance of the six models within the test cohort is detailed in Table 1 and illustrated in Fig. 2. Confusion matrix of the six multi-classification ML models in the test cohort is shown in Fig. 3. Among these, the Catboost model exhibited the highest levels of mild_F1, moderate and severe_F1. These results highlight the robust performance of the Catboost model and its effectiveness in classifying depression levels in patients with CTDs.

Table 1 The performance of the following six multi-classification ML models in the test cohort (n = 96).
Fig. 2
figure 2

ROC curve of the six multi-classification ML models in the test cohort. (a) LR; (b) SVM; (c) RFC; (d) LGBM; (e) Catboost; (f) ANN. Notes: ROC, receiver operating curve; ML, machine learning; LR, logistic regression; SVM, support vector machine; RFC, random forest classifier; LGBM, light gradient boosting machine; Catboost, categorical boosting; ANN, artificial neural network.

Fig. 3
figure 3

Confusion matrix of the six multi-classification ML models in the test cohort. (a) LR; (b) SVM; (c) RFC; (d) LGBM; (e) Catboost; (f) ANN. Notes: ML, machine learning; LR, logistic regression; SVM, support vector machine; RFC, random forest classifier; LGBM, light gradient boosting machine; Catboost, categorical boosting; ANN, artificial neural network.

Model interpretation

The Catboost model was explained through the application of the SHAP algorithm, which calculated the average absolute SHAP values to quantify the impact of each variable.

Figure 4c illustrate the distributions of feature importance analyses and variable impact model outputs for the Catboost model across the none, mild, moderate, and severe depression categories. Notably, the analysis highlights that higher levels of fatigue, presence of anti-Ro52 antibodies, reduced sleep duration, lower platelet counts, and decreased lymphocyte counts were associated with increased SHAP values. This suggests these factors correlate with an elevated risk of depression, as indicated by the model. Additionally, Fig. 4d provides a comparative visualization of the contributions of these five critical variables across a subset of ten randomly selected patients from the test cohort.

Fig. 4
figure 4

SHAP summary plots for explaining the five variables contributing to the Catboost model. (a) The bar and dot chart displaying the feature importance rankings and the distribution of each variable influence on the Catboost outputs for the “none” category; (b) the bar and dot chart displaying the feature importance rankings and the distribution of each variable influence on the Catboost outputs for the “mild” category; (c) the bar and dot chart displaying the feature importance rankings and the distribution of each variable influence on the Catboost outputs for the “moderate and severe” category; (d) grouped-clustered heatmap illustrating a comparison of the five variables among the random ten CTD patients of the test cohort.

Model deployment

To enhance the practical applicability of our proposed Catboost model in clinical settings, the model was deployed to the cloud, making it accessible to healthcare professionals for real-time depression risk assessment in patients with CTDs. The model is hosted on a web-based platform available at https://macnomogram.shinyapps.io/Catboost/. The user interface of this online tool is illustrated in Figure S3, designed to provide a user-friendly experience for clinicians and researchers alike.

Discussion

This study introduces the ANN model as a predictive tool for assessing the risk of depression, demonstrating superior performance over competing models across multiple metrics such as AUC, accuracy, precision, recall, F1 score, and kappa coefficient. The model’s robustness in parsing complex medical data reveal its potential for accurate depression risk predictions.

Unlike traditional tools such as the PHQ-9 and BDI, the ANN model integrates and analyzes a broad range of clinical and laboratory data, automating the analysis process. This circumvents the need for manual scoring by clinicians, facilitating quicker, more objective assessments. The development of a web-based calculator enhances the model’s clinical utility, providing healthcare professionals with a readily accessible tool.

This research contributes to the literature on ML applications in refining depression risk assessment strategies, expanding upon previous studies15,16. It acknowledges depression’s multifactorial nature, arising from various factors24, which manifest the need for comprehensive approaches to understanding and managing this condition. The ANN model marks a notable advancement in applying ML to mental health, offering a nuanced and efficient method for identifying at-risk individuals25,26.

The relationship between fatigue and depression is intricate, influenced by various biological and psychological factors. Recent researches indicate neuroinflammation’s central role in these concurrent conditions27, with further studies linking immune dysregulation in multiple sclerosis patients to both conditions27. Tarasiuk et al. propose that inflammation, oxidative stress, and neurodegeneration contribute to the convergence of fatigue and depression, suggesting comprehensive treatment strategies focusing on inflammatory processes and brain arousal systems to prevent progression from fatigue to depression28.

In recent years, anti-Ro52 has been found in a variety of CTDs and has attracted widespread attention from rheumatologists. A large number of studies have confirmed that anti-Ro52 is associated with higher frequency of ILD in CTD patients29. Interestingly, patients with ILD are more prone to depression30. Further studies have explored the role of anti-Ro52 antibodies in psychiatric symptoms among CTD patients, identifying them as potential biomarkers for heightened risk of depression and anxiety9. Xu et al. investigated their prognostic meaning in specific autoimmune conditions, linking these antibodies to more severe disease trajectories and potential psychiatric symptom development31.

Recent studies display the complex and bidirectional relationship between sleep duration and depression, showing that both insufficient and excessive sleep can increase the risk of depressive disorders. Bender et al. identified a U-shaped correlation between sleep duration and depression risk, revealing that individuals who sleep less than seven hours or more than nine hours nightly face a heightened risk of depression32. This observation is supported by Geoffroy et al., who also found a U-shaped link between sleep duration and mental health outcomes. This reinforces the importance of promoting adequate sleep as a preventative strategy against psychiatric conditions33.

Recent research is illuminating the complex connections between blood cell counts—specifically platelets and lymphocytes—and depression, positing that systemic inflammation may mediate this relationship. Zhu et al. have reported findings that link elevated levels of inflammatory markers, such as the monocyte-to-lymphocyte ratio (MLR), to individuals with depression34. This link to systemic inflammation suggests these markers could serve as potential biomarkers for depression, bolstering the theory that inflammation plays a critical role in the pathophysiology of the condition. Additionally, the relevance of another inflammatory marker, the platelet-to-lymphocyte ratio (PLR), has been explored by researchers like Gasparyan et al. and Danese et al.35,36. Furthermore, Gansner et al. discussed the effects of plateletpheresis a common procedure in blood donation on the immune system, highlighting its potential implications for inflammatory processes37. This body of research could lead to the identification of novel biomarkers or therapeutic targets for managing depression.

This study not only builds upon our previous research identifying specific antibodies as depression risk factors in CTD patients9, but also integrates these findings into a predictive ML model. Rather than defining depressive outcomes as a simple yes or no, our multi-classification model opens up the possibility of more precise risk stratification of patients (no depression, mild depression, moderate and severe depression), allowing clinicians to target interventions more precisely based on the different predictions for each patient. In addition, this model also offers clinicians new insights for enhancing mental health management and optimizing treatment strategies. Future studies will aim to improve feature extraction and selection methods and expand the model’s integration into broader medical information systems to support automated risk assessments and early intervention.

Our study also has some limitations, including potential biases from the single-center data collection and the dependence of ML model performance on data quality and feature selection. The exclusion of variables with excessive missing data points, such as certain cytokines might have affected the model’s performance38. In addition, there are many risk factors for depression, including dysfunctional cognitions, stressful life events and circumstances, parental depression, interpersonal dysfunction39, these variables were not included in the study due to resource constraints, challenges in data collection and measurement, and interactions and complexities among variables.

Conclusions

This study introduces a novel multi-classification ML-based model, employing the ANN algorithm, for the assessment of depression risk in patients with CTDs. Through rigorous analysis of multidimensional patient data, including clinical information, laboratory test results, and psychological assessments, we identified key variables substantially associated with depression risk. The ANN model outperformed existing assessment tools and other ML models in accuracy and efficiency. The utilization of SHAP for model interpretation further elucidates the contributions of specific variables to depression risk, enhancing the model’s clinical relevance. Deployed as an R Shiny application, the model offers a practical tool for healthcare professionals, facilitating early identification and management of depression in CTD patients. Despite its promising results, future research should focus on expanding the model’s generalizability through multi-center studies and integrating additional biomarkers to refine its predictive capability. This study displays the potential of advanced ML techniques in transforming the approach to mental health assessment within the medical field, particularly for patients with complex autoimmune diseases.