Main

Depression, a pervasive mood disorder characterized by persistent low mood, diminished interest or pleasure in activities, and impaired self-esteem, poses a essential global public health challenge1. According to 2024 studies, the prevalence of depression has significantly increased2,3. In addition, a large-scale US study in 2023 found that 22.1% of adults experienced symptoms of depression4. In China, the lifetime prevalence of major depressive disorder has been estimated at 6% (ref. 5), with recent data reporting weighted prevalence rates of 2.5% for major depressive disorder and 14.7% for subthreshold depressive symptoms6. These figures underscore the substantial public health burden of depression in the Chinese population as well. However, the onset and manifestation of depression vary widely due to a range of factors, including genetic predisposition, environmental influences and epigenetic modifications5. These multifactorial influences, along with barriers such as personal stigma and reluctance to seek screening, result in under-recognition and under-identification, exacerbating its public health burden6,7,8.

Advancements in health check-up technology, along with increased public awareness and concern for health, have markedly enhanced the accessibility of health-related data. Innovations such as advanced imaging techniques and automated laboratory systems9,10,11 enable efficient follow-up and tracking of health indicators and disease progression through longitudinal data. There are studies using adverse childhood experiences12 and biomarkers such as C-reactive protein13,14 to track depression trajectories. Notably, blood cell biomarkers are of particular interest due to their potential for identifying individuals at risk and their ease of collection. A comprehensive meta-analysis, pooling 104 studies with more than 1,000 patients, found elevated white blood cell counts, neutrophil counts and monocyte counts, accompanied by changes in ratios such as neutrophil-to-lymphocyte ratio (NLR), CD4/CD8 and T helper 17/T regulatory ratio, in depressed individuals15. Conversely, red blood cell biomarkers, such as hematocrit, hemoglobin and mean corpuscular volume, tend to decrease in individuals with depression, while red blood cell distribution width (RDW) increases16. Nevertheless, much of this research focuses on post-onset analysis rather than at-risk trajectories before symptom emergence, and there is a lack of large-scale and multicenter studies, especially in East Asian populations17,18.

Given this context, we hypothesized that blood cell trajectories are associated with the risk of developing depression in an East Asian population. We utilized a large-scale longitudinal check-up cohort of ~700,000 individuals with more than 10 years follow-up19. A Poisson model was applied to investigate the association between temporal changes in hematological biomarkers (including both overall and subtypes of red and white blood cells) and the risk of developing depression during 13-year follow-up. We further explored the potential trajectories of specific hematological biomarkers before the onset of depression. We hope to deepen the understanding of the relationship between blood cell biomarkers and depression risk, providing new perspectives and clinical guidance for the early detection and intervention of depression.

Results

Baseline characteristics

This study analyzed data from the WHALE (West China Hospital Alliance Longitudinal Epidemiology Wellness) Study20, which is a longitudinal cohort involving 685,163 participants undergoing regular health check-ups between 2010 and 2023. Baseline characteristics are detailed in Table 1.

Table 1 Baseline characteristics of participants by disease status

The Self-rating Depression Scale (SDS)21 was incorporated into the WHALE Study in 2014, engaging 96,596 participants. Following stringent quality controls, 96,555 individuals were retained, excluding those who screened positive for marked depressive symptoms during their initial health check-up (9,062 individuals), resulting in a final sample size of 87,493 individuals. During the follow-up period, 25.2% of participants (22,048 individuals) were lost to follow-up, yielding a follow-up rate of 74.8%. Baseline characteristics were assessed at the time of the first health check-up, before any participants were screened positive for depressive symptoms. Characteristics at baseline were stratified on the basis of depressive symptom status. A total of 0.7% (n = 650) were screened positive for depressive symptoms during the 10 years of follow-up (2014–2023). Note that this rate reflects incident cases only, as individuals who had already screened positive for depressive symptoms at baseline (n = 9,062, approximately 9.4%) were excluded from the final analysis to focus on new onset. This exclusion, along with the generally healthy nature of the health check-up population, contributed to the relatively low observed incidence.

The follow-up for depression outcomes began in 2014, after the incorporation of the SDS into the WHALE Study, and continued until 2023. Hematological biomarkers were assessed during participants’ regular health check-ups, with measurements obtained at each health check-up over the study period (Fig. 1). Female participants accounted for 45.9% (n = 40,134), with an average age of 43.47 years. Non-smokers made up 70.6% (n = 60,916) of the cohort, non-drinkers 53.3% (n = 46,593); 7.4% (n = 6,499) had a history of hypertension, and 3.2% (n = 2,840) had diabetes. In addition, the study evaluated 32 kinds of hematological biomarkers at baseline.

Fig. 1: Study population flow diagram.
figure 1

Flow diagram illustrating participant selection from the WHALE Study between 2010 and 2023. Steps include data screening and preparation, analysis of depression risk and sensitivity, and trajectory analysis of longitudinal biomarkers. Exclusion criteria and final sample sizes are indicated at each stage. FDR, FDR-corrected P value. Credit: erythrocyte and lymphocyte icons, graphicsrf, Vecteezy.com.

Associations between hematological biomarkers and depression

In Poisson regression analysis, we applied unadjusted and three sequential adjusted models to explore the association of depression and different hematological biomarkers. Robust standard errors clustered by individual were used to account for repeated measurements. As shown in Table 2 and Fig. 2, significant associations were identified for several white and red blood cell-related biomarkers. Notably, CD4 percentage, CD8 percentage and their ratio showed the strongest associations with depression risk. However, no significant associations were observed for ratios or platelet biomarkers.

Fig. 2: Associations between blood cell parameters and depression risk.
figure 2

Estimates (risk ratios (RRs) or odds ratios (ORs) with 95% confidence intervals (CIs)) of associations between multiple blood cell indicators and depression risk were obtained using Poisson regression, bivariable logistic regression and multivariable ordinal logistic regression models. Error bars represent 95% CIs. Two-sided tests were performed, and multiple comparisons were corrected using the FDR method. FDR-corrected P values are given. Sample size: n = 87,493 participants. aFDR <0.05 in Poisson regression models (RR). bFDR <0.05 in bivariable logistic regression models (OR). cFDR <0.05 in multivariable ordinal logistic regression models (OR). NA, non-convergence of the models during estimation.

Table 2 RRs and 95% CIs for blood cell parameters associated with depression

For white blood cell-related biomarkers, after adjusting age, sex, body mass index (BMI), smoking status, drinking status, diabetes and hypertension (model 3, fully adjusted model), depression was significantly associated with a lower level of CD8 percentage (adjusted risk ratio (aRR) 0.829, 95% CI 0.722–0.951, FDR 0.023). Furthermore, depression was found to have a significant relationship with higher levels of CD4 count (aRR 1.148, 95% CI 1.023–1.289, FDR 0.042), CD4 percentage (aRR 1.209, 95% CI 1.069–1.368, FDR 0.012) and CD4/CD8 ratio (aRR 1.165, 95% CI 1.061–1.279, FDR 0.012). For red blood cell-related biomarkers, the fully adjusted model showed that depression was significantly associated only with a higher level of RDW-SD (RDW–s.d.) (aRR 1.140, 95% CI 1.059–1.226, FDR 0.004). Similar signals were also identified in the other three models.

In the sex-stratified Poisson models, different patterns emerged between male and female participants: no significant associations were observed in male participants. In female participants, RDW-SD showed significant associations with depression in all models (aRR 1.171, 95% CI 1.063–1.290, FDR 0.013). Other hematological biomarkers did not show significant associations with depression in either sex, as detailed in Supplementary Table 1.

Mixed-effects model analysis confirmed these findings, with RDW-SD showing consistent results (adjusted odds ratio (aOR) 1.111, 95% CI 1.023–1.206, FDR 0.040). Other significant biomarkers included mean corpuscular volume (aOR 1.195, 95% CI 1.088–1.314, FDR 0.002), basophil percentage (aOR 1.106, 95% CI 1.042–1.174, FDR 0.006), monocyte percentage (aOR 1.170, 95% CI 1.080–1.268, FDR 0.002) and monocyte-to-lymphocyte ratio (MLR) (aOR 1.109, 95% CI 1.028–1.197, FDR 0.016). Moreover, higher depression risk was found with lower red blood cell count levels (aOR 0.810, 95% CI 0.725–0.906, FDR 0.002; Supplementary Table 2)

Sensitivity analysis

We then performed binary and ordinal logistic regression analyses as sensitivity tests using the same models (Supplementary Table 3). Notably, we divided 87,493 cases into non-case (n = 69,292, SDS score < 50), mild (n = 17,551, 50 ≤ SDS score < 59), moderate (n = 585, 60 ≤ SDS score < 69) and severe (n = 65, SDS score ≥ 69) subgroups.

For white blood cell-related biomarkers, the binary logistic regression results demonstrated consistent directionality with the associations identified in the Poisson regression analysis, supporting the robustness of our findings. Furthermore, higher levels of white blood cell count, neutrophil count and monocyte count were significantly associated with depression in ordinal logistic regression analysis in model 1 (adjusted for age and sex; all aOR > 1, all FDR < 0.05).

For red blood cell-related biomarkers, higher mean corpuscular hemoglobin showed additional significant associations, and both mean corpuscular hemoglobin concentration and hemoglobin remained significant after adjustment for age and sex (all aOR > 1, all FDR < 0.05).

Notably, ordinal logistic regression revealed a negative association between blood platelet count, platelet-to-lymphocyte ratio and depression (all aOR < 1, all FDR < 0.05), along with a significant positive association of MLR with depression (aOR > 1, FDR < 0.05); both patterns were not observed in other analyses.

Trajectory analysis

We applied latent class mixed modeling22,23 to constructed trajectories of hematologic biomarkers on the basis of the assumption that the population may contain distinct latent subgroups with heterogeneous longitudinal patterns. Individuals with at least two SDS records were included to ensure sufficient SDS data for modeling. The total is 13,787 individuals with the same 650 cases of depression. After model selection procedures detailed in Methods, we identified optimal trajectory models for 12 biomarkers in the total sample, 13 in female participants and 9 in male participants. The fitting process and trajectory results for the hematological biomarkers are provided in Supplementary Tables 46 and Supplementary Figs. 130. On average, each participant had 2.4 measurements taken over a follow-up period of 10 years (2014–2023). The number of assessments per participant ranged from two to eight, with a median of two visits (interquartile range (IQR) 2–3). The average interval between consecutive assessments was approximately 1.8 years.

In the total population, trajectory association analysis revealed two significant biomarkers associated with depression. Individuals exhibiting a U-shaped trajectory of the MLR had a higher likelihood of depression compared with those with a stable trajectory. Similarly, for the NLR, an inverted-N-shaped trajectory was associated with an increased risk of depression relative to the stable group (Table 3 and Fig. 3).

Fig. 3: Trajectories before depression.
figure 3

Trajectories were identified using latent class mixed models, with time on the x axis representing follow-up duration (years) from baseline. a, Predicted trajectories for MLR in all participants. b, Predicted trajectories for NLR in all participants. c, Predicted trajectories for MLR in male participants. d, Predicted trajectories for LymC in male participants.

Table 3 Logistic regression analyses of associations between trajectories of blood cell biomarkers and depression risk

Given that sex is an effect modifier influencing depression24, we then conducted a sex-stratified latent class trajectory analysis. In male participants, a U-shaped trajectory of MLR was positively associated with depression risk. In addition, an N-shaped trajectory of lymphocyte count (LymC) was also linked to a higher probability of depression when compared with a stable trajectory. By contrast, no significant associations between hematological trajectories and depression were observed in female participants (Table 3 and Fig. 3).

Discussion

Based on data from the WHALE Study, which is considered both population-based and nationally representative due to its extensive and diverse participant pool19, our study investigates the relationship between various hematological biomarkers and the risk of depression. Among the blood cell types analyzed, white blood cell-related biomarkers demonstrated the most notable associations with depression. Notably, our study identified elevated CD4 count and percentage, CD4/CD8 ratio and RDW-SD as biomarkers associated with increased risk of depression. By contrast, higher CD8 percentage was protective. Sex-specific analyses revealed that RDW-SD was positively associated with depression in female participants. In addition, trajectory modeling identified specific dynamic patterns of MLR and NLR associated with depression risk in the total population, and MLR and LymC in male participants, suggesting that dynamic changes in inflammatory markers may play a role in the etiology of depression. These results highlight the complex role of hematological biomarkers in depression and suggest distinct biological mechanisms across genders.

The new-onset depression rate in our study was 0.7%, which is relatively low. This relatively low rate is consistent with national-level estimates of major depressive disorder in China, which are generally lower than global averages due to factors such as cultural stigma, symptom somatization and diagnostic discrepancies25. This may be due to several factors. First, individuals in health check-up cohorts are generally healthier and less likely to report depressive symptoms. Previous studies have shown that people in such populations tend to under-report mental health issues, due to either lack of awareness or social stigma26. Second, cultural factors in China may contribute to this under-reporting. Depression is often under-recognized or under-reported due to traditional beliefs that prioritize physical over psychological health and a reluctance to acknowledge mental health issues27. To reduce the risk of reverse causation bias and ensure a clear temporal sequence between biomarker measurement and depression onset, we excluded individuals who screened positive for depressive symptoms at their first SDS assessment. This approach aimed to ensure a more accurate analysis.

The relationship between white blood cell-related biomarkers and depression appears to be complex, varying across different contexts and populations. First, our study identified a positive correlation between elevated CD4 levels and depression. This finding is consistent with the results of animal studies by Fan et al.28, which provided biological evidence supporting the role of high levels of CD4+ T cells in contributing to depression and other stress-induced mood disorders. This suggested that CD4+ T-cell-driven metabolic changes, such as mitochondrial fission and purine synthesis, play an important role in the pathophysiology of depression, aligning with our observations29. By contrast, Donna et al.30 observed a negative correlation in HIV/AIDS patients, suggesting that the immunocompromised status of these patients may alter this relationship. This contrast highlighted the importance of considering immune competence in interpreting CD4-related findings.

Second, consistent with our results, reduced levels of CD8+ T cells have been associated with the onset of depression in other studies31. Although their overall numbers are decreased, dysfunction of the remaining CD8+ T cells—such as excessive production of pro-inflammatory cytokines such as TNF-α and IFN-γ—may contribute to sustained neuroinflammation, impaired neurogenesis and cognitive dysfunction, thereby promoting depression development. This reinforced our findings and underscored the role of immune-mediated inflammation in depression. Third, Zhou et al.32 found that the CD4/CD8 ratio was higher in non-severe depression cases than in severe ones, indicating that the severity of depression might differentially influence this ratio. This finding aligns with our observations and suggests that the CD4/CD8 ratio could serve as a potential marker for depression severity in clinical settings. Notably, our findings reflect the state of these biomarkers before the emergence of elevated depressive symptoms, as the Poisson models incorporated time-updated exposures and prospectively assessed depression outcomes at follow-up. This distinguishes them from studies focusing on immune changes following symptom onset. For red blood cell-related biomarkers, we observed a positive correlation between depression and higher levels of RDW-SD. Multiple studies16,18,33,34 have consistently reported that patients with depression, including those clinically diagnosed with unipolar depression, tend to have higher RDW levels compared with non-depressed individuals. Notably, higher RDW levels have been identified as predictive of depression across various groups. For example, a prospective cohort study35 involving 438 patients with intracerebral hemorrhage found that those with elevated RDW levels at admission had a significantly higher risk of developing depression three months post-intracerebral hemorrhage. This indicated that RDW could serve as a biomarker for identifying patients at risk for depression following acute medical events. This variability underscored the need for further research to explore the associations and underlying mechanisms in diverse populations.

Our sex-stratified Poisson regression analysis revealed distinct patterns in the association between RDW-SD and depression between males and females. Specifically, we found a significant positive association between RDW-SD and depression in female participants after adjusting for confounding factors, a pattern not observed in male participants. This contrasted with findings from a population-based study in northeastern Iran36, which reported that male participants with severe depression had significantly higher values of RDW. However, the relationship was less marked among women. The discrepancies could be attributed to racial and geographic differences37, highlighting the importance of considering these factors when interpreting RDW as a biomarker for depression. The variability observed across different studies underscores the need for further research to explore the associations and underlying mechanisms in diverse populations.

Trajectory-based findings from our study further underscore the dynamic and nonlinear nature of immune involvement in depression. Specifically, we observed that an inverted-N-shaped trajectory of the NLR and a U-shaped trajectory of the MLR were significantly associated with elevated depression risk in the overall sample. In male participants, a U-shaped trajectory of the MLR and an N-shaped trajectory of LymC also emerged as a significant predictor. These results suggest that not only static elevations but also fluctuations in immune cell distributions—particularly those reflecting periods of transient inflammation or immunosuppression—may contribute to depression pathogenesis. This aligns with the theoretical framework proposed by Miller and Raison, who argued that depression is less a state of persistent inflammation and more a result of immune dysregulation marked by instability in immune set points over time38. Similarly, Dantzer et al. emphasized that peripheral immune oscillations may interfere with central nervous system homeostasis, contributing to affective disturbances39.

The specific trajectory patterns we identified may reflect underlying cycles of immune activation and suppression. For example, U- or N-shaped trends in MLR and LymC may indicate an adaptive immune response initially mobilized under stress, followed by compensatory exhaustion and eventual reactivation. This is consistent with previous work demonstrating that T-cell dynamics, particularly involving CD4+ and CD8+ subsets, influence hippocampal neurogenesis, neuroinflammation and ultimately emotional regulation40. Moreover, these trajectory-based associations extend beyond the limitations of cross-sectional immune markers by capturing temporal shifts that precede the onset of depressive symptoms, offering a novel perspective on immunological vulnerability in depression. Such dynamic biomarkers may serve as more sensitive predictors than static measures, especially in early identification and preventive strategies41.

Future research should aim to elucidate the underlying mechanisms driving these associations, potentially involving longitudinal studies42 and experimental models to validate the causal relationships. Furthermore, these results underscore the potential utility of trajectory analysis in identifying biomarkers for depression. By capturing dynamic changes in hematological biomarkers over time, this approach provides a more nuanced understanding of how physiological changes relate to mental health outcomes43. Such insights are crucial for developing targeted interventions and personalized treatment strategies for depression, tailored to the specific needs of different demographic groups.

Our research has several notable strengths that enhance its impact and credibility. First, we utilized a large, population-based sample, which is notably larger than those used in many previous studies and provides evidence from East Asia for research in this field to compensate for the lack of population representativeness, providing robust statistical power and greater generalizability of the results. Second, the hematological biomarker data were collected using objective, standardized instruments, ensuring high accuracy and consistency across measurements. Third, we conducted multiple sensitivity analyses, including sex-stratified analysis, to confirm the robustness of our findings, demonstrating that our results are stable under various conditions.

There are also some limitations in this research. First, the assessment of depression severity relied on self-reported tools rather than widely accepted clinical diagnostic methods, such as structured interviews, but we applied different statistical models to adjust for multiple confounding factors and utilized standardized biomarker data to enhance data reliability. Second, despite accounting for several potential confounders, some residual confounding, including socioeconomic demographic factors, measurement error or other complex interactions cannot be entirely ruled out. Therefore, we tried to mitigate this limitation by performing comprehensive sensitivity analyses and hope to continue to refine our methods in future studies. Last, although the follow-up rate was 63.5%, our hospital’s large catchment area and strong participant adherence helped maintain a robust sample. However, some selection bias due to loss to follow-up remains. We plan to refine follow-up strategies in future studies to improve generalizability.

Conclusion

Our study emphasizes the significance of hematological biomarker trajectories in understanding depression risk. We identified significant associations between depression symptoms and biomarkers such as white blood cell-related biomarkers, including count, percentage and ratio of CD4 and CD8, and red blood cell-related biomarkers, including RDW-SD. Notably, the sex-stratified trajectory analysis revealed that the N-shaped trajectory of RDW-SD in male participants and the increasing-shaped trajectory of CD4 cell subsets in female participants are particularly important for understanding depression risk, reflecting physiological disturbances and hormonal influences, respectively. Our findings contribute to understanding the complex interplay between hematological factors and depression, suggesting that long-term monitoring of these biomarkers could aid in identifying individuals at higher risk and inform personalized intervention strategies.

Methods

Study design, setting and participants

The WHALE Study is a large-scale, prospective health check-up initiative established in 2010 at West China Hospital of Sichuan University19. In its first release, the cohort included 478,898 participants, who underwent comprehensive health check-ups at a main center and four sub-centers (Wuhou, Wenjiang, Tianfu and Shangjin). The second data release builds on this foundation with further data processing and quality control to ensure higher reliability and robustness. The cohort now includes data from 685,163 participants, spanning from 2010 to 2023 (Chinese Clinical Trial Registry (http://www.chictr.org.cn/), identifier: ChiCTR2200066950). This structured, multicenter approach ensures comprehensive data collection and enhances the dataset’s capacity for longitudinal research across a diverse population.

All participants completed at least one general health check-up, encompassing vital signs, body measurements (height, weight, BMI, blood pressure and so on) and laboratory tests (routine blood, routine urine and so on). Starting from 27 May 2014, the depression self-assessment questionnaire was incorporated into the check-up protocol. A total of 96,596 participants provided depression self-assessment scores, with 41 individuals excluded due to being under 18 years old or having erroneous assessment scores (Fig. 1). Data were accessed through the Big Data Platform19 of the West China Hospital, which is used to integrate all electronic medical records.

Exposures variable

Since 2010, we have amassed comprehensive health check-up data from participants, including detailed hematological biomarkers. These include 8 red blood cell-related parameters (red blood cell count, RDW-SD, RDW-CV, hematocrit, mean corpuscular hemoglobin concentration, mean corpuscular hemoglobin, mean corpuscular volume, hemoglobin), 11 white blood cell-related parameters (white blood cell count, neutrophil count, neutrophil percentage, LymC, lymphocyte percentage, basophil count, basophil percentage, eosinophil count, eosinophil percentage, monocyte count, monocyte percentage), 7 T-lymphocyte-related biomarkers (CD3 count, CD3 percentage, CD4 count, CD4 percentage, CD8 count, CD8 percentage and CD4/CD8 ratio), 1 platelet count and equation K value of erythrocyte sedimentation rate, and erythrocyte sedimentation rate. Despite advancements, the etiology of depression remains elusive. Elevated levels of pro-inflammatory markers have been noted in individuals with depression. Hence, we examined three ratios reflecting inflammatory and immune responses: MLR, NLR, and platelet-to-lymphocyte ratio.

Data collection continued until 2023, with many participants undergoing multiple health assessments, providing a rich dataset of repeated measurements. Hematology analytes, including red blood cell count, white blood cell count, platelet count and hemoglobin, were determined using the XE-2100 and XE-5000 systems (Sysmex), with further details available in the study by Lin et al. in BMC Medicine19.

Outcome variable

Since 2014, our study has employed the SDS to screen for and assess the severity of depression among health check-up participants. Developed by Zung in 1965, the SDS is a widely recognized tool in psychological research and clinical practice, noted for its reliability and validity. This scale consists of 20 items, evenly split between positive and negative statements, with each item rated on a 4-point scale on the basis of the participant’s feelings over the past week21.

In our assessments, we adopted a dual approach to SDS score interpretation. For binary classification in the primary analyses (for example, Poisson regression and trajectory analysis), we defined depressive symptoms as an SDS score of 60 or higher, in accordance with commonly used thresholds for identifying clinically meaningful symptoms of moderate to severe depression. This threshold ensures greater specificity and reduces the likelihood of false positives in incidence estimation. For ordinal analyses using ordinal logistic regression, we categorized depressive symptom severity into four levels on the basis of standard SDS interpretation: <50, no symptoms; 50–59, mild; 60–69, moderate; and ≥70, severe21. This classification allowed us to explore the relationship between biomarkers and varying levels of depressive symptom severity. The date of depressive symptom identification was defined as the date the scale was recorded.

In all regression-based analyses—including Poisson regression, mixed-effects models and standard logistic regression—the outcome variable was binary, indicating the presence or absence of depressive symptoms. In the ordinal logistic regression, the outcome was depressive symptom severity, categorized as none, mild, moderate or severe. For the trajectory analysis, participants were first grouped into distinct latent classes on the basis of longitudinal patterns of biomarker levels. Subsequent analyses examined the association between trajectory group membership and binary depression status.

Ascertainment of covariates

Depression risk is influenced by a range of demographic, lifestyle and health-related factors44,45. On the basis of previous literature and biological plausibility, we consider age, sex, smoking status, alcohol consumption, BMI, hypertension and diabetes as covariates in our primary and sensitivity analyses. All covariate information was collected during the health check-up using standardized questionnaires. Age was treated as a continuous variable (in years), and sex was coded as binary (female, male). Smoking status and alcohol consumption were categorized as current use (yes/no). BMI was calculated as weight in kilograms divided by height in meters squared (kg/m²) and included as a continuous variable. Hypertension and diabetes were defined on the basis of physician diagnosis reported in the health record or self-reported history during the check-up, both coded as binary variables (yes/no).

Statistical analysis

The health check-up data included repeated measurements of the SDS scores and other examination biomarkers for participants from 2014 to 2023. We first performed baseline descriptive statistical analysis based on whether participants screened positive for depressive symptoms, presenting continuous variables as medians and IQRs (25th-75th percentiles) and categorical variables as numbers (n) and percentages (%).

To model the incidence of depressive symptoms over time, we conducted longitudinal analyses using multivariable Poisson regression models with a log link function. The occurrence of depressive symptoms at the subsequent visit served as the outcome. To account for varying time intervals between health check-ups, we included the logarithm of the days between visits as an offset term. This approach is commonly used in epidemiological studies to estimate incidence rates when follow-up durations vary46,47,48. To account for within-subject correlation due to repeated measures, we applied robust (sandwich) standard errors clustered by participant identifier, which adjust the variance estimates without altering point estimates49,50,51. The models included hematologic biomarkers as primary exposures of interest and were adjusted for relevant covariates. Only participants with at least two SDS assessments were included in these analyses. To formally express the Poisson regression model, we specified:

$$\log \left({\lambda }_{i}\right)={\beta }_{0}+{\beta }_{1}\times {\mathrm{Biomarker}}_{i}+\mathop{\sum }\limits_{k}{\beta }_{k}\times {\mathrm{Covariate}}_{{ik}}+\log \left({\mathrm{Days}}_{i}\right)$$

where \({\lambda }_{i}\) represents the expected incidence rate of depressive symptoms for participant i, and the biomarker and covariates are modeled as fixed effects. The log(Daysi) term accounts for variation in follow-up time between health check-ups and was included as an offset.

We assessed model adequacy using the Pearson chi2 statistic divided by residual degrees of freedom. Dispersion values were close to 1.0 for most models, indicating no evidence of overdispersion (Supplementary Table 7).

We constructed an unadjusted model and three adjusted models to account for covariates. Model 1 adjusted for baseline age and sex, model 2 also included smoking status and alcohol consumption, and model 3 further incorporated BMI, hypertension and diabetes.

Repeated measurements were defined as participants having completed at least two SDS-based assessments. Individuals with only a single SDS evaluation were excluded from rate-based and trajectory analyses to mitigate the potential bias introduced by disproportionate weighting (for example, those positive in a single visit contributing a 100% event rate). On average, included participants had 2.4 SDS assessments over the 10-year follow-up period.

Given the evidence from previous studies that suggest sex-specific differences in the hematological biomarker response to depression24, we conducted a sex-stratified analysis to explore potential differences. To further account for the repeated measures within individuals over time, we employed a generalized linear mixed-effects model with a logit link function. The binary outcome was the presence or absence of depressive symptoms at each health check-up visit. A random intercept for each participant (based on individual identification) was included to model within-participant correlation arising from repeated assessments over time. The model can be expressed as:

$$\mathrm{logit}\left[P\left({Y}_{{ij}}=1\right)\right]={\beta }_{0}+{\beta }_{1}\times {\mathrm{Biomarker}}_{{ij}}+\mathop{\sum }\limits_{k=2}^{K}{\beta }_{k}\times {\mathrm{Covariate}}_{k,{ij}}+{u}_{i}$$

where Yij indicates depressive symptoms for individual i at visit j, β0 is the intercept, β1 represents the effect of the biomarker, and ui N (0, σu2) is the participant-level random effect. This modeling approach allows for the estimation of population-level fixed effects while accounting for participant-level variability in baseline risk.

To further assess the robustness of our findings, we conducted additional logistic regression and ordinal logistic regression analyses. In these analyses, depressive symptoms were modeled as either a binary or an ordinal outcome (categorized into mild, moderate and severe). For the ordinal analysis, participants with no depressive symptoms (SDS < 50) were used as the reference group. These analyses allowed us to examine whether biomarker–depression associations remained consistent across time. Results were reported using ORs and 95% CIs.

To identify potential trajectory patterns of hematological biomarkers, we included participants with two or more health check-up records and analyzed 32 biomarkers significantly associated with depression onset. Then latent class mixed modeling was used to model the longitudinal change of a certain biomarker over time and identify the trajectories22,23,52. Polynomial functions (linear, quadratic and cubic) were tested to determine the optimal functional form. Random effects included a random intercept and slope (time), allowing for individual-level heterogeneity. The optimal number of latent classes (up to five) was selected on the basis of a combination of Bayesian information criterion, average posterior classification probability (≥0.7) and minimum class size (≥2% of the sample), following the Guidelines for Reporting on Latent Trajectory Studies53.

Latent class mixed modeling was selected for its ability to flexibly model individual biomarkers with interpretable class structures, accommodate within-class variability and provide clinically meaningful insights while maintaining computational feasibility. Effect modification was examined by considering sex as an effect modifier, hypothesizing that sex differences might modulate the relationship between hematological biomarker trajectories and depression onset24. Finally, we used binary logistic regression to study the relationship between depression and different trajectory patterns of hematological biomarkers (Fig. 1).

For the regression analyses, separate models were constructed for each hematological biomarker, allowing independent assessment of their associations with depressive symptoms. In all regression-based analyses, biomarker associations were evaluated using four models (unadjusted and models 1–3 with increasing covariate adjustment). All analyses were conducted with R software (version 4.2.0, using glm2, TableOne and lcmm packages), considering P values < 0.05 as significant. All P values were adjusted for FDR using the Benjamini–Hochberg procedure, applied within each analysis (Poisson regression, logistic regression and mixed-effects models) and within each level of model adjustment (that is, unadjusted model and models 1–3).

Ethics and inclusion statement

The study obtained approval from the Ethics Committee of West China Hospital of Sichuan University, and informed consent was waived (Protocol No. 2023-245). The reporting of observational study results follows the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) guidelines.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.