Introduction

Major depressive disorder (MDD) is one of the most prevalent and disabling psychiatric conditions globally. Clinically, it is characterized by anhedonia, pervasive low mood, suicidal ideation, cognitive dysfunction, and a high burden of neuropsychiatric comorbidities [1], often accompanied by painful physical symptoms (PPS) [2] that further exacerbate functional decline and reduce quality of life. MDD results in years lived with disability and has become one of the leading causes of premature mortality [3]. Approximately 300 million people worldwide are affected by MDD, and its incidence is increasing annually, resulting in significant socioeconomic burdens [4]. Despite its prevalence, the pathophysiological mechanisms underlying MDD remain incompletely understood, posing tremendous hurdles to the development of effective treatments [5].

In recent decades, advancements in genomics and transcriptomics have opened new avenues for understanding the neuroimmunological underpinnings of this complex disorder. Genomic studies, particularly genome-wide association studies (GWAS) [6], have identified numerous genetic variants associated with an increased risk of MDD. These studies have shed light on the polygenic nature of MDD, implicating a wide array of genes involved in neurological pathways. However, despite these findings, the relationship between these genes, their expression profiles, and functional validation through immunophenotyping remains inconsistent. This gap highlights the need to identify reliable biomarkers for MDD.

Transcriptomic analyses have provided valuable insights into gene expression changes occurring in both the brain [7] and immune cells [8] of individuals with MDD, highlighting potential molecular mechanisms underlying the disorder. While the brain transcriptome is critical for understanding MDD, its study is limited by the availability of postmortem data, which restricts insights into its dynamic nature during the course of the disorder. Thus, analyzing transcriptomic data from both the brain and immune cells of MDD patients, alongside genome-wide bioinformatic analyses, is essential [9] for advancing our understanding of MDD neuroimmunology.

Neuroimmunology explores interactions between the nervous and immune systems and is an emerging research area that has accumulated evidence from both MDD patients and animal studies [10]. This reveals a bidirectional interaction between the nervous and immune systems, influencing an individual’s stress response and significantly contributing to their susceptibility to mood disorders [10, 11]. Emerging evidence indicates that neuroimmune processes play a pivotal role in MDD, where immune dysregulation and inflammation are recognized as central components of its pathogenesis. This has led to the development of the neuroimmune network model of depression [12].

This study employs an integrative systems approach to characterize an immunological network within the leukocytes of MDD patients, including the stress-induced expression of PAX6. For the first time, we demonstrate that PAX6, previously suggested in GWAS studies to be potentially associated with MDD, is transcriptionally, at the protein level, and functionally linked to MDD. This is evidenced by the stress-induced expression of PAX6 in both human MDD patient samples from peripheral blood leukocytes (PBLs) and PBL samples of mice subjected to chronic restraint stress (CRS). Our findings suggest that alterations in PAX6 expression in leukocytes, a transcription factor traditionally linked to neurodevelopment and brain function, reflect a broader connection between the brain and the peripheral immune system. By exploring these interactions, our study uncovers potential biomarkers and novel therapeutic pathways for MDD intervention.

Results

Clusters of neuroimmune biological processes in MDD transcriptomes

To identify the shared differentially expressed genes (DEGs) and biological processes (BPs) across multiple transcriptome datasets from MDD patients (demographics, diagnosis, and treatment information available in Supplementary Table 1), we performed an integrative analysis. This approach aimed to uncover shared molecular mechanisms between PBLs and the anterior cingulate cortex (ACC), a brain area that plays a critical role in MDD by regulating emotional processing, cognitive control, and stress responses [13]. To achieve this, we leveraged publicly available primary datasets from the Gene Expression Omnibus (GEO) repository. Extended Data Fig. 1 provides an overview of primary datasets containing the total number of transcriptomes (3114 individuals: 1877 MDD patients and 1237 controls) along with a detailed bioinformatics workflow used for analysis.

We performed a differential expression analysis on transcriptomic datasets that met our eligibility criteria (see Materials and Methods). Volcano plots (Fig. 1a) illustrate the significant DEGs for each primary dataset, with red and blue indicating up-regulation and down-regulation, respectively. While only a few DEGs overlapped between at least two datasets, and no common DEGs were found across all datasets, we identified numerous convergent BPs involved in MDD pathogenesis (Fig. 1b). We identified 26 common BPs and 76 shared BPs among three datasets [14,15,16]. These processes include cellular response to cytokine stimulus, second messenger-mediated signaling, sensory organ development, regulation of gene expression, and generation of neurons (Fig. 1c). Notably, these BPs are enriched by an interconnected network of immune- and nervous system-related DEGs derived from PBLs and ACC samples (Fig. 1d).

Fig. 1: Characterization of Common Differentially Expressed Genes (DEGs) and Biological Processes Across Blood and Brain (Cingulate Cortex) Transcriptomic Datasets from MDD Patients.
figure 1

a Volcano plots show upregulated (red) and downregulated (blue) DEGs across transcriptomes from the following datasets: Oh, H. et al. (2022), Ramaker, R.C. et al. (2017), Trang, T.L. et al. (2018), and Cathomas, F. et al. (2022). b Venn diagrams illustrate the overlap of DEGs and biological processes (BPs), emphasizing the commonalities among the studies. c Dot plots display the 26 common BPs enriched by DEGs from each dataset. d A network view of these BPs highlights neuroimmune interconnections across related biological processes.

The absence of overlapping DEGs across all datasets prompted us to assess enriched biological processes (BPs) by performing gene enrichment analysis on the DEGs from each dataset individually. We hypothesized that, despite variations in individual gene expression, there might be a consistent pattern in gene ontology across datasets. Visualization of the data using uniform manifold approximation and projection (UMAP) for dimensionality reduction revealed distinct clusters associated with neuroimmune processes across the four datasets (Extended Data Fig. 2a). Similarly, by leveraging the rrvgo package to interpret lists of gene ontology (GO) terms, we obtained an integrated perspective on various neuroimmune BPs through a semantic similarity matrix (Extended Data Fig. 2b). This correlogram suggests a shared yet intricate interplay of neuroimmune mechanisms underlying MDD.

Additionally, we performed gene annotation to explore the relationships between genes and BPs, assigning functional roles to each gene. This analysis revealed that DEGs across the studies regulate key similar neurological BPs, including neuron generation and differentiation, nervous system development, synapse organization and assembly, sensory perception of smell, glial cell differentiation, oligodendrocyte differentiation, and the regulation of nervous system processes. The immune-related BPs involved DEGs in B and T cell activation, cytokine-related processes (e.g., IL-1, IL-2, IL-6, IL-8, IL-17, and interferons [IFNs]), as well as leukocyte chemotaxis and phagocytosis (Extended Data Fig. 2c). Thus, while different genes may be dysregulated in various MDD cohorts, they frequently converge on similar biological pathways and processes, underscoring the complex and multifactorial nature of MDD.

Overlapping genes associated with depression in genome-wide and transcriptome datasets

To enhance the robustness, accuracy, and generalizability of our findings, ultimately leading to a deeper and more reliable understanding of the biological basis of depression and highlighting potential biomarkers, we integrated findings from the most extensive worldwide GWAS meta-analysis (Als, T. D. et al., 2023 [6]). This study included over 1.3 million individuals, with 371,184 diagnosed with depression. Furthermore, we incorporated the most extensive worldwide transcriptomic meta-analysis findings by Wittenberg, G.M. et al., 2020 [8], which included PBL datasets of 1754 MDD cases and 1145 healthy controls (HC). Wittenberg, G.M. and colleagues quantitatively reviewed 10 published whole-genome transcriptional datasets from PBL samples in case-control studies of MDD, focusing on studies developed before 2018.

We performed a consensus analysis involving 411 genes significantly associated with depression (identified by Als, T.D. et al., 2023) [6] and transcriptome datasets from the ACC (Oh, H. et al., 2022 [17], and Ramaker, R.C. et al., 2017 [16]), datasets of bulk RNAseq from blood samples (Cathomas, F. et al., 2022 [15], and Trang, T.L. et al., 2018 [14]), as well as the transcriptomic meta-analysis by Wittenberg, G.M. et al., 2020) [8], which found 343 genes with a false discovery rate <5%. This approach identified 31 shared genes between genomic and transcriptomic studies (Fig. 2a-c), further visualized in a gene-BPs network (Fig. 2d), indicating their potential involvement in standard biological processes related to MDD. Six DEGs (Oh, H. et al., 2022 [17]; Ramaker, R.C. et al., 2017 [16]), OR2B2, CD40, CHML, FES, SAMD5, and SERPING1, were commonly identified in the GWAS meta-analysis and as DEGs in the ACC datasets (Fig. 2a). Additionally, 25 genes from the GWAS meta-analysis were identified as DEGs in the blood transcriptomes. Specifically, 17 genes overlapped with the Trang, T.L. et al., 2018 [14] dataset (Fig. 2b; ACVR1B, ARIH2, FLRT1, HARS2, MLEC, MST1, NEGR1, NRD1, PAX6, POGZ, PPP6C, RBM4, SHISA9, SORCS3, STK32A, TCF4, and USP19), two genes with the Cathomas, F. et al., 2022 [15] dataset (Fig. 2b; HIST1H3I and KCNJ13), and six DEGs (Fig. 2c; BTG3, CTNNA3, NUP43, PPP3CC, RERE, and SHANK2) were found in the Wittenberg, G.M. et al., 2020 [8] transcriptomic meta-analysis. These results underscore the potential involvement of these shared genes in the molecular mechanisms underlying MDD and suggest specific gene relationships that may be critical in the disorder’s pathology.

Fig. 2: Overlapping Genes Associated with Depression in Genome-Wide and Transcriptome Datasets.
figure 2

a-c Venn diagrams display the number of shared genes identified across each primary dataset. d The 31 shared genes are shown in a gene-biological processes (BPs) network. Yellow and green nodes represent DEGs from blood and cingulate cortex datasets, respectively. Gray nodes represent BPs, and edges denote gene-BP associations.

Altered neuroimmune gene correlations define the stratification of major depressive disorder

Next, we evaluated the ability of neuroimmune gene expression to differentiate MDD patients from healthy controls (HC), a finding that could have important implications for personalized medicine. Principal Component Analysis (PCA) revealed that the overlapping DEGs between the GWAS meta-analysis (Als, T. D. et al., 2023 [6]) and the ACC dataset (Ramaker, R.C. et al., 2017 [16]) stratified MDD patients from healthy controls (Extended Data Fig. 3). Additionally, PCA analysis with DEGs in the PBMCs from Trang, T.L. et al., 2018 [14] (Fig. 3a, b), which exhibited the highest number (DEGs = 17) of overlapping DEGs with the GWAS meta-analysis (Als, T. D. et al., 2023 [6]), effectively stratified MDD patients from HC.

Fig. 3: Stratification and Relationships Between Shared Genes Across the Genomic and Transcriptomic Primary MDD Studies.
figure 3

a The bar plot displays the percentage of explained variance for each principal component (PC) dimension from a Principal Component Analysis (PCA) with spectral decomposition. b The PCA plot illustrates the stratification of MDD patients and healthy controls based on DEGs from the dataset of Trang, T.L. et al. (2018). Genes with positive correlations align on the same side of the plot, while negatively correlated genes point in the opposite direction. Small circles represent concentration ellipses around the mean points of each group. The accompanying histograms depict the density distribution of samples (individuals) for each group. c Heatmaps obtained from exploratory factor analysis of the 17 shared genes with negative and positive loadings are visualized for MDD patients and healthy controls from the Trang, T.L. et al. dataset. d Correlograms show the topological correlation pattern among shared genes. The color scale bar represents the range of Spearman’s rank correlation coefficient for controls (left side of the graph) and MDD patients (right side). The size of the gene names is proportional to the number of associations they have with other genes in the network.

We then conducted further analyses to elucidate the relationship patterns among the identified shared genes. This approach is crucial as it provides new insights into understanding the molecular underpinnings of MDD. For this purpose, we used the dataset from Trang, T.L. et al., 2018 [14], which contains the highest number of shared DEGs with the GWAS meta-analysis. The Exploratory Factor Analysis (EFA) revealed an overall increase in the factor loading of DEGs (Fig. 3c) in the MDD group. Likewise, correlation analysis showed that while several correlations (based on Spearman’s rank correlation coefficient) were maintained, there were increased topological changes in correlation patterns among DEGs in MDD patients (Fig. 3d). For instance, there were increased connections between PAX6 and MST1 with other genes when comparing MDD to HC. However, we also observed a few reduced associations, including those between SORCS3 and MLEC genes. This detailed analysis underscores the complexity of the molecular interactions in MDD and highlights specific gene relationships that may play critical roles in the disorder’s pathology. Hence, these results indicate an interplay between DEGs in MDD through changes in systemic relationships.

Probability of developing MDD based on gene expression

To elucidate the biological aspects of DEGs in MDD from a systems biology perspective, we utilized a multivariate analysis of variance (MANOVA) to quantify the relative effects of gene expression on MDD concerning healthy controls. By leveraging the comprehensive dataset from Trang, T.L. et al., 2018 [14] the relative effect analysis identified a subset of DEGs, including SORCS3, PPP6C, NEGR1, and PAX6 (Fig. 4a). These genes stood out due to their non-overlapping confidence intervals, suggesting a more pronounced and distinct role in MDD pathophysiology. To ensure the robustness of our findings and to guard against the potential for spurious significance due to multiple comparisons, we conducted a rigorous false discovery rate (FDR) analysis. This rigorous statistical approach identified NEGR1, PAX6, PPP6C, and SORCS3 as DEGs in the dataset from Trang, T.L. et al., 2018 [14], which remained significant after FDR correction (Extended Data Fig. 4). However, none of these four DEGs showed significant expression changes in ACC samples (Extended Data Fig. 5).

Fig. 4: The Probability of Developing MDD Based on the Expression of NEGR1, PAX6, PPP6C, and SORCS3 Genes.
figure 4

The relative effect of neuroimmune genes in MDD. a The graphic shows the relative effects (calculated using the MANOVA test) of DEGs present in PBMCs (the dataset from Trang, T.L. et al., 2018) of MDD patients versus healthy controls. The circle size indicates the probabilistic measure (relative effect size). Shadows show confidence intervals. Turquoise and dark purple dots represent healthy individuals and MDD patients, respectively. b The heat map displays the enriched biological processes (BPs), molecular functions (MFs), and synapse-related processes based on the SynGO 2024 database for the four selected genes. The graphic at the bottom shows the number of enriched processes. c Scatter plots display the binomial logistic regression results for the expression of each gene, NEGR1, PAX6, PPP6C, and SORCS3, across the MDD group compared to healthy controls (dataset: Trang, T.L. et al., 2018). The MDD groups are represented in black for “yes” (1 = presence of MDD) and green for “no” (0 = healthy controls).

Notably, the enrichment analysis of biological processes, molecular functions, and synapse-related processes associated with these four DEGs, as revealed by the SynGO 2024 database, also suggested a functional role in the neurobiological pathways in the PBLs, serving as valuable indicators of MDD risk (Fig. 4b). Among the potential neuroimmune roles ascribed to these four DEGs, NEGR1 and SORCS3 are implicated in various synaptic-related processes (Fig. 4b), as revealed by enrichment analysis of BPs, molecular function (MF), and synapse-related processes. Specifically, SORCS3 is involved in neuropeptide receptor activity, while NEGR1 participates in synaptic organization and function. PPP6C, conversely, is associated with protein dephosphorylation, the negative regulation of defense response, and several cell-cycle-associated processes. PAX6, a transcription factor, plays a crucial role in neuron fate commitment, the positive regulation of transcription regulatory region DNA binding, the regulation of neurogenesis, glucose homeostasis, and visual perception. These diverse functional roles played by these DEGs suggest that they may be not only predictive biomarkers for MDD but also key molecules in the complex molecular networks that underlie the disorder’s pathophysiology.

Considering the above findings, we proceeded to evaluate the predictive potential of these DEGs for MDD risk. Through logistic regression analysis, we identified that only PAX6 and SORCS3 exhibited significant associations with the likelihood of developing MDD (Fig. 4c and Extended Data Fig. 6). These results underscore the potential of PAX6 and SORCS3 as reliable biomarkers, providing further evidence of their critical involvement in the neuroimmune mechanisms underlying MDD. Moreover, the distinct functional roles of these DEGs within neurobiological and synaptic processes, coupled with their statistical significance in a prediction model, suggest their potential utility in early diagnostics for MDD management.

Diseasome of NEGR1, PAX6, PPP6C, and SORCS3

To gain a deeper understanding of the implications of NEGR1, PAX6, PPP6C, and SORCS3 in MDD, we conducted a comprehensive diseasome analysis. The diseasome conceptual framework maps the genetic and molecular connections between diseases, providing a systems view of how various disorders are interlinked through shared genetic factors. This approach revealed the gene-disease associations (GDAs) for the NEGR1, PAX6, PPP6C, and SORCS3 genes. PAX6 emerged as the gene most associated with different diseases, followed by NEGR1, SORCS3, and PPP6C (Extended Data Fig. 7a). The heatmap (Extended Data Fig. 7b) specifically highlights diseases related to at least two genes, revealing potential shared genetic factors among various mood and mental disorders and their comorbidities, such as intellectual disability, obesity, diabetes, and malignancies. Additionally, single nucleotide polymorphisms (SNPs) in these genes have been characterized and associated with MDD-related symptoms, as reported in the GWAS catalog (Extended Data Figs 7c, d).

Consistent differential expression of PAX6 across blood and amygdala in MDD

To expand our understanding of neuroimmune interactions between the blood and the central nervous system (CNS), we examined whether NEGR1, PAX6, PPP6C, and SORCS3 genes exhibit differential expression in CNS regions beyond the ACC. We aimed to identify common DEGs across these regions and compare them with the meta-analysis results of Als, T.D. et al. 2023 [6]. Transcriptomic data for the amygdala, which is involved in emotional regulation, stress response, and mood processing, were obtained from the primary dataset of Labonté B. et al. 2017 [18], while data for additional CNS regions were extracted from Li, J.Z. et al. 2013 [19].

We identified varying numbers of overlapping DEGs across different CNS regions, with the amygdala showing the highest number of DEGs (Fig. 5a). Notably, PAX6 was identified among the DEGs in the amygdala (Fig. 5b). In contrast, NEGR1, PPP6C, and SORCS3 did not exhibit significant differential expression in other CNS regions. This suggests that the expression of these genes may be region-specific within the CNS or that other factors, such as the local tissue environment or disease states, could influence their regulation. Nonetheless, several other DEGs identified in our study overlapped with those reported in the Als, T.D. et al., 2023 [6] meta-analysis, highlighting critical genes implicated in both CNS and peripheral blood transcriptomes of MDD patients. For instance, genes such as PAX6, ACVR1B, POGZ, and FLRT1 were detected in the blood transcriptome dataset from Trang, T.L. et al. 2018. Additionally, KCNJ13 was identified as a DEG in Cathomas, F. et al. 2022, while CTNNA3 emerged as a DEG in the Wittenberg, G.M. et al. 2020 [8] meta-analysis.

Fig. 5: Differentially Expressed Genes Overlapping Across Various Central Nervous System Regions with Als, T.D. et al., 2023 Meta-analysis.
figure 5

a The figure shows the names of different CNS regions along with the number of overlapping DEGs. Transcriptome data for the amygdala were obtained from the primary dataset by Labonté, B. et al. (2017), while data for other CNS regions were sourced from Li J.Z. et al. (2013). b The network illustrates the DEGs overlapping with those identified in the Als, T.D. et al. 2023 meta-analysis. The PAX6, ACVR1B, POGZ, and FLRT1 genes are also present in the blood transcriptome data from Trang, T.L. et al. (2018). Additionally, KCNJ13 is identified as a DEG in the study by Cathomas, F. et al. 2022, and CTNNA3 is found as a DEG in the meta-analysis by Wittenberg, G.M. et al. 2020.

Taken together, the shared DEGs, particularly PAX6, point to potential biomarkers, providing a more integrative perspective on the systemic and neurobiological processes involved in MDD.

Immunomodulatory effects of chronic restraint stress on PAX6 expression and immune cell populations in mice

The findings above indicate that PAX6 functions as a critical molecular predictor of MDD. To further explore its role, we assessed stress-induced PAX6 protein expression in mice. Using flow cytometric analysis, we assessed immune cell populations and PAX6 expression in whole blood from mice subjected to chronic restraint stress (CRS) for 7 or 18 days, a well-established model for inducing anhedonic behaviors in rodents [20]. Across both the 7- and 18-day paradigms, our analysis identified four distinct immune cell clusters (Fig. 6a-c; Extended Data Fig. 8 [gate strategy] and Extended Data Fig. 9 [flow cytometric results following 18 days of CRS]), characterized by specific markers (CD45, TCRβ, CD3, CD4, CD8, CD19, CD11b, and Ly6C). At 7 days post-stress, Cluster 4, defined by CD11b and Ly6C expression, exhibited a significant upregulation of PAX6 (Fig. 6d), suggesting its involvement in stress-mediated immune responses. Notably, at this time point, we recently demonstrated that depressive-like behaviors had not yet emerged [20]. Furthermore, the increase in PAX6 expression in CD11b + Ly6C+ cells correlated with a higher frequency of these cells, indicating an expansion of inflammatory myeloid populations, including monocytes and neutrophils. However, in this model, depressive-like behaviors became evident at 18 days post-stress [20]. Interestingly, at this later stage, we observed that PAX6 expression in myeloid cells (Cluster 1) returned to baseline levels (Extended Data Fig. 9d), suggesting a transient role in early stress responses.

Fig. 6: Immunophenotyping of Immune Cells in C57BL/6 J Mice Subjected to Chronic Restraint Stress (CRS).
figure 6

a-c Flow cytometry analysis identifies four distinct immune cell clusters based on the expression of CD45, TCRβ, CD3, CD4, CD19, CD11b, and Ly6C. d A detailed analysis of immune cell clusters reveals upregulation of PAX6 expression in immune cells following 7 days of CRS. The figure demonstrates the immunomodulatory effects of CRS, highlighting an increase in PAX6 expression in Cluster 4, which consists of a higher number of myeloid cell populations (CD11b and Ly6C).

Taken together, these findings indicate that PAX6 upregulation in inflammatory myeloid cells may represent an early immune response to stress, potentially preceding the onset of depressive behaviors. The subsequent normalization of PAX6 expression at 18 days post-stress raises the possibility that its early activation contributes to immune modulation rather than sustained immune dysfunction. These insights highlight PAX6 as a potential biomarker for stress-related immune alterations, warranting further investigation into its functional role in MDD.

Discussion

To our knowledge, this work represents the first dataset-based analysis integrating genomics, transcriptomics, and immunophenotyping validation of leukocyte-expressing PAX6 as a critical predictor of MDD. A key strength of our study is the enhanced robustness and generalizability achieved through the integration of the most extensive worldwide-published GWAS meta-analysis [6], encompassing over 1.3 million individuals, and the most comprehensive PBL transcriptomic meta-analysis by Wittenberg, G.M. et al. [8]. Broadly, our findings align with a recent integrative analysis of transcriptional datasets using the RobustRankAggreg package (RRA) by Zhong et al. [21]. These authors investigated the correlation between immune-associated DEGs in PBMC and ACC datasets. They also found more robust DEGs in the blood than in the ACC. However, as described here, the 25 overlapping genes found in PBLs form a network of classical molecules associated with the nervous system. These DEGs stratify between MDD and healthy control groups in PCA, with more pronounced separation between groups using PBL-derived DEGs than those from the ACC.

To gain new insights into MDD pathophysiology, correlation analysis identified altered connections between the common genomic and transcriptomic DEGs in PBLs. For instance, the increased connections of PAX6 and MST1 with other genes may implicate these as central hubs in MDD pathology, potentially influencing multiple downstream biological processes. PAX6, known for its role in neural development [22], and MST1, involved in apoptosis and immune response [23], indicate new neuroimmune pathways that could be critical in the development and progression of MDD. Similarly, the reduced associations between SORCS3, which is involved in neuronal development and synaptic function [24, 47], and MLEC, a promoter of M1 to M2 macrophage polarization [25], suggest potential dysregulation or shifts in molecular interactions that merit further investigation. These findings suggest a dynamic relationship between the immune and nervous systems that extends beyond the known “crosstalk” between both systems, involving the release and diffusion of neurotransmitters from nervous tissue to regulate immune cells through leukocyte-surface receptors [26,27,28,29]. Consistent with this, immune cells can also synthesize and release neurotransmitters [30, 31], including acetylcholine (ACh) [32,33,34], serotonin [35,36,37], and glutamate [38] that modulate cell activation, acting as autocrine or paracrine modulators.

Growing evidence supports the association of NEGR1, PAX6, PPP6C, and SORCS3 genes with MDD. For example, previous research has implicated PAX6 in neurodevelopmental processes and visual perception [39, 40], which are relevant to the cognitive and sensory symptoms observed in MDD [41, 42]. Moreover, the involvement of PPP6C in protein dephosphorylation and cell-cycle processes [43] aligns with existing literature suggesting dysregulation of cell-cycle signaling [44] in MDD. Additionally, the involvement of NEGR1 and SORCS3 in synaptic function and neuropeptide receptor activity [24, 45, 46], alongside their associations with other mental disorders [47], suggests that synaptic dysregulation could be a shared mechanism.

Utilizing a diseasome analysis, we mapped the GDAs for NEGR1, PAX6, PPP6C, and SORCS3, revealing their potential roles in a spectrum of human disorders. This approach provides a valuable systems-level perspective on the pathophysiology of MDD. The shared genetic associations of NEGR1, PAX6, PPP6C, and SORCS3 with other diseases underscore the complex interplay between genetic factors in MDD and other conditions. For instance, PAX6, a key regulator of neurodevelopment, has been implicated in cognitive function and intellectual disability [39, 40]. Its dysregulation in MDD may contribute to the cognitive deficits frequently observed in these patients. Additionally, the association between PAX6 expression in immune cells and its upregulation under stress conditions supports the hypothesis that neurodevelopmental genes play a role in immune-mediated pathways relevant to MDD. Similarly, NEGR1 and SORCS3, which are involved in synaptic function and neuropeptide receptor activity [46, 48], have been linked to both mood disorders and metabolic conditions such as obesity and diabetes [47, 49].

This raises the possibility that dysregulation of these genes contributes not only to the emotional and cognitive symptoms of MDD but also to the high prevalence of metabolic disturbances in affected patients. This aligns with evidence suggesting that chronic stress and altered neuroimmune signaling can induce metabolic changes, reinforcing the bidirectional relationship between obesity and depression [4, 50]. Furthermore, PPP6C, which plays a role in cell-cycle regulation and immune modulation, has been associated with malignancies [43]. This underscores the systemic impact of immune dysregulation in MDD, extending beyond neuroinflammation to systemic inflammatory conditions. Such findings may help explain the high comorbidity between MDD and metabolic syndrome, autoimmune diseases, and certain cancers. By integrating these GDAs into a neuroimmune framework, our findings highlight that genetic factors influencing both neuronal and immune function contribute to the diverse symptomatology and comorbidities observed in MDD patients [51]. This overlap emphasizes the need for an integrative approach to understanding MDD, considering its unique and shared pathophysiological aspects [10].

Consistent with the interconnection between the nervous and immune systems, our findings in a mouse model of CRS suggested that, in addition to its role in neurogenesis, PAX6 may also be implicated in the immune response, particularly under conditions of psychological stress. We utilized a well-established CRS mouse model that mimics key physiological and molecular aspects of depression. However, while this model effectively captures stress-related biological changes, it has inherent limitations in fully replicating the complex symptomatology of MDD in humans. Our findings provide novel insights into the dynamic role of PAX6 in stress-induced immune responses and its potential relevance to MDD pathophysiology. The early upregulation of PAX6 in CD11b + Ly6C+ myeloid cells following seven days of chronic restraint stress suggests that it may serve as an early marker of stress-induced immune activation, preceding the onset of depressive-like behaviors [23].

This transient expression pattern, with PAX6 levels returning to baseline by 18 days post-stress, indicates that its role may be more closely associated with an acute immune adaptation to stress rather than a sustained pro-inflammatory state linked to depression. The observed expansion of inflammatory myeloid cells at seven days post-stress, coupled with the subsequent normalization of both cell frequency and PAX6 expression at 18 days, suggests that early immune activation may play a protective or compensatory role in stress resilience. These results support the growing evidence that neuroimmune interactions are critical in the early phases of stress adaptation and may help distinguish stress-responsive immune changes from those that contribute to the onset and progression of MDD [52]. Research indicates that chronic stress can lead to physiological and hormonal adaptations, resulting in neurobiological consequences and behavioral impairments akin to those observed in psychiatric disorders. Additionally, disruptions in neuroimmune mechanisms have been implicated in depression, underscoring the importance of understanding these interactions in the context of MDD pathophysiology [10]. Future studies should explore whether PAX6-driven immune responses are mechanistically involved in modulating stress resilience or vulnerability.

However, our study has limitations. By employing an exploratory, large-scale multi-omics integration approach, we prioritized robust molecular signatures in MDD rather than establishing causal relationships through mechanistic experiments. While this discovery-driven strategy lacks direct validation, it provides a crucial foundation for future targeted studies. By consistently identifying DEGs across datasets and linking them to neuroimmune pathways, we highlight key candidates such as PAX6 for functional investigations, including gene perturbation and pathway analyses. This systems-level framework strengthens empirical evidence to guide future mechanistic research and enhance translational potential.

Another limitation of our study is the clinical heterogeneity across the datasets included in our integrative analyses. MDD encompasses a spectrum of clinical phenotypes, ranging from first-episode to chronic and treatment-resistant forms, and is often accompanied by comorbid psychiatric and somatic conditions. These distinctions are not trivial, as they may influence the immune and transcriptomic landscapes under investigation. For instance, the dataset by Cathomas et al. (2022), which was included in the meta-analysis by Wittenberg et al. (2020), focuses specifically on patients with treatment-resistant depression (TRD), a population that may exhibit more pronounced immune dysregulation compared to non-TRD individuals. Moreover, variation in treatment status, symptom severity, duration of illness, and comorbidity profiles likely introduces biological variability that could confound direct comparisons or obscure subtype-specific molecular signatures. Future studies with harmonized clinical phenotyping and stratified subgroup analyses will be essential to disentangle how specific clinical profiles modulate neuroimmune interactions in MDD. This represents a key direction for the field and a necessary step toward personalized neuroimmunological biomarkers and interventions.

In conclusion, we not only identified a set of DEGs with pronounced effects on MDD risk and potential as biomarkers but also delved deeper to understand their systemic implications within the context of disease networks. Understanding how these genes contribute to the neuroimmune processes in MDD could unveil novel insights into the disorder’s etiology. By elucidating the molecular underpinnings of MDD, we can move closer to developing personalized treatment strategies that address the specific biological mechanisms underlying each patient’s condition. Future studies are warranted to functionally validate the roles of these genes and their interactions in MDD pathogenesis. Additionally, future research should focus on validating these findings in larger, independent cohorts and exploring the functional implications of these altered gene relationships. This knowledge can pave the way for developing novel diagnostic tools that explore these altered molecular pathways.

Online methods

Extended Data information, including demographic characteristics of MDD patients and the input/output transcriptomic data used to generate the Figures using R programming language, is available in Supplementary Tables.

Data curation

Genomic dataset

To identify genetic changes associated with MDD, we used the genes associated with depression across the genome, as recently published by Als, T.D. et al., (2023) [6], as a reference. This association was determined using the Multi-Marker Analysis of GenoMic Annotation (MAGMA) tool, as implemented in the Functional Mapping and Annotation of Genome-Wide Association Studies (FUMA) platform v.1.3.6a [53, 54]. This approach characterized 411 genes significantly associated with depression after Bonferroni correction for the number of genes tested.

Transcriptomic datasets

We searched for eligible transcriptome datasets published in the online library PubMed (https://pubmed.ncbi.nlm.nih.gov/) and Gene Expression Omnibus (GEO) [55] indexed under the terms “major depressive disorder” and “human.” We found 30 studies reporting whole-genome transcriptional data.

Studies were subsequently excluded based on the following criteria: 1) studies available only in original articles and not in databases (n = 6), 2) studies using snRNA-Seq (single-nucleus RNA sequencing) (n = 1), 3) studies addressing other categories of depression besides MDD (n = 1), 4) datasets from neurons, astrocytes, and immortalized B lymphocyte cell lines (n = 11), and 5) datasets with fewer than 50 significant DEGs (n = 1). This resulted in 10 eligible datasets published in 10 different articles, which were accessible for reanalysis.

Notably, our analysis initially focused on ACC tissue due to the availability of RNA sequencing data, while the amygdala data relied on microarrays. To broaden our understanding of MDD, we incorporated additional brain regions despite technological differences. Two of these transcriptome datasets were derived from the ACC of patients with depression and healthy controls (Oh, H. et al. [2022] - GSE193417 [17] and Ramaker, R.C. et al. [2017] - GSE80655 [16]) from bulk RNA-seq. Another dataset included six brain regions, namely the Anterior Insula (aINS), Cingulate Gyrus 25 (Cg25), Dorsolateral Prefrontal Cortex (DLPFC), Nucleus Accumbens (Nac), Orbitofrontal Cortex (OFC), and Subiculum (Sub), from patients with depression and healthy controls (Labonté et al. [2017] - GSE102556) from bulk RNA-seq [18]. Additionally, there was a microarray dataset that included the amygdala of patients with depression and healthy controls (Li J. Z. et al. [2013] - GSE45642) [19], and two other datasets obtained transcriptome data of blood samples from MDD patients and healthy controls (Cathomas, F. et al. [2022] [15] - GSE18855 and Trang T.L. et al. [2018] [14] – (data available at GitHub: https://github.com/insilico/DepressionGeneModules) from bulk RNA-seq.

We also incorporated the meta-significant genes identified by Wittenberg, G.M. et al. (2020) [8], who evaluated four datasets from different sequencing methods, to enhance the robustness of our analyses using bulk RNA-seq and microarray data.

Differential expression and functional enrichment analyses

The read counts of each dataset were transformed into log2 counts per million (CPM), and DEGs between the groups (MDD patients versus healthy controls) were identified using the DESeq2 pipeline in R (v. 4.3.2) [56], with an adjusted p-value < 0.05, ensuring that only the most robust and statistically significant expression changes were considered in the analysis.

Functional enrichment analysis of biological processes (BPs) was performed using gene ontology analysis. For this purpose, we utilized the 31 genes common to both genomic and transcriptomic data associated with depression. BPs were identified using the clusterProfiler package [57] in R programming. Enriched BPs were defined as significant based on an adjusted p-value < 0.05. The significance and magnitude of biological term enrichment are visualized in dot plots graphs generated with clusterProfiler. Furthermore, we utilized network graphs to illustrate interactions between enriched terms and DEGs, employing the ggnet2 package [58] in R (described below). This approach provides a comprehensive view of complex biological relationships, enhancing our understanding of the biological processes underlying depression.

To visualize neuroimmune interactions, we filtered significant gene enrichment associations (p-value < 0.05) and categorized enriched BP into “neuro,” “immune,” or “other.” Subsequently, we created a network object to display interactions (edges) between BPs and genes (nodes) using ggnet2 [58] in R.

Integrated transcriptomic consensus strategy and cross-dataset convergence analysis

To address the heterogeneity inherent in transcriptomic datasets from different sources and platforms, we employed a consensus-based integration strategy designed to identify robust neuroimmune signals across tissues and cohorts (Extended Data Fig. 1). This methodology builds upon a framework previously established by our group [59].

We curated a set of publicly available transcriptomic datasets from individuals diagnosed with MDD and matched healthy controls, derived from both bulk RNA-seq and microarray platforms. These included transcriptomes from PBLs and CNS regions, sourced from the Gene Expression Omnibus (GEO) and associated repositories.

Each dataset underwent differential gene expression analysis independently using the DESeq2 pipeline. To overcome the limited overlap of DEGs across studies, we focused on the integration of BPs. GO enrichment analyses were performed on DEGs from each dataset, followed by semantic similarity-based clustering of GO terms using the rrvgo package. This allowed us to detect shared neuroimmune biological themes even in the absence of gene-level concordance.

To visualize and validate cross-dataset convergence, we performed Uniform Manifold Approximation and Projection (UMAP) dimensionality reduction and network analysis of the enriched terms. Additionally, we conducted meta-analytic integration with the largest GWAS of depression to date (Als et al., 2023) and the most comprehensive transcriptomic meta-analysis of PBLs (Wittenberg et al., 2020), identifying a core set of 31 shared genes implicated in MDD.

This integrative systems framework enabled us to uncover consistent neuroimmune signatures across independent datasets (described below), tissues, and analytical methods, enhancing the robustness and translational relevance of our findings.

Analysis of neuroimmune clusters

To evaluate the neuroimmune clusters formed by BPs, we utilized the web tool “appyters,” integrated with EnrichR [60]. This approach enables the identification of enriched BPs and organizes them into clusters on a scatter plot, facilitating the visualization of their interactions with biological processes and among themselves, using the Leiden algorithm for clustering, resulting in a reduction in Uniform Manifold Approximation and Projection (UMAP) dimensionality.

Principal component analysis

Principal Component Analysis (PCA) [61] with spectral decomposition [62] was used to assess the ability of transcripts to differentiate between patients with MDD and healthy controls. This analysis was performed using the prcomp and princomp functions from the factoextra package [63,64,65] in R. These tools facilitate PCA by decomposing the dataset into principal components, aiding in visualizing and understanding the variance within the data, thus evaluating the effectiveness of the genes in distinguishing between the different groups.

Correlation analysis among DEGs

We utilized multivariate analysis of correlations among DEGs in each group to identify and visualize correlations between different DEGs within each group (controls and MDD), as previously described [66]. The corrgram package [67, 68] in R created correlation plots, clearly representing the relationships among the DEGs. The psych package [69] was employed for calculating correlations and conducting additional statistical analyses, while inlmisc [70] facilitated data manipulation and presentation. Additionally, correlations among DEGs were visualized in circular network formats using the qgraph package [71] in R.

Exploratory factor analysis (EFA)

A critical aspect of Exploratory Factor Analysis (EFA) is uncovering underlying structures in real-world problems [72]. Specifically, R-mode methods of EFA aim to investigate the relationship between variables. The Dandelion Plot is an innovative method for visualizing EFA in R [72], providing a more adequate representation of factors.

Relative effect of gene expression on MDD

The relative effect of NEGR1, PAX6, PPP6C, and SORCS3 gene expression on the MDD phenotype was evaluated using Multivariate Analysis of Variance (MANOVA). Statistical analysis was conducted using the R packages npmv [73] and reshape [74], as previously described [59]. This approach allows for determining if there are statistically significant differences between groups concerning dependent variables. It is instrumental when multiple dependent variables are suspected to be correlated. Additionally, we computed confidence intervals for the relative effects using bootstrap techniques involving 1000 resamples. The data obtained were visualized using ggplot2 [75].

False discovery rate analysis

False Discovery Rate (FDR) analysis [76] compared gene expression between different diagnoses using the packages dplyr [77], ggplot2 [75], rstatix [78], and ggpubr [79]. Initially, we grouped the data by genes and performed t-tests to compare expression means between diagnoses, adjusting p-values using the FDR method. The results were visualized using boxplots, highlighting significant differences in expression between diagnoses according to adjusted p-values < 0,05, generated usinf the ggplot [75] function in R.

Binomial logistic regression

We employed binomial logistic regression [80] to analyze the relationship between the expression of the four genes with predictor variables with FDR < 0.05 (NEGR1, PAX6, PPP6C, and SORCS3) and the binary outcome of MDD as a control condition. The glm function was used from the base R package for logistic regression. Statistical significance was determined using a p-value < 0.05.

NEGR1, PAX6, PPP6C, and SORCS3 association with MDD and diseasome analyses

To further evaluate the association of NEGR1, PAX6, PPP6C, and SORCS3 with the MDD phenotype, we utilized data from the GWAS Catalog. We filtered the SNPs related to these genes based on their p-values and risk scores. The genomic locations of the genes and their associated SNPs were represented using a circos plot generated with the R circlize package [81]. Additionally, we visualized the significance, risk, and SNP characteristics using R ggplot2 [82] plots to provide detailed insights into their roles in MDD.

Moreover, we assessed the association of NEGR1, PAX6, PPP6C, and SORCS3 with MDD and comorbidities by evaluating the diseasome of these four genes using Dysgenet [83] through EnrichR [60]. This approach integrates information on genes and their associations with various genetic diseases, disorders, and syndromes, compiling data from scientific literature and genetic variant databases. The creation and visualization of the diseasome network were conducted using the “ggnet2“ [84] package in R, which correlates genes with diseases, disorders, and syndromes, where nodes represent genes and edges represent relationships between them.

Stress-induced expression of PAX6

Mice

All animal experiments were conducted in accordance with institutional guidelines and regulations, and were approved by the Brigham and Women’s Hospital Institutional Animal Care and Use Committee (IACUC protocol no. 2022N000120).

Eight-week-old male C57BL/6 J mice were purchased from the Jackson Laboratory (Strain #000664) and housed in a conventional specific pathogen-free facility at the Hale Building for Transformative Medicine, Brigham and Women’s Hospital, Harvard Medical School. Mice were group-housed under a standard light cycle (12 h light/dark) at 20-23 °C and humidity (~50%) with ad libitum access to water and food.

Chronic restraint stress (CRS) model

This method is widely utilized to induce psychological stress in rodents [20]. It is frequently employed in studies examining the effects of stress on behavior, physiology, and molecular biology, including models of depression and anxiety [85]. Tubes were washed with 70% ethanol before every use. Thirty to forty air holes were drilled using a 1/16” drill bit into 50-mL conical tubes (Falcon). Mice were placed into tubes and positioned horizontally in a tube rack for 6 h each day for 7 or 18 consecutive days. Stress sessions were conducted daily between 9:00 and 10:00 AM. After restraint, mice were returned to their cages, and the tubes were washed with soap, water, and 70% ethanol.

Flow cytometric analysis

Blood was extracted from the aorta using a 21 G needle (Becton Dickinson, #305165) and placed in a blood collection tube containing the anticoagulant heparin (Becton Dickinson, #365965) on ice. Briefly, blood was then resuspended in 1 mL of ACK lysing buffer (Thermo Fisher Scientific) for 5 min. Cells were centrifuged at 1600 rpm for 5 min, and the supernatants were discarded. The cells were stained for flow cytometric analyses using the following dyes/antibodies: fixable viability dye Zombie Aqua (1:1000; BioLegend; cat.# 423101) to exclude dead cells, BUV395-anti-CD45 (BD Biosciences; 1:200; cat.# 564616), BUV737-anti-CD3 (BD Biosciences; 1:200; cat.# 612771), PE-Cy7-anti-TCRb (Biolegend; 1:200; cat.# 109222), BUV496-anti-CD4 (BD Biosciences; 1:300; cat.# 612952), BV711-anti-CD8 (Biolegend; 1:300; cat.# 100748), AF700-anti-CD19 (Thermo Fisher Scientific; 1:300; cat.# 56-0193-82), APC-anti-CD11b (Biolegend; 1:100; cat.# 101212), BV785-anti-Ly6C (Biolegend; 1:100; cat.# 128041) and AF488-anti-PAX6 (Bioss USA; 1:100; cat.# bs-11204R). Surface markers were stained for 25 min at 4°C in Mg2+ and Ca2+ free HBSS with 2% FCS, 0.4% EDTA (0.5 M), and 2.5% HEPES (1 M). The cells were then fixed in Cytoperm/Cytofix (eBioscience) and permeabilized with Perm/Wash Buffer (Thermo Fisher Scientific). Flow cytometric acquisition was performed on an AURORA (Cytek) instrument using DIVA software (BD Biosciences), and the data were analyzed with FlowJo software version 10.1 (TreeStar Inc).

Data from 31,000 cells per mouse were processed using R 4.4.0 and RStudio 2024.04.2. The raw data underwent rigorous pre-processing, including compensation with the flowCore library, standardized logical transformation (w = 0.5, t = 1,000,000, m = 4.5), and z-score normalization to ensure consistency. Cells were concatenated based on a 10-color panel: CD3, CD4, CD8, TCRβ, CD11b, CD19, Ly6C, CD45, and PAX6, and categorized into Control and Stressed groups. Differential analysis was performed using FlowSOM, focusing on lineage-defining markers (CD19, CD3, CD4, CD8, and CD11b) and identifying four meta-clusters. Initial cluster delineation was guided by manual gating in Kaluza Software with Radar plots. UMAP provided a dimensional visualization of marker density and population distribution across groups and clusters.