Introduction

AD is a prevalent age-related neurodegenerative disorder characterized by progressive memory loss, cognitive decline, and impaired daily functioning, severely impacting quality of life1,2. Its pathogenesis involves the interplay of multiple pathways, such as amyloid-β (Aβ) accumulation, tau pathology, neuroinflammation, oxidative stress, and genetic predisposition, yet the core driver remains elusive3,4,5,6,7,8. Due to this multifactorial etiology, most current therapies provide only symptomatic relief without altering disease progression. The global burden of AD continues to rise, imposing profound personal, social, and economic consequences9. East Asia bears the highest absolute burden due to rapid population aging10,11with the total cost of AD in China projected to increase from US $248.71 billion in 2020 to US $1.89 trillion in 2050. Worldwide, the costs of dementia were estimated at US $957.56 billion in 2015 and are expected to reach US $9.12 trillion by 2050¹². In 2018, AD International estimated that ~ 50 million people were living with dementia globally, a number projected to triple by 2050124. Despite extensive research, the pathogenesis of AD remains contentious, necessitating further investigation into its mechanisms, the identification of novel biomarkers, and the improvement of diagnostic models to enhance intervention and treatment strategies.

Neuroinflammation is a recognized hallmark of AD and other neurodegenerative disorders13. The blood-brain barrier (BBB), choroid plexus, and meninges facilitate communication between the brain and the peripheral immune system14,15,16. Among these, the BBB is crucial for protecting neurons from systemic influences and maintaining central nervous system homeostasis. In AD patients, BBB integrity is compromised, leading to increased capillary permeability, degeneration of barrier-associated cells, and infiltration of circulating leukocytes and erythrocytes into the brain parenchyma14. Although the immune system’s involvement in AD pathogenesis is established, the precise role of immune cells and their genetic components in AD diagnosis remains unclear. Thus, identifying novel biomarkers and developing a more accurate diagnostic framework for AD are critical for early detection and improved patient outcomes.

Bilirubin, a byproduct of hemoglobin metabolism, possesses antioxidant properties but also exhibits immunomodulatory and neurotoxic effects17. Elevated serum bilirubin levels are observed in AD patients18. Under normal physiological conditions, bilirubin does not cause neurotoxicity; however, significant increases in bilirubin levels, as seen in pathological jaundice, can lead to neurotoxicity due to BBB disruption19,20. Clinical evidence indicates that neonatal hyperbilirubinemia increases the risk of bilirubin-induced neurological dysfunction21with severe cases potentially resulting in irreversible neurological damage22. Bilirubin encephalopathy refers to acute and chronic neurological dysfunctions associated with severe hyperbilirubinemia. Previous studies suggest that bilirubin-induced neurotoxicity may involve neuronal necrosis, tau protein hyperphosphorylation, increased Aβ production, and neuroinflammation23,24. Nonetheless, the molecular basis of bilirubin-induced neurodegeneration remains incompletely understood.

In this study, we applied integrated machine learning analyses to identify novel biomarkers associated with bilirubin-induced AD-like pathology in both neonatal and adult models. Our findings provide new insights into the pathophysiological functions of bilirubin and suggest potential diagnostic and therapeutic targets for AD.

Result

Pathological jaundice regulates multiple signaling pathways in the brain

Deletion of the UGT1A1 gene, essential for bilirubin metabolism, results in severe hyperbilirubinemia25. UGT1A1-deficient mice typically die within seven days of birth and exhibit pronounced signs of hyperbilirubinemia, such as jaundice, within 12 to 36 h postnatally26. Severe hyperbilirubinemia leads to bilirubin deposition in brain tissue, causing kernicterus. To investigate the effects of pathological jaundice on brain regulation, we sequenced various brain tissues from three-day-old suckling mice. Fig. S1A shows the differential gene expression across these tissues.

Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses revealed significant differences in bilirubin regulation across the cerebral cortex, brainstem, cerebellum, hippocampus, and midbrain (Fig. 1A and B). GO-biological process (GO-BP) analysis indicated that the cerebral cortex and brainstem were associated with secondary metabolic processes, steroid metabolic processes, and chromosome segregation. Multiple immune-related pathways were enriched in the cerebellum and midbrain, whereas hippocampal tissues were enriched in mitochondrial processes and mitochondria-associated apoptosis. GO-cellular component (GO-CC) and GO-molecular function (GO-MF) analyses showed enrichment in various extracellular matrix-related pathways, such as collagen content and structural components conferring compressive resistance.

Fig. 1
figure 1

Functional enrichment analysis of differential genes in brain. (A, B) GO and KEGG analyses across different brain tissues.

Although KEGG results varied among tissues, several infection-related pathways were enriched, including herpes simplex virus 1 infection, malaria, and human T-cell leukemia virus 1 infection. This implies that bilirubin may be involved in triggering brain inflammation. Additionally, pathways associated with neurodegenerative diseases, such as AD and Parkinson’s disease, were enriched in hippocampal tissues, suggesting that bilirubin may induce Alzheimer’s-like lesions by affecting hippocampal function.

Overall, our gene sequencing and assays in UGT1A1-deficient mice suggest that pathological jaundice regulates multiple signaling pathways in the brain.

Construction of weighted gene co-expression networks and identification of pathological jaundice-related modules

To identify module eigengenes (MEs) associated with pathological jaundice, we performed weighted gene co-expression network analysis (WGCNA) on both our sequencing dataset and the GSE28146 dataset. The soft-threshold power was set to 12 for our dataset (Fig. 2A) and 7 for GSE28146 (scale-free R² = 0.85; Fig. S1B). In our dataset, three modules were strongly correlated with pathological jaundice: bisque4 (COR = 0.57, P = 0.001), turquoise (COR = 0.75, P = 3e-6), and ivory (COR = 0.52, P = 0.004) (Fig. 2B). In GSE28146, three modules were highly correlated with AD: saddlebrown, lightpink3, and turquoise (Fig. S1C). Scatterplots further confirmed strong correlations between gene significance (GS) and module membership (MM), including bisque4 (COR = 0.63, P = 6.5e-5), turquoise (COR = 0.83, P = 1e-200), and ivory (COR = 0.81, P = 4e-14) in our dataset (Fig. 2C), as well as saddlebrown (COR = 0.55, P = 2.7e-15), lightpink3 (COR = 0.47, P = 0.00044), and turquoise (COR = 0.52, P = 2.1e-92) in GSE28146 (Fig. S1D).

Fig. 2
figure 2

WGCNA identifies gene modules associated with pathological jaundice. (A) Scale-free network construction with soft-threshold power = 12 (scale-free R² = 0.85). (B) Module–trait heatmap showing correlations between gene modules, KO mice, and WT controls. Each cell displays the correlation coefficient and p-value. (C) Scatter plots of bisque4, turquoise, and ivory modules, showing strong positive correlations with KO mice. (D, E) KEGG enrichment analyses of genes from the bisque4, turquoise, and ivory modules.

To investigate the regulatory mechanisms implicated in pathological jaundice and AD, we performed KEGG clustering analysis on the gene modules from the self-assessment dataset and GSE28146. The results revealed a similarity in the KEGG clustering outcomes between the module strongly associated with pathological jaundice and the module highly linked to AD. Pathways such as autophagy, mitophagy, cyclic adenosine monophosphate (cAMP) signaling pathway, pathways of neurodegeneration, and other signaling pathways exhibited significant enrichment in both datasets (Fig. 2D and S1E). Further examination of neurodegenerative disease and nervous system showed substantial enrichment of multiple pathways relevant to neurodegenerative diseases (Fig. 2E), mirroring the findings in GSE28146 (Fig. S1F).

To delve deeper into the signaling pathways implicated in AD, we performed differential analysis on GSE5281, GSE37263, and hippocampal tissue datasets from GSE36980. Genes meeting the criteria of |logFC| > 0.5 and P-value < 0.05 were selected for KEGG pathway analysis. The results revealed significant enrichment of pathways linked to neurodegenerative disorders, including the pathways of neurodegeneration, within the context of AD (Fig. S1G).

These findings further suggest that pathological jaundice may contribute to the pathogenesis of neurological impairments by activating various pathways associated with neurodegenerative diseases.

Screening of hub genes for pathologic jaundice-induced nerve damage

Initially, a novel gene set was generated by amalgamating the AD, Huntington’s disease, Parkinson’s disease, and Pathways of Neurodegeneration gene sets. This combined gene set was then cross-referenced with the key module genes from the WGCNA to identify shared genes, resulting in the identification of 84 intersecting genes (Fig. 3A). The Random Forest (RF) algorithm was subsequently employed to assess the relative importance of these 84 genes, focusing on the top 10 genes based on %IncMSE and IncNodePurity (Fig. 3B). Additionally, the support vector machine-recursive feature elimination (SVM-RFE) algorithm was applied to these 84 candidate genes, leading to the identification of 5 core genes and the presentation of the top 5 hub genes (Fig. 3C). Receiver operating characteristic (ROC) analysis was then conducted to determine the top 10 genes based on the area under the curve (AUC) for further examination. The results from the Random Forest, SVM-RFE, and ROC analyses were intersected, identifying BBC3 and MAP3K10 as common genes (Fig. 3D). The expression levels of BBC3 and MAP3K10 were elevated in the jaundice group compared to the control group (Fig. 3E). The ROC outcomes showed AUC values of 0.948 for BBC3 and 0.952 for MAP3K10, indicating their excellent diagnostic performance in pathological jaundice (Fig. 3F). This suggests that pathological jaundice might contribute to nerve damage by modulating the expression of BBC3 and MAP3K10, potentially triggering neurodegenerative conditions.

Fig. 3
figure 3

Identification of potential diagnostic biomarkers. (A) Venn diagram of candidate variables screened by KEGG gene sets and module genes. (B) Selection of diagnostic markers using the RF algorithm. (C) Selection of diagnostic markers using the SVM-RFE algorithm. (D) Venn diagram showing the overlap of variables identified by ROC, RF, and SVM-RFE analyses. (E, F) Expression levels and ROC curves of BBC3 and MAP3kK10. (G) Expression levels and ROC curves of BBC3 and MAP3K10 in the GSE15222 dataset. (H) KEGG pathway enrichment of candidate genes. (I) Correlation analysis between BBC3, MAP3K10, and enriched pathways.

We validated the expression of these genes in an additional AD-related dataset. This assay revealed a significant elevation in the expression levels of Bbc3 and Map3k10 in AD patients from the GSE15222 dataset compared to the control group (Fig. 3G). AUC values show that BBC3 and MAP3K10 also have some diagnostic values. Through Gene set variation analysis (GSVA) analysis, several pathways with differential expression were enriched, and they were shown in a heatmap. The Notch signaling pathway was significantly upregulated in both the hyperbilirubinemia group and AD patients (Fig. 3H). Meanwhile, we performed a correlation analysis examining the relationship between pathway and the genes of BBC3 and MAP3k10 (Fig. 3I). In both the hyperbilirubinemia model and the GSE15222 dataset, the expression of MAP3K10 showed a significant positive correlation with the Notch signaling pathway. Previous studies have demonstrated that excessive activation of the Notch signaling pathway is closely linked to amyloid-beta production and the onset of AD27–29.

In conclusion, our findings suggest a potential link between pathological jaundice and the initiation of neurological damage through the modulation of the Notch signaling pathway.

Bilirubin contributes to AD-like lesions in the brain by inducing neuroinflammation

Free bilirubin has been implicated in compromising BBB integrity by impairing glutathione function and increasing endothelial nitric oxide synthase activity via cytokine release19,20. Consequently, severe hyperbilirubinemia in adults may allow bilirubin infiltration into the brain. To investigate its neurological impact, we established a lateral ventricle injection model, which revealed activation of multiple immune-related signaling pathways in brain tissues following bilirubin exposure (Fig. 4A). Furthermore, KEGG enrichment analysis highlighted the significant enrichment of signaling pathways, such as the tumor necrosis factor (TNF) signaling pathway, viral protein interaction with cytokine and cytokine receptor, and cytokine-cytokine receptor interactions in several tissues following bilirubin exposure (Fig. 4B). These pathways are recognized as fundamental components of the immune system, suggesting a potential role for bilirubin in neuroinflammatory responses.

Fig. 4
figure 4

Functional enrichment analysis of differentially expressed genes (DEGs) in the lateral ventricle injection model and AD. (A, B) GO and KEGG enrichment analyses of DEGs across different brain regions. (C) KEGG enrichment analysis of DEGs from datasets GSE29378, GSE53679, and GSE122063. (D) GSEA of datasets GSE29378 and GSE122063.

To extend these findings, we analyzed datasets GSE29378, GSE53697, and GSE122063 using the limma package (|logFC| > 0.5, P < 0.05). Consistent with our model, KEGG enrichment analysis showed significant enrichment of TNF signaling and cytokine-related pathways in AD patients (Fig. 4C). Gene set enrichment analysis (GSEA) also confirmed activation of immune and innate immune pathways in AD (Fig. 4D).

Collectively, these findings suggest a potential involvement of bilirubin in the onset and progression of AD through the induction of neuroinflammation.

Construction of weighted gene co-expression networks and identification of AD-related modules

Next, we investigated the disparities in immune cell infiltration between individuals with AD and controls using single-sample gene set enrichment analysis (ssGSEA). We analyzed the distribution of 28 immune cells in the GSE122063 dataset (Fig. 5A). Subsequently, we utilized WGCNA to identify relevant modules potentially regulating AD and immune cells, with the soft threshold set at 6 (scale-free R2 = 0.85) (Fig. 5B). The heatmap in Fig. 5C displays the associations between these modules and related traits. Among these identified modules, grey60 (COR = 0.44, P = 8e-6), pink (COR = 0.52, P = 5e-8), tan (COR = 0.46, P = 2e-6), saddlebrown (COR = 0.53, P = 2e-8), and lightcyan (COR = 0.58, P = 5e-10) exhibited a pronounced positive correlation with AD. Further examination through scatterplot analysis revealed significant positive correlations between GS and MM for the grey60 module (COR = 0.63, P = 3.5e-25), pink module (COR = 0.61, P = 6.4e-78), tan module (COR = 0.61, P = 2e-41), saddlebrown module (COR = 0.53, P = 1.7e-8), and lightcyan module (COR = 0.77, P = 2.2e-46) (Fig. 5D).

Fig. 5
figure 5

WGCNA identifies AD- and immune cell–related gene modules. (A) (A) Proportions of 28 immune cell types in control and AD groups. (B) Scale-free network construction with soft-threshold power = 6 (scale-free R² = 0.85). ( C) Module–trait heatmap showing correlations between gene modules, AD, and immune cells. (D) Scatter plots of grey60, pink, tan, saddlebrown, and lightcyan modules, showing strong positive correlations with AD. (E, F) GO and KEGG enrichment analyses of module genes. (G) Correlation between GS and MM in activated dendritic cells, effector memory CD8⁺ T cells, myeloid-derived suppressor cells, natural killer T cells, neutrophils, and plasmacytoid dendritic cells.

Subsequent GO and KEGG enrichment analyses conducted on genes within these modules indicated significant enrichment in leukocyte proliferation, adhesion, and major histocompatibility complex processes for GO analysis (Fig. 5E). Similarly, the KEGG pathway analysis showcased significant enrichment in signaling pathways, such as cytokine-cytokine receptor interaction, viral protein interaction with cytokine and cytokine receptor, and TNF signaling pathway (Fig. 5F). These findings aligned with the outcomes from the lateral ventricle injection model we established.

Upon scrutinizing heatmaps illustrating the associations between modules and their respective traits, we discerned significant positive correlations of the grey60, pink, tan, saddlebrown, and lightcyan modules with various immune cells concurrently. Figure 5G depicts the correlation between GS and MM concerning activated dendritic cells, effector memory CD8+ T cells, myeloid-derived suppressor cells, natural killer T cells, neutrophils, and plasmacytoid dendritic cells. Consequently, we postulate that these immune cells might play a role in neuroinflammatory processes and potentially trigger AD.

Identification of core immune cells in AD patients

To identify the key immune cells implicated in neuroinflammation, we utilized least absolute shrinkage and selection operator (LASSO), RF, and SVM-RFE algorithms for screening. Following a 10-fold cross-validation, the LASSO algorithm pinpointed 5 immune cells (Fig. 6A). The Random Forest algorithm subsequently assessed the relative importance of the 28 immune cells, selecting the top five based on %IncMSE and IncNodePurity for further scrutiny (Fig. 6B). Additionally, the SVM-RFE algorithm identified four immune cells, visually represented in Fig. 6C. Subsequent ROC analysis of these 28 immune cells led to the selection of the top 5 immune cells based on the AUC for more detailed investigation. By integrating findings from LASSO, Random Forest, SVM-RFE, and ROC analyses, we identified three core immune cells (Fig. 6D). Noteworthy were the AUC values for effector memory CD8+ T cells (0.873), effector memory CD4+ T cells (0.798), and immature B cells (0.872) (Fig. 6E).

Fig. 6
figure 6

Identification of core immune cells in AD patients. (A) Screening of diagnostic immune cells using the LASSO logistic regression algorithm. (B) Selection of diagnostic immune cells using the RF algorithm. (C) Diagnostic immune cells identified by the SVM-RFE algorithm. (D) Venn diagram showing overlap of immune cells identified by LASSO, RF, ROC, and SVM-RFE analyses. (E) ROC curves of effector memory CD8⁺ T cells, effector memory CD4⁺ T cells, and immature B cells.

By synthesizing the outcomes of WGCNA, we conclusively determined that effector memory CD8+ T cells are the central immune cells involved in neuroinflammation.

Screening hub genes by machine learning

We screened genes meeting the criteria of |logFC| ≥ 1 and P-value < 0.05, identifying 76 common genes by intersecting them with genes from the relevant module (Fig. 7A). Subsequently, LASSO, RF, and SVM-RFE methods were employed to further pinpoint core genes. LASSO regression identified a total of 12 genes (Fig. 7B). Concurrently, Fig. 7C displayed the screening outcomes of Random Forest, with genes co-occurring in the top 10 rankings of %IncMSE and IncNodePurity selected for detailed analysis. The SVM-RFE algorithm highlighted 57 key genes (Fig. 7D, left), with the top 10 genes ranked by importance (Fig. 7D, right). Integration of results from LASSO, RandomForest, and SVM-RFE analysis led to the identification of four core genes: complement component 4 (C4A), fibronectin type III and SPRY domain containing 2(FSD2), human leukocyte antigen- DR beta 4 chain (HLA-DRB4), and fc of igG binding protein (FCGBP) (Fig.7E). Analysis of their expression revealed significantly higher levels in AD compared to the normal group (Fig. 7F). Additionally, the areas under the ROC curves for C4A, FSD2, HLA-DRB4, and FCGBP were 0.951, 0.848, 0.826, and 0.91, respectively (Fig. 8A), suggesting their potential as valuable biomarkers.

Fig. 7
figure 7

Establishment of diagnostic models and identification of potential biomarkers. (A) Venn diagram of candidate variables identified from DEGs and module genes. (B) Screening of diagnostic markers using the LASSO logistic regression algorithm. (C) Selection of diagnostic markers using the RF algorithm. (D) Selection of diagnostic markers using the SVM-RFE algorithm. (E) Venn diagram showing the overlap of markers identified by LASSO, RF, and SVM-RFE. (F) Expression levels of C4A, FSD2, HLA-DRB4, and FCGBP.

Fig. 8
figure 8

Construction and validation of a nomogram model for AD diagnosis (A) ROC curves of C4A, FSD2, HLA-DRB4, and FCGBP. (B) Nomogram integrating diagnostic markers for AD; each variable corresponds to a score, and the total score is obtained by summing across variables. (C) ROC curve demonstrating the diagnostic performance of the nomogram. (D) Calibration curve assessing the predictive accuracy of the nomogram. (E) Decision curve analysis (DCA) evaluating the clinical utility of the nomogram. (F) Expression levels and ROC curves of C4A, FSD2, HLA-DRB4, and FCGBP in the GSE33000 dataset.

Utilizing the ‘rms’ R package, we constructed a nomogram for AD diagnosis based on the core genes C4A, FSD2, HLA-DRB4, and FCGBP (Fig. 8B). Subsequent ROC curve analysis of the nomogram yielded an AUC of 0.9987 (Fig. 8C), while the calibration curve indicated minimal deviation between actual AD risk and predicted risk, signifying the nomogram’s high predictive accuracy (Fig. 8D). Results from decision curve analysis (DCA) demonstrated the superior clinical benefits provided by the nomogram (Fig. 8E). In the validation set GSE33000, the expression levels of C4A, HLA-DRB4, and FCGBP were significantly elevated compared to the control group, and ROC curve analysis also demonstrated excellent diagnostic value. (Fig. 8F), further reinforcing their potential as diagnostic markers.

Additionally, we performed a correlation analysis examining the relationship between immune cells and the genes C4A, HLA-DRB4, and FCGBP (Fig. S3). This assay revealed a significant positive correlation between effector memory CD8+ T cells and C4A, FSD2, HLA-DRB4, and FCGBP, with C4A exhibiting the strongest correlation (COR = 0.76, P < 0.001), followed by FSD2 (COR = 0.54, P < 0.001), and FCGBP (COR = 0.54, P < 0.001).

Discussion

Hyperbilirubinemia is a commonly observed condition in newborns, typically physiological and resolving without intervention. However, significantly elevated bilirubin levels pose a risk of severe brain damage, leading to short- and long-term neurodevelopmental impairments and the potential development of acute or chronic bilirubin encephalopathy, known as kernicterus19,30. Kernicterus is characterized by the accumulation of bilirubin in the brain, though the precise pathogenesis and mechanisms of bilirubin-induced neuronal damage remain unclear.

In this study, we developed a model of neonatal pathological jaundice using UGT1A1 knockout mice. Our bioinformatics analysis revealed a significant resemblance between nerve injury induced by pathological jaundice and neurodegenerative diseases. We employed LASSO regression, random forest algorithm, and SVM-RFE to identify key genes associated with pathological jaundice-induced neurological damage, pinpointing BBC3 and MAP3K10 as crucial genes. The diagnostic accuracy of these biomarkers was evaluated using ROC curves.

Our data from neurodegenerative diseases showed a significant increase in the expression of BBC3 and MAP3K10 in patients with AD. BBC3 is a direct activator of BAX, playing a vital role in both p53-dependent and p53-independent apoptosis31,32,33. Additionally, BBC3 regulates endoplasmic reticulum stress-induced neuronal apoptosis and is implicated in apoptosis regulation in the developing brain34,35,36. Multiple studies have shown that BBC3 is a key regulator of oxidative stress-induced BAX activation and neuronal apoptosis, suggesting that BBC3 may be a critical therapeutic target for neuroprotection37,38. BBC3 deficiency significantly protects neurons from ER-stress-induced apoptosis for neurodegenerative diseases39,40. MAP3K10, a member of the mitogen-activated protein kinases (MAPK) family, regulates the MAPK signaling pathway, including the c-Jun N-terminal kinase (JNK) /p38MAPK and ERK pathways. MAPK pathway has been shown to regulate Notch signaling41. Dysregulation of neuronal cell signaling can lead to neurodegeneration and cognitive decline42,43. Pathway analysis results indicate that the Notch signaling pathway is activated in both hyperbilirubinemia models and AD patients. Notch signaling plays a critical role in neurogenesis during both embryonic and adult brain development, and is closely associated with synaptic remodeling, memory, and learning44,45,46,47. In AD patients, both Notch-1 and its target genes show significantly elevated expression, while Notch-1 signaling also interacts with amyloid beta production48. These findings collectively suggest a significant correlation between aberrant Notch signaling and Alzheimer’s disease.

Our findings suggest that pathological jaundice may disrupt the Notch signaling pathway by influencing BBC3 and MAP3K10 expression, potentially leading to neuronal cell apoptosis and contributing to the development of nuclear jaundice.

Maintaining blood-brain barrier integrity is crucial to prevent the entry of macromolecules, including bilirubin, into the brain, thereby maintaining brain homeostasis49. Free bilirubin can compromise BBB integrity by interfering with glutathione19,20. Conditions, such as autoimmune hemolytic anemia, drug-induced states, and glucose-6-phosphate dehydrogenase deficiency typically result in unconjugated hyperbilirubinemia in adults50.

In our investigation of the potential pathogenesis of AD, we analyzed various AD-related sequencing data, revealing significant activation of signaling pathways such as neuroactive ligand-receptor interactions, cytokine-cytokine receptor interactions, and viral protein interactions with cytokines and cytokine receptors in AD patients. GSEA indicated an upregulation of the immune system, suggesting that neuroinflammation may play a significant role in AD induction. We simulated adult bilirubin exposure by injecting bilirubin into the lateral ventricle. Bioinformatics analysis demonstrated that bilirubin treatment markedly activated signaling pathways, including cytokine-cytokine receptor interaction and viral protein interaction with cytokine and cytokine receptor. Bilirubin-induced neuroinflammation has been reported in multiple studies51,52,53,54. In rat models of bilirubin encephalopathy, bilirubin excessively activates microglia, leading to upregulated expression of MHCII, CCL2, IL-6, TNF-a, and IL-1b, along with downregulated IL-10 levels, thereby triggering inflammation in the hippocampus52,55,56. Consequently, adult bilirubin exposure may contribute to the initiation of neuroinflammation.

Numerous studies have shown the role of neuroinflammation in AD initiation and progression. Increased levels of inflammatory mediators have been documented in the cerebral tissues of AD patients, indicating a significant role of bilirubin in the pathogenesis of neurodegenerative diseases, specifically AD57.

Neuroinflammation is a key feature of several neurodegenerative diseases, particularly in the pathogenesis and progression of AD58. It is typically initiated by microglial activation59,60. As the resident immune cells of the central nervous system (CNS), microglia coordinate innate immune responses within the brain and interact with peripheral immunity through the BBB. Under physiological conditions, the BBB protects the CNS by regulating the passage of neurotoxins and serum factors via specialized tight junctions and transport proteins61. In AD, BBB disruption has been reported in multiple regions, including the hippocampus, gray matter, and white matter62,63,64,65and may occur at early stages of disease progression62,66,67. Evidence of plasma extravasation, peripheral macrophage infiltration, and neutrophil infiltration in AD brain tissue further supports a role for immune infiltration in disease development68,69,70,71. Analysis of the GSE122063 data indicates that effector memory CD8+ T cells, effector memory CD4+ T cells, and immature B cells may contribute to neuroinflammation. Coupled with the results of WGCNA, it is suggested that effector memory CD8+ T cells are significant immune cells involved in AD. Other studies also demonstrated that effector memory CD8+ T cells participate in AD through their proinflammatory and cytotoxic functions72.

We employed various machine learning techniques to identify key genes associated with neuroinflammation. C4A, FSD2, HLA-DRB4, and FCGBP were identified as hub gene. The analysis of ROC curves indicated that these genes could serve as novel potential diagnostic markers for neuroinflammatory AD. Based on these genes, we developed a diagnostic model for AD, which demonstrated good accuracy through validation with calibration curves and decision curves.

C4A, a component of the classical complement pathway, is a marker of inflammation, and associated with an increased risk of schizophrenia. Overexpression of C4A promotes excessive synaptic loss and behavioral changes in mice73. Elevated expression of C4A in AD patients, potentially related to disease pathogenesis74,75. The FCGBP gene encodes an IgG Fc-binding protein crucial for immune and inflammatory processes, including the regulation of immune protection and inflammation in the gut76. Abnormalities in gut microbes may trigger mucosal immune activation, leading to neuroinflammation and neurodegeneration77,78. FCGBP might play a crucial role in both intestinal and brain inflammation, potentially contributing to neurodegenerative processes79. While no studies have yet reported on HLA-DRB4 and FSD2 in AD, further experimental confirmation of their functions is necessary. Nonetheless, our analysis suggests that these key genes may serve as potential diagnostic markers for AD.

While this study provides important insights, several limitations should be noted. First, the inclusion of multiple tissue types (cerebral cortex, hippocampus, midbrain, cerebellum, and brainstem) may introduce bias. Future studies integrating larger numbers of samples from the same tissue type will improve detection power and robustness. Second, as our analyses were primarily bioinformatics-based, further validation using clinical data and experimental models is required to confirm these findings. Finally, although autophagy and neuroinflammation in AD have been widely studied, the six genes highlighted here remain relatively unexplored in the context of AD, and their functional roles warrant further experimental investigation.

Material and method

Graphical abstract

To analyze the hub genes of AD, we designed a flowchart. The workflow of the analysis is shown in Fig. 9.

Fig.9
figure 9

The main workflow of the study.

Animals and injections

UGT1A1-/- mice were generated by mating heterozygous UGT1A1+/- mice. Littermate wild-type (WT) mice served as controls (n = 3 per group). On postnatal day 3, UGT1A1-/- and WT mice were euthanized, and brains were immediately harvested into ice-cold PBS (4 °C). The hippocampus, cerebral cortex, midbrain, brainstem, and cerebellum were dissected, transferred to RNAlater, and stored at − 80 °C. Mice were maintained under a 12 h light/dark cycle with ad libitum food and water. All procedures followed the ARRIVE guidelines and the regulations of the Guangzhou Medical University Animal Experimentation Committee, with ethical approval (NO. GY2022-020).

Male C57BL/6J mice (6 weeks old; Jiangsu GemPharmatech Biotechnology Co., Ltd.) were randomly assigned to two groups (n = 3 per group). Bilirubin was administered by stereotaxic injection into the lateral ventricle. Mice were anesthetized with 10% chloral hydrate (5 mL/g) and secured in a stereotaxic apparatus. Coordinates were AP = − 0.5 mm, ML = − 1.0 mm, DV = − 2.5 mm. Based on cerebrospinal fluid volume in mice (~ 35 µL)80, 560 nL of bilirubin solution (50 µM) was injected at a rate of 4 nL/s, with the needle left in place for 10 min post-injection. Bilirubin was prepared as described previously24: bilirubin (100 mg; Sigma-Aldrich, USA) was dissolved in 0.5 M NaOH (1 mL), diluted in ddH₂O to 10 mg/mL, and pH-adjusted to 8.5 with 0.5 M HCl. For injections, the solution was further diluted to 3.175 mM.

Twelve hours post-injection, mice were anesthetized with isoflurane and perfused transcardially with pre-cooled PBS until blood-free. Brains were rapidly removed, and the hippocampus, cerebral cortex, midbrain, brainstem, and cerebellum were dissected on ice, placed in RNAlater, and stored at − 80 °C.

RNA extraction and library preparation

Total RNA was extracted from tissue samples (20–50 mg) using 1 mL TRIzol reagent (Invitrogen™, cat. no. 15596018) followed by mechanical homogenization at low temperature with a Tissuelyser-24. For RNA purification, 0.2 mL chloroform was added per 1 mL TRIzol-lysate, vortexed for 15 s, incubated at room temperature for 5 min, and centrifuged at 12,000 × g for 15 min at 4 °C. The aqueous phase was transferred to a new tube, and RNA was precipitated with isopropanol. For cell samples, RNA was extracted directly using the phenol–chloroform method.

mRNA was captured and purified using Epi™ mRNA Capture Beads (Epibiotek, cat. no. R2020-96), and libraries were constructed with the Epi™ mRNA Library Fast Kit (Epibiotek, cat. no. R1810). Final libraries were purified with Epi™ DNA Clean Beads (Epibiotek, cat. no. R1809). Library quality was assessed on a Bioptic Qsep100 Analyzer to confirm fragment size distribution within the expected range.

RNA-seq

RNA-seq libraries were prepared using the VAHTS Stranded mRNA-seq Library Prep Kit for Illumina V2 (Vazyme Biotech, NR612-02) following the manufacturer’s instructions. Sequencing reads were aligned to the mouse Ensembl genome (GRCm38) using HISAT2 (v2.1.0) with the parameter --rna-strandness RF. Read counts mapped to the genome were quantified using featureCounts (v1.6.3). Differential gene expression analysis was performed with the DESeq2 R package.

Data downloads and pre-processing

We extracted the gene expression profiles of AD (GSE29378, GSE53697, GSE122063, GSE33000, GSE28146, GSE37263, GSE5281 and GSE36980) from the GEO database using the GEOquery package (v2.70.0)81.

GSE series

AD

Control

GSE29378

31

32

GSE53697

9

8

GSE122063

56

44

GSE33000

310

157

GSE28146

8

22

GSE37263

8

8

GSE5281

87

74

GSE36980

8

10

Gene co-expression analysis

Weighted gene co-expression network analysis (WGCNA) was performed using the WGCNA R package (v1.72-5) to identify gene modules, determine hub genes, and assess module–phenotype relationships82. The optimal soft-threshold power was selected using the pickSoftThreshold function, and the most relevant modules associated with pathological jaundice, AD, and immune cell populations were identified for downstream analysis.

Screening of DEGs

To further assess gene expression differences between groups, differential expression analysis was performed using the limma package83. Genes with P < 0.05 and |log₂FC| > 0.5 (or > 1, where specified) were considered differentially expressed. Results were visualized using volcano plots.

Functional enrichment analysis

GO and KEGG enrichment analyses of DEGs and module genes were performed using the clusterProfiler R package (v4.10.0) 84. GO analysis was conducted for biological processes (BP), molecular functions (MF), and cellular components (CC), while KEGG analysis was used to identify enriched pathways. A P < 0.05 was considered statistically significant.

GSVA

GSVA is a nonparametric, unsupervised method used to assess gene set enrichment in microarray or RNA-seq data85. KEGG gene sets were obtained from the Molecular Signatures Database. Normalized enrichment scores were calculated using the GSVA and GSEABase packages86. Differential pathway activity between the pediatric sepsis and control groups was identified using the limma package. Pathways with P < 0.05 were considered significant and ranked by log₂|FC|.

Evaluation of subtype distribution among immune-infiltrated cells

We assessed the immunological characteristics of each sample using ssGSEA, an extension of the GSEA method widely applied in immune infiltration studies87. The GSVA, GSEABase, and limma packages were used to estimate the abundance of 28 immune cell types in each tissue85,88based on immune cell gene sets derived from published literature89. Pearson’s correlation analysis was then applied to evaluate the associations between hub genes and immune cell populations.

Machine learning screening of hub immune cells and hub gene

Three machine learning algorithms, RF90, LASSO91and SVM-RFE92,93were applied to identify key immune cells and candidate genes associated with bilirubin and AD diagnosis. RF is an ensemble method capable of handling numerous input variables while evaluating their relative importance. LASSO is a regression approach well suited for high-dimensional data. SVM-RFE classifies data by constructing hyperplanes and employs regularization to reduce overfitting. Genes consistently selected across all three algorithms were defined as hub genes.

ROC curves of hub genes

The logistic regression model of hub genes was constructed using glm function in R software, and the ROC curves were plotted using R package pROC (v1.18.5). The ROC curve helps us determine whether the model has diagnostic value. The AUC is used to determine predictive accuracy.

Column chart construction and diagnostic effectiveness assessment

A diagnostic model for predicting AD incidence was constructed using logistic regression and visualized as a nomogram94. Model performance was evaluated by calibration curves (accuracy) and decision curve analysis (clinical utility). Finally, biomarker expression and model performance were validated in an independent dataset.

Statistics

All statistical analyses were performed in R (v4.3.2). Differences between two groups were assessed using either the Wilcoxon rank-sum test or Student’s t-test. Correlations between variables were evaluated using Pearson’s correlation coefficient. A two-tailed p < 0.05 was considered statistically significant.