Introduction

Atopic dermatitis (AD) is a common chronic inflammatory skin disease, with a prevalence rate of approximately 15–20% in children and 2–10% in adults worldwide1,2,3. AD is characterized by sensitive and dry skin, marked pruritus, and recurrent chronic eczematous lesions, which severely affects the sleep, psychological well-being and even social performance of patients4,5.

The pathogenesis of AD is complex and multifaceted, including genetic predisposition, skin barrier dysfunction, skin microbial abnormalities, and immune dysfunction6. The primary immunological mechanisms involved in AD are the imbalance of T helper (Th)1/Th2, resulting in the abnormal cytokine secretion. Skin barrier dysfunction facilitates transcutaneous microbial and allergen penetration, leading to innate immune activation7. The notable pro-inflammatory cytokines activate Th2 cells, which promote specific cytokine-associated inflammation (Interleukin (IL)-4, IL-5, IL-13), eosinophilia, and immunoglobulin E (IgE) production8. IL-4 and IL-13 play essential roles in the inflammatory process by exacerbating epidermal barrier dysfunction and amplifying sensory nerve signals mediated by the pruritic cytokine IL-319. Dendritic cells can express co-stimulatory molecules involved in the process of promoting the initial CD4+T cells to Th cell polarization, further aggravating the occurrence and progression of AD10. However, the immune mechanisms underlying AD are still poorly understood. Therefore, investigating immune infiltration related molecules and pathways can provide valuable insights for clinical prevention and targeted therapy of AD.

In this study, our aim is to use bioinformatics to identify novel diagnostic and therapeutic biomarkers for AD by analyzing gene expression omnibus (GEO) database, and predict the involvement of related microRNAs (miRNAs) and transcription factors (TFs). Furthermore, we aimed to determine the molecules and pathways related to the pathogenesis of AD and mechanisms of immune cell infiltration using the CIBERSORT method. Our goal of this study is to provide a novel research direction for the management of AD.

Materials and methods

Data acquisition

The gene expression datasets used in this study were available from the GEO database. The gene expression datasets were included if they met the following criteria: (1) high-throughput sequencing or array-based transcriptome profiling had been conducted; (2) untreated (or non-drug-treated) lesion skin tissues from AD patients and healthy skin tissues from healthy controls were available; and (3) the number of samples should be greater than 15 in the dataset. Four datasets (GSE32924, 8 healthy control tissues and 13 AD patient tissues; GSE107361, 11 healthy control tissues and 20 AD patient tissues; GSE121212, 37 healthy control tissues and 26 AD patient tissues; and GSE230200, 8 healthy control tissues and 10 AD patient tissues)11,12,13,14 were applied for filtrating differentially expressed genes (DEGs) and two datasets (GSE130588, 20 healthy control tissues and 51 AD patient tissues; and GSE16161, 9 healthy control tissues and 9 AD patient tissues)15,16 were applied as validation datasets.

Data preprocessing and identification of DEGs

We used GEO2R to compare gene expression profiles between lesion skin tissues of AD patients and healthy skin tissues of healthy controls to identify the DEGs. The DEGs were chosen based on the criteria of |log2FC| > 1.0 and P value < 0.05. Subsequently, the volcano plot, heatmap plot and principal component analysis (PCA) plot were depicted by OmicStudio tools to demonstrate the differentiation between the groups of each dataset.

Functional enrichment analysis

By applying Venn diagram, overlapping DEGs among different datasets were identified as the common DEGs (co-DEGs). Then, the co-DEGs were subjected to functional enrichment analysis. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses for co-DEGs were conducted utilizing Metascape (version 3.5.20240101)17. The number of the genes included in the enriched terms was set at least 3, with P value < 0.01 considered as significant.

Protein–protein interaction (PPI) network and module analysis

Search Tool for the Retrieval of Interacting Genes (STRING) (version 12.0) can predict PPI and co-DEGs were uploaded to STRING to construct a PPI network with complex regulatory relationships. Network nodes represented proteins with different node colors indicating proteins with different functions. The confidence score was set to be over 0.4. Cytoscape (version 3.9.1) was utilized to visualize the PPI network and Molecular complex detection (MCODE) plug-in of Cytoscape was used to screen the modules of the whole networks, with degree cutoff = 2, node score cutoff = 0.2, k-core = 2, and maximal depth = 100.

Identification and validation of hub genes

The hub genes were identified utilizing the cytoHubba plug-in of Cytoscape. Common hub genes were identified by eight algorithms (Degree, MNC, MCC, Stress, Radiality, Closeness, BottleNeck, and EPC), then visualized using the R package “UpSet”. Moreover, we conducted a co-expression and function network of these common hub genes via GeneMANIA. The expression of these genes was verified in validation datasets by the receiver operating characteristic (ROC) curve analysis using the R package “pROC”. Those genes with an area under the curve (AUC) values > 0.70 were considered with diagnostic value. P value < 0.05 was considered significant.

Prediction of miRNAs

We used NetworkAnalyst 3.0, an online visualization tool that predicts gene–miRNA connections, to map the hub genes with matching miRNAs. Comprehensive experimentally validated gene–miRNA interaction data were collected from TarBase. For each hub gene, miRNAs with the minimal network connectivity were identified. Finally, Cytoscape was used to build a gene regulatory network. Then, miRNAs with a connectivity degree of at least 3 were selected.

Prediction of TFs

To investigate the interactions between the hub genes and TFs and evaluate the influence of the TFs on the expression and functional pathways of the hub genes, NetworkAnalyst was also used. TF targets were derived from the JASPAR database. TFs of the hub genes were predicted within a minimal network, and Cytoscape was utilized to show the hub genes and TFs. The correlation between these TFs and the hub genes was conducted by using Pearson correlation analysis.

Evaluation of immune cell infiltration and correlation analysis between hub genes and immune cells

The immune cell proportions were performed using the CIBERSORT, which uses normalized gene expression data from AD patients and healthy controls with the expressed reference signature genes (LM22). Bar graphs, box plots and correlation plots were performed using the OmicStudio tools. The correlation between the identified hub genes and the infiltrating immune cells was conducted by using Pearson correlation analysis and their relationships were visualized using the R packages “ggplot2” and “stats”.

Statistical analysis

R software (version 4.1.3) (https://www.r-project.org/) was used for the analysis of data and OriginPro 2022 software (version 9.9.0) (https://www.originlab.com/) was used for the generation of images.

Results

Information of expression profiling data

Based on the inclusion criteria, four datasets (GSE32924, GSE107361, GSE121212, and GSE230200) were finally selected and their gene expression profiles were acquired from the GEO database. With the criteria of P value < 0.05 and |log2 FC| > 1.0, the following DEGs were obtained: 1739 DEGs from GSE32924, with 782 upregulated genes and 957 downregulated genes; 2440 DEGs from GSE107361, with 1177 upregulated genes and 1263 downregulated genes; 1848 DEGs from GSE121212, with 1031 upregulated genes and 817 downregulated genes; 1631 DEGs from GSE230200, with 879 upregulated genes and 752 downregulated genes. The volcano plots of the DEGs are shown in Fig. 1A, and the top 25 significant upregulated and downregulated genes were represented by heatmaps in Fig. 1B. In addition, the PCA plots demonstrated the separation between the DEGs of AD and healthy tissues (Fig. 1C). Therefore, the identification of co-DEGs may provide important data for the diagnosis of AD.

Fig. 1
figure 1

Determination of the DEGs. (A) Volcano plots of the DEGs. The upregulated DEGs (red dots) and downregulated DEGs (blue dots) of each dataset were identified with the use of criteria of P value < 0.05 and |log2FC|> 1.0. (B) Expression heatmaps of the top 50 DEGs in each dataset, as determined based upon P value. (C) PCA plots of DEGs in each dataset. DEGs differentially expressed genes, PCA principal component analysis, HC healthy control, AD atopic dermatitis.

Identification of co-DEGs and functional enrichment analysis

To elucidate the correlation of DEGs in the four datasets, a total of 146 co-DEGs were identified from the intersection of Venn diagram (Fig. 2A). GO and KEGG pathway enrichment analyses were used to investigate the biological functions and pathways associated with the co-DEGs. Biological processes, cellular components and molecular functions are included in GO annotation enrichment terms. The most highly enriched cellular components were extracellular matrix and external encapsulating structure (Fig. 2B). With regards to biological processes and molecular functions, co-DEGs were substantially enhanced in cytokine-mediated signaling pathway and calcium ion binding, respectively (Fig. 2C,D). The KEGG enrichment analysis revealed that co-DEGs were mainly associated with cytokine–cytokine receptor interaction, viral protein interaction with cytokine and cytokine receptor, JAK-STAT signaling pathway, and IL-17 signaling pathway (Fig. 2E). These data suggested that chemokines and cytokines are jointly involved in the development of AD.

Fig. 2
figure 2

Determination of co-DEGs and functional enrichment analysis. (A) Venn diagram illustrating the co-DEGs screened from the intersection of the DEGs in four datasets. (BD) GO functional analysis showing enrichment of co-DEGs in cellular component, biological process, and molecular function. (E) KEGG pathway enrichment analysis of co-DEGs. Co-DEGs common differentially expressed genes, GO gene ontology, KEGG kyoto encyclopedia of genes and genomes.

PPI network construction and module analysis

The PPI network of the co-DEGs was obtained from STRING and analyzed by Cytoscape, which contained 93 nodes and 292 interaction pairs (Fig. 3A). Four closely connected gene modules (module 1, score = 13.18; module 2, score = 7.00; module 3, score = 5.00; module 4, score = 4.00) were identified through MCODE plug-in of Cytoscape, including 34 nodes and 149 interaction pairs (Fig. 3B–E). The GO enrichment analysis showed that these genes were predominantly related to cytokine-mediated signaling pathway, cellular response to cytokine stimulus, and innate immune response (Fig. 3F). The KEGG pathway enrichment analysis showed that they were mainly involved in cytokine–cytokine receptor interaction, viral protein interaction with cytokine and cytokine receptor, and chemokine signaling pathway (Fig. 3G). The results suggested the essential role of inflammatory and immunological response in AD.

Fig. 3
figure 3

PPI network, significant gene modules and enrichment analysis of the modular genes. (A) The PPI network of co-DEGs. (BE) Four significant gene clustering modules. (F,G) GO and KEGG enrichment analysis of the modular genes. PPI protein–protein interaction, co-DEGs common differentially expressed genes, GO gene ontology, KEGG kyoto encyclopedia of genes and genomes.

Identification and validation of hub genes

Through the eight algorithms of plug-in cytoHubba, we calculated the top 20 hub genes and identified eight common hub genes, including CCL2, CCR7, GZMB, IL7R, CD274, IL10RA, IRF7, and CCL22 (Fig. 4A). Subsequently, we analyzed the networks and functions of these genes via GeneMANIA. These genes exhibited a complex PPI network with a co-expression of 75.48%, shared protein domains of 16.43%, and co-localization of 8.09% (Fig. 4B). GO analysis showed that these genes are mainly involved in response to cytokine, cytokine-mediated signaling pathway, and response to biotic stimulus (Fig. 4C). In addition, KEGG pathway analysis showed that they are mainly involved in cytokine–cytokine receptor interaction, viral protein interaction with cytokine and cytokine receptor, chemokine signaling pathway, and JAK-STAT signaling pathway (Fig. 4D). To verify the reliability of these genes, GSE130588 and GSE16161 were employed to validate common hub genes. The expression levels of seven hub genes (CCL2, CCR7, GZMB, IL7R, CD274, IRF7, and CCL22) were significantly up-regulated in AD patients compared with healthy controls in the two validation datasets (Fig. 5A,B). These hub genes were further subjected to ROC analysis. Hub genes with AUC values > 0.70 suggested that all hub genes had diagnostic value in AD (Figs. 6 and 7). Among these genes, CCR7 exhibited the best specificity and sensitivity for the diagnosis of AD in the GSE130588 dataset (Fig. 6B), while CD274, IRF7, and CCL22 proved to have the highest specificity and sensitivity for AD diagnosis in the GSE16161 dataset (Fig. 7E–G).

Fig. 4
figure 4

Determination and co-expression network of common hub genes. (A) Venn diagram showed that eight algorithms have screened out common hub genes. (B) Common hub genes and their co-expression genes were analyzed via GeneMANIA. (C,D) GO and KEGG enrichment analysis of the common hub genes. The outermost circle on the right represented terms, and the inner circle on the right represented the significant P value of the corresponding pathway of the gene. GO gene ontology, KEGG kyoto encyclopedia of genes and genomes.

Fig. 5
figure 5

The relative expression levels of hub genes were validated using (A) GSE130588 and (B) GSE16161. HC healthy control, AD atopic dermatitis.

Fig. 6
figure 6

The diagnostic effectiveness of hub genes ((A) CCL2, (B) CCR7, (C) GZMB, (D) IL7R, (E) CD274, (F) IRF7, and (G) CCL22) was validated using GSE130588.

Fig. 7
figure 7

The diagnostic effectiveness of hub genes ((A) CCL2, (B) CCR7, (C) GZMB, (D) IL7R, (E) CD274, (F) IRF7, and (G) CCL22) was validated using GSE16161.

Prediction of miRNAs and TFs

To gain insight into the relationship between miRNAs and TFs and hub genes during transcriptional repression or abrogation of protein translation, we utilized Cytoscape to develop gene regulatory networks. Based on the TarBase database, we found that eight miRNA molecules may regulate the expression of hub genes. The hub genes and their associated regulatory miRNAs were shown in Fig. 8A. The seven candidate miRNAs were listed in Table 1, and hsa-mir-24-3p, hsa-mir-146a-5p, and hsa-let-7a-5p were found to be more closely associated with AD. We further predicted targeted TFs of the identified hub genes by applying the JASPAR database. A total of 17 nodes and 28 interaction pairs were obtained in the TF-target gene network. FOXC1 was predicted to regulate CCL2, CCR7, GZMB, and CD274 (Fig. 8B). Subsequently, we performed a correlation analysis between FOXC1 and its targeted hub genes. The results indicated that the expression level of FOXC1 was negatively correlated with those of CCL2, GZMB, and CD274 (Fig. 8C). Therefore, these key miRNAs and TFs are crucial for understanding the physiopathological processes and molecular mechanisms of AD.

Fig. 8
figure 8

Key miRNAs and TFs regulatory network with hub genes. (A) MiRNA regulatory network. MiRNAs were marked in yellow, and the hub genes were marked in red. (B) TF regulatory network. TFs were marked in yellow, and the hub genes were marked in red. (C) Correlation analysis between FOXC1 and CCL2, GZMB, and CD274. MiRNAs microRNA, TFs transcription factors.

Table 1 Candidate miRNAs (Degree ≥ 3) targeting hub genes in AD.

Analysis of immune cell infiltration

To explore immune cell infiltration between healthy controls and AD patients, we employed CIBERSORT to evaluate the immune cell infiltration. The constitutions of 22 types of immune cells in each sample were exhibited in Fig. 9A. Compared with healthy controls, there were more T cells CD4 naive, T cells CD4 memory activated, T cells gamma delta, NK cells resting, dendritic cells resting, and dendritic cells activated in AD patients, but fewer B cells naive, NK cells activated, macrophages M0, and mast cells resting (Fig. 9B). Correlation analysis among the 22 types of immune cells indicated that T cells CD4 memory activated was significantly positive correlation with dendritic cells activated (R = 0.55), and mast cells resting was significantly negative correlation with dendritic cells activated (R = − 0.68) (Fig. 9C). These findings suggested that active specific inflammatory cells have positive implications for the evolution of AD pathogenesis process.

Fig. 9
figure 9

Immune cell infiltration analysis. (A) Relative fraction of 22 sub-populations of immune cells. (B) The differences of 22 sub-populations of immune cells between the HC and AD samples. (C) Correlation between 22 sub-populations of immune cells. HC healthy control, AD atopic dermatitis. *P value < 0.05, **P value < 0.01, ***P value < 0.001, and ****P value < 0.0001.

Relationship between identified hub genes and immune cells

To explore the relationship between the hub genes and 22 types of immune cells, we conducted a correlation analysis to reveal their interactions and potential synergistic effects. As shown in Fig. 10, the expression levels of CCL2, CCR7, GZMB, CD274, and CCL22 were significantly positively correlated with the number of dendritic cells activated (R > 0.6), with CCL22 exhibiting the most significant correlation (R = 0.84). Additionally, the expression levels of CCL2, GZMB, CD274, and CCL22 were significantly negatively correlated with the number of mast cells resting (R < − 0.6). These results may provide a more detailed understanding of AD pathogenesis.

Fig. 10
figure 10

Relationship between hub genes ((A) CCL2, (B) CCR7, (C) GZMB, (D) IL7R, (E) CD274, (F) IRF7, and (G) CCL22) and immune cells.

Discussion

AD is a chronic skin disease characterized by inflammation and immune dysregulation18. Therefore, it is crucial to identify key molecules involved in the pathogenesis of AD as potential therapeutic targets. In this study, four GEO datasets were used for DEGs screening, and 146 co-DEGs were obtained. Functional enrichment analyses revealed that these co-DEGs were primarily enriched in cytokine-mediated signaling pathway, cytokine-cytokine receptor interaction, JAK-STAT signaling pathway, and IL-17 signaling pathway, suggesting that inflammation and immune-related dysfunction play important roles in the pathogenesis of AD.

Skin barrier dysfunction, immune abnormalities, and skin flora imbalance are important factors in the pathogenesis of AD19. Upon stimulation by external allergens and microorganisms, keratinocytes can release a large number of chemokines and cytokines, subsequently activating dendritic cells and/or type 2 innate lymphoid cells to trigger skin inflammation20. IL-4 and IL-13 bind to IL-4Rα receptors to activate transmembrane signals through the JAK-STAT signaling pathway, which mediates and affects immune response, barrier dysfunction, pruritus and microbiota21, and regulate cell growth, differentiation and apoptosis22. Activated Th2 cells activate downstream B cells, leading to a class switch that produces IgE and triggers the release of inflammatory mediators, amplifying the type 2 immune response21. Th2 inflammatory cytokines could inhibit the expression of barrier-related proteins in keratinocytes, further disrupting the skin barrier function23. In the chronic phase of AD, mixed inflammatory infiltrates of Th1, Th17, and Th22 cells can also be seen in the skin lesions24. GO and KEGG analyses suggested that chronic inflammation combined with immune dysfunction is the crucial factor in the occurrence and development of AD.

STRING and Cytoscape were used to screen common hub genes, and we identified seven hub genes (CCL2, CCR7, GZMB, IL7R, CD274, IRF7, and CCL22) that exhibited high diagnostic sensitivity and specificity, as validated through external datasets and ROC curve analysis. The results indicated that these hub genes could serve as potential biomarkers for AD in humans.

CCL2 and CCL22 are both chemokines involved in immunoregulatory and inflammatory processes. Studies have demonstrated that CCL2 levels are significantly higher in lesional skin compared with nonlesional skin in children with AD25, and CCL22 levels are elevated in the AD group compared with healthy individuals26. CCR7, a chemokine receptor, has been detected on dendritic cells as well as other cell types. Compared with healthy control skin, a dendritic cell population expressing CCR7 is unique to AD lesions27. Furthermore, CCR7 has been implicated in the migration of neutrophils from the skin to skin-draining lymph nodes via lymphatic vessels28. AD is a complex disease involving multiple immune cells, so it can be speculated that CCR7 may mediate the migration of dendritic cells and neutrophils to skin-draining lymph nodes in AD and trigger subsequent immune responses29. GZMB, a serine protease with cytotoxic and immunomodulatory functions, is significantly increased in the blood plasma of AD patients30. Correlation analysis showed that GZMB was positively correlated with serum GRP levels and markers of dermatitis severity in patients with AD, which potentially reflecting the severity of dermatitis and pruritus in these patients31. IL7R encodes a heterodimeric receptor complex that plays a role in AD32, and IL7R variants can modulate the impact of TSLP variants, which are associated with more persistent AD33. CD274 plays a critical role in Th1 and Th17 immune responses, while CD273 is involved in Th2 immune response34. The inflammatory pattern of chronic AD patients shows a transformation from Th2/Th22 to Th1/Th17, manifested as a trend of down-regulation of Th2 cytokines and up-regulation of Th1 cytokines35. Our results were consistent with previous findings, but further studies are needed to investigate the role of CD274 in AD pathogenesis. IRF7 plays a vital role in inflammation and immune response. He et al. demonstrated that IRF7 expression is elevated in type 2 lymphoid cells during allergic inflammation, but its absence leads to a reduction36.

We further focus on exploring the relationships between hub genes, predicted miRNAs and TFs. We have constructed complex interaction networks to identify the crucial nodes represented by the hub genes. This comprehensive bioinformatics approach has been demonstrated to be reliable in a variety of diseases37. FOXC1 was discovered to be significant in AD after a TF-gene regulatory network and was negatively correlated with the hub genes CCL2, GZMB, and CD274. Additionally, we investigated immune cell infiltration and the relationships between the identified hub genes and immune cells in AD. Our results indicated that CCL22 expression level was positively correlated with the number of dendritic cells activated in AD patients.

Dendritic cells, being the most specialized antigen-presenting cells, are activated by foreign allergens and microorganisms, leading to the production of type 2 cytokines and the development of AD38. Both epidermal Langerhans cells and dermal dendritic cells play crucial roles in the development of AD. CCL22 produced by both cell types is a key factor in recruiting Th2 cells and maintaining their immune response39. By blocking IL-4 signal transduction, the upregulation of CCL22 induced by IL-4 can be inhibited, thus potentially slowing down the pathological progression of AD40. In the future, drug development for AD may need to take into account the regulatory role of dendritic cells in mediating the differentiation of different Th cells.

However, this study still had several limitations. The clinical sample size of some datasets was relatively small, necessitating additional studies with larger samples and more methods. The relevant clinical information in the datasets was not complete, with missing data on disease duration, severity, and skin lesion location. In future work, we aim to conduct more relevant in vivo and in vitro experiments and compare our findings with those of similar inflammatory diseases, in order to further validate the role of these hub genes in AD and elucidate their underlying mechanisms.

In summary, we identified the co-DEGs in AD and conducted GO and KEGG enrichment analyses, as well as a PPI network analysis. Our study has not only screened potential miRNA and TFs mediated by specific hub genes, but has also characterized the immune cell infiltration and explored the relationships between the identified hub genes and immune cells in AD. These results offer new insights into the immune, cellular, and molecular mechanisms underlying the pathogenesis of AD.