Introduction

Acute myocardial infarction (AMI) is the predominant factor contributing to global mortality rates1. By 2030, the incidence of AMI is expected to reach 75 million cases2. The epidemiology of AMI highlights its high prevalence and mortality rates, necessitating improved strategies for early diagnosis and effective treatment3,4. Current therapeutic approaches include pharmacological interventions and interventional strategies. However, challenges such as diagnostic difficulties and suboptimal treatment outcomes persist, emphasizing the critical need for innovative biomarkers and diagnostic techniques3,5.

The molecular mechanisms underlying AMI involve several pathological processes including myocardial ischemia, apoptosis, and inflammatory responses1,3. Ischemic conditions trigger a series of cellular processes that ultimately culminate in cardiomyocyte death and contribute to the overall damage observed in AMI. The initiation of pro-inflammatory signaling cascades, especially the NF-κB, MAPK, and PI3K/Akt pathways, is essential for orchestrating these biological responses2,6. These pathways are implicated in the regulation of apoptosis and inflammation, further complicating the disease process, and present potential targets for therapeutic interventions.

Abnormalities in glucose metabolism are intricately linked to several diseases including diabetes, metabolic syndrome, and cardiovascular disease. These conditions often coexist and exacerbate patient morbidity and mortality7. Diabetes significantly increases the risk of cardiovascular complications8. Metabolic syndrome, another closely related condition, encompasses a cluster of metabolic disorders, including obesity, dyslipidemia, and hypertension, which further contribute to cardiovascular diseases9,10. The aforementioned research underscores the need to understand the fundamental mechanisms associated with AMI.

Under normal conditions, the circadian rhythm can coordinate the physiological functions of the cardiovascular system, maintaining the dynamic balance of various indicators such as blood pressure, heart rate, vascular tension, and blood coagulability11,12,13. However, when the circadian rhythm is disrupted, the tolerance of the cardiovascular system to ischemia and hypoxia is significantly reduced, and the risk of AMI is markedly increased14,15. Consequently, a more comprehensive exploration of the interplay between the blood-brain barrier, circadian rhythm, and AMI not only aids in clarifying the underlying mechanisms of myocardial infarction, but also has the potential to generate novel concepts and therapeutic targets for the formulation of prevention and treatment strategies that are grounded in the understanding of circadian rhythms.

This study employed a comprehensive approach that integrates batch effect removal, differential expression analysis, enrichment analysis, and machine learning model construction to investigate AMI. The advantage of this methodology lies in its ability to consolidate information from multiple datasets, thereby enhancing the reliability of the results, whereas the incorporation of machine learning techniques significantly improves the accuracy of the diagnostic models. The main goal of this study was to identify the essential genes linked to AMI and develop a robust diagnostic framework that can enhance early identification and tailored therapeutic approaches.

Results

Technology roadmap

The technology roadmap is illustrated in Fig. 1.

Fig. 1
figure 1

Technology roadmap. DEGs, Differentially Expressed Genes; GMRGs, Glucose Metabolism-Related Genes; CRRGs, Circadian Rhythm-Related Genes; GMCRRDEGs, Glucose Metabolism and Circadian Rhythm-Related Differentially Expressed Genes; GO, Gene Ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes; LASSO, Least Absolute Shrinkage and Selection Operator; ssGSEA, Single-Sample Gene Set Enrichment Analysis.

Merging of AMI data sets

We downloaded two datasets, GSE6636016 (49 AMI vs. 50 control samples, blood tissue, GPL570 microarray, Homo sapiens, PMID: 21737128) and GSE4806017 (31 AMI vs. 21 control samples, blood tissue, GPL570 microarray, Homo sapiens, PMID: 25684211), from the GEO database(https://www.ncbi.nlm.nih.gov/geo/)18.Initially, the GSE66360 and GSE48060 datasets were integrated into a single dataset using the R package, sva, to mitigate batch effects. Following this integration, distribution boxplots (Fig. 2A,B) were used to assess and compare the expression values of the combined datasets before and after removing the batch effect. Second, a principal component analysis (PCA) plot (Fig. 2C,D) was employed to evaluate the distribution of low-dimensional features within the datasets, both before and after the removal of batch effects. The outcomes of the distribution boxplot and the PCA plot indicate that the batch effects present in the samples of the AMI dataset were effectively minimized following the batch removal process.

Fig. 2
figure 2

Batch effects removal. (A) Box plot of the combined GEO datasets distribution before batch removal. (B) Post-batch integrated GEO datasets (combined datasets) distribution boxplots. (C) PCA plot of the datasets before debatching. (D) PCA map of the combined GEO datasets after batch processing. PCA, Principal Component Analysis; AMI, AMI. The AMI (AMI) datasets GSE66360 and GSE48060 are shown in blue and yellow, respectively.

AMI related glucose metabolism and circadian rhythm-related differentially expressed genes (GMCRRDEGs)

This analysis revealed a total of 689 differentially expressed genes (DEGs) that satisfied the criteria of |logFC| > 0.5 and a p-value < 0.05, within the Combined Datasets. Under this threshold, 256 genes exhibited upregulation (logFC > 0.5 and p < 0.05), while 433 genes showed down-regulation (logFC < -0.5 and p < 0.05). A volcano plot was constructed using the results obtained from differential analysis of this dataset (Fig. 3A).

To identify Glucose Metabolism and Circadian Rhythm-Related Differentially Expressed Genes(GMCRRDEGs), all DEGs were intermingled with glucose metabolism related-genes (GMRGs) and circadian rhythm-related genes (CRRGs) to create a Venn diagram (Fig. 3B). Twelve GMCRRDEGs were identified, including NAMPT, JUN, IL1B, PHACTR1, TKT, ADRB2, CD36, EPAS1, EDN1, TNF, FOS and HP. Based on the results obtained from the intersection analysis, we assessed the differences in GMCRRDEGs expression across diverse sample groups within the combined datasets. The R package pheatmap was used to create a heatmap that effectively illustrates the analytical outcomes (Fig. 3C).

Through analysis using the R package, RCircos, 12 GMCRRDEGs were found to be located on the human chromosome (Fig. 3D). Chromosome mapping analysis revealed that the majority of candidate chromosomal regions (GMCRRDEGs) were located on chromosome 6, which encompassed genes such as EDN1, PHACTR1, and TNF.

Fig. 3
figure 3

Differential gene expression analysis. (A) Volcano plot of differentially expressed gene analysis between AMI group and Control group in combined GEO datasets. (B) Venn diagram of genes associated with GMRGs and CRRGs in all AMI samples from DEGs in the integrated GEO datasets. (C) Heat map of GMCRRDEGs in the integrated GEO dataset. (D) Chromosomal mapping of GMCRRDEGs. DEGs, differentially expressed genes; GMRGs, glucose metabolism-related genes; CRRGs, circadian rhythm-related genes; GMCRRDEGs, glucose metabolism and circadian rhythm-related differentially expressed genes. Blue is the Control group and yellow is the AMI group. In the heat map, red represents high expression and blue represents low expression.

Expression differences and correlation analysis of glucose metabolism and circadian rhythm-related differentially expressed genes (GMCRRDEGs) among the groups of the integrated GEO dataset

A comparative analysis of 12 critical candidate compounds (GMCRRDEGs) across cohorts within the combined datasets revealed distinct expression patterns through intergroup visualization (Fig. 4A). The analysis demonstrated significant differential expression (p < 0.001) of multiple biochemical markers, including HP, FOS, TNF, EDN1, EPAS1, CD36, ADRB2, TKT, PHACTR1, IL1B, JUN, and NAMPT between comparative groups. Extreme statistical significance thresholds (indicated by triple asterisks) highlight the strong molecular separation characteristics among these targets.

The ROC curve (Fig. 4B-E) illustrated that the accuracy of FOS expression values for diagnosing the Control and AMI groups was low, as shown by an area under the curve(AUC) between 0.5 and 0.7. The expression values of HP, TNF, EDN1, EPAS1, CD36, ADRB2, TKT, PHACTR1, IL1B, JUN, NAMPT had a certain accuracy in the diagnosis of the Control and AMI groups (0.7 < AUC < 0.9).

We also calculated the pairwise correlation of the 12 GMCRRDEGs in the combined datasets using a correlation heat map was used for display (Fig. 4F). The results showed that most of the GMCRRDEGs were positively correlated, and the genes NAMPT and IL1B were significantly positively correlated (r = 0.774). Finally, ggplot2 in R was used to produce a scatter plot of the four most strongly correlated gene pairs (Fig. 4G–J).

Fig. 4
figure 4

Differential expression and correlation analysis. (A) Group comparison of GMCRRDEGs in the AMI group and the Control group in the combined GEO datasets. (BE). ROC curves of HP, FOS and TNF (B); EDN1, EPAS1 and CD36 (C); ADRB2, TKT and PHACTR1 (D); IL1B, JUN, and NAMPT (E) in the combined GEO datasets. (F) Heat map of GMCRRDEGs in the AMI group and Control group in the integrated GEO datasets. (G) Scatter plot of the association between NAMPT and IL1B in GMCRRDEGs. (H) Scatter plot of the correlation between NAMPT and FOS in GMCRRDEGs. (I) Scatter plot of the correlation between NAMPT and PHACTR1 in GMCRRDEGs. (J) Scatter plot of the correlation between IL1B and FOS in GMCRRDEGs. AMI, AMI; GMCRRDEGs, Glucose Metabolism and Circadian Rhythm-Related Differentially Expressed Genes. In the correlation heat map, red is a positive correlation, blue is a negative correlation, and the depth of color represents the strength of the correlation.

Functional and pathway enrichment analysis

Functional annotation of the 12 GMCRRDEG-related genes using gene ontology(GO)/Kyoto Encyclopedia of Genes and Genomes(KEGG) enrichment methodologies revealed significant pathway clustering, as shown in Supplementary Table S1. These differentially expressed targets demonstrated pronounced involvement in nitric oxide homeostasis mechanisms, particularly manifesting in three-tiered regulatory hierarchies: (1) activation control of nitric oxide biosynthesis, (2) modulation of nitric oxide metabolic flux, and (3) peptide-responsive cellular signaling dynamics. Additionally, biological specificity has emerged in redox-sensitive macromolecular interaction networks. Several cellular components (CC) were enriched, including endocytic vesicles, specific granules, endocytic vesicle membranes, haptoglobin-hemoglobin complexes, and clathrin-coated vesicles. In the context of MF, the identified enrichment encompassed activities included cytokine activity, R-SMAD binding, SMAD binding, binding of transcription factors specific to RNA polymerase II, and amyloid-beta binding. Furthermore, these genes exhibited significant enrichment within pathways, including the TNF signaling pathway, fluid shear stress, atherosclerosis, pertussis, and leishmaniasis, as well as lipid metabolism and atherosclerosis pathways, as determined by KEGG analysis. Enrichment analysis results are displayed as bubble plots (Fig. 5A).

Network maps of the enrichment analysis were drawn (Fig. 5B-E). The lines illustrate the associated molecules along with the descriptions of the entries. Nodes of larger size signify entries that encompass a higher quantity of molecules.

Fig. 5
figure 5

GO and KEGG enrichment analysis. (A) Bubble diagram showing the results of GO and KEGG enrichment analysis of GMCRRDEGs : GO terms and KEGG terms are shown on the abscissa. (BE) The GO and KEGG network diagram showing: BP (B), CC (C), MF (D) and KEGG (E). The light-yellow nodes represent items, the light blue nodes represent molecules, and the lines represent the relationship between items and molecules. GMCRRDEGs, glucose metabolism and circadian rhythm-related differentially expressed genes; GO, gene ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes; BP, biological process; CC, cellular component; MF, molecular function. In the bubble plot, bubble size and color represent the number of genes and size of the p-value value, respectively. The redder the color, the smaller the p-value value, and the bluer the larger the p-value value. The screening criteria for GO and KEGG enrichment analysis were p value < 0.05 and FDR value (q value) < 0.25.

Establishment of diagnostic model for AMI

To evaluate the diagnostic significance of the 12 GMCRRDEGs in AMI, logistic regression analysis was conducted using the 12 GMCRRDEGs variables. Subsequently, a logistic regression model was developed, and the results were illustrated using a Forest Plot (Fig. 6A). The analysis demonstrated that all 12 GMCRRDEGs exhibited statistical significance within the logistic regression model, as indicated by p-values that were less than 0.05, namely NAMPT, EDN1, EPAS1, JUN, IL1B, ADRB2, CD36, HP, TNF, TKT, FOS, PHACTR1. Second, the SVM-RFE algorithm was used based on the 12 GMCRRDEGs. 5-fold cross-validation was set up, and the average rank of the genes was calculated to obtain the number of genes with the lowest error rate (Fig. 6B) and highest accuracy rate (Fig. 6C) of the model. When the number of genes was 6, the SVM model had the highest accuracy, and the top 6 average-ranked genes were screened for subsequent analyses (Fig. 6D). The 6 GMCRRDEGss identified were JUN, EPAS1, IL1B, ADRB2, FOS, and CD36. Subsequently, they were incorporated into the support vector machine (SVM) model following their selection through least absolute shrinkage and selection operator (LASSO) regression analysis to develop a diagnostic model for AMI. Visualization was accomplished using the LASSO regression model diagram (Fig. 6E) along with the LASSO variable trajectory diagram (Fig. 6F). The analysis indicated that six GMCRRDEGss recognized as key genes were used in the LASSO regression model. The key genes were JUN, EPAS1, IL1B, ADRB2, FOS, and CD36.

Fig. 6
figure 6

Diagnostic model of AMI. (A) Forest plot of 12 GMCRRDEGs included in the logistic regression model in the diagnosis model of AMI. (B, C). The number of genes with the lowest error rate (B) and the number of genes with the highest accuracy (C) obtained by SVM-RFE algorithm are visualized. (D) The average importance ranking lollipop plot of the six genes with the lowest error rate obtained by the SVM-RFE algorithm. (EF). Diagnostic model plot (E) and variable trajectory plot (F) of LASSO regression model. AMI, AMI; SVM, support vector machine; LASSO, least absolute shrinkage and selection operator.

Validation of the diagnostic model for AMI

To further verify the effectiveness of the AMI diagnostic models, a nomogram was constructed based on key genes to illustrate their interrelationships (Fig. 7A). The analysis revealed that the expression levels of the pivotal gene IL1B exhibited markedly higher significance than other variables included in the diagnostic model of AMI. The expression level of CD36 was less significant in the AMI diagnostic model than other variables.

The accuracy and resolution of the AMI diagnostic model were assessed by constructing a calibration curve based on our analysis. The predictive performance of the model was evaluated by comparing the actual and predicted probabilities under various conditions, as illustrated in Fig. 7B. The calibration curve pertaining to the diagnostic model for AMI indicated that the calibration line, represented by the dotted line, displayed a slight divergence from the diagonal line that signified the optimal model; however, it maintained proximity to the fitted line. Decision curve analysis (DCA) was used to assess and demonstrate the clinical utility of the AMI diagnostic models developed using key genes in the combined datasets (Fig. 7C). The examination indicated that, within a specific interval, the model’s trajectory consistently surpassed the lines representing both positive and negative outcomes. This indicates the greater net benefit and superior performance of the model. Additionally, the ROC curve (Fig. 7D) demonstrated that RiskScore expression levels in the combined datasets achieved high accuracy (AUC > 0.9) across different groups.

Patients diagnosed with AMI were classified into two distinct groups, High-Risk and Low-Risk, using the median risk score derived from the diagnostic model as the classification criterion. The risk score was calculated using the following formula:

$$\:RiskScore=JUN\ast\:\left(0.7828\right)+EPAS1\ast\:\left(0.9825\right)+IL1B\ast\:\left(0.5379\right)+ADRB2\ast\:\left(-0.7919\right)+FOS\:\ast\:\:(-0.1603)+CD36\:\ast\:\:\left(0.4531\right)$$

Risk-stratified expression profiling of six pivotal biomarkers in the AMI cohort ( Fig. 7E) revealed distinct transcriptional patterns between the high-risk and low-risk subpopulations. The analysis demonstrated tiered statistical significance across biomarkers:ADRB2 demonstrated nominal statistical significance (p < 0.05), EPAS1 exhibited robust statistical significance (p < 0.01), and JUN, IL1B, FOS, and CD36 reached stringent significance thresholds (p < 0.001). Notably, the overexpression trends of JUN, IL1B, FOS, and CD36 in high-risk subjects suggested a potential mechanistic convergence in AMI pathogenesis. These transcriptional disparities were quantified through risk-group comparative analysis, with effect sizes proportional to the significance levels.

The results obtained from the functional similarity (Friends) analysis were used to identify genes that have a substantial influence on the biological process of AMI ( Fig. 7F). The analysis revealed that FOS serves as a vital gene in AMI, being closest to the established critical threshold (cutoff value = 0.7).

Fig. 7
figure 7

Diagnostic and validation analysis of AMI. (A) Nomogram of key genes in diagnostic models for diagnosing AMI based on combined GEO datasets. (B,C). Calibration curve plot (B) and decision curve analysis (DCA) plot (C) of the AMI diagnostic model based on the key genes in the integrated GEO datasets. (D) ROC curve of RiskScore in the combined GEO datasets. (E) Group comparison plots of Key Genes in the High-Risk and Low-Risk groups of AMI group. (F) Cloud rain diagram of the results of Friends analysis of key genes. DCA: Decision Curve Analysis; AMI, AMI. * represents p value < 0.05, statistically significant; *** represents p value < 0.001, highly statistically significant. In the group comparison graph, blue represents the Low-Risk group and yellow represents the High-Risk group.

Immune infiltration analysis of SsGSEA algorithm based on high and low logistic risk scores

The group comparison diagram (Fig. 8A) revealed that all 17 immune cell types showed statistically significant differences (p < 0.05), including Activated CD8 T cells, activated dendritic cells, central memory CD8 T cells, eosinophils, gamma delta T cells, immature dendritic cells, macrophages, mast cells, MDSC, memory B cells, monocytes, natural killer cells, natural killer T cells, neutrophils, plasmacytoid dendritic cells, regulatory T cells, and T follicular helper cells. Subsequently, correlation analysis regarding the abundance of 17 immune cell infiltrations within the AMI samples was graphically represented using a correlation heatmap (Fig. 8B-C). The analysis revealed that in the low-risk AMI samples, most immune cells exhibited strong positive correlations, with mast cells and neutrophils showing the strongest significant positive correlations (r = 0.726, P < 0.05) (Fig. 8B). Within the high-risk cohort, the majority of immune cell types exhibited pronounced positive correlations. Notably, activated dendritic cells and macrophages demonstrated the most significant positive correlation, characterized by an r value of 0.836 and a p-value of less than 0.05(Fig. 8C). Finally, the association between essential genes and the density of immune cell infiltration was depicted using a correlation bubble plot (Fig. 8D-E). These results indicate that many immune cells were strongly correlated in the low-risk group. Specifically, CD36 and Gamma delta T cells exhibited the strongest significant positive correlation, with an r value of 0.672 (p < 0.05). In the high-risk group, many immune cells showed strong positive correlations, the most notable being between FOS and mast cells (r = 0.77, p < 0.05) (Fig. 8E).

Fig. 8
figure 8

Immune infiltration analysis by ssGSEA algorithm. (A) Group comparison diagram of immune cells in the Low and High Risk groups in the combined GEO datasets. (B,C) Results of correlation analysis of immune cell infiltration abundance in the Low-Risk (B) and High-Risk (C) groups of AMI samples are presented. (D,E) Bubble plot of correlation between immune cell infiltration abundance and key genes in the Low-Risk (D) and High-Risk (E) groups of AMI samples. ssGSEA, single-sample gene set enrichment analysis; AMI, AMI. ns stands for p value ≥ 0.05, no statistical significance; * represents p value < 0.05, statistically significant; ** represents p value < 0.01, highly statistically significant; *** represents p value < 0.001 extremely statistically significant. The absolute value of the correlation coefficient (r value) ranged from 0.5 to 0.8, indicating a moderate correlation. Low-Risk group (blue) and High-Risk group (yellow). Red is a positive correlation, blue is a negative correlation, and the depth of the color represents the strength of the correlation.

Discussion

During the study design phase, we conducted a systematic search of major public databases such as GEO and ArrayExpress, applying strict selection criteria aligned with the core objectives of our research. We chose to include two GEO datasets, GSE66360 and GSE48060, which both originate from the GPL570 platform. These datasets meet the following criteria: they are derived from human samples, specifically from blood tissues, and they are all based on the GPL570 platform. Combined, they provide a sample size of over 150 cases. Furthermore, we implemented rigorous batch effect removal and normalization steps. As shown in Fig. 2, the batch effects have been nearly completely eliminated, which enhances the overall statistical power and reliability of our results. Finally, the combined datasets were annotated and subjected to standardization and normalization using the R package limma.

Through a comprehensive analysis of the combined datasets, we identified 12 DEGs linked to glucose metabolism and circadian rhythms, including notable genes such as NAMPT and IL1B. These results underscore the intricate interactions between metabolic mechanisms and circadian regulation in AMI. Moreover, our study employed robust methodologies, including batch-effect removal and differential gene expression analysis, which enhanced the reliability of our results.

During an AMI, cardiac myocytes undergo ischemia and hypoxia, causing a drastic shift in energy metabolism from aerobic oxidation to anaerobic glycolysis. This transition leads to reduced glucose utilization efficiency and the accumulation of lactate, which significantly impairs myocardial contractile function. Numerous clinical and preclinical studies have shown that patients with acute myocardial infarction often experience stress-induced hyperglycemia, and elevated blood glucose levels are closely linked to worse prognoses19.In the pathological context of AMI, the interplay between glucose metabolism and circadian rhythm genes holds significant biological relevance. For example, EPAS1, a hypoxia-inducible factor, enhances the conversion of glucose to lactate in ischemic cardiac tissue by binding to the promoter regions of critical glycolytic enzymes, thereby sustaining energy production. Meanwhile, IL1B promotes apoptosis in pancreatic β-cells through activation of the NF-κB pathway, while concurrently inhibiting the phosphorylation of insulin receptor substrates, which diminishes the ability of myocardial cells to uptake glucose20.

Analysis of the key genes associated with AMI revealed significant findings regarding JUN, IL1B, and FOS, which were notably upregulated in the AMI group. JUN is a widely recognized transcription factor that plays an essential role in cellular responses to stress and inflammation21, and its elevated expression suggests its potential involvement in the pathological processes of myocardial injury and repair following AMI. Upregulation of JUN may indicate an adaptive response to ischemic stress, contributing to the regulation of genes involved in cell survival and inflammation. IL1B, which encodes interleukin-1 beta, is a pro-inflammatory cytokine implicated in various cardiovascular diseases22. The significant increase in the AMI cohort underscores its role in mediating inflammatory responses that can exacerbate myocardial damage and influence healing process23. These findings align with existing literature that highlights IL1B’s contribution to myocardial ischemia and subsequent remodeling24, suggesting that targeting IL1B could be a therapeutic strategy for managing AMI. FOS is another critical component of the AP-1 transcription factor complex25, is involved in cell proliferation and differentiation. The observed upregulation of FOS in patients with AMI indicates its potential role in the inflammatory response and cellular signaling pathways activated during myocardial infarction. The significant upregulation of these genes in AMI highlights their potential as therapeutic targets and biomarkers for disease progression and underscores the need for further investigation into their roles in myocardial ischemia and recovery processes.

GO and KEGG enrichment analyses indicated that nitric oxide (NO) synthesis, cellular stress response, and TNF signaling pathway play crucial roles in AMI. NO is vital in various physiological processes, including vasodilation and the immune response, and its dysregulation is linked to the pathogenesis of cardiovascular diseases26. NO significantly contributes to inflammatory responses and vascular function during AMI27; excessive production can lead to oxidative stress and myocardial damage, whereas insufficient levels may affect vasodilation and blood flow restoration28. The involvement of the TNF signaling pathway in inflammation and apoptosis further emphasizes its importance in AMI, as elevated TNF levels can exacerbate myocardial injury and influence the healing process29. Our findings suggest that the interplay between glucose metabolism and circadian rhythms may contribute to the inflammatory response and cellular stress mechanisms observed in AMI. Understanding the molecular mechanisms underlying these pathways can provide insights into developing novel intervention strategies aimed at enhancing cardiac protection during ischemic events, thereby addressing critical issues in cardiovascular research and clinical practice30,31.

The immune infiltration analysis results from this study revealed significant alterations in the abundance of 17 immune cell types in the AMI group, highlighting the strong correlation between mast cells and neutrophils in the high-risk cohort. The observed increase in mast cell and neutrophil infiltration aligns with previous research, indicating that these cells are crucial mediators of inflammation and tissue repair during cardiac events32. These data indicate that the presence of these immune cells is not merely a byproduct of myocardial injury but may actively participate in disease progression, potentially influencing outcomes and recovery. By elucidating the relationship between immune cell infiltration and AMI severity, this study contributes to the growing body of literature emphasizing the importance of the immune system in cardiovascular diseases, paving the way for novel interventions that target immune pathways to mitigate myocardial damage and enhance recovery33.

In this study, a risk-scoring system based on GMCRRDEGs demonstrated excellent performance in the diagnosis of AMI. The ROC curve showed that the area under the curve (AUC) of the risk score (RiskScore) was greater than 0.9, indicating a high diagnostic accuracy. This provides a rapid and effective method for assessing patient risk levels in clinical practice and helps optimize individualized management strategies. The risk scoring system can be further expanded and optimized in the following ways in practical applications. First, combining the risk scoring system with traditional clinical indicators (such as electrocardiograms and troponin levels) may further improve prediction accuracy. Although traditional clinical indicators are valuable for the diagnosis of AMI, they often reflect only the local characteristics of the disease34. In contrast, gene expression data can reveal the underlying mechanisms of a disease at the molecular level. Combining these approaches may provide a more comprehensive risk assessment. Future studies can conduct multicenter clinical trials to verify the efficacy of the risk-scoring system in combination with traditional clinical indicators to provide a more precise tool for clinical diagnosis. Second, further optimization of the algorithms to enhance model robustness is an important direction for future research. The current risk-scoring system is based on the LASSO regression and SVM-RFE algorithms, which showed good performance in this study. However, there may still be some limitations in dealing with complex and variable clinical data. For example, the introduction of deep learning algorithms, such as neural networks or convolutional neural networks, can better capture nonlinear features and complex relationships in gene expression data. Additionally, increasing the sample size and data diversity to train and validate the model further is a key step in enhancing its robustness.

In this study, six key genes (JUN, EPAS1, IL1B, ADRB2, FOS, and CD36) were identified using the SVM-RFE algorithm, and the LASSO regression model based on these genes performed well. This model provides an efficient method for the precise prediction of AMI risk and lays the foundation for the development of future clinical decision-support systems. However, to further enhance the practicality and reliability of the model, several aspects need to be explored and optimized. First, validating the model’s stability in a larger independent cohort is a crucial step for future research. Although the current study achieved good results with an integrated dataset, the limitations of the sample size and data sources may have affected the generalizability of the model. Therefore, this model must be validated in multiple independent clinical cohorts to assess its stability and accuracy in different populations and settings. Moreover, through multicenter collaboration, collecting more gene expression data from patients with AMI can further optimize model parameters and improve prediction performance. Second, combining other molecular markers or imaging features to enhance the predictive power is also an important direction for future research. In addition to gene expression data, proteomic, metabolomic, and imaging examinations (such as coronary angiography and echocardiography) provide important information for the diagnosis and risk assessment of AMI. Integrating these multi-omics data and imaging features with the current gene-based model may help to construct a more comprehensive and accurate predictive model, providing stronger support for clinical decision-making.

Limitations.

This study has several limitations. The study primarily relied on publicly available databases, which may not entirely encompass the intricacy of MI. The lack of wet lab experiments hinders the verification of our findings, and the comparatively limited sample sizes may have influenced the strength of the results. Additionally, the potential batch effects of integrating multiple datasets could introduce variability, which could confound the analysis. Future studies should incorporate clinical samples to validate these results. We plan to conduct wet lab experiments to further confirm our findings. Ultimately, this research opens new avenues for innovative diagnostic and therapeutic approaches aimed at treating these alterations in patients with AMI.

Conclusion

This study identified 12 differentially expressed genes related to glucose metabolism and circadian rhythms by integrating GEO datasets and applying various bioinformatic methods to successfully construct an efficient and highly accurate (AUC > 0.9) diagnostic model for AMI. These findings not only provide new insights into the pathogenesis of AMI but also lay the foundation for the future development of novel diagnostic and therapeutic strategies. Future work will focus on expanding the sample size, multi-center validation, and exploring more potential molecular markers to achieve broader clinical applications.

Materials and methods

Data download

The two datasets (GSE66360 and GSE48060) were downloaded from the NCBI GEO database using the R package GEOquery35. As detailed in the Introduction (line 84) and summarized in Table 1, both datasets were generated on the GPL570 platform and derived from whole-blood samples of Homo sapiens.GSE66360 comprises 49 AMI cases and 50 controls; GSE48060 comprises 31 AMI cases and 21 controls. All available AMI and control samples were included in the present study.

Table 1 GEO microarray chip Information.

GEO, gene expression omnibus; AMI, AMI

The GeneCards database36 offers extensive insight into human genes. The search term utilized was “Glucose Metabolism,” and we selected only those genes associated with glucose metabolism that are classified as “Protein Coding” and “Relevace score” > 2. In total, 955 GMRGs were identified. Additionally, a total of 26 GMRGs in the published literature37,38 were obtained from PubMed using “Glucose Metabolism” as the key word. A comprehensive total of 973 GMRGs were identified through the processes of consolidation and removal of duplicates from the previously acquired GMRGs. Comprehensive information on these results is provided in supplementary Table S239.

Similarly, We employed the keyword “circadian rhythm” for our search, subsequently filtering the genes to include only those classified as “Protein Coding” and possessing a relevance score exceeding 2. A total of 701 CRRGs were obtained. Additionally, using “circadian rhythm” as the key word, we obtained a total of 300 CRRGs in the published literature40 in PubMed. A total of 836 CRRGs were obtained after combining the CRRGs obtained using the above methods. Comprehensive details can be found in supplementary Table S3.

The R package sva41 was utilized to remove batch effects from the GSE66360 and GSE48060 datasets, resulting in the formation of integrated datasets. The aggregated datasets included 80 samples originating from the AMI group and 71 samples from the Control group. Subsequently, the combined datasets were normalized using the R package limma42, which involved the annotation of probes and various standardization processes. The expression matrices, both before and after eliminating batch effects, underwent PCA43. This analytical approach was employed to convert the data into a reduced-dimensional format, thereby enabling the visualization of these characteristics in either two- or three-dimensional graphical representations. In the subsequent analysis, the samples from the combined datasets served as the test set and the samples from the GSE48060 dataset functioned as the validation set.

Differentially expressed gene analysis

Based on the sample grouping of the combined datasets, the samples were categorized into AMI and Control groups. Differential analysis was performed on the combined datasets using the limma package in R to examine differences in gene expression levels between the two groups.The R package limma (Version 3.58.1) was used to perform a differential analysis of gene expression between the two groups, and the criteria set for the identification of DEGs comprised a log fold change (|logFC|) threshold exceeding 0.5 and a p-value below 0.05. Genes with logFC > 0.5 and p < 0.05, were classified as upregulated DEGs, while those with logFC < -0.5, and p < 0.05, were considered downregulated DEGs, and the results of differential analysis were used to draw the volcano plot using the R package ggplot2 (Version 3.4.4).

To obtain GMCRRDEGs associated with AMI, all DEGs were interleaved with GMRGs and CRRGs, and Venn diagrams were drawn to obtain GMCRRDEGs. A heat map was generated for visualization using the R package pheatmap (version 1.0.12). Finally, the chromosomal locations of the GMCRRDEGs were visualized by chromosome mapping using the R package RCircos44(Version 1.2.2).

Differential expression of glucose metabolism and circadian rhythm-related genes (GMCRRDEGs) in the integrated GEO dataset, correlation analysis and ROC curve analysis

To explore variations in GMCRRDEGs expression between the two groups in the combined datasets, a comparative analysis chart was created based on GMCRRDEGs expression levels. The ROC curve for GMCRRDEGs was plotted using the R package pROC (Version 1.18.5), which also facilitated computation of the AUC associated with the ROC analysis. The AUC was employed to evaluate the diagnostic effectiveness of GMCRRDEGs expression for the occurrence of AMI. The AUC values for the ROC curve typically range from 0.5 to 1. An AUC value of 1 indicated a superior diagnostic ability. Particularly, an AUC in the range of 0.5 to 0.7 indicates a low level of accuracy. Conversely, an AUC falling between 0.7 and 0.9 denotes moderate accuracy, while an AUC greater than 0.9 reflects a high degree of accuracy.

To investigate the relationship between GMCRRDEGs, Spearman’s algorithm was used to assess the correlation between GMCRRDEGs expression levels within the combined datasets. The results obtained from the correlation analysis were illustrated using the R package pheatmap (version 1.0.12), which facilitated the generation of correlation heat maps. The most relevant Top4 GMCRRDEGs were screened, identified, and a correlation scatter plot was generated using the R package ggplot2 (Version 3.4.4). Correlation coefficients (r values) with absolute values below 0.3 indicated weak or no correlation, those between 0.3 and 0.5 suggested a weak correlation, values ranging from 0.5 to 0.8 denoted a moderate correlation, and coefficients above 0.8 represented a strong correlation.

Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes(KEGG) enrichment analysis

GO analysis45 serves as a prevalent approach for conducting large-scale functional enrichment studies, encompassing three main categories: Biological Process (BP), Cellular Component (CC), and Molecular Function (MF). KEGG46,47 serves as a prominent database that compiles data related to genomes, biological pathways, various diseases, and pharmaceutical agents. We conducted GO and KEGG enrichment analyses of GMCRRDEGs using the R package clusterProfiler48. Statistical significance was determined for items that presented a p-value of less than 0.05, alongside a false discovery rate (FDR), commonly known as the q-value, below 0.25.

Construction of diagnostic model for AMI

To develop AMI diagnostic models using the combined datasets, GMCRRDEGs were analyzed using logistic regression. Logistic regression was employed to investigate the relationship between independent and dependent variables when the latter was categorized as a binary variable. The GMCRRDEGs were evaluated using a significance threshold of p < 0.05, which led to the development of a logistic regression model. Subsequently, the collective expression of the GMCRRDEGs included in the logistic regression analysis was illustrated using a forest plot.

Subsequently, utilizing the GMCRRDEGs incorporated within the logistic regression model, the support vector machine recursive feature elimination (SVM-RFE)49 algorithm was employed, as facilitated by the e1071 package (Version 1.7–14) in R, to identify prospective biomarkers. SVM-RFE relies on the feature selection algorithm of support vector machine (SVM) to select the most important features by recursively eliminating those that contribute the least to the classification.

In conclusion, a LASSO regression analysis was performed using the R package glmnet50 (Version 4.1-8). The analysis was configured with parameters including set.seed(500) and family="binomial,” in accordance with the characteristics identified through the SVM-RFE algorithm. The LASSO regression technique, which is an extension of linear regression, addresses the issue of model overfitting and improves its capacity for generalization by incorporating a penalty term, which is the product of a parameter (lambda) and the absolute value of the coefficient. Diagnostic model diagrams and variable trajectory diagrams were used to depict the outcomes of the LASSO regression analysis. These outcomes constituted the diagnostic model for AMI, with GMCRRDEGs incorporated into the model and identified as key genes. Subsequently, the LASSO RiskScore was determined using the risk coefficients obtained from the LASSO regression analysis. The risk scores were calculated using the following formula:

$$\:RiskScore=\sum\:_{\dot{l}}Cofficient\left({gene}_{\dot{l}}\right)\ast\:mRNAExpressiont\left({gene}_{\dot{l}}\right)$$

Validation of the diagnostic model for AMI

A Logistic Nomogram51 is a graphical representation situated within a Cartesian coordinate system that employs a collection of nonintersecting line segments to illustrate the functional associations among several independent variables. To create a nomogram that visually represented the relationships among the critical genes, the R package rms (Version 6.7-1) was employed, leveraging the outcomes derived from the logistic regression analysis. The R package ggDCA52 (version 1.1) was used to generate DCA maps based on the key genes identified in the combined datasets.

Moreover, the AMI group was classified into two categories, high- and low-risk, according to the median risk score derived from the AMI diagnostic model. A comparison map was generated to explore the differences in key gene expression between high- and low-risk AMI groups. Finally, the R package pROC53 (Version 1.18.5) was used to plot the ROC curve of the model genes and calculate the AUC value. We evaluated the diagnostic effect of the expression levels of key genes on the occurrence of AMI. The AUC derived from the ROC curve were generally within the range of 0.5 1. An AUC value approaching one reflects superior diagnostic efficacy. Particularly, AUC values ranging from 0.5 to 0.7 are indicative of low diagnostic accuracy, those between 0.7 and 0.9 suggest moderate diagnostic accuracy, and values exceeding 0.9 denote high diagnostic accuracy.

The semantic assessment of GO annotations provides a quantitative framework for evaluating the similarities between genes and genomes, thereby positioning it as an essential element in numerous bioinformatics analytical methodologies. The R package GOSemSim54 (Version 2.28.0) was used to assess the functional correlations among key genes. Additionally, functional correlations among these key genes were analyzed by evaluating their functional similarities (friends).

Immune infiltration analysis of the high and low-risk groups of key genes by SsGSEA algorithm

Single-sample gene set enrichment analysis (ssGSEA)55is used to evaluate the presence of specific gene sets in individual biological samples. This method assesses the extent to which various immune cell populations are present in a sample. Different immune cell populations are identified and categorized, including activated CD8 T cells, activated dendritic cells, gamma delta T cells, natural killer cells, and regulatory T cells. Subsequently, ssGSEA generated enrichment scores that reflected the proportional representation of each type of immune cell infiltration across samples. These scores are then compiled into an immune cell infiltration matrix that encompassed data from AMI samples derived from various combined datasets. This approach allows a detailed analysis of immune cell distribution and activity within individual samples, providing insights into the immune landscape of the studied tissues or conditions. We used the R package ggplot2 (Version 3.4.4) map grouping comparison to show the difference in low and high-risk expression by immune cells in the combined datasets of AMI samples between the groups. Subsequently, immune cells that demonstrated significant variance between the two cohorts were selected for in-depth analysis. The Spearman correlation coefficient was used to evaluate the interrelationships among these immune cell types, and the R package pheatmap was used to construct a heatmap that visually depicted the outcomes of the correlation analysis. The AMI cohort was divided into two distinct classifications, namely high- and low-risk, based on the median risk score obtained from the AMI diagnostic framework. To effectively illustrate the correlation analysis results linking key genes with immune cells, a correlation bubble plot was generated using the R package ggplot2 (version 3.4.4).

Statistical analysis

Statistical analyses were performed using R programming language (v4.2.2). Continuous variables exhibiting normal distribution were evaluated for intergroup differences using the independent Student’s t-test, whereas non-normally distributed data were subjected to nonparametric analysis using the Mann-Whitney U test (Wilcoxon rank-sum test). Multigroup comparisons were conducted using the Kruskal-Wallis rank-sum method. The correlation patterns between molecular markers were quantified using Spearman’s rank-order correlation coefficients. All statistical tests followed two-tailed verification principles, with significance thresholds set at p < 0.05, unless stated otherwise.