Introduction

The rates of incidence and mortality associated with cardiovascular diseases are increasing, with acute myocardial infarction (AMI) being the most lethal and presenting considerable health risks to patients1,2. AMI signifies a severe progression of ischemia and hypoxia in myocardial tissue resulting from the blockage of coronary arteries; this condition may lead to localized or extensive damage and necrosis of myocardial cells, causing serious complications, including cardiogenic shock, heart failure, and cardiac arrest3,4. Furthermore, some cases of AMI may exhibit no symptoms during their initial stages. Consequently, accurate and timely diagnosis, along with effective reperfusion therapies such as thrombolysis or percutaneous coronary intervention (PCI), is vital in reducing the extent of AMI and enhancing patient outcomes5,6. At present, the identification of AMI typically relies on alterations in cardiac biomarkers. Standard biomarkers used in clinical settings consist of cardiac troponin T, cardiac troponin I, creatine kinase-MB (CK-MB), and myoglobin7. Nevertheless, these biomarkers are primarily released from necrotic cardiomyocytes within 2–4 h after the onset of AMI, and their concentrations can also be elevated in patients suffering from chronic kidney disease, heart failure, sepsis, and thyroid disorders, particularly among the elderly8. Hence, it is essential to discover additional, more specific biomarkers for the diagnosis of AMI.

Invasive coronary angiography is considered the “gold standard” for the detection of acute myocardial infarction (AMI). Nonetheless, this procedure comes with significant costs and may carry risks for patients9. Due to advancements in gene chip technology and transcriptome sequencing techniques, an increasing number of gene chip applications and bioinformatics analyses are being adopted in cardiovascular research and clinical practice, aiding in the discovery of new biomarkers for the early diagnosis and prognosis of various diseases10,11,12. Conventional methods typically concentrate on a limited number of genes or proteins; in contrast, bioinformatics allows for the evaluation of complex biological systems as cohesive entities. Its advantages in fields such as disease forecasting, personalized medicine, and drug development are becoming more widely acknowledged in clinical environments13,14. For example, Chen et al. identified PRF1 and TBX21 as novel biomarkers for diagnosis as well as potential therapeutic targets through the analysis of microarray expression profiles in patients with AMI15. Similarly, Kiliszek et al. employed a microarray technique to showcase that during ST-segment elevation myocardial infarction (STEMI), numerous genes show altered expression patterns, including those associated with various pathways related to platelet functionality, lipid and glucose metabolism, and the stability of atherosclerotic plaques16. Additionally, Liu et al. conducted a weighted gene co-expression network analysis (WGCNA) on GSE4648 and highlighted the diagnostic potential of ten hub genes, including Socs3, Hspa1b, Atf3, Il1b, Cxcl1, Selp, Ptgs2, Cxcl2, S100a8, and Myd88, using both bioinformatics and laboratory methods17.

In recent years, the issues of obesity and being overweight have generated increasing concerns. The prevalence of these conditions is on the rise and has been linked to type II diabetes mellitus, metabolic syndrome, various cancers, hypertension, and cardiovascular diseases within the general population18,19,20. However, the connection between being overweight or obese and acute myocardial infarction (AMI) remains a topic of debate. Mehta et al. indicated that obese individuals experiencing AMI have a lower risk of mortality compared to those with a normal body mass index (BMI)21. Conversely, Yusuf et al. found that abdominal obesity elevates the risk of AMI across different ages and genders in all regions22. Nonetheless, there has been a limited number of studies that have examined the influence of obesity-related genes (ORGs) on the diagnosis, risk assessment, and prognosis of AMI thus far.

In this study, our goal is to carefully organize multiple microarray datasets, including AMI and control samples, using data from the Gene Expression Omnibus (GEO) database. The main objective is to identify potential biomarkers associated with ORG for the diagnosis of AMI by combining bioinformatics analysis and machine learning methods.

Materials and methods

Data acquisition, processing and identification of differentially expressed genes

We conducted a comprehensive screening of the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/) using the search terms “AMI” or “acute myocardial infarction,” “array” type, and “Homo sapiens.” The inclusion criteria for datasets required that each group contain at least three patients and three controls, and that gene symbols and Entrez IDs were available in the annotated platforms (GPL). Ultimately, five datasets were selected (Table S1). The datasets GSE48060, GSE60993, GSE66360, and GSE97320, encompassing 90 samples from AMI and 81 control samples, were combined to create a metadata file. This metadata served as the training cohort for the analysis of differentially expressed genes (DEGs) and the development of a diagnostic model. Additionally, GSE59867, which includes 111 AMI samples and 46 control samples, was selected for independent validation. To preprocess the data and eliminate batch effects, the ComBat function from the SVA package was employed23, followed by principal component analysis (PCA) and t-distributed stochastic neighbor embedding (t-SNE) to verify the removal of batch effects. The “limma” package in R was applied for background correction, inter-array normalization, and analysis of differential expression between AMI and control samples. A false discovery rate adjusted p-value of < 0.05 and | log2 (fold change) | >0.8 were established as the criteria for identifying DEGs. We identified 1556 obesity-related genes (ORGs) from the GeneCards database (https://www.genecards.org/) using the term “obesity” with a relevance score of ≥ 5 as the screening criteria. Finally, the differentially expressed obesity-related genes (DE-ORGs) were derived by intersecting the DEGs with ORGs.

Screening of candidate diagnostic biomarkers

To discover genes with diagnostic capabilities, three machine learning techniques were employed to forecast disease status. The least absolute shrinkage and selection operator (LASSO) is a regression analysis technique that incorporates regularization to enhance predictive accuracy24,25. The LASSO methodology was implemented with the “glmnet” package in R to detect genes significantly linked to the differentiation between AMI and normal samples. Support vector machine recursive feature elimination (SVM-RFE) was employed to identify the features with the greatest discriminative ability26. In the case of random forest, we assessed the error rates for 1 to 500 trees, with the optimal number of trees identified as the count yielding the lowest error rate while maintaining the best stability27. The selection of candidate genes was based on the combined outcomes of the three machine learning methods.

Function analysis and protein-protein interaction network of DEGs

In order to enhance our understanding of the biological functions associated with the chosen DEGs, we conducted an analysis utilizing Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways through the “clusterProfiler” package. Concurrently, we developed a protein-protein interaction (PPI) network by employing the STRING database (http://string-db.org), applying a minimum interaction score set to a medium confidence level of 0.4 as our selection criterion and excluding isolated nodes from the network.

Construction of obesity-related genes model

The expression levels of the candidate genes were obtained from both the training and validation cohorts, and the feature value for each sample was computed using the following formula: \(Feature{\text{ }}value{\text{ }} = {\text{ }}\beta _{0} + {\text{ }}\beta _{1} ^{*} X_{1} + {\text{ }}\beta _{2} ^{*} X_{2} + {\text{ }}...{\text{ }} + {\text{ }}\beta _{n} ^{*} X_{n}\). In this equation, β0 denotes the Y-intercept, βn signifies the binary logistic regression coefficient for the n-th gene, Xn represents the expression measurement of the n-th gene, and n indicates the total number of genes included in the regression model. To assess the diagnostic efficacy of the feature genes, the receiver operating characteristic (ROC) curve was generated utilizing the pROC package.

Immune cells infiltration, immune function and correlation analysis

The immune gene dataset and its corresponding annotations were obtained from the ImmPort database (https://www.immport.org/). To assess the infiltration proportions of 22 different immune cell types across all samples in the training cohort, the CIBERSORT algorithm was employed. The immune gene-annotated pathways were evaluated in each sample using the ssGSEA algorithm. Subsequently, based on the median feature values, samples within the training cohort were categorized into low-risk and high-risk groups. The Wilcoxon test, utilizing the “ggplot2” package in R, was conducted to analyze the variations in immune cell types and immune functions between these two risk groups28. To determine the relationships between diagnostic genes and the differing immune cells or functions, Spearman correlation analysis was executed. Furthermore, the genes distinguishing the low-risk from the high-risk groups underwent KEGG pathway enrichment analysis via the gene set enrichment analysis (GSEA) method29. Multiple-testing corrections were applied to the comparisons of immune-cell and immune-function, as well as to the GSEA results. An adjusted p-value < 0.05 were considered significant.

Real-time qPCR

Blood samples from five patients diagnosed with AMI and five control subjects were obtained from Huaihe Hospital at Henan University in China. The criteria for excluding participants from this study included the presence of cancer, autoimmune disease, serious infectious disease, advanced liver and kidney failure, hematological disease, and a prior history of cardiovascular disease (Table S 2). Prior to sample collection, informed consent was secured from all participants, both patients and healthy individuals. The study protocols involving human blood were carried out in alignment with the principles outlined in the Declaration of Helsinki and received approval from the Medical School’s Ethics Committee at Henan University, China (HUSOM-2018-282). Peripheral blood mononuclear cells (PBMCs) were isolated using a PBMC separation solution. Total RNA was extracted employing Trizol reagent (Takara, Dalian, China) as per the manufacturer’s guidelines. Following this, the extracted RNA underwent reverse transcription (RR036A, Takara) to create complementary DNA (cDNA). Real-time PCR was performed with the TB GreenTM premix Ex TaqTM (RR420A, Takara) using the ABI Prism 7900 System, adhering to the following protocol: the denaturation step was carried out for 10 s at 95 °C, the annealing phase took 20 s at 60 °C, the extension phase lasted for 30 s at 72 °C, and a total of 40 cycles were executed30. Gene expression levels were analyzed using the 2−ΔΔCt method, with GAPDH being the endogenous control. The primer sequences are detailed in Supplementary Table S3.

Statistical analysis

Statistical analyses for the bioinformatics section of this research were performed using R (version 4.4.2). To assess the diagnostic effectiveness of the biomarkers and the diagnostic model, ROC curve analysis was applied. The correlation between the expression levels of model genes and the presence of infiltrating immune cells was evaluated utilizing Spearman’s correlation coefficient. All statistical tests were two-sided, and a p-value of less than 0.05 was considered statistically significant.

Results

Screening of DEGs in AMI

Following the standardization of the original data from the four data sets, t-SNE and PCA mappings were created, demonstrating that batch differences were effectively eliminated and the data remained stable (Fig. 1A-D). Next, the DEGs within the training cohort were analyzed utilizing the limma package after addressing batch effects. In comparison to the control group, 157 genes showed significant expression changes in the AMI patients group, featuring 142 genes up-regulated and 15 down-regulated. These DEGs were then represented in both a volcano plot and a cluster heatmap (Fig. 1E, F).

Fig. 1
figure 1

Normalization of the dataset and analysis of differential gene expression. (A-D) Normalization and batch effect correction in four microarray datasets GSE48060, GSE60993, GSE66360, and GSE97320. (E, F) Volcano map of differentially expressed genes (DEGs) after screening (adjusted p < 0.05 and |log2FC|>0.8) (E) and cluster heatmap of DEGs (F) between acute myocardial infarction (AMI) and control samples using the merged dataset (AMI = 90, control = 81) derived from GSE48060, GSE60993, GSE66360, and GSE97320.

Identification of differentially expressed obesity-related genes and enrichment analysis

The overlap of 157 genes with differential expression and those associated with human obesity indicated the identification of 18 differentially expressed obesity-related genes (DE-ORGs), which include IL1RN, SERPINA1, TLR2, NFKBIA, PYGL, IL1B, MMP9, DGAT2, TLR4, NLRP3, ITLN1, CEBPB, CD163, ALDH2, STEAP4, IRS2, SLC7A7, and PTGS2 (Fig. 2A). The heatmap indicated that these 18 genes formed a cluster characterized by elevated expression in AMI samples while exhibiting lower expression in control samples, as evaluated using the training database (Fig. 2B). To further explore the pathophysiological roles of these DE-ORGs, enrichment analyses were conducted using clusterProfiler, involving Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), and Disease Ontology (DO). The GO analysis showed that the DE-ORGs were predominantly implicated in the response to lipopolysaccharide, inflammatory responses, and membrane rafts (Fig. 2C). KEGG analysis revealed that these DE-ORGs participated in various signaling pathways, such as those related to lipids and atherosclerosis, the IL-17 signaling pathway, and the TNF signaling pathway. Moreover, they also played roles in the NF-kappa B signaling pathway and toll-like receptor signaling pathway (Fig. 2D). Additionally, the DO analysis indicated that the 18 DE-ORGs were primarily enriched in conditions such as pancreatitis, fatty liver disease, and lipid storage disease (Fig. 2E). Considering the strong association between pancreatitis, lipid metabolism disorders, and cardiovascular disease31,32,33, these findings underscored a significant link between DE-ORGs and AMI, emphasizing that DE-ORGs primarily regulate inflammatory responses and lipid storage.

Fig. 2
figure 2

Differentially expressed obesity-related genes (DE-ORGs) and functional enrichment. (A) Identification of 18 DE-ORGs by overlapping the 157 differentially expressed genes with 1556 obesity-related genes. (B) Expression heatmap of 18 DE-ORGs in AMI patients and controls. (C) GO enrichment analysis of DE-ORGs. (D) KEGG enrichment analysis of DE-ORGs. (E) DO enrichment analysis of DE-ORGs.

Diagnostic features biomarkers were determined by machine learning

A total of three machine learning algorithms were utilized to identify diagnostic signature biomarkers within the 18 DE-ORGs mentioned above. By applying LASSO analysis, we discovered 9 feature genes: IL1RN, SERPINA1, TLR2, NFKBIA, IL1B, MMP9, ITLN1, ALDH2, and PTGS2 (Fig. 3A, B). Support vector machine (SVM) is supervised machine learning method widely used for classification and regression tasks. To mitigate the risk of overfitting, a recursive feature elimination (RFE) algorithm was employed to extract the most relevant genes from the meta-data cohort. Subsequently, SVM-RFE was utilized to identify the features with the highest discriminative power. The SVM-RFE algorithm indicated that the model’s prediction error was minimized when n = 16, enhancing its predictive capacity (Fig. 3C, D). Consequently, we identified sixteen feature genes including IL1B, ITLN1, NFKBIA, PTGS2, MMP9, TLR2, IL1RN, ALDH2, SLC7A7, CEBPB, TLR4, IRS2, CD163, PYGL, NLRP3, and DGAT2. Random forest analysis revealed that the model reached a stable state at ntree = 500 (Fig. 3E). Following this, we selected genes with importance scores exceeding 5, leading us to eight genes: MMP9, SERPINA1, IL1RN, TLR2, IRS2, NFKBIA, ITLN1, and DGAT2 (Fig. 3F). Finally, we intersected the findings of the three machine learning algorithms (Fig. 3G), resulting in the identification of five key genes: IL1RN, TLR2, NFKBIA, MMP9, and ITLN1. The distribution of these genes among differentially expressed genes is depicted in Fig. 3H. An analysis of protein-protein interactions suggests that these selected candidate genes can create an interaction network that revolves around IL1RN (Fig. 3I).

Fig. 3
figure 3

Identification of diagnostic signature genes. (A, B) The variation curve of regression coefficient (A) and root mean square (RMS) error (B) as a function of Log (λ) in Lasso regression. (C, D) The results of support vector machine-recursive feature elimination (SVM-RFE) algorithm, with the broken line chart showing the number of genes corresponding to the lowest error rate (C) and the highest accuracy (D) for AMI. (E) Identification of the AMI-specific genes using random forest approach, illustrating the impact of the number decision trees on the error rate; the x-axis represents the number of decision trees, while the y-axis denotes the error rate. (F) The most importance genes selected by random forest, with the x-axis indicating the importance index and the y-axis listing the genes. (G) Venn diagram of five candidate genes. (H) Distribution of five candidate genes in a volcano map of differentially expressed genes (DEGs) between AMI and control samples. (I) Protein-protein interaction network based on candidate genes.

Obesity-related genes model for diagnosis of AMI and validation

The five identified genes were utilized to develop a diagnostic model employing a binary logistic regression algorithm within the training cohort. The feature value is calculated using the formula: feature value = -23.3899 + 1.3338 * IL1RN + 0.4214 * TLR2 + 0.6228 * NFKBIA + 0.3462 * MMP9 + 0.6269 * ITLN1. The ROC curve generated from the diagnostic model for acute myocardial infarction (AMI) was employed to assess the model’s diagnostic performance. The results indicated that the area under the curve (AUC) for this model was 0.924 in the training dataset and 0.825 in the validation dataset, reflecting a high level of diagnostic accuracy (Fig. 4A, B). Subsequently, a nomogram was developed utilizing the “rms” package to forecast the occurrence of AMI, incorporating IL1RN, TLR2, NFKBIA, MMP9, and ITLN1. The “Points” were illustrated independently as scores for the five key DE-ORGs, while the “Total Points” represented the cumulative score of these DE-ORGs (Fig. 4C). To evaluate the nomogram’s predictive performance, the AUC was utilized, demonstrating superior predictive accuracy when compared to the five previously mentioned DE-ORGs (Fig. 4D and Supplementary Figure S1A-E). The calibration curve revealed a minimal discrepancy between the actual and predicted incidences of AMI (Fig. 4E). Additionally, the Decision Curve Analysis (DCA) illustrated that the diagnostic model performs effectively, yielding significant net benefits (Fig. 4F). Lastly, we assessed the expression levels of IL1RN, TLR2, NFKBIA, MMP9, and ITLN1 in both AMI and healthy controls using the validation dataset GSE59876. Notable alterations were observed in the expression of IL1RN, TLR2, NFKBIA, and MMP9 between the AMI and control cohorts, whereas ITLN1 did not show significant variation (Fig. 4G). RT-qPCR was conducted to further validate the expression levels of the diagnostic biomarkers, and the findings confirmed that the expression levels of IL1RN, TLR2, NFKBIA, MMP9, and ITLN1 were in agreement with those of the training sets (Fig. 4H and Supplementary Figure S1F-J).

Fig. 4
figure 4

Development and validation of a diagnostic model for predicting AMI. (A, B) ROC curves for the diagnostic model in both the training cohort (A) and the validation cohort (B). (C) Nomogram designed for predicting AMI within the training cohort. (D) The ROC curves of the diagnostic model, revealing AUC of 0.933 for the training set and 0.882 for the testing set. (E, F) Calibration curve (E) and DCA curve (F) of the nomogram model in the training cohort. (G) Validation of the expression of diagnostic biomarkers using the GSE59867 dataset (AMI = 111, control = 49). (H) Diagnostic biomarkers validated in peripheral blood samples from AMI patients (n = 5) and controls (n = 5) through RT-qPCR.

Correlation analysis between immune cells and high-risk and low-risk populations

The samples in the training dataset were categorized into low-risk (n = 85) and high-risk (n = 86) groups based on the median feature value obtained from the diagnostic model. A total of six immune cell subsets exhibited distinct infiltration patterns between these two groups, including resting memory CD4 T cells, gamma delta T cells, resting NK cells, monocytes, activated mast cell, and neutrophils (Fig. 5A). The heatmap depicting differential immune cell expression was generated using the “pheatmap” package in R (Supplementary Figure S2A). The relationship between immune cells and genes of the diagnostic model was assessed across different risk groups. The genes IL1RN, TLR2, NFKBIA, and MMP9 demonstrated a significant positive correlation with neutrophils in both risk categories. Additionally, genes IL1RN, TLR2, and NFKBIA exhibited positive correlations with activated mast cells, while IL1RN, TLR2, NFKBIA, and MMP9 revealed a notable negative correlation with memory resting CD4 T cells and gamma delta T cells within the high-risk group (Supplementary Figure S2B, C). Moreover, a significant variance was detected among thirteen immune functions, the most pronounced of which was related to CCR, CD8 + T cells, cytolytic activity, macrophages, MHC class I, neutrophils, NK cells, parainflammation, T cell co-inhibition, T cell co-stimulation, Th1 cells, Th2 cells, and tumor-infiltrating lymphocytes (TIL) (Fig. 5B). The correlation analysis between diagnostic marker genes and immune function categories was performed using the Spearman method. In the high-risk group, IL1RN and TLR2 displayed a notably positive correlation with macrophages and immature dendritic cells (iDCs), while showing a significant negative correlation with T cell co-stimulation, Th2 cells, Th1 cells, helper T cells, checkpoints, and type II interferon (IFN) responses (Fig. 5C, D). Additionally, the findings from the GSEA enrichment analysis revealed that the high-risk group showed significant enrichment in pathways such as chemokine signaling, FC-gamma-R-mediated phagocytosis, leishmania infection, and Toll-like receptor signaling, while ribosome enrichment was most pronounced in the low-risk group (Fig. 5E). This observation implies that these immune functions could play crucial roles during key moments in the pathophysiology of AMI.

Fig. 5
figure 5

Correlation of immune cells and immune functions with high- and low-risk groups. (A, B) Differential analysis of immune cell infiltration (A) and 29 immune functions (B) across these groups. (C, D) Heatmap of the correlation between diagnostic genes and 29 immune functions in the low-risk group (C) and the high-risk group (D). (E) GSEA results for the pathways in the high-risk group. ** p < 0.01, *** p < 0.001; ns, no significant.

Discussion

AMI represents the most severe form of coronary artery disease, resulting in millions of fatalities each year across both developed and developing nations34,35. The evolution of AMI is notably rapid, often leading to delays in treatment. Even with advancements in reperfusion techniques and pharmacological therapies, AMI continues to pose a significant challenge to global health, impacting over 7 million individuals globally on an annual basis36. In industrialized nations, obesity has emerged as a critical health concern, with its prevalence escalating among both adults and children37. This issue goes beyond aesthetics; it considerably heightens the likelihood of serious health complications, such as AMI22,38,39. The link between obesity and coronary artery disease (CAD) largely arises from atherosclerosis, which is intensified by excess fat, especially in the abdominal region40. The American Heart Association recognizes obesity as a significant modifiable risk factor for CAD. In a study based on population data, independent associations were found between being overweight or obese and the early onset of AMI41,42. Moreover, various long-term longitudinal studies have confirmed that obesity serves as an independent risk factor for coronary atherosclerosis43,44. Nevertheless, there has been a scarcity of reports focusing on the impact of obesity-related genes in the diagnosis and risk assessment of AMI. To address this shortcoming, the present study established a new diagnostic signature related to DE-ORGs utilizing datasets obtained from the GEO database. Through comprehensive analysis of transcriptome alongside clinical data, this signature demonstrated promising discriminatory efficacy in both training and validation cohorts. It offers remarkably high accuracy in predicting AMI, with AUC values ranging from 0.747 to 0.962. Furthermore, this signature is correlated with the infiltration of immune cells and varying patterns of immune function. These findings provide new perspectives that enrich the discourse on obesity-related genes and cardiovascular disease (CVD), suggesting valuable clinical applications for the early detection and risk classification of AMI.

In this study, we developed a robust diagnostic scoring system composed of five specific genes: IL1RN, TLR2, NFKBIA, MMP9, and ITLN1. Previous studies have revealed certain associations between these genes and the development and pathogenesis of cardiovascular disease (CVD). For example, IL1RN (interleukin 1 receptor antagonist) is essential in modulating inflammatory responses by suppressing the activities of interleukin 1 (IL-1), a significant pro-inflammatory cytokine45,46. An analysis comparing gene expression levels in 92 patients with AMI and 57 control participants demonstrated that IL1RN, along with IL1B and AQP9, exhibited significant dysregulation in AMI. Functional enrichment analysis associated these genes with immune-related pathways, such as leukocyte migration and cytokine production. The research proposed that the FSTL3-miR-330-3p-IL1B/IL1RN axis could represent novel RNA regulatory pathways involved in the progression of AMI. These results underscored IL1RN’s dual function in inflammation and tissue repair, indicating its potential as a diagnostic or therapeutic target47. TLR2 (toll-like receptor 2), a well-conserved member of the TLR family, has been linked to the regulation of ventricular remodeling following AMI48. Notable cardioprotective benefits, including the preservation of cardiac function and the reduction of ischemia-reperfusion injury, were reported in a mouse model of AMI after the inhibition of TLR248,49. NFKBIA, referred to as NF-kappa B inhibitor alpha, is a protein that plays a crucial role in the regulation of the NF-kappa B signaling pathway, which is essential for controlling the expression of genes involved in inflammation, immune response, and cell survival50. Chronic inflammation is a key characteristic observed in conditions like atherosclerosis, heart failure, and myocardial infarction51,52. By modulating this pathway, NFKBIA can affect the onset and progression of these diseases. MMP9, known as Matrix Metalloproteinase-9, functions as an enzyme that is integral to the degradation of the extracellular matrix. This enzyme plays a significant role in various physiological and pathological processes, encompassing tissue remodeling and inflammation. Its function is closely linked to numerous diseases, particularly those associated with cardiovascular conditions53,54. Increased levels of MMP9 are commonly identified in individuals diagnosed with atherosclerosis55, wherein plaque accumulation occurs in the arteries, resulting in narrowing and diminished blood flow. This plaque accumulation is aggravated by inflammatory processes within the arterial walls, while MMP9 further promotes the degradation of the extracellular matrix in these regions, potentially leading to plaque instability and a heightened risk of rupture53. In the case of myocardial infarction, or a heart attack, serum levels of MMP9 rise markedly within 6 h of onset, peaking between 24 and 48 h thereafter. There is a significant correlation between MMP9 levels and the extent of myocardial infarction as well as the reduction in left ventricular ejection fraction. In clinical settings, the simultaneous assessment of MMP9 and cardiac troponin I (cTnI) enhances the diagnostic sensitivity for early acute coronary syndrome (ACS) (< 4 h) from 72% to 89%56,57. ITLN1 (also referred to as “omentin”) is a highly prevalent mRNA and protein found in visceral adipose tissue and has been linked to the pathophysiology of obesity as well as other metabolically associated diseases, being characterized as a “novel adipokine”58,59. In their research utilizing the ApoE −/− mouse model, Lin et al. demonstrated that administering ITLN1 could elevate collagen levels in the coronary vascular plaques of these mice, diminish the size of necrotic cores, and prevent plaque rupture. This suggests that ITLN1 plays a role in modulating macrophage functionality, impeding the release of inflammatory factors and apoptosis, while also enhancing the stability of atherosclerotic plaques via integrin receptors α v β 3 and α v β 560. In a separate study, Bai et al. reported that plasma concentrations of ITLN1 in individuals with coronary heart disease (CAD) were markedly lower compared to healthy controls (61.21 ± 10.21 ng/dL vs. 95.22 ± 12.21 µg/L, p < 0.0001), and a negative correlation was found between ITLN1 levels and the severity of CAD61. Nevertheless, our findings revealed that the mRNA expression of ITLN1 in the group of patients with acute myocardial infarction (AMI) was significantly elevated compared to the control group. The reasons for this discrepancy with existing literature may relate to differences in sample sizes or individual patient variations, which require further investigation. Thus far, the aforementioned findings partially correspond with the outcomes of the present study, wherein TLR2 and MMP9 were found to be upregulated in AMI. Notably, the abnormal expression of IL1RN, TLR2, NFKBIA, MMP9, and ITLN1 is strongly associated with clinical outcomes in CVD patients. Additionally, this novel diagnostic signature consisting of five DE-ORGs demonstrated excellent predictive capability in both the training cohort (AUC = 0.924) and external validation cohort (AUC = 0.825), offering valuable insights for the prompt diagnosis of AMI. Huang et al. created a logistic regression diagnostic model for AMI, achieving AUC values of 0.794 in the training set and 0.745 in the testing set62. Meanwhile, Chen et al. established a random forest diagnostic model for AMI, with AUC values of 0.855 (training set) and 0.731 (testing set)63. In comparison to these prior studies, our diagnostic model demonstrated commendable accuracy, indicating that it is more robust and universally applicable. This may provide significant insights into new molecular subtypes and improve early diagnostic assessments of AMI, ultimately facilitating tailored treatment and management approaches for future patients.

However, it is important to recognize the constraints of our research. First, the original case numbers in each dataset were somewhat limited, prompting us to utilize multiple datasets. Second, because the study was retrospective and certain critical clinical data were missing in the GEO datasets, comparing the diagnostic model’s predictive value with that of traditional biomarkers in diagnosing AMI is not feasible. Third, verification of the expression levels of IL1RN, TLR2, NFKBIA, MMP9, and ITLN1 should be conducted using RT-qPCR or Western blot in datasets that include larger sample sizes. Lastly, this investigation relied on bioinformatics analysis, highlighting the necessity for larger, prospective clinical validation and comparison with established cardiac biomarkers, such as troponins and CK-MB, to evaluate in vivo outcomes.

Conclusion

Through three machine learning algorithms, LASSO, RF, and SVM-RFE, we identified five genes associated with obesity: IL1RN, TLR2, NFKBIA, MMP9, and ITLN1. These genes may serve as potential therapeutic targets and biomarkers for the development of AMI. More importantly, the present study created an innovative diagnostic model aimed at the early AMI detection, centered on these five genes, thereby providing fresh insights into the underlying mechanisms of AMI and presenting an intriguing avenue for future research initiatives.