Introduction

Diabetes mellitus (DM) encompasses a group of metabolic disorders characterized by chronic hyperglycemia and long-term disruptions in carbohydrate, fat, and protein metabolism caused by various factors1,2. The two most common types of DM are type 1 diabetes (T1D), resulting from absolute insulin deficiency due to pancreatic β-cell destruction, and type 2 diabetes, characterized by insulin resistance and inadequate insulin secretion3,4. Type 2 diabetes (T2D) is the predominant form, accounting for 90–95% of all diabetes cases5. This condition is associated with a range of complications, including cardiovascular disease, retinopathy, nephropathy, neuropathy, and foot ulcers that may lead to amputation, significantly impairing patients’ quality of life and life expectancy2,6. As a growing global public health concern, the prevalence of T2D continues to rise, placing substantial burdens on human health and socioeconomic systems4. Therefore, new insights into the etiology, diagnosis, and prognosis of diabetes are urgently needed.

The current gold standard for diagnosing T2D is the measurement of venous plasma glucose levels1. However, research indicates that diabetes-related metabolic changes may occur decades before the onset of hyperglycemia. During this pre-diabetic phase, although blood glucose levels remain below the diagnostic threshold, insulin resistance, β-cell dysfunction, and other metabolic abnormalities are already present. This stage is linked to an increased risk of early kidney disease, chronic kidney disease, small-fiber neuropathy, diabetic retinopathy, and macrovascular disease5,7. Conventional treatments for T2D include dietary management, exercise therapy, oral antidiabetic medications, and insulin injections. However, these strategies neither cure diabetes nor fully prevent its complications4. Thus, early diagnosis and timely intervention are essential for delaying diabetes progression and minimizing complication risks, facilitating clinical decision-making, and benefiting high-risk individuals. Nonetheless, critical diagnostic genes and effective therapeutic targets remain to be identified.

Over the past few decades, diabetes has attracted significant research interest. For instance, Sun et al. demonstrated that exosomes can function not only as biomarkers for the early diagnosis of diabetes but also as potential therapeutic agents for managing diabetes and its complications8. Additionally, several studies have identified associations between specific metabolites and T2D. For example, mannose has been linked to T2D, while fatty acids, proinsulin, inflammatory markers (such as glycoprotein acetylation, interleukin-1 receptor antagonist, and hs-CRP), and amino acids have been associated with an increased risk of developing the disease9.

Machine learning algorithms have been used more than once in diabetes and related diseases. In the search for diagnostic markers for diabetic nephropathy, Ming Zhong et al. used the least absolute shrinkage and selection operator (LASSO) regression algorithm, support vector machine-recursive feature elimination (SVM–RFE) algorithm and random forest (RF) algorithm that eventually obtained four genes as diagnostic markers10. Moreover, the LASSO logistic regression and SVM-RFE algorithms were also applied to identify potential diabetic peripheral neuropathy diagnostic markers11. In addition to these, machine learning algorithms have been used in the study of other diseases such as COPD12 , AD13 , SLE14 and glioma15. All the above studies have shown that machine learning algorithms have high utility and accuracy.

In this study, to uncover the potential mechanism and biomarkers of T2D, bulk-seq and single-cell seq data from the Gene Expression Omnibus (GEO) database were analyzed. Key genes (SLC2A2) was identified as the novel biomarker of T2D. Functional enrichment analyses and immune cell infiltration analysis were performed to reveal the molecular mechanism of T2D. Finally, in vivo studies initially validated the results of bioinformatic analysis. Our study provides novel insights by combining differential analysis with machine learning (LASSO and SVM-RFE) to robustly identify SLC2A2 as a key T2D gene. This integrated approach revealed SLC2A2’s dual role in beta cell dysfunction (via cAMP/insulin resistance pathways) and immune modulation (correlating with CD4+ memory T cells), offering new mechanistic understanding of T2D.

Materials and methods

Data collection and download

Type 2 diabetic and non-diabetic gene expression datasets containing GSE76894, GSE76895 and GSE86468 were downloaded from the GEO database (https://www.ncbi.nlm.nih.gov/geo/). The GSE76894 dataset containing 19 T2D and 84 non-diabetic (ND) islet samples was derived from the GPL570 platform of Affymetrix Human Genome U133 Plus 2.0 Array. And the GSE76895 dataset contains 36 T2D, 32 ND, 15 impaired glucose tolerance (IGT), and 20 type 3c diabetes (T3cD) cases. GSE86468 dataset includes 9 T2D and 15 ND islet samples. We used GSE76895 as a training cohort, while GSE76894 and GSE86468 were used as validation cohorts to confirm the analysis results. Applying the “limma” package, differentially expressed genes (DEGs) from the GSE76895 data were with the cut-off criteria of |fold change| ≥ 1.5, adj.P. Val. Filter < 0.05. A linear model was then fitted using the ‘lmFit’ function, and differential expression was assessed with the “eBayes” function. The LASSO algorithm was implemented using the “glmnet” R package, along with the SVM-RFE algorithm based on the “e1071”, “kernlab”, and “caret” packages to analyze key genes among the DEGs15.

Functional enrichment analysis

With the help of the “ClusterProfiler” package in RStudio, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG)16,17 enrichment analysis was performed to further understand the biological significance of DEGs. Statistical significance was defined as a P value less < 0.05. Then, the “GOplot” package was used to visualize the results of the enrichment analysis. Besides, GSEA analysis was performed with the criteria of adj.P-value < 0.05. Top3 GSEA terms of the two groups were visualized by “enrichplot” package.

Feature selection by LASSO and SVM-RFE algorithms

Two machine-learning algorithms (LASSO and SVM-RFE) were used to screen characteristic genes. LASSO, which was performed to find genes significantly associated with different samples, is an algorithm that can be used for regression analysis and regularization for variable selection with the help of the R software package “glmnet”18. For LASSO analysis, the parameters α = 1, 10-fold cross-validation, and λ selection via minimum criterion were selected. We performed SVM-Recursive Feature Elimination (SVM-RFE) using the “caret” package in R with a radial basis function kernel, evaluating feature subsets at sizes of 2, 4, 6, 8 and sequentially from 10 to 40 features (in increments of 3). The analysis employed 10-fold cross-validation (method = “cv”) with AUC as the evaluation metric, using default parameters for gamma (1/(data dimension)) and cost (C = 1). Feature importance was calculated by averaging variable importance across all cross-validation folds, with the optimal feature subset selected based on maximum mean AUC performance. The random seed was set to 123 to ensure reproducibility of the feature selection process16.

For further analysis, we combined the overlapping genes between LASSO and SVM-RFE. In order to evaluate their power as candidate diagnostic biomarkers, we verified their expression levels.

Evaluation of immune cell infiltration

As a highly applicable analytical tool and a deconvolution algorithm, CIBERSORT can analyze 547 genes. A linear support vector regression, as a machine learning method, was used to deconvolve the bulk gene expression matrix to denoise. Immune cell infiltration (22 various cell types) was precisely quantified in each T2D and ND gene expression profile17 through the “CIBERSORT” package in RStudio. Further analysis was limited to samples that had CIBERSORT outputs of P < 0.05. Using the R packages “reshape 2” and “ggExtra”, spearman correlation coefficients were performed between candidate diagnostic biomarkers and significantly changed immune cells.

Single-cell transcriptome data processing and analyzing

Single-cell RNA sequence (scRNA-seq) data of T2D and ND was downloaded from ArrayExpress (https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-5061/). Single-cell cDNA libraries were generated using the Smar-sea2 protocol. The scRNA-seq data contains 4 T2D and 6 ND samples19. During the processing of single-cell transcriptome data, we performed normalization, scaling, and cell clustering. Utilizing the “Seurat” package (http://satijalab.org/seurat/), we successfully identified 12 primary cell categories. Single cells were extracted with the criteria of nFeature_RNA > 500 and < 7000, percent.mt < 10%, nCount_RNA > 1000 to remove doublet and dead cells. Using the “LogNormalize” method with the “NormalizeData” function, we normalized the filtered gene-barcode matrices. The top 2000 highly variable genes were discovered by the “FindVariableFeatures” function using the “vst” method, which was previously centered and scaled using “ScaleData”. Based on these 2000 highly variable genes, we performed principal component analysis (PCA) and used the Harmony package to remove batch effects and perform dimensionality reduction. Functions “FindNeighbors”, “FindClusters” and “runTSNE” from Seurat were then used to display dimensionality-reduced clusters on the 2D maps generated by the t-distributed t-SNE. Moreover, expression levels of diagnostic biomarkers in different cell types are represented by bubble diagrams. Finally, “Vlnplot” function of “Seurat” package was used to show the expression level of key gene in scRNA-seq data.

T2D mouse model

C57BL/6J wild-type mice, aged 6–7 weeks, were randomly assigned to two groups: (1) a control group receiving a standard diet (ND group) and (2) a T2D group subjected to a high-fat diet and treated with streptozotocin. The T2D group was fed a high-fat diet (#D12492) for a period of 4 weeks before receiving streptozotocin injections (30 mg/kg, Sigma-Aldrich) for three consecutive days to induce T2D. Mice with blood glucose levels above 16.7 mmol/L, as determined by a blood glucose meter (LifeScan), were identified as having T2D. All animal experiments were approved by the Institutional Animal Care and Use Committee of Fudan University and were consistent with the governmental regulations of China for the care and use of animals. All methods are reported in accordance with ARRIVE guidelines.

Intraperitoneal insulin tolerance tests and intraperitoneal glucose tolerance tests

Tail vein blood glucose levels in mice were measured using a blood glucose meter (LifeScan) following a fasting period of 12 h. For the intraperitoneal glucose tolerance test (IPGTT), mice received an intravenous injection of glucose (2 g/kg body weight) after the same fasting interval, and tail vein blood samples were taken at 0, 30, 60, 90, and 120 min post-injection. In a similar fashion, the intraperitoneal insulin tolerance test (IPITT) was conducted by administering an insulin injection (0.75 U/kg body weight) to the fasted mice, with subsequent blood samples collected from the tail vein at 0, 15, 30, 60, and 90 min following the injection.

Real-time quantitative PCR

Total RNA was purified from cells or tissues with the RNAiso Plus Reagent (Catalog#9109, TaKaRa), after which complementary DNA (cDNA) was synthesized using the PrimeScript™ RT Reagent Kit with gDNA Eraser (Catalog#RR047A, TaKaRa). Real-time quantitative PCR was performed on a ViiA 7 Real-Time PCR System (Applied Biosystems) using QuantiNova SYBR® Green (Catalog#208056, QIAGEN) and primers specific to the target genes. The sequences of the primers: Actin: forward - CGCCACCAGTTCGCCATGGA-, reverse - TACAGCCCGGGGAGCATCGT-. SLC2A2: forward - TCAGAGCGAGTCACAAACCT-, reverse - TCTGCCTGGCCAATCCTTTA-.

Statistics

The data were analyzed using the R programming language (version 4.1.1) and are expressed as mean ± standard deviation (SD). To assess the statistical significance between the two groups, an unpaired Student’s t-test or the nonparametric Mann-Whitney U test was employed. A p-value of less than 0.05 was considered indicative of statistical significance.

Results

Screening of DEGs in the T2D and ND islet samples

The flow of this study is shown in Fig. 1. After data normalization (Supplementary Figs. 1 and 2), differential expression analysis between T2D and ND islet samples in the GSE76895 dataset was performed utilizing the “limma” package. Of the 112 identified DEGs, 67 genes upregulated significantly and 45 genes downregulated significantly, and all DEGs were displayed in the heatmap (Fig. 2A, B & Supplementary Table 1).

Fig. 1
figure 1

Flow chart of the study.

Fig. 2
figure 2

Differential analysis and functional enrichment analysis between T2D and ND islet samples in GSE76895 datasets. (A) Heatmap plot of all DEGs. Gene expression trends in different tissues were represented by different colors. (B) Volcano plot of DEGs constructed using the fold change values and P-adjust; red dots represented up-regulated differential genes, gray dots represented nonsignificant genes, and blue dots represented down-regulated differential genes. (C) Bubble diagram of DEGs GO analysis. (D) Bubble diagram of DEGs KEGG16,17 analysis. Bubble color was determined by p-value and bubble sized by gene count.

Biological functions of DEGs

GO and KEGG analyses were carried out to find the biological functions of DEGs using the ‘ClusterProfiler’ package. GO analysis includes Biological Process (BP) analysis, Cell Component (CC) analysis, and Molecular Function (MF) analysis. In BP analysis, DEGs were primarily involved with regulation of membrane potential and reactive oxygen species metabolic process. In CC analysis, DEGs were primarily involved with membrane and clathrin-sculpted vesicle. In MF analysis, DEGs were primarily involved with monoatomic ion channel activity, gated channel activity and aldo-keto reductase (NADP) acitivity (Fig. 2C). Moreover, the result of KEGG analysis showed these DEGs were also mainly enriched in the hormone signaling, cAMP signaling pathway, calclum signaling pathway, and insulin resistance (Fig. 2D).

Moreover, GSEA results showed cytokine-cytokine receptor interaction, complement and coagulation cascades, and ECM-receptor interaction were up-regulated in T2D samples (Fig. 3A). RNA polymerase, ribosome biogenesis in eukaryotes, and oxidative phosphorylation were mainly down-regulated in T2D samples (Fig. 3B). The functional enrichment analyses in GSE76894 displays the similar results (Supplementary Figs. 3–4).

Fig. 3
figure 3

The results of GSEA analysis in GSE76895 datasets. (A) GSEA enrichment analysis results in T2D islet samples. (B) GSEA enrichment analysis results in ND islet samples.

Nine genes were identified as the biomarkers by LASSO and SVM-RFE

Two algorithms were used to screen biomarkers in the GSE76895 dataset. For Fig. 4A (LASSO analysis), the x-axis represents log(λ) values, where λ is the regularization parameter controlling feature sparsity. The y-axis shows standardized coefficient magnitudes. The left vertical dotted line, which indicating the optimal λ (minimum deviance criterion) was selected. Therefore, we got 21 candidate genes (Supplementary Table 2). For Fig. 4B (SVM-RFE analysis), the x-axis now displays the number of retained features and the y-axis presents the AUC performance metric. The curve shows maximal AUC (0.957) achieved with 19 features (Supplementary Tables 3, 4). Then, Veen plot indicated 9 overlapping meaningful features (Fig. 4C). Moreover, we analyzed the expression level of these 9 genes, which showed their significantly different expression levels between T2D and ND islet samples (Fig. 4D).

As a validation cohort, the GSE76894 dataset was used to assess the accuracy of the above analytic results and the expression levels for the 9 candidate biomarkers. The expression levels of ASCL2, ARG2, SLC2A2, OPRD1, and HS6ST2 were significantly different between T2D and ND islet samples (Fig. 5). Compared to ND islet samples, T2D islet samples exhibited significantly lower expression levels of ASCL2 (p value = 2.9e-06), ARG2 (p value = 1.3e-07), SLC2A2 (p value = 1.6e-06), OPRD1 (p value = 3.2e-06), and HS6ST2 (p value = 0.0094) (Fig. 5). The gene PCOLCE2 and PRIMA1 were not found in the validation cohort, so it was not shown.

Moreover, the GSE86468 dataset was also used to verify the above results. As shown in the Fig. 6, PCOLCE2 (p value = 0.034) and SLC2A2 (p value = 0.013) were significantly different between T2D and ND islet samples. Combining three datasets, only one gene SLC2A2 directly differs between T2D and ND. Therefore, SLC2A2 is considered a biomarker for T2D.

Fig. 4
figure 4

Screening candidate biomarkers from DEGs between the T2D and ND islet samples in GSE76895. (A) 21 genes were screened by the LASSO logistic regression algorithm. (B) 19 genes were screened by the SVM-RFE algorithm. (C) Venn diagram of 9 features including PCOLCE2, ASCL2, ARG2, PRIMA1, SLC2A2, OPRD1, SLC26A4, HS6ST2, and CYS1 intersected by LASSO and SVM-RFE algorithms. (D) The expression level of candidate markers PCOLCE2, ASCL2, ARG2, PRIMA1, SLC2A2, OPRD1, SLC26A4, HS6ST2, and CYS1 in GSE76895 datasets between the T2D and ND islet samples.

Fig. 5
figure 5

Validating candidate biomarkers in GSE76894. (AG) The expression level of candidate markers ASCL2, ARG2, SLC2A2, OPRD1, SLC26A4, HS6ST2, and CYS1 in GSE76894 datasets between the T2D and ND islet samples. ASCL2, ARG2, SLC2A2, OPRD1, and HS6ST2 were the significant genes among the seven genes.

Fig. 6
figure 6

Validating candidate biomarkers in GSE86468. (AI) The expression level of candidate markers PCOLCE2, ASCL2, ARG2, PRIMA1, SLC2A2, OPRD1, SLC26A4, HS6ST2, and CYS1 in GSE86468 datasets between the T2D and ND islet samples. PCOLCE2 (A) and SLC2A2 (E) were the significant genes among the nine genes.

Analysis of immune cell infiltration showed significant changes in immune cells between T2D and ND islet samples

Using the CIBERSORT algorithm, we first calculated the proportion of immune cell infiltration in the T2D and ND islet samples (Fig. 7A). Following that, we calculated the correlation between the 22 types of infiltrating immune cells and SLC2A2 (Fig. 7B). Mast cells activated was correlated positively with NK cells resting but negatively with mast cells resting and NK cells activated. However, mast cells resting was correlated positively with NK cells activated. Neutrophils were correlated negatively with monocytes. Moreover, we can notice that SLC2A2 had a positive correlation with resting CD4+ memory T cells.

Fig. 7
figure 7

Evaluation and visualization of immune cell infiltration and correlation with 22 types of immune cell infiltration between T2D and ND islet samples. (A) The proportion of infiltrating immune cells in the T2D and ND islet samples. (B) Correlation matrix of 22 immune cell infiltration between the T2D and ND islet samples. Blue and red represented positive and negative correlations, respectively. Darker color indicated stronger correlation.

Expression level of SLC2A2 in single-cell transcriptome data

The scRNA-seq data was performed normalization, scaling, clustering, highly variable genes screening, and so on (Fig. 8A-B). Based on these 2000 highly variable genes, we then demonstrated the 12 clusters. Based on the expression of featural markers (Supplementary Fig. 2C), we identified them into 10 cell types, including gamma cells, delta cells, stellate cells, beta cells, alpha cells, acinar cells, immune cells, endothelial cells, ductal cells, and other cells (Fig. 8C). Interestingly, of the 10 cell types, the SLC2A2 presented the highest expression in the beta cells, which is the most important cell type in the islet (Fig. 8D). In beta cells, the SLC2A2 was significantly down-regulated in T2D group (Fig. 8E).

Fig. 8
figure 8

The SLC2A2 expression analysis in the single-cell transcriptome data. (A, B) The vlnplot of single-cell transcriptome data before (A) and after (B) quality control. (C) The t-SNE plot of the 10 identified main cell types. (D) The bubble plot showed the expression of the HHATL and SLC2A2 related to different cell types. Bubble color was determined by average expression and bubble sized by percent expressed. (E) The expression level of SLC2A2 in beta cells between control group and T2D group.

Validation of key genes

SLC2A2 were identified as the key genes in this study, so an in vivo study was performed to validate their expression levels. After confirming the successful establishment of T2D (Fig. 9A–D), qPCR analysis was performed to test the expression level of SLC2A2. The result showed that SLC2A2 were down-regulated in the T2D group (Fig. 9E).

Fig. 9
figure 9

Validation of SLC2A2 in mouse T2D model. (A) Body weight of the ND and T2D group. (B) Fasting blood glucose levels of the ND and T2D group. (C) IPGTT of the ND and T2D group. (D) IPITT of the ND and T2D group. (E) The relative expression level of SLC2A2.

Discussion

Type 2 diabetes (T2D) is a chronic metabolic disorder characterized by insulin resistance and relative insulin deficiency, leading to hyperglycemia20. It represents the most prevalent form of diabetes, accounting for approximately 90–95% of all diagnosed cases globally21. The pathophysiology of T2D is complex and multifactorial, involving genetic predisposition, environmental factors, and lifestyle choices22. Key risk factors include obesity, physical inactivity, and advancing age, which contribute to the disruption of glucose homeostasis23. Over the past few decades, the rising prevalence of T2D has been linked to lifestyle changes, particularly the increase in sedentary behavior and unhealthy dietary patterns24. If left unmanaged, T2D can lead to serious complications, including cardiovascular diseases, neuropathy, nephropathy, and retinopathy, significantly impairing the quality of life, and increasing healthcare costs25.

In this study, one biomarker of T2D were identified. Firstly, we identified a gene named SLC2A2, which encodes glucose transporter 2 (GLUT2). GLUT2 is widely expressed in the liver, intestinal epithelial cells, and pancreatic β-cells, functioning as a key transporter mediating transmembrane glucose transport26,27. Dysregulated expression or impaired function of GLUT2 has been strongly linked to impaired fasting glucose and type 2 diabetes, contributing to an increased risk of developing diabetes28,29,30. Whole-genome bisulfite sequencing of human pancreatic islets identified 457 genes exhibiting differentially methylated regions (DMRs) in T2D islets, including NR4A3, PARK2, PID1, SLC2A2, and SOCS231. Notably, SLC2A2 has been further implicated as a potential therapeutic target for T2D through genome-wide association studies, suggesting its utility in tailored therapy design32. Mechanistically, SLC2A2 dysfunction contributes to impaired insulin secretion in pancreatic islets—a key pathological feature of T2D33. The aforementioned study provides multi-faceted evidence supporting SLC2A2 as a potential biomarker for T2D, while simultaneously validating the reliability of our research findings. Among the nine candidate genes identified in our study, several, including SLC2A2, have been previously linked to type 2 diabetes and glycemic traits in genome-wide association studies (GWAS)34. In contrast, others such as PRIMA1 and CYS1 have not shown prior associations with T2D, highlighting their potential as novel biomarkers. These findings demonstrate the strength of our integrated approach in uncovering both established and underexplored genetic contributors to T2D.

Our findings suggest that resting CD4+ memory T cells may contribute significantly to the pathogenesis of T2D, showing a positive correlation with SLC2A2 expression. These cells, which are essential for adaptive immunity, maintain long-term immunological memory through a quiescent state (characterized by low proliferation and metabolic activity) while retaining rapid responsiveness to antigen re-exposure. Notably, dysregulation of CD4+ memory T cell homeostasis has been implicated in various immune-related disorders, including autoimmune diseases, chronic infections, and impaired vaccine responses35. Intriguingly, recent studies have revealed distinct alterations in immune cell populations in T2D patients, with elevated levels of eosinophils, naïve CD4+ T cells, and regulatory T cells (Tregs), but reduced frequencies of resting CD4+ memory T cells and monocytes in pancreatic islets36. Furthermore, accumulating evidence highlights the critical regulatory role of immune-related genes and immune cell subsets (e.g., resting CD4+ memory T cells) in the onset and progression of DM, potentially offering novel insights for immune-targeted therapeutic strategies37,38.

In this study, a significant decrease in SLC2A2 expression was observed in beta cells of the T2D model, resulting in beta cell dysfunction. This is different from previous literature that transient downregulation of SLC2A2 has a potential beta cell protective effect. However, persistent downregulation of SLC2A2 expression may lead to a decrease in glucose sensing ability of beta cells, thereby exacerbating insulin secretion defects and leading to beta cell dedifferentiation39. Therefore, the protective effect of SLC2A2 transient downregulation on β cells may be a compensatory mechanism, and the pathological significance of SLC2A2 and β cells is different. Further studies are needed to explore the effect of SLC2A2 dynamic changes on T2D.

Recent studies exploring the pathogenesis of diabetes have revealed a potential link between the several signaling pathways and type 2 diabetes. Hedgehog signaling pathway, activated by three ligands—Sonic hedgehog, Desert hedgehog, and Indian hedgehog—plays a critical role in various physiological processes, including embryonic development, tissue regeneration, and adult tissue homeostasis40. Historically recognized as a major oncogenic pathway associated with several human malignancies, it has primarily been studied in the context of cancer research41. Emerging evidence suggests that the Hedgehog signaling pathway can be activated under hyperglycemic conditions, functioning as a regulatory axis for lipid metabolism and insulin signaling42. Yao et al. demonstrated its involvement in insulin resistance by inhibiting PPAR-γ, highlighting the therapeutic potential of targeting the Hedgehog pathway for treating metabolic disorders characterized by insulin resistance43. Moreover, the Hedgehog signaling pathway has been shown to induce metabolic reprogramming toward a Warburg-like glycolytic state via the cilia-dependent Smo-Ca2+-Ampk axis. Activation of this axis promotes insulin-independent glucose uptake in muscle and brown adipose tissue, suggesting that rationally designed Hedgehog pathway modulators could offer innovative therapeutic approaches for obesity and diabetes. Additionally, multiple studies have linked this pathway to the development of diabetic complications, including diabetic nephropathy, retinopathy, neuropathy, and cardiovascular diseases40.Overall, this study bridges gaps in current T2D biomarker research by linking β-cell specificity, immune correlation, and multi-cohort validation into a unified framework.

Several limitations should be acknowledged. First, while preliminary in vivo experiments were performed, a deeper exploration of the underlying mechanisms is still needed. Second, extending this research to pre-diabetic populations would strengthen the clinical applicability of the findings.

Conclusion

SLC2A2 was identified as key gene of T2D through differential analysis, LASSO and SVM-RFE. Notably, the enrichment of key biological processes such as cAMP signaling pathway, insulin resistance, cytokine-cytokine receptor interaction, ECM-receptor interaction, and oxidative phosphorylation underscores the multifactorial nature of T2D, where diverse molecular pathways converge to drive disease progression. Moreover, SLC2A2 had a positive correlation with resting CD4+ memory T cells. Further analysis uncovered that SLC2A2 expressed in beta cells. Finally, in vivo studies confirmed the expressed level of SLC2A2. This study provided a novel vision for the diagnosis and pharmaceutical development of T2D.