Introduction

Type 2 diabetes (T2D) is a multifaceted metabolic disorder marked by impaired insulin secretion or insulin resistance, leading to chronic hyperglycemia and disruptions in carbohydrate, lipid, and protein metabolism1. Sarcopenia (SA) is defined by the progressive decline in skeletal muscle mass, strength, and function2. The condition is more prevalent among individuals with chronic diseases3, with SA affecting 7% to 29.3% of older adults with T2D4,5, a rate two to three times higher than that observed in elderly individuals with normal blood glucose levels6,7. Additionally, 54.7% of T2D patients aged over 70 are at risk for SA8. Common manifestations in T2D patients with SA include muscle wasting, delayed nerve conduction, diminished maximal contraction strength in skeletal muscles, and a reduced contraction velocity. These alterations lead to physical limitations, such as restricted limb mobility, decreased physical agility, impaired balance, lowered metabolic rate, and an increased susceptibility to falls and fractures.

T2D and SA share multiple overlapping pathophysiological mechanisms, with metabolic dysregulation serving as a common underlying factor. Insulin resistance promotes muscle catabolism by diminishing insulin signaling, which aggravates muscle loss and strength decline9. Blood glucose fluctuations independently contribute to SA, driving reductions in muscle mass, grip strength, and walking speed9. Blood glucose fluctuations are independently associated with SA, which facilitates the reductions in muscle mass, grip strength, and walking speeds9. In T2D patients, factors such as poor glycemic control, prolonged disease duration, and the onset of additional systemic complications predict the onset of SA. Chronic inflammation and oxidative stress are integral to the pathogenesis of both conditions. Sustained hyperglycemia in T2D patients perpetuates a state of chronic inflammation, marked by increased secretion of inflammatory mediators, which heightens the risk of progressive muscle mass loss in the limbs10. Thus, the inflammatory microenvironment plays a significant role in the development of T2D-associated SA11. Additionally, molecules such as insulin, insulin-like growth factor, advanced glycation end-products (AGEs), and myostatin modulate muscle metabolism and function in both T2D and SA.

Existing research has significantly advanced our understanding of the distinct pathological mechanisms of T2D and SA. However, few studies address the cross-disease genes and their functional roles in both conditions. Current investigations primarily focus on gene expression within individual diseases, lacking a comprehensive approach to identify and characterize the shared molecular underpinnings of T2D and SA. This study aims to map the genetic overlap between T2D and SA using weighted gene co-expression network analysis (WGCNA) and protein–protein interaction (PPI) network analysis. Transcriptomic data from various publicly available databases were integrated to identify shared differentially expressed genes (DEGs) between T2D and SA. Network analysis was employed to identify key gene modules and central genes linked to both diseases. These findings highlight common molecular mechanisms between T2D and SA, offering potential targets for future research and clinical applications. This study enhances our understanding of the pathological processes underlying T2D and SA and suggests new avenues for therapeutic development.

Materials and methods

Expression profiling data screening

The following T2D- and SA-associated gene expression datasets were retrieved from the NCBI GEO database (http://www.ncbi.nlm.nih.gov/geo/) on July 6, 2024. Upon downloading the sequencing data, the datasets were normalized to remove genes with zero expression values. T2D-related datasets include: (1) GSE1873212, which contains 118 human muscle tissue samples, of which 92 were selected for analysis, consisting of 45 T2D patient samples and 47 control samples. The platform used for analysis was the Affymetrix Gene Chip Human Genome U133 Plus 2.0 Array. This dataset primarily addresses T2D disease; (2) GSE20229513,14: Out of 111 human muscle tissue samples, 38 baseline pre-prescription samples were chosen for analysis, including 21 T2D and 17 control samples. The Illumina HiSeq2500 (Homo sapiens) platform was employed for detection. This dataset serves as a validation set for T2D-related studies. SA-related datasets include: (1) GSE22615115: This dataset comprises 60 muscle tissue samples, with 40 selected for analysis, including 20 normal control samples and 20 sarcopenia patient samples. The Illumina HiSeq2500 (Homo sapiens) platform was utilized for detection. This dataset primarily addresses SA disease; (2) GSE142816: This dataset consists of 22 muscle tissue samples, including 10 control samples and 12 samples from myasthenia gravis patients. The Affymetrix Human Genome U133A Array was used for detection. This dataset serves as a validation set for SA-related studies.

Screening of significantly differentially expressed genes

DEGs between groups in the T2D and SA datasets were identified using the limma package (Version 3.34.7) in R (version 4.3.1)17 (https://bioconductor.org/packages/release/bioc/html/limma.html). A false discovery rate (FDR) of < 0.05 and fold change > 1.2 were used as criteria for filtering significant DEGs. Genes that met these criteria were considered significantly differentially expressed in both diseases. The expression profiles of the overlapping DEGs across disease groups were visualized through two-way hierarchical clustering, and a heatmap was generated using the pheatmap package (https://cran.r-project.org/package=pheatmap)18 in R version 4.3.119, providing a comprehensive overview of shared gene expression patterns.

WGCNA algorithm screens modules for significant correlation of disease states

WGCNA was employed to identify modules significantly associated with disease states. This approach constructs co-expression networks to reveal disease-related modules20 and potential therapeutic targets (https://cran.r-project.org/web/packages/WGCNA/index.html)21. Expression profiles from the combined T2D- and SA-related datasets were analyzed using the WGCNA package (version 1.61) in R (version 4.3.1). Modules were defined with a minimum size threshold of 100 genes, and a cut height threshold of 0.995 was applied for merging. Comparisons of the identified modules between T2D and SA revealed overlapping DEGs, which were integrated with previously identified DEGs to obtain a list of candidate genes common to both diseases. Functional annotation of these candidate genes was conducted through Gene Ontology (GO) and KEGG pathway enrichment analyses using DAVID (version 6.8) (https://david.ncifcrf.gov/)22,23, with a significance threshold of p < 0.05.

Construction of candidate gene PPI networks and identification of Hub genes

To examine PPIs among candidate genes, the STRING database (Version 11.0, http://string-db.org/)24 was utilized. The interaction network was visualized using Cytoscape (Version 3.9.0, http://www.cytoscape.org/)25. Hub genes within the network were determined using four topological analysis algorithms—MCC, MNC, Degree, and EPC—via the cytoHubba plugin (Version 0.1) in Cytoscape26. Genes consistently identified as significant across all four algorithms were retained as the final hub genes linked to both T2D and SA, serving as key molecular targets common to these conditions.

Construction of a disease diagnostic model based on optimal gene combinations

The expression levels of hub genes in the T2D-related dataset GSE18732 and SA-related dataset GSE226151 were analyzed using one-way logistic regression with the rms package (version 6.3–0, https://cran.r-project.org/web/packages/rms/index.html)27 in the R version 4.3.1. Genes yielding p-values < 0.05 in both datasets were retained for further investigation. Subsequently, the most significant gene combinations for both diseases were identified using the LASSO (Least Absolute Shrinkage and Selection Operator) algorithm in the lars package (version 1.2, https://cran.r-project.org/web/packages/lars/index.html)28 within R version 4.3.1. Based on these combinations, separate diagnostic classifiers for T2D and SA were developed using the Support Vector Machine (SVM) method with a Sigmoid kernel and tenfold cross-validation, utilizing the e1071 package (version 1.6–8, https://cran.r-project.org/web/packages/e1071)29. Model performance was assessed via receiver operating characteristic (ROC) curve analysis, performed with the pROC package (version 1.12.1, https://cran.r-project.org/web/packages/pROC/index.html)30 in R version 4.3.1. The optimal gene combinations for T2D and SA were extracted for diagnostic model construction and subsequently validated using the independent datasets GSE202295 for T2D and GSE1428 for SA. Gene expression levels and diagnostic efficacy were evaluated within these validation datasets to confirm model reliability and applicability.

Genetic correlation analysis of dual-disease associations

The optimal gene combinations for constructing disease classification models were analyzed to identify overlapping genes between T2D and SA. The relevance of these shared feature genes was then assessed for their discriminative capacity across both conditions. miRNAs associated with T2D and SA were extracted from the HMDDv4 database (http://www.cuilab.cn/hmdd)31. miRNAs common to both diseases were identified, and their associated target genes were predicted using the miRWalk3.0 database32.

Regulatory transcription factors (TFs) and their corresponding target gene relationships were retrieved from the TRRUST database (https://www.grnpedia.org/trrust/)33. Furthermore, disease-related factors influencing both T2D and SA were sourced from the 2024 update of the Comparative Toxicogenomics Database (http://ctd.mdibl.org/)34, which provided associations between genes and pharmacological molecules. The shared factors between both diseases and their corresponding genes were identified. By integrating miRNAs, TFs, and drug associations, a regulatory network for the key feature genes associated with both diseases was constructed and visualized using Cytoscape (Version 3.9.0).

Immune correlation analysis

The cellular immune microenvironment was assessed to determine immune cell proportions in diseased tissues. Immune infiltration analysis, conducted using CIBERSORT, quantified immune cell type distributions in samples from the T2D-associated GSE18732 and SA-associated GSE226151 datasets. Differences in immune cell distribution between the two diseases were evaluated using the Wilcoxon test in R (Version 3.6.1). Correlations between feature genes linked to each disease and differentially distributed immune cell types were also calculated.

Animals

Ten-week-old male homozygous diabetic db/db mice and nondiabetic littermate db/m mice were procured from Hangzhou Ziyuan Experimental Animal Technology Co., LTD. The db/db mice model was employed to study comorbid T2D and SA. Mice were maintained in a controlled environment with a 12‐h light/dark cycle and acclimatized for seven days prior to experimentation. Following acclimation, the mice were divided into two groups: db/m (n = 6) and db/db (n = 6). All animal experiments were approved by the Animal Experimental and Ethics Committee of the Second Affiliated Hospital of Anhui Medical University (NO: LLSC20220413) and performed in compliance with the guidelines of the Committee of Animal Care and Use of the Second Affiliated Hospital of Anhui Medical University and ARRIVE guidelines. Upon completion of the study, blood samples were collected, and muscle tissues were harvested for subsequent analysis. Muscle tissues were either fixed in 4% phosphate-buffered formaldehyde for histological evaluation or snap-frozen for molecular analysis. Mice were euthanized at the study’s conclusion using isoflurane.

Histologic evaluation

Muscle tissues were fixed in 4% paraformaldehyde, embedded in paraffin, sectioned into 5 μm slices, and stained with hematoxylin and eosin (H&E) for histological analysis. Muscle injury was assessed by a blinded pathologist using a previously established protocol35. Cross-sectional images were captured with an Olympus BX41 microscope, and muscle fiber cross-sectional areas were measured using Image-Pro Plus 6.0 software.

Quantitative real-time PCR

Total RNA was extracted from muscle tissues using Trizol reagent (Thermo Fisher Scientific, Waltham, USA). Reverse transcription was performed with the HyperScript™ III 1st Strand cDNA Synthesis Kit (NovaBio, Shanghai, China). Real-time PCR primers used were as follows: Bdh1: sense 5′‐AGGCTGTGACTCTGGATTTGGG‐3′, antisense 5′‐CTGGATGGTTCTCAGTCGGTCA‐3′; Fgf9: sense 5′‐ATAAACATGTGGACACCGGAAG‐3′, antisense 5′‐ATAGTTCAGGTACTTTGTCAGGGTC‐3′; Ldha: sense 5′‐AGTTGTTGGGGTTGGTGCTG‐3′, antisense 5′‐TTGGTGTTTTAAGGAAGAGGCTG‐3′; Gapdh: sense 5′‐AGTGGCAAAGTGGAGATT‐3′, antisense 5′‐GTGGAGTCATACTGGAACA‐3′. Real-time PCR was conducted using the S6 Universal SYBR qPCR Mix (NovaBio, Shanghai, China), and gene expression changes were quantified using the comparative CT (2−ΔΔCt) method, with Gapdh as the normalization control.

Western blotting

Total protein was extracted from homogenized soleus muscle samples using RIPA lysis buffer containing protease inhibitors. Protein concentrations were determined by BCA assay prior to analysis. Protein samples were separated on 10% SDS polyacrylamide gels and transferred to PVDF membranes (Millipore, Billerica, MA, USA). The membranes were blocked with 5% bovine serum albumin for 1 h at room temperature and incubated overnight at 4 °C with primary antibodies against BDH1 (Proteintech, cat number: 15417-1-AP, 1:1000), FGF9 (Affinity Biosciences, cat number: DF9532, 1:1000), and LDHA (Affinity Biosciences, cat number: DF6280, 1:1000), or GAPDH (Affinity Biosciences, cat number: AF7021, 1:5000). Following washes, the blots were incubated for 1 h at room temperature with an HRP-conjugated secondary antibody and developed using enhanced chemiluminescence reagents (Millipore).

Statistical analysis

Data are expressed as mean ± SEM. Comparisons between two groups were made using an unpaired t-test, while one-way ANOVA followed by Tukey’s test was used for multiple-group comparisons. Statistical significance was set at P < 0.05. Graphs and figures were generated with GraphPad Prism 8.0.

Results

Identification of intersecting DEGs between T2D and SA

Gene expression data were obtained from the T2D-associated GSE18732 and SA-associated GSE226151 datasets. Differentially expressed genes (DEGs) were identified using the limma package in the T2D (45 samples vs. 47 controls) and SA (20 samples vs. 20 controls) datasets, resulting in 330 and 1,054 DEGs, respectively. Volcano plots illustrating the DEG distributions were presented in Fig. 1A. Analysis of the two DEG datasets revealed 50 common genes (Fig. 1B). The expression data for these 50 genes were then extracted from both the GSE18732 and GSE226151 datasets. Expression patterns across different sample types were visualized in a heatmap in Fig. 1C.

Fig. 1
figure 1

Identification and comparison of DEGs in T2D- and SA-related datasets. (A) Volcano plots display the distribution of DEGs based on effect size (log2FC) and statistical significance (− log10FDR) for T2D vs. CTRL and SA vs. CTRL comparisons. Red and blue markers indicate significantly upregulated and downregulated DEGs, respectively. The horizontal dashed line represents the FDR threshold of 0.05, while vertical dashed lines define the ± 1.2-fold change cutoff. (B) Venn diagram depicting the overlap of DEGs between T2D vs. CTRL and SA vs. CTRL groups. (C) Heatmap showing expression patterns of shared DEGs across T2D- and SA-related datasets. T2D, type 2 diabetes; SA, sarcopenia; CTRL, control; DEG, differentially expressed gene; FDR, false discovery rate; FC, fold change.

Identification of disease-associated modules using WGCNA

Subsequent analysis of the T2D-related dataset GSE18732 and the SA-related dataset GSE226151 was conducted to explore the expression patterns of all identified genes. As illustrated in Fig. 2A, B, E, and F, the power parameter values were set to 4 and 8, corresponding to the point at which the square of the correlation coefficient first reached 0.9. At these values, the resulting co-expression networks exhibited a mean node connectivity of 1, reflecting the small-world network property. Dissimilarity coefficients between nodes were calculated, followed by hierarchical clustering to produce dendrograms. The minimum gene count per module was established at 100, with a pruning height of cutHeight = 0.995. As shown in Fig. 2C and G, this analysis identified 8 modules in the T2D-related dataset and 10 modules in the SA-related dataset. The correlation between each identified module and the sample phenotype (disease/control) was then computed (Fig. 2D and H). Modules with an absolute correlation coefficient greater than 0.3 with the disease phenotype were retained as relevant. In the T2D-related dataset, the blue and turquoise modules were identified, containing a total of 2,203 genes. In the SA-related dataset, the black, brown, and turquoise modules were retained, comprising 2,020 genes in total.

Fig. 2
figure 2

Selection of network parameters, module identification, and functional enrichment analysis in T2D- and SA-related datasets. (A) Scale-free topology model fit for T2D samples, depicting the relationship between the soft-thresholding power (x-axis) and the squared correlation coefficient of log(k) vs. log(p(k)) (y-axis). The red line marks the threshold at which the correlation coefficient square reaches 0.9. (B) Mean gene connectivity at various power settings in T2D samples, with the red line indicating the point where average node connectivity equals 1, corresponding to the selected power in panel (A). (C) Gene dendrogram for T2D samples, with modules identified by distinct colors based on hierarchical clustering. (D) Module-trait heatmap in T2D, quantifying correlations between each module and the phenotype (disease/control). (E) Scale-free topology model fit for SA samples, showing the squared correlation coefficient of log(k) vs. log(p(k)) across increasing power values. The red line indicates the 0.9 threshold used for power selection. (F) Mean gene connectivity across power settings in SA, with the red line denoting the point where average node connectivity equals 1, as determined in panel (E). (G) Gene dendrogram for SA samples, with modules labeled by color to represent distinct co-expression clusters. (H) Heatmap of module-trait correlations in SA, indicating the degree of association between individual modules and disease phenotype. (I) Intersection plot of genes within phenotype-associated modules and previously identified DEGs across T2D and SA, highlighting shared candidates. (J) Functional enrichment results for 30 core candidate genes, illustrating significantly enriched GO biological processes and KEGG pathways. T2D, type 2 diabetes; SA, sarcopenia.

The genes within the two modules identified in the T2D dataset and the three modules identified in the SA dataset were subsequently compared with the DEGs significantly expressed in both diseases. This comparison revealed four overlapping sections: T2D–turquoise, SA–turquoise, T2D–blue, SA–black, and SA–brown modules, involving a total of 30 genes (Fig. 2I). These 30 key candidate genes associated with both diseases were further analyzed for GO functional and KEGG pathway enrichment using DAVID36,37,38. The enrichment analysis identified 15 significantly enriched GO biological processes and eight KEGG pathways (Fig. 2J). The analysis of the T2D- and SA-related datasets led to the identification of disease-associated gene modules and 30 key candidate DEGs in both diseases. These genes exhibited notable functional enrichment in GO biological processes and KEGG pathways, implicating their potential roles in the progression of T2D and SA.

Construction of PPI network and identification of hub genes

  1. (1)

    Construction of the PPI network

PPIs among the 30 candidate genes identified in Step 2 were extracted from the STRING database, retaining only those interactions with a confidence score above 0.4. This threshold resulted in a network comprising 38 interaction pairs, which was subsequently used to construct the PPI network (Fig. 3A).

Fig. 3
figure 3

Visualization of the PPI network of candidate genes and identification of hub genes. (A) Interaction network illustrating the relationships among 30 candidate genes. Each node represents a protein labeled with its corresponding gene name. Red nodes indicate proteins upregulated in both T2D and SA, whereas blue nodes represent those consistently downregulated across the two conditions. (B) Venn diagram comparing the top 50 hub genes identified by four distinct topological algorithms: MCC, MNC, DEGREE, and EPC. The overlapping regions highlight genes concurrently selected by multiple algorithms, each represented by a specific color. T2D, type 2 diabetes; SA, sarcopenia; MCC, maximal clique centrality; MNC, maximum neighborhood component; EPC, edge percolated component.

  1. (2)

    Identification of hub genes using topological analysis of the PPI network

Hub genes were determined from the PPI network using the cytoHubba plugin in Cytoscape 3.9.0, based on four topological analysis algorithms: MCC, MNC, DEGREE, and EPC. The top 15 genes from each algorithm, ranked by score, were selected as candidate hub genes. A comparison of these top 15 genes across all four algorithms (Fig. 3B) revealed 14 intersecting genes: LDHA, EHHADH, ALDH6A1, PTPRC, BDH1, CKB, FGF9, TSLP, MLXIPL, CHST15, CDKN2C, ARL4C, HYAL1, and C5AR1. These 14 genes were identified as hub genes associated with both T2D and SA (Supplementary Table 1). They represent key candidates for investigating shared pathogenic mechanisms in both diseases.

Construction of risk assessment models for T2D and SA

  1. (1)

    Selection of hub genes

Univariate logistic regression was performed on the expression profiles of the 14 candidate hub genes identified in Step 3 using the T2D-related GSE18732 and SA-related GSE226151 datasets. In the T2D dataset, 13 genes and in the SA dataset, 10 genes were identified with p-values less than 0.05 (Fig. 4A, C, Supplementary Table 2). Subsequently, LASSO algorithm analysis was applied to each dataset to refine the gene sets for each disease, with the parameter selection depicted in Fig. 4B, D. This analysis identified seven hub genes for T2D (CDKN2C, BDH1, FGF9, C5AR1, LDHA, TSLP, and ARL4C) and six hub genes for SA (BDH1, FGF9, EHHADH, CKB, LDHA, and PTPRC) (Supplementary Table 3).

Fig. 4
figure 4

Development and validation of risk assessment models for T2D and SA. (A) Forest plot depicting the outcomes of univariate logistic regression in T2D samples. The vertical line at x = 1 denotes the null hypothesis (hazard ratio = 1). Each horizontal line segment represents the 95% confidence interval for an individual effect estimate, while square markers reflect study weights, with larger markers indicating greater statistical influence. (B) LASSO regression plot illustrating parameter tuning for optimal gene selection in T2D. The selected penalty parameter corresponds to log(λ) = –2.71. The coefficient path plot tracks regularization paths for all predictors, with each colored line representing an individual variable, annotated by its identifier. The y-axis shows coefficient magnitudes, while the lower x-axis presents log(λ) values and the upper x-axis displays the count of non-zero coefficients retained at each λ. The vertical dashed line indicates the optimal λ yielding the minimum cross-validated MSE. (C) Forest plot presenting univariate logistic regression results for SA samples. (D) LASSO plot demonstrating parameter optimization for gene set selection in SA, with the optimal log(λ) value identified as –3.12. E. ROC curve and gene expression profile of the seven hub genes in the T2D training dataset. The x-axis indicates the false positive rate (1-specificity), and the y-axis represents the true positive rate (sensitivity). (F) ROC curve and gene expression profile of the six hub genes in the SA training dataset. (G) ROC curve and gene expression profile of the seven hub genes in the T2D validation dataset. (H) ROC curve and gene expression profile of the six hub genes in the SA validation dataset. *p < 0.05; **p < 0.01; ***p < 0.005. T2D, type 2 diabetes; SA, sarcopenia.

  1. (2)

    Construction and validation of risk assessment models for T2D and SA

Expression levels of the seven hub genes for T2D and six hub genes for SA were first visualized (Fig. 4E,F). Individual risk classification models were then constructed based on gene sets from the T2D-related GSE18732 dataset and the SA-related GSE226151 dataset. Model performance was evaluated using ROC curve analysis (Fig. 4E,F). As shown in Fig. 4E, the T2D risk model achieved high predictive accuracy on the GSE18732 dataset (AUC = 0.944). Significant differences in the expression of the seven selected genes between T2D patients and controls (P < 0.005) indicated a strong association of these genes with T2D pathogenesis. Similarly, the SA model on the GSE226151 dataset demonstrated high predictive accuracy (AUC = 0.940; Fig. 4F), reflecting robust performance in distinguishing SA patients from controls. Expression differences in the six selected genes between SA patients and controls (P < 0.005) suggested their potential involvement in SA pathogenesis.

The hub genes were further validated in the T2D-related GSE1428 and SA-related GSE202295 datasets. Expression levels and classification performance were consistent across these validation datasets (Fig. 4G,H). The expression patterns of the seven T2D hub genes in GSE1428 mirrored those in the training dataset GSE18732. Similarly, the six SA hub genes displayed expression patterns in GSE202295 that were consistent with those observed in GSE226151 (Supplementary Table 4).

Univariate logistic regression and LASSO-selected hub genes showed consistent expression patterns across both training and validation datasets. The resulting risk assessment models for T2D and SA demonstrated excellent performance, confirming their reliability as diagnostic biomarkers.

Identification and analysis of cross-disease hub genes

  1. (1)

    Selection of hub genes related to both T2D and SA

A comparison of the seven T2D-related hub genes with the six SA-related hub genes identified three common genes: BDH1, FGF9, and LDHA. The expression profiles of these genes were consistent across both diseases: BDH1 and FGF9 were significantly downregulated in diseased tissues, while LDHA was significantly upregulated. The ability of these genes to distinguish each disease from control was assessed for both conditions (Fig. 5A,B). Thus, BDH1, FGF9, and LDHA were established as key hub genes associated with both T2D and SA.

Fig. 5
figure 5

Regulatory network of cross-disease genes in T2D and SA. (A) ROC curves evaluating the diagnostic performance of the three overlapping genes in the T2D dataset. (B) ROC curves evaluating the diagnostic performance of the three overlapping genes in the SA dataset. (C) Integrated regulatory network constructed from the three shared hub genes. Green squares denote miRNAs, orange diamonds indicate transcriptional regulators, and yellow triangles represent disease-associated factors. T2D, type 2 diabetes; SA, sarcopenia.

  1. (2)

    Network construction of cross-disease hub genes in T2D and SA

Disease-associated miRNAs were retrieved from the HMDD v4 database, followed by extraction of those specifically linked to T2D and SA, resulting in 201 and 32 miRNAs, respectively. The intersection yielded 23 miRNAs implicated in both conditions. Predicted target genes of these shared miRNAs were obtained using the miRWalk 3.0 database, retaining only those regulating the three hub genes shared between T2D and SA. This filtering produced a final miRNA–mRNA interaction network comprising 28 pairs.

Transcription factor–gene interaction data were sourced from the TRRUST database. Interactions involving the three shared hub genes were selected, identifying 12 transcription factors with co-regulatory activity across all three genes.

Disease-specific associations for the three hub genes were initially queried in the CTD database, with results presented in Table 1. Further searches using “Type 2 diabetes” and “Sarcopenia” as keywords identified overlapping disease-related agents, including drugs and environmental exposures. Comparative analysis revealed six shared factors—bisphenol A, perfluorooctane sulfonic acid, atrazine, folic acid, zinc, and particulate matter—associated with both diseases. Gene–chemical interaction data were then extracted, and 14 relevant pairs involving these shared factors and the three hub genes were selected for integrative analysis.

Table 1 CTD data on the three hub genes associated with both T2D and SA.

Finally, an integrated regulatory network was constructed to delineate the interactions between the three cross-disease hub genes (BDH1, FGF9, and LDHA) and their upstream regulators, including miRNAs, transcription factors, and associated compounds. This network (Fig. 5C) provides additional insight into the multilayered regulatory architecture involving BDH1, FGF9, and LDHA in the context of T2D and SA pathogenesis.

Assessment of cell proportion and gene expression correlations in T2D and SA

Immune cell composition in the T2D-associated GSE18732 dataset and the SA-related GSE226151 dataset was quantified using CIBERSORT. Comparative analysis of immune cell distributions between disease and control groups within each dataset revealed distinct alterations (Fig. 6A,B). In T2D, significant differences were observed in resting CD4+ memory T cells and regulatory T cells (Tregs), while in SA, activated mast cells and neutrophils exhibited pronounced shifts. Subsequently, the expression levels of three hub genes shared across both conditions were analyzed in relation to immune cell types exhibiting differential abundance (Fig. 6C). BDH1 expression demonstrated significant associations with Tregs (Fig. 6D), mast cells (Fig. 6E), and neutrophils (Fig. 6F). LDHA expression also correlated with mast cell proportions (Fig. 6G).

Fig. 6
figure 6

Immune correlation analysis of cross-disease hub genes in T2D and SA. (A) Immune cell composition within the T2D dataset. (B) Immune cell composition within the SA dataset. (C) Correlation analysis between each of the three shared hub genes and differentially abundant immune cell populations in both datasets. (DF) Scatter plots depicting the association between BDH1 expression and Treg cells (D), mast cells (E), and neutrophils (F). (G) Scatter plot showing the association between LDHA expression and mast cells. *p < 0.05; **p < 0.01; ***p < 0.005. T2D, type 2 diabetes; SA, sarcopenia.

Core biomarkers validation in diabetes‐induced muscle atrophy of db/db mice

To evaluate glycemic alterations, tail vein blood glucose levels were measured in db/db mice, revealing a marked elevation compared to db/m controls (Fig. 7A). Histological analysis of soleus muscle following fixation, sectioning, and HE staining showed a significant reduction in cross-sectional area in db/db mice relative to age-matched db/m mice (Fig. 7B and C). In addition, mRNA and protein expression levels of BDH1, FGF9, and LDHA in the soleus muscle differed significantly between db/db and db/m groups (Fig. 7D and E). These results indicate that BDH1, FGF9, and LDHA may serve as key biomarkers implicated in the pathophysiology of both T2D and SA.

Fig. 7
figure 7

Validation of core biomarkers in diabetes-induced muscle atrophy using db/db mice. (A) Blood glucose levels measured in each group (n = 6). (B) Representative hematoxylin–eosin (H&E) stained images of muscle tissues (scale bars: 100 μm and 50 μm) (n = 6). (C) Quantification of the mean cross-sectional area of soleus muscles. (D) Relative mRNA expression levels of Bdh1, Fgf9, and Ldha in soleus muscles. (E) Corresponding protein expression levels of BDH1, FGF9, and LDHA in soleus muscles. *p < 0.05; **p < 0.01; ***P < 0.001, *comparison between the db/db group and control db/m group.

Discussion

T2D and SA exhibit mutual risk amplification, yet the molecular basis underlying their interaction remains unclear. In this study, 30 candidate genes associated with both conditions were identified, displaying significant enrichment in GO biological processes and KEGG pathways. Among them, 14 were classified as hub genes due to their strong associations with both T2D and SA. Further analysis identified three cross-disease hub genes—BDH1, FGF9, and LDHA—through comparison with disease-specific hubs. The diagnostic genes and molecular mechanisms uncovered in this analysis provide a preliminary framework for advancing research into diagnostic biomarkers and therapeutic strategies targeting T2D and SA. Additionally, experimental validation using a T2D with SA animal model confirmed consistent expression patterns of these genes with the transcriptomic analysis.

Notably, 3-hydroxybutyrate dehydrogenase 1 (BDH1), a mitochondrial enzyme localized to the inner mitochondrial membrane, plays a central role in sustaining energy metabolism and homeostatic balance. Evidence indicates that BDH1 overexpression or β-hydroxybutyrate (βOHB) supplementation mitigates cellular injury. Activation of the BDH1-dependent βOHB metabolic cascade enhances flux through the βOHB–acetoacetate (AcAc)–succinate–fumarate pathway and concurrently activates NRF2, thereby enhancing the cellular antioxidant response. In diabetic kidney disease (DKD) models, elevated renal BDH1 expression reversed pathological features such as fibrosis, inflammation, and apoptosis, supporting its therapeutic relevance in DKD management39. In diabetic cardiomyopathy (DCM), BDH1 activity is significantly downregulated; however, treatment with the SGLT2 inhibitor empagliflozin substantially restored BDH1 function, improving ketone body metabolism and offering a potential intervention pathway for DCM40. The primary role of BDH1 involves catalyzing the reversible interconversion between βOHB and AcAc, a key reaction within the ketone body metabolic cycle. Under conditions of increased energy demand or glucose deprivation, such as during starvation, BDH1 expression is significantly elevated to promote βOHB oxidation, thereby providing an alternative energy source for muscle cells41. This shift reduces dependence on limited glycogen reserves, conserving them for sustained or high-intensity physical exertion. Additionally, BDH1 contributes to maintaining ketone homeostasis, ensuring a continuous energy supply to muscle tissue across diverse metabolic states. By supporting immediate energetic demands, BDH1 helps preserve muscle mass and functional capacity42, suggesting its therapeutic potential in mitigating muscle atrophy and enhancing muscle regeneration. Evidence also indicates that ketogenic diets may support improvements in muscle mass and strength43. In SA, disruptions in energy metabolism frequently impair protein synthesis and accelerate proteolysis, enhancing muscle wasting. Modulating BDH1 activity or supplementing exogenous βOHB may reestablish ketone balance, attenuate metabolic stress, and protect muscle structure and function.

Mitochondria play a central role in sustaining cellular energy homeostasis, and their impairment is closely associated with the development of T2D and SA. In SA models, dysregulation of BDH1 activity may disrupt ketone metabolism and mitochondrial energy output, thereby affecting muscle integrity and regenerative capacity. Cui et al. identified BDH1 as a potential mediator linking mitochondrial dysfunction to SA progression through integrated bioinformatics and experimental analyses44. Similarly, BDH1 has been implicated in T2D-related mitochondrial impairment, as demonstrated in the study by Kritika Sharma45. Mitochondrial functionality remains essential for preserving cellular stability, particularly in muscle tissue. Consequently, aberrant BDH1 expression or function may underlie the concurrent pathophysiology of T2D and SA.

Fibroblast growth factor 9 (FGF9), a polypeptide growth factor, plays a regulatory role in multiple cellular processes, including proliferation, differentiation, migration, and survival. In pancreatic β-cells, FGF9 activates the PI3K/Akt signaling cascade, thereby reducing apoptosis and enhancing pancreatic function. Additionally, FGF9 modulates hepatic lipid metabolism, attenuates lipid accumulation, and restores metabolic homeostasis, contributing to improved insulin sensitivity and mitigation of hepatic steatosis46. Its anti-inflammatory capacity is evidenced by the promotion of M2 macrophage polarization, which helps limit adverse cardiac remodeling following myocardial infarction in diabetic models47. FGF9 signaling is also essential for cardiomyocyte development; myocardial FGF9 deficiency has been linked to ventricular hypoplasia48. In the context of SA, FGF9 exhibits distinct functional properties. Individuals with SA typically show reduced myoblast numbers and impaired proliferative and differentiation potential, which impairs muscle regeneration. FGF9 suppresses myogenic differentiation while promoting myoblast proliferation, supporting muscle repair and regenerative processes49. Huang et al. demonstrated FGF9 expression in C2C12 myoblasts, myotubes, and mouse SOL and EDL muscles, where it inhibited myogenin expression, thereby impeding terminal differentiation and favoring proliferative expansion of C2C12 myoblasts50. Identified as a shared gene between T2D and SA, FGF9 may serve as a molecular link between the two conditions. This association highlights a promising avenue for therapeutic development targeting overlapping pathophysiological mechanisms in T2D with comorbid SA.

Lactate dehydrogenase A (LDHA), a specific subunit of lactate dehydrogenase (LDH), exhibits predominant expression in human pancreatic α-cells, with minimal presence in β-cells51. Limited LDHA expression in β-cells serves to restrict excessive insulin secretion in response to elevated circulating pyruvate or lactate during intense physical activity or stress, thereby reducing the risk of hyperinsulinemia and hypoglycemia52. LDHA expression undergoes dynamic modulation throughout pancreatic islet development; its repression in β-cells intensifies from the fetal stage to adulthood, inversely correlating with β-cell maturation and the enhancement of insulin secretory function53. In contrast, α-cells maintain significantly higher LDHA expression in adulthood. High-glucose conditions further enhance LDHA mRNA and protein levels in α-cells, accompanied by increased lactate release54. These observations emphasize LDHA’s regulatory role in islet cell lineage specification and its involvement in modulating metabolic stress responses.

However, LDHA expression is significantly elevated in the diabetic state, particularly within pancreatic β-cells of individuals with T2D, where it is strongly linked to impaired insulin secretion and diminished glucose oxidation capacity55. Similar expression patterns have been consistently observed in pancreatic β-cells across multiple diabetic rat models56,57,58, as well as in pancreatic tissues from T2D patients. Experimental studies have shown that administration of the LDHA inhibitor oxamate in obese diabetic db/db mice results in notable improvements in metabolic parameters, including reductions in fasting blood glucose and enhancements in insulin sensitivity and secretion. Additionally, oxamate treatment contributes to the restoration of normal islet morphology59, supporting the therapeutic relevance of LDHA inhibition in diabetes management. LDHA also plays a key role in maintaining muscle integrity60. As a central glycolytic enzyme, its downregulation impairs ATP synthesis in skeletal muscle, with a significant impact on energy metabolism in type 2 muscle fibers61. In contrast, research by Zhang et al. utilizing a chicken muscle antibody array demonstrated that increased LDHA expression suppressed myoblast proliferation and differentiation. This suppression occurs through enhanced glycolytic flux and tricarboxylic acid cycle activity, accompanied by inhibition of oxidative phosphorylation in the electron transport chain, collectively supporting energy metabolism while modulating muscle development and regeneration62. Sanchez et al. further reported that LDHA inhibitors may preferentially enhance insulin secretion without directly affecting glucagon release, presenting a refined therapeutic strategy for T2D. Elevated LDHA activity contributes to excessive lactate accumulation, driving lactylation modifications that exacerbate insulin resistance and disturb the balance between muscle protein synthesis and degradation. These disruptions ultimately impair muscle mass and function63. Precision targeting of LDHA activity represents a viable strategy to improve both pancreatic performance and muscle integrity in T2D, thereby contributing to more favorable metabolic outcomes.

Therapeutic investigations focusing on BDH1-modulating natural extracts, FGF9-based biologics, and isoform-selective LDHA inhibitors demonstrate considerable potential for revolutionizing diagnostic and therapeutic approaches in metabolic disorders. A recent study demonstrated that Nanoselenium-conjugated Siraitia grosvenorii (NSG) modulates the upregulation of ketone body biosynthesis genes (particularly BDH1), offering a potential intervention strategy against endocrine disruptor-induced obesity64. Complementary research revealed that apigenin and chrysin exhibit hypocholesterolemic and ketogenic properties in mouse embryonic fibroblasts through transcriptional upregulation of Bdh1 and Slc16a665. In biotechnological developments, high-yield fibroblast growth factor 9 (FGF9) production strains were successfully isolated via emulsion-based cultivation of Corynebacterium glutamicum, with subsequent protein secretion analysis showing a threefold increase in FGF9 output compared to parental strains66. Regarding therapeutic development, systematic evaluation of existing lactate dehydrogenase (LDH) inhibitors has identified promising isoform-selectivity profiles67 and preclinical performance68, suggesting strong translational potential for these biomarker-targeted compounds.

Immune correlation analysis revealed that elevated BDH1 expression was positively associated with increased Treg and mast cell infiltration, while exhibiting an inverse relationship with neutrophil abundance. LDHA expression also demonstrated a negative correlation with mast cell activation. Tregs modulate immune responses via direct cell–cell contact and secretion of anti-inflammatory cytokines. In murine models of obesity and T2D, a marked reduction in Treg populations within visceral adipose tissue has been observed69. Induction of Tregs mitigated inflammatory responses, attenuated autoimmunity, improved insulin sensitivity, and reduced end-organ damage70. Moreover, Tregs accumulate at sites of muscle injury and contribute to muscle homeostasis and regeneration through IL-6 receptor α–mediated signaling71. Mast cells release proinflammatory mediators that can aggravate pancreatic injury and impair islet function72, yet also participate in skeletal muscle remodeling by promoting myogenic proliferation and differentiation via cytokine and growth factor signaling in coordination with other immune cells73. In contrast, neutrophils impair pancreatic β-cell function through the release of proinflammatory factors, exacerbating insulin resistance and advancing T2D pathology74. Additionally, neutrophil-derived superoxide contributes to muscle membrane damage, and deficiencies in neutrophil recruitment can hinder muscle regeneration and accelerate SA progression75. Overall, disruption of immune cell homeostasis plays a central role in driving disease onset and progression.

Despite extensive research on the association between T2D and SA, limited evidence exists regarding shared core biomarkers. The current study contributes to understanding potential molecular mechanisms underlying T2D with SA and predicts relevant signaling pathways and immune cell profiles. However, several limitations must be acknowledged. The analysis is based on two datasets comprising 66 T2D and 64 control samples, and 32 SA and 30 control samples, respectively. This sample size may constrain the reliability of WGCNA, PPI network construction, and immune infiltration analysis. Additionally, the datasets are generated using different platforms, including microarray, high-throughput sequencing, and gene chip technologies, which may introduce variability and confound cross-sectional comparisons.

Conclusion

In summary, this study, integrating bioinformatics analysis with experimental validation, identified BDH1, FGF9, and LDHA as diagnostic biomarkers linked to both T2D and SA. The intersecting genes exhibited significant enrichment in multiple biological processes and signaling pathways. Cross-disease hub genes showed marked correlations with distinct immune cell distributions, suggesting immunological involvement in the shared disease mechanisms. These biomarkers hold potential for improving early diagnostic precision and informing therapeutic strategies, with implications for optimizing clinical outcomes and patient quality of life in T2D complicated by SA. The results contribute to a deeper understanding of the overlapping pathogenic mechanisms and offer prospective targets for integrated diagnostic and treatment frameworks.