Introduction

Lung cancer is one of the most common cancer worldwide, with the highest incidence and mortality rates among all cancers1,2. Non-small cell lung cancer (NSCLC), accounting for approximately 85% of all lung cancer cases, represents the predominant pathological type3. Lung adenocarcinoma (LUAD) is the most frequent histological subtype of NSCLC, comprising around 40% of newly diagnosed lung cancer cases4. The complex biological characteristics and high degree of heterogeneity of LUAD pose significant challenges for its treatment and prognostic prediction. Currently, clinical treatment strategies include surgical resection, chemotherapy, radiotherapy, molecular targeted therapy5, and immunotherapy6. Although these approaches have markedly improved the prognosis of LUAD patients7, issues such as drug resistance, adverse side effects, and the lack of clear patient selection criteria still limit the overall clinical benefits8.

The prognosis of LUAD patients remains generally poor, with an overall 5-year survival rate of only 10%–15%9. Studies have shown that tumor heterogeneity is a key factor affecting treatment efficacy and prognosis10, manifested across multiple levels including genomic alterations, individual genetic variability11, and dynamic changes within the tumor microenvironment12. While existing prognostic evaluation systems, such as Tumor-Node-Metastasis (TNM) staging system and molecular subtyping13, provide important guidance for clinical decision-making, they still exhibit significant limitations in terms of predictive accuracy, individualized application, and dynamic monitoring of the tumor microenvironment. Thus, there is an urgent need to develop more precise and comprehensive prognostic assessment systems.

Mitochondria, as central organelles for cellular energy metabolism, play essential roles in a variety of critical biological processes, including Ca2⁺ signaling, reactive oxygen species (ROS) homeostasis, regulation of apoptosis and cell proliferation, and lipid transport14. Recent studies have revealed that mitochondrial dysfunction is closely associated with the initiation and progression of multiple cancer types15. For example, enhanced mitochondrial biogenesis has been shown to promote tumorigenesis by altering glycolytic metabolism in gastric and colorectal cancers16; heterogeneity of mitochondrial oxidative phosphorylation (OXPHOS) function has been linked to immunotherapy resistance in triple-negative breast cancer17; and in LUAD, KRAS mutations can lead to increased mitochondrial DNA copy number, promoting tumor progression through enhanced oxidative phosphorylation18, while Targeting mitochondrial transcription elongation factor (TEFM) has been demonstrated to effectively inhibit tumor growth19. Moreover, mitochondrial-mediated programmed cell death signatures have been implicated in prognostic prediction and immunotherapy response evaluation in LUAD20. These findings provide novel insights for the development of mitochondria-targeted therapeutic strategies. However, the impact of mitochondrial gene expression patterns on LUAD progression and prognosis remains poorly understood, limiting their clinical application.

To better predict the prognosis of LUAD, this study conducted a molecular classification of patients based on mitochondrial-related gene expression profiles. Utilizing publicly available datasets, LUAD patients were stratified into distinct subtypes according to the expression patterns of these genes. We systematically evaluated the differences among these subtypes in terms of gene expression, metabolic pathways, immune microenvironment, and clinical outcomes, and further conducted detailed investigations of cellular heterogeneity at single-cell resolution. Additionally, to gain insights into potential therapeutic responses, we assessed the sensitivity of various chemotherapeutic agents across these subtypes. Collectively, the molecular classification system established in this study may provide a novel theoretical basis for precision therapy in LUAD.

Materials and methods

Data acquisition and processing

A cohort comprising a total of 515 LUAD patients was enrolled in this investigation. Their RNA sequencing data, somatic mutation profiles, and corresponding clinical data were procured from The Cancer Genome Atlas (TCGA, https://portal.gdc.cancer.gov/), accessed via the TCGAbiolinks21 package (version 2.30.4) in the R programming environment. Clinical and transcriptomic data from an additional LUAD dataset, GSE3121022, were retrieved from the Gene Expression Omnibus (GEO; https://www.ncbi.nlm.nih.gov/gds/) database using GEOquery23 package (version 2.70.0) in the R programming environment. 1136 mitochondria-related genes that are essential for mitochondrial function were obtained from the MitoCarta 3.024 database (https://www.broadinstitute.org/mitocarta/mitocarta30-inventory-mammalian-mitochondrial-proteins-and-pathways). Six gene sets related to mitochondrial function, five gene sets related to inflammatory pathways, and eight gene sets associated with immune evasion were obtained from the Molecular Signatures Database (http://www.gsea-msigdb.org/gsea/msigdb/index.jsp)25.

Univariate Cox regression analysis was performed on a total of 59,423 genes derived from TCGA-LUAD tumor samples to identify genes significantly associated with prognosis. Based on a selection criterion of p < 0.01, 5,567 genes were identified. By intersecting these genes with 1,136 mitochondria-related genes, 259 prognostic mitochondria-related genes (PMGs) were further determined. The gene symbols of the PMGs were subsequently converted to ENTREZ IDs using the clusterProfiler26 R package (version 4.10.1), and Gene Ontology (GO) enrichment analysis was conducted using the enrichGO function. The enrichment results were visualized with the ggplot227 package (version 3.5.1).

Mitochondrial-related subtypes analysis by consensus clustering

Based on the expression profiles of the 259 PMGs, consensus clustering28 (CC) was performed on the TCGA-LUAD gene expression matrix using the ConsensusClusterPlus28 R package (version 1.66.0). To assess the clustering stability of different LUAD subtypes, 80% resampling of the samples was conducted. The optimal number of clusters was determined to be k = 2. Random sampling and clustering were repeated for 1,000 iterations to ensure the robustness and reproducibility of the findings. Accordingly, the TCGA-LUAD cohort was divided into two distinct PMGs-related subtypes, designated as C1 and C2. Principal component analysis29 (PCA) was performed to evaluate the homogeneity between the identified clusters, and the results were visualized using the ggplot2 package (version 3.5.1). Kaplan–Meier survival curves were generated using the survival30 R package (version 3.5.8), and differences in survival rates between clusters were assessed by the log-rank test, with p < 0.05 considered statistically significant.

Gene expression and pathway differences in PMGs-related subtypes

The GSVA31 package (version 1.50.5) was utilized to calculate the enrichment scores of PMGs-related subtypes based on the Hallmark32 gene sets. The results were visualized using the pheatmap package (version 1.0.12). To evaluate the mitochondrial functional differences between PMGs-related subtypes, particular attention was given to six key mitochondrial function-associated pathways, including Fatty Acid β-Oxidation, Mitochondrial Biogenesis, Mitochondrial Protein Degradation, Mitophagy, Oxidative Phosphorylation (OXPHOS), and the Tricarboxylic Acid (TCA) cycle. The differences in GSVA scores for these metabolic pathways between subtypes were visualized using the ggplot2 package (version 3.5.1), with a statistical significance threshold set at p < 0.05.

Evaluation of immune microenvironment heterogeneity in PMGs-related subtypes

The estimate33 R package (version 1.0.13) was employed to evaluate the immune scores, stromal scores, and ESTIMATE scores of PMGs-related subtypes, as well as to infer tumor purity for each sample. The CIBERSORT34 algorithm (https://cibersortx.stanford.edu/) was utilized to estimate the relative proportions of 22 common immune cell types within the PMGs-related subtypes. Statistical significance of differences between groups was assessed using the Wilcoxon rank-sum test.

Establishment of prognostic models for lung adenocarcinoma based on DEGs

Differentially expressed genes (DEGs) between the identified cohorts were analyzed using the limma35 R package (version 3.58.1), with statistical thresholds set at |logFC|> 1 and p < 0.05. The results were visualized using the pheatmap package (version 1.0.12). A univariate Cox proportional hazards regression model was applied to evaluate the association between DEG expression levels and patient survival outcomes. To further reduce the number of candidate variables for model construction, Least Absolute Shrinkage and Selection Operator (LASSO) regression analysis was performed using the glmnet36 package (version 4.1.8). Subsequently, key genes selected through LASSO regression were subjected to stepwise multivariate Cox proportional hazards regression analysis. A prognostic risk score model was then constructed based on the regression coefficients, with the risk score formula defined as follows:\({\text{Risk Score }} = \, \beta_{1} \times {\text{Gene}}_{1} + \beta_{2} \times {\text{Gene}}_{2} + \beta_{3} \times {\text{Gene}}_{3} + \cdots + \beta_{n} \times {\text{Gene}}_{n} .\)

Patients were stratified into high-risk and low-risk groups based on the median value of the risk score. Kaplan–Meier (K–M) survival analysis and receiver operating characteristic (ROC) curve analysis were employed to evaluate the predictive performance of the prognostic model. Additionally, a nomogram was constructed using the rms R package (version 6.8.0) to assess the potential clinical utility of the risk score as an independent prognostic indicator.

Drug sensitivity analysis and genomic alteration analysis

The oncoPredict37 R package (version 0.2) was utilized to predict the sensitivity of patients with different PMG-related subtypes to commonly used therapeutic agents in LUAD. Somatic mutation data for the TCGA-STAD cohort were downloaded using the TCGAmutations R package (version 0.4.0), retaining only samples with complete mutation information for subsequent analyses. Single nucleotide variant (SNV) data were analyzed and visualized using the maftools38 R package (version 2.18.0).

Investigating cellular type differences among mitochondria-related subtypes at single-cell resolution

Single-cell RNA sequencing (scRNA-seq) datasets from GSE11757039, GSE13190740, and GSE14807141 were selected for further analysis. The scDblFinder42 R package (version 1.16.0) was employed to identify and remove doublets. Cells with fewer than 500 UMI counts and those with mitochondrial gene expression exceeding 15% were further filtered out. Batch effects across different datasets were corrected using the harmony R package (version 1.2.1). The top 3000 highly variable genes were selected as features for subsequent dimensionality reduction and clustering using the FindVariableGenes function in the Seurat43 package (version 4.4.0). Dimensionality reduction was performed based on the first 30 principal components, and clustering was conducted with a resolution parameter set to 0.5. Cluster-specific marker genes were identified using the FindAllMarkers function with parameters set to logfc.threshold = 0.25 and min.pct = 0.25. Clusters were annotated by integrating information from the original dataset publications and known classical marker genes for cell types.

A total of 37 untreated primary tumor samples were selected for subsequent in-depth analysis. The utilized scRNA-seq data were not accompanied by bulk RNA-seq profiles. According to the method described by Zhou et al.44, an approximate bulk RNA-seq expression profile for each single-cell sample was generated by averaging the gene expression values across all cells within each sample. Subsequently, the TCGA-LUAD dataset was randomly divided into a training set and a validation set at a 7:3 ratio. An XGBoost45-based supervised classification model was constructed using the training set to classify single-cell samples based on PMG expression profiles. The model’s performance was validated using the validation set, with the area under the curve (AUC) employed as the evaluation metric. Finally, bar plots were generated using the ggplot2 R package (version 3.5.1) to compare differences in cell type distributions between the C1 and C2 subtypes in the single-cell samples.

Assessment of gene set scoring at the single-cell level

Enrichment scores of Immune evasion signaling pathways in single-cell samples were evaluated using the “AUCell46,” “UCell47,” “singscore48,” and “ssgsea49” methods implemented in the irGSEA50 software package (https://github.com/chuiqin/irGSEA/) (version 3.3.2). The rank aggregation (RRA) algorithm was applied to integrate the results from different methods and to identify immune evasion pathways that were consistently and significantly enriched across all approaches.

Cell–cell communication analysis

Intercellular communication networks were constructed using the CellChat51 R package (version 2.1.0) (https://github.com/sqjin/CellChat) to explore potential interactions among various cell types. After creating the CellChat object, the CellChatDB.human database was selected to infer communication probabilities and specific receptor-ligand interactions within the defined cell populations.

Statistical analysis

All of the statistical analyses and data visualizations were performed in the R programming environment (R version 4.3.1). The Wilcoxon rank-sum test was employed for comparative analysis between two groups. The K–M method was employed to draw survival curves. p-value < 0.05 was considered statistically significant. Unless stated otherwise, all parameter settings for the R packages utilized in this study were set to their default values.

Results

The expression patterns of mitochondrial-related genes identify two distinct LUAD subtypes

To stratify patients with lung adenocarcinoma (LUAD), we initially conducted a genome-wide univariate Cox regression analysis, applying a stringent threshold of p < 0.01, and identified 5,567 genes significantly associated with overall survival (OS). Concurrently, 1,136 mitochondria-related genes were obtained from the MITOCARTA3.0 database (Supplementary Table 1), referred to as mitochondrial genes (MGs). By intersecting these two gene sets (Fig. 1A), a total of 259 genes were identified that were both associated with mitochondrial function and prognostically relevant in LUAD (Supplementary Table 2), hereafter termed prognostic mitochondrial genes (PMGs). These PMGs possess dual functional and prognostic significance, thus providing a molecular foundation for the classification of mitochondria-associated subtypes in LUAD. Gene Ontology (GO) enrichment analysis revealed that the PMGs were predominantly involved in biological processes related to mitochondrial function and energy metabolism (Fig. 1B).

Fig. 1
figure 1

Two subtypes of LUAD based on PMGs. (A) Screening process of PMGs, excluding 5308 genes associated only with prognosis and 877 mitochondrial genes without significant prognostic value; (B) GO enrichment analysis of PMGs; (C) Consensus matrix heatmap based on PMGs in the TCGA-LUAD cohort; (D) Cumulative distribution function (CDF) curve based on PMGs in the TCGA-LUAD cohort; (E) PCA visualization of LUAD molecular subtypes based on PMGs, with ellipses representing the distribution ranges of different subtypes in low-dimensional space; (F) Kaplan–Meier survival analysis of the identified subtypes; (G) Heatmap of GSVA scores for PMG-associated subtypes based on Hallmark gene sets, with each square representing a single sample; (H) Boxplots illustrating the GSVA score distributions of six key mitochondrial-related pathways between C1 and C2 subtypes; (I) Violin plots showing the comparison of ESTIMATE scores between PMG-associated subtypes; (J) Violin plots depicting the comparison of tumor purity between PMG-associated subtypes; (K) Boxplots illustrating differences in immune cell infiltration levels between PMG-associated subtypes.

Based on the expression profiles of the PMGs, consensus clustering analysis was performed on the TCGA-LUAD cohort to uncover potential molecular subtypes. When the number of clusters was set to k = 2, the consensus matrix heatmap exhibited distinct and well-defined boundaries (Fig. 1C), indicating high intra-cluster similarity and substantial inter-cluster heterogeneity. Furthermore, the cumulative distribution function (CDF) curve plateaued at k = 2 (Fig. 1D), suggesting that increasing the cluster number beyond two did not markedly improve clustering stability. Therefore, k = 2 was selected as the optimal clustering solution. Accordingly, the TCGA-LUAD cohort was stratified into two molecular subtypes, designated C1 and C2. Principal component analysis (PCA) further validated the clustering results, demonstrating a clear separation between the two subtypes in two-dimensional space (Fig. 1E), thereby underscoring the distinct gene expression patterns between C1 and C2. Kaplan–Meier survival analysis showed that patients classified into the C1 subtype exhibited significantly poorer OS compared to those in the C2 subtype (p = 0.0011), indicating that this molecular classification has robust prognostic relevance (Fig. 1F).

To elucidate differences in pathway activity between the two subtypes, we performed Gene Set Variation Analysis (GSVA) based on the Hallmark gene sets. Among the 50 pathways evaluated, 33 displayed significant differences in activity scores between C1 and C2 (p < 0.05) (Fig. 1G). Notably, the majority of pathways, particularly those involved in metabolic processes, were more highly activated in the C1 subtype, suggesting an enhanced metabolic phenotype. In contrast, only four pathways—primarily immune-related, including allograft rejection and IL6/JAK/STAT3 signaling—were more activated in the C2 subtype.

Further, to specifically investigate mitochondrial functional differences between the subtypes, we examined GSVA scores of six key mitochondrial pathways, including fatty acid β-oxidation, mitochondrial biogenesis, mitochondrial protein degradation, mitophagy, oxidative phosphorylation, and the tricarboxylic acid (TCA) cycle. Across all assessed pathways, C1 exhibited significantly higher GSVA scores compared to C2 (Fig. 1H), reflecting a global upregulation of mitochondrial metabolic activity.

The tumor immune microenvironment was also compared between the two subtypes using the ESTIMATE algorithm. The C1 subtype demonstrated lower stromal, immune, and ESTIMATE scores, but higher tumor purity relative to the C2 subtype (Fig. 1I,J). Furthermore, immune cell infiltration was assessed using the CIBERSORT algorithm, revealing that C1 tumors harbored higher levels of T follicular helper cells, M0 macrophages, M2 macrophages, and activated mast cells. Conversely, plasma cells, resting memory CD4+ T cells, resting dendritic cells, and resting mast cells were more abundant in the C2 subtype (Fig. 1K). These findings collectively suggest that mitochondrial gene expression patterns are intricately associated with both metabolic reprogramming and immune landscape remodeling in LUAD.

Construction and validation of a prognostic model based on DEGs associated with PMGs-related subtypes

To identify robust biomarkers for prognostic evaluation, we first analyzed differentially expressed genes (DEGs) between the two subtypes. A total of 932 DEGs were identified, including 446 genes upregulated in the C1 subtype and downregulated in C2, and 486 genes exhibiting the opposite expression pattern (Fig. 2A). To further screen for candidate genes associated with the prognosis of lung adenocarcinoma (LUAD) patients, univariate Cox regression analysis was performed. Based on gene expression profiles and survival data from 515 patients in the TCGA-LUAD cohort, 190 genes were initially identified as significantly associated with prognosis (p < 0.05) (Supplementary Table 3). To refine these candidates, least absolute shrinkage and selection operator (LASSO) regression analysis was conducted. The feature selection process and cross-validation results of the LASSO model are presented in Fig. 2B,C, respectively. According to the changes in variable coefficients with increasing regularization parameter (λ), 16 key genes were selected for subsequent multivariate Cox regression analysis (Supplementary Table 4).

Fig. 2
figure 2

Construction of a prognostic model based on DEGs among PMG-related subtypes. (A) Screening process of PMGs, excluding 5308 genes associated only with prognosis and 877 mitochondrial genes without significant prognostic value; (B) GO enrichment analysis of PMGs; (C) Consensus matrix heatmap based on PMGs in the TCGA-LUAD cohort; (D) Cumulative distribution function (CDF) curve based on PMGs in the TCGA-LUAD cohort; (E) PCA visualization of LUAD molecular subtypes based on PMGs, with ellipses representing the distribution ranges of different subtypes in low-dimensional space; (F) Kaplan–Meier survival analysis of the identified subtypes; (G) Heatmap of GSVA scores for PMG-associated subtypes based on Hallmark gene sets, with each square representing a single sample; (H) Boxplots illustrating the GSVA score distributions of six key mitochondrial-related pathways between C1 and C2 subtypes; (I) Violin plots showing the comparison of ESTIMATE scores between PMG-associated subtypes; (J) Violin plots depicting the comparison of tumor purity between PMG-associated subtypes; (K) Boxplots illustrating differences in immune cell infiltration levels between PMG-associated subtypes.

Multivariate Cox regression analysis was then performed to validate the prognostic value of these 16 genes and to construct the final prognostic model. Ultimately, eight genes were identified as independent prognostic factors. As shown in Fig. 2D, these genes exhibited significantly different expression levels between the high- and low-risk groups. The forest plot further confirmed that all genes had statistically significant hazard ratios (HRs) (global p = 2.33 × 10⁻12) (Fig. 2E). Based on these eight independent prognostic factors, we constructed a prognostic risk score model, with the formula as follows: \({\text{Risk score }} = \, \left( {0.{26573}} \right) \times {\text{PLK1 }} + \, \left( {0.{11685}} \right) \times {\text{LYPD3 }} + \, \left( {0.{11713}} \right) \times {\text{FOSL1 }} + \, \left( {0.{53113}} \right) \times {\text{DLGAP5 }} + \, \left( { - 0.{32573}} \right) \times {\text{IKZF3}}\)\(+ \, \left( {0.{17318}} \right) \times {\text{C4orf48 }} + \, \left( { - 0.{55832}} \right) \times {\text{CENPW }} + \, \left( { - 0.{21994}} \right) \times {\text{TRIP13}}\).

Time-dependent ROC curve analysis (Fig. 2F) demonstrated that the model had good predictive accuracy for overall survival (OS), with an area under the curve (AUC) of approximately 0.7. Kaplan–Meier survival analysis revealed that patients in the high-risk group had significantly shorter OS compared to those in the low-risk group (p < 0.001) (Fig. 2G), indicating the model’s ability to effectively stratify patients by risk level. A nomogram incorporating sex and risk score was subsequently developed to further evaluate the independent prognostic value of the risk score (Fig. 2H). The calibration plot showed excellent agreement between predicted and observed OS (Fig. 2I).

Further statistical analysis showed that the majority of patients with the poor-prognosis C1 subtype were classified into the high-risk group (198/308), whereas patients with the favorable-prognosis C2 subtype were predominantly distributed in the low-risk group (145/206), supporting the accuracy of the risk model in distinguishing patients with different prognostic risks (Fig. 2J). Finally, the robustness of the model was validated using the independent GSE31210 cohort. Consistent with the findings in the TCGA-LUAD cohort, the model maintained good predictive performance (Fig. 2K), and Kaplan–Meier survival analysis confirmed a significant OS difference between high- and low-risk groups, with shorter survival observed in the high-risk group (p = 0.012) (Fig. 2L). Together, these results suggest that our risk score model has strong clinical potential as a robust prognostic tool for LUAD patients.

Drug sensitivity differences and TMB characteristics between PMGs subtypes

To investigate the differences in drug sensitivity between the two lung adenocarcinoma (LUAD) subtypes, we used the OncoPredict package to estimate the half-maximal inhibitory concentration (IC50) values for several chemotherapeutic and targeted agents commonly used in LUAD treatment. The results showed that Cisplatin, Carboplatin, Gemcitabine, Irinotecan, and Vinorelbine exhibited higher IC50 values in the C1 subtype compared to the C2 subtype, whereas Osimertinib displayed a significantly lower IC50 value in the C1 subtype (Fig. 3A). To further explore the genomic alterations that might contribute to differential therapeutic responses, we generated waterfall plots of somatic mutations for the C1 (Fig. 3B) and C2 (Fig. 3C) subtypes using the “maftools” package. Interestingly, the C2 subtype exhibited a higher overall mutation rate and a broader spectrum of gene mutations. Tumor mutational burden (TMB) analysis revealed that the C1 subtype had a higher TMB (Fig. 3D,E), which may partially explain its increased sensitivity to Osimertinib.

Fig. 3
figure 3

Comparison of genomic alterations between the two PMGs-related subtypes. (A) Differences in sensitivity to commonly used LUAD therapeutic agents between PMG subtypes; (B) Waterfall plot of somatic mutations in the C1 subtype; (C) Waterfall plot of somatic mutations in the C2 subtype; (D) Distribution of tumor mutational burden (TMB) in the C1 subtype; (E) Distribution of tumor mutational burden (TMB) in the C2 subtype.

Single-cell insights into cellular diversity of C1 and C2 subtypes

To further explore the differences in various cell types between the two lung adenocarcinoma (LUAD) subtypes, we conducted a comprehensive analysis using 37 primary tumor samples from three publicly available single-cell RNA sequencing datasets: GSE131907, GSE148071, and GSE117570. After implementing rigorous quality control measures, we obtained a total of 64,667 cells for subsequent analyses. Batch effect correction results, shown in Fig. 4A,B, demonstrate that technical variations among datasets were effectively mitigated. At a resolution parameter of 0.5, we distinguished 14 major cell types based on classical marker genes, including T lymphocytes, Monocytes, B lymphocytes, Macrophages, Dendritic cells, Plasma cells, Mast cells, Natural Killer(NK) cells, Alveolar cells, Epithelial cells, Endothelial cells, Fibroblasts, Oligodendrocytes, and Malignant cells (Fig. 4C). Figure 4D displays the top five highly expressed genes in each cell type.

Fig. 4
figure 4

Cellular heterogeneity and classification of PMG subtypes based on single-cell RNA-seq data. (A) Visualization of batch effects in the raw single-cell data; (B) Batch effect removal after correction with the Harmony algorithm; (C) The t-SNE plot displaying 14 major cell types; (D) Heatmap of differentially expressed genes across integrated single-cell samples; (E) ROC curve for the validation dataset using the classification prediction model; (F) GSVA analysis of pathway enrichment; (G) Comparison of the proportions of different cell types between C1 and C2 subtypes.

Since the single-cell RNA-seq data lacked corresponding bulk RNA-seq data, we simulated bulk expression profiles by averaging gene expression levels across all cells within each single-cell sample following the method proposed by Zhou et al. Based on previous subtype classifications, we then constructed a classification prediction model using the XGBoost algorithm, associating complex mitochondrial gene expression patterns with known subtype labels to classify single-cell samples. We randomly divided the TCGA-LUAD cohort into two parts at a 7:3 ratio, using the larger portion for training and the smaller for validation. A classification prediction model was trained on the training set using the expression profiles of 259 PMGs, achieving an AUC of 1.0. Remarkably, the model also demonstrated high predictive accuracy in the validation set, with an AUC of 0.93 (Fig. 4E), indicating its excellent performance in PMGs-based sample classification. Using this classifier, we categorized the single-cell samples and identified 17 C1 subtype samples and 20 C2 subtype samples. Gene set variation analysis (GSVA) revealed distinct pathway activation patterns between the C1 and C2 subtypes (Fig. 4F). Notably, although fewer significantly enriched pathways were detected in the single-cell data compared to the bulk sequencing data, the pathways commonly enriched in both datasets exhibited highly consistent activity trends, further validating the robustness and reliability of the classification model. The distribution patterns of different cell types between the C1 and C2 subtypes are illustrated in Fig. 4G. Compared with the C2 subtype, the C1 subtype exhibited a higher proportion of macrophages and malignant tumor cells, whereas the C2 subtype showed a higher proportion of T lymphocytes and B lymphocytes. This observation suggests that macrophages in the C1 subtype may not exert their inherent anti-tumor effects but instead participate in tumor-promoting processes.

Heterogeneous impact of macrophages on prognosis across PMGs-related subtypes

Given the higher abundance of malignant tumor cells and macrophages in the C1 subtype, along with the significantly poorer prognosis associated with C1, we hypothesized that macrophages might play a critical role in tumor progression among patients with this subtype. GO enrichment analysis revealed that DEGs in macrophages were predominantly enriched in pathways related to energy metabolism and biosynthesis (Fig. 5A). To further elucidate the role of macrophages in LUAD prognosis, particularly in the context of immune evasion, we conducted a detailed gene set enrichment analysis of eight key immune escape signaling pathways. As shown in Fig. 5B, we integrated the results from four different gene set enrichment methods to comprehensively assess the activity of macrophages between the C1 and C2 subtypes. The analysis demonstrated that macrophages in the C1 subtype exhibited significantly upregulated activity in the NF-κB signaling pathway, while those in the C2 subtype displayed relatively suppressed activity, suggesting that the NF-κB pathway may play a central role in immune evasion within the C1 subtype.

Fig. 5
figure 5

Functional characterization and cell–cell communication analysis of C1 subtype macrophages. (A) GO enrichment analysis of macrophages in the C1 subtype; (B) Enrichment scores of C1 subtype macrophages across immune escape pathways; (C) Heatmap of differences in the number of ligand–receptor pairs between different cell types; (D) Heatmap of differences in the probabilities of ligand–receptor pairs between different cell types; (E) Ligand–receptor interaction analysis from macrophages (sender) to tumor cells (receiver); (F) Ligand–receptor interaction analysis from tumor cells (sender) to macrophages (receiver).

Using CellChat for cell–cell communication analysis, we identified frequent and complex interactions between macrophages and malignant tumor cells. Heatmaps depicting the number (Fig. 5C) and the probability (Fig. 5D) of ligand–receptor interactions between these two cell types revealed extensive intercellular communication. In the signaling from macrophages to tumor cells, the PPIA-BSG and LGALS9-P4HB ligand–receptor pairs were significantly enriched (Fig. 5E); conversely, in the signaling from tumor cells to macrophages, the PPIA-BSG, MIF-(CD74 + CD44) complex, MT-RNR2-FPRL2, and MDK-NCL interactions were frequently observed (Fig. 5F).

Discussion

Lung adenocarcinoma (LUAD) is a complex multifactorial disease. Despite significant advances in targeted therapies and immunotherapies in recent years2, challenges such as heterogeneous treatment responses and poor prognoses persist10. The heterogeneity of LUAD at clinical, behavioral, cellular, and molecular levels is a major factor influencing patient outcomes10. However, current LUAD classification methods are primarily based on histopathological features, which often fail to fully capture the tumor’s molecular characteristics and biological functions, thereby limiting their effectiveness in accurately predicting patient prognosis.

Mitochondria, as the primary sites for cellular energy production, are critically involved in biological processes closely related to cancer initiation and progression, including anabolic metabolism and oxidative stress14. Although the role of mitochondrial function in cancer progression has been studied across several cancer types15, its specific mechanisms and contributions in LUAD remain insufficiently explored.

In this study, we identified two potential LUAD subtypes based on the expression profiles of key mitochondrial genes and systematically characterized their distinct features. The C1 subtype exhibited significantly enhanced mitochondrial functional activity and was notably associated with poorer survival outcomes, higher tumor purity, and a more immunosuppressive microenvironment. Despite the heightened mitochondrial activity observed in C1, this did not translate into favorable prognostic outcomes; rather, it correlated with shorter survival times. This phenomenon may reflect an increased dependency of tumor cells on energy metabolism, thereby promoting tumor growth and metastasis. Furthermore, the combination of higher tumor purity and a significantly immunosuppressive microenvironment in the C1 subtype likely provides a niche conducive to tumor cell survival and dissemination. Enhanced mitochondrial function may not only facilitate metabolic adaptation of tumor cells but also exacerbate immune suppression by modulating immune cell functions, ultimately leading to poor patient outcomes. Notably, a higher infiltration level of M2 macrophages was observed in the C1 subtype compared to C2. Macrophages exhibit remarkable plasticity and can differentiate into distinct functional phenotypes through polarization processes. Among these, M2 macrophages, characterized by their anti-inflammatory properties, tissue repair capabilities, and immunosuppressive functions, play a critical role in tumor progression. Substantial evidence suggests that mitochondria play a pivotal role in regulating macrophage polarization52. The study by Willenborg et al. demonstrated that mitochondrial metabolism drives the pro-inflammatory and angiogenic functions of M1 macrophages through the early glycolysis-mtROS-HIF1α axis, followed by promotion of M2 macrophage survival and tissue remodeling functions via late-stage OXPHOS-hormesis. This metabolic reprogramming orchestrates the temporal transition of macrophage phenotypes from M1 to M253. These findings suggest that mitochondrial metabolism may facilitate LUAD progression through dual mechanisms: directly supporting the energy demands and proliferative capacity of tumor cells, while simultaneously remodeling the immune microenvironment (particularly by inducing M2 macrophage polarization) to evade immune surveillance.

In contrast, the C2 subtype exhibited lower mitochondrial functional activity, better survival outcomes, lower tumor purity, and higher levels of immune cell infiltration. Additionally, the C2 subtype demonstrated an enrichment trend in certain pathways related to inflammatory responses and immune regulation. However, upon evaluating the expression activity of five classical inflammation-related pathways—JAK-STAT, NF-κB, inflammasome, cGAS-STING, and MAPK—no significant differences were observed between the C1 and C2 subtypes. These findings suggest that the C2 subtype may represent a collection of samples characterized by weaker mitochondrial function and lacking the metabolic features observed in the C1 subtype.

To elucidate the clinical relevance of mitochondrial-related subtyping, we constructed a prognostic model based on eight differentially expressed genes (DEGs) between the subtypes: PLK1, LYPD3, FOSL1, DLGAP5, IKZF3, C4orf48, CENPW, and TRIP13. These genes are critically involved in tumorigenesis and tumor progression: PLK1 regulates cell cycle progression and its inhibitors have advanced to phase III clinical trials54; LYPD3 functions as a potential antibody–drug conjugate (ADC) target and is implicated in immune evasion, and its elevated expression is significantly associated with poor prognosis in LUAD55; FOSL1 regulates tumor cell progression in solid malignancies by modulating epithelial-mesenchymal transition (EMT)56; DLGAP5 functions as a master cell cycle regulator that drives lung adenocarcinoma progression via PLK1 upregulation, substantiating its oncogenic role57; CENPW is a gene implicated in nucleosome assembly, has been identified as an oncogene associated with poor prognosis in hepatocellular carcinoma58 and breast cancer59. However, its role in the progression of lung adenocarcinoma remains to be elucidated; IKZF3 influences lymphocyte differentiation and is a potential therapeutic target in breast cancer60; TRIP13 participates in DNA damage repair and its overexpression associated with poor prognosis in LUAD61; and C4orf48 promotes renal fibrosis through RNA-binding mechanisms, whereas its biological functions in tumorigenesis and cancer progression remain poorly characterized62. These genes play pivotal roles in critical biological processes including tumorigenesis, immune regulation, and cell cycle control. While some have been established as potential therapeutic targets or prognostic biomarkers, the precise molecular mechanisms and clinical implications of others in lung adenocarcinoma remain to be fully elucidated.

It is important to note that our prognostic model achieved an AUC (area under the curve) value of 0.7, indicating moderate predictive performance. This is lower compared to deep learning-based approaches such as SpaRx63 (AUC = 0.938) and DrugFormer64 (AUC = 0.975), likely due to the limitations of the currently used Lasso-Cox regression model in capturing complex nonlinear relationships. Future studies should aim to enhance predictive accuracy by leveraging more advanced deep learning models or establishing integrated predictive frameworks. Moreover, extensive validation in larger cohorts and further clinical application testing will be necessary.

To explore the specific therapeutic strategies for the two molecular subtypes of LUAD, we systematically assessed the IC50 of six commonly used clinical antitumor drugs. These drugs include: cisplatin, a platinum-based chemotherapeutic agent that exerts cytotoxic effects by forming DNA cross-links, and is widely used in first-line combination chemotherapy for advanced NSCLC; carboplatin, a second-generation platinum drug with a similar mechanism of action but a more favorable toxicity profile, commonly used in combination chemotherapy for NSCLC; gemcitabine, an antimetabolite that inhibits DNA synthesis, an essential component of basic NSCLC chemotherapy regimens; irinotecan, an antitumor agent targeting topoisomerase I, primarily used in second-line therapy for small cell lung cancer (SCLC) and NSCLC; vinorelbine, a plant alkaloid chemotherapeutic agent that inhibits microtubule polymerization to exert anti-mitotic effects, suitable for palliative treatment of NSCLC; and osimertinib, a third-generation EGFR tyrosine kinase inhibitor that specifically targets EGFR mutations, a first-line treatment for EGFR-mutated NSCLC.The results showed that osimertinib was the only drug that exhibited significantly enhanced drug sensitivity in the C1 subtype (IC50 value decreased by 42% compared to the C2 subtype, p < 0.001), while other drugs demonstrated better efficacy in the C2 subtype. To further investigate the molecular mechanisms underlying this difference, we conducted genomic feature analysis. Somatic mutation spectrum analysis revealed that the KRAS mutation frequency was similar between the two subtypes, and the EGFR gene did not appear in the top 30 mutated genes. Based on this finding, we turned to TMB analysis.TMB analysis revealed significant differences between the subtypes: while the TMB values in both subtypes were positively correlated with the number of mutations, the TMB values in the C1 subtype were more concentrated and significantly higher than those in the C2 subtype. The C2 subtype exhibited a broader range of TMB values, with 23.5% of samples falling below the clinical reference threshold of 0.5. These data suggest that the higher TMB in the C1 subtype may enhance sensitivity to osimertinib by increasing genomic instability, whereas the relatively lower TMB in the C2 subtype may explain its better response to conventional chemotherapeutic agents. This finding provides important biological insights into the treatment response differences between LUAD molecular subtypes.

We extended the subtyping strategy to single-cell resolution using the XGBoost algorithm, enabling a more detailed comparison of cellular composition differences between subtypes. Given the lack of matched bulk RNA-seq profiles for single-cell sequencing data, we constructed pseudo-bulk RNA-seq expression profiles by averaging the gene expression values of individual cells within each sample44. It should be noted that, while this arithmetic averaging strategy offers advantages in computational simplicity and stability and can reflect the overall expression characteristics of a cellular population, it presents two important limitations: first, it may obscure the heterogeneity of specific cellular subpopulations; second, simple arithmetic averaging fails to adequately account for the impact of cellular composition proportions on the expression profiles, where existing pseudo-bulk methods demonstrate superior performance65.

In the C1 subtype, the proportions of macrophages and malignant tumor cells were significantly higher than those observed in the C2 subtype. Conversely, the proportions of other immune cell types, aside from macrophages, were markedly lower in C1 compared to C2. This suggests that a high abundance of macrophages may synergize with tumor cells to suppress anti-tumor immune responses, thereby enhancing the proliferative and invasive capacities of tumor cells. Moreover, tumor-associated macrophages (TAMs) may facilitate chronic inflammatory responses through mitochondrial-associated pathways, thereby promoting tumor progression and immune evasion66. This finding aligns with the unfavorable prognosis observed in the C1 subtype.Therefore, we propose that the heightened mitochondrial metabolic activity observed in C1 may be linked to macrophage-mediated immunosuppression and tumor-promoting effects, which is further corroborated by the elevated infiltration of M2-polarized macrophages in this subtype.

Further in-depth cell–cell communication analyses revealed that interactions between macrophages and malignant tumor cells were closely associated with activation of the NF-κB signaling pathway. This indicates that macrophages may enhance tumor survival and exert immunosuppressive functions through NF-κB pathway activation, thereby facilitating tumor immune evasion. In this process, reactive oxygen species (ROS) serve as critical signaling molecules for NF-κB pathway activation, with mitochondria being one of the primary sources of ROS67. Mitochondria-derived ROS can activate NF-κB signaling within macrophages, leading to the secretion of both pro-inflammatory and immunosuppressive cytokines, thus shaping a tumor-promoting microenvironment68. Consequently, mitochondrial regulation of macrophage activation states plays a pivotal role in modulating the tumor microenvironment and represents a key factor influencing lung adenocarcinoma progression and patient prognosis.

Ligand–receptor interaction analysis further elucidated the regulatory mechanisms of the tumor microenvironment in the C1 subtype. Notably, the interaction between galectin-9 (LGALS9) and prolyl 4-hydroxylase subunit beta (P4HB) was significantly enriched in macrophage-to-tumor cell signaling, potentially mediating T cell suppression via NF-κB pathway activation69. This finding aligns with our observation of significantly lower T cell infiltration in the C1 subtype compared to C2. Moreover, the binding of cyclophilin A (PPIA) to basigin (BSG) may facilitate tumor invasion and recruit M0 macrophages, consistent with the elevated proportion of M0 macrophages in the C1 subtype70. Importantly, the interaction between macrophage migration inhibitory factor (MIF) and CD74 antigen (CD74) was more prominent in the context of TP53 mutations, aligning with the higher tumor mutational burden (TMB) and TP53 mutation rates observed in C171. Additionally, the binding of midkine (MDK) to nucleolin (NCL) may inhibit wild-type EGFR signaling pathways, potentially contributing to the increased sensitivity of C1 tumors to osimertinib treatment.

Previous work by Zhou et al. proposed a novel classification of 1,076 TCGA-BRCA samples based on the expression of 29 mitophagy-related genes, resulting in two distinct subtypes. Among them, the cluster 1 subtype exhibited high expression of mitophagy-related genes and was associated with poorer survival outcomes, consistent with our finding that the C1 subtype, characterized by worse prognosis, exhibited significantly higher mitophagy pathway activity compared to C2. Interestingly, Zhou et al. also reported that triple-negative breast cancer (TNBC) and metastatic breast cancer were more prevalent in the cluster 2 subtype, which may be attributed to excessive ROS levels leading to lower expression of mitophagy genes in cluster 272. Wang et al. further classified samples based on the expression of nuclear-encoded mitochondrial genes (NMGs), identifying three subtypes associated with immune cell infiltration patterns and exploring their potential involvement in cuproptosis, a recently discovered form of cell death with anti-cancer properties73.

In LUAD, previous studies have predominantly focused on the role of specific genes in regulating mitochondrial function. In contrast, our study systematically established, for the first time, a molecular subtyping framework for lung adenocarcinoma based on mitochondrial-related gene signatures through multi-omics analyses. We further elucidated the interplay between mitochondrial function and the tumor immune microenvironment, offering novel insights for precision diagnosis and individualized treatment of lung adenocarcinoma. Nonetheless, several limitations should be acknowledged. First, our study is based solely on bioinformatic analyses and lacks validation in solid clinical specimens. Second, limitations also exist in the single-cell analysis component: although 64,667 cells from 37 samples provided substantial cellular data, the relatively small sample size may hinder comprehensive assessment of population heterogeneity. Future studies should aim to validate these findings in larger cohorts.