Abstract
Lung adenocarcinoma (LUAD) is a common histological subtype of lung cancer, but its prognosis remains poor. Recent studies have suggested that liquid-liquid phase separation-related genes (LRGs) can significantly predict the prognosis of low-grade tumors. Identifying potential LRGs associated with prognosis in LUAD could have significant clinical value for predicting patient outcomes. Data were sourced from public databases. Differentially expressed LRGs (DE-LRGs) were identified through differential expression analysis and by taking intersections between datasets. Regression analysis and the Least Absolute Shrinkage and Selection Operator (Lasso) method were used to shortlist prognostic genes, and a multivariate Cox regression model was developed to create a prognostic risk model. Tumor samples were stratified into high- and low-risk groups based on the median risk score. Independent prognostic analyses and the construction of a nomogram were performed in conjunction with clinical characteristics. Immune characteristics of the two risk groups were also analyzed. Additionally, single-cell RNA sequencing (scRNA-seq) data were used to identify cell clusters and annotate known cell types. A total of 389 DE-LRGs were identified, and 7 prognostic genes were selected to construct the risk model. Patients in the high-risk group exhibited lower survival rates, and the nomogram demonstrated high predictive accuracy. Significant differences were observed in clinical characteristics, immune status, and drug sensitivity between the high- and low-risk groups. Based on scRNA-seq data, 8 distinct cell types were annotated, with the prognostic genes GRIA1 and BCAN showing higher expression levels in fibroblasts and mast cells, respectively. Seven prognostic genes were identified, and the resulting prognostic model accurately predicted the survival outcomes of LUAD patients. This model provides valuable insights for the prognosis and personalized treatment of LUAD patients.
Similar content being viewed by others
Introduction
Lung cancer is the most common malignancy worldwide, broadly categorized into small-cell lung cancer (SCLC) and non-small-cell lung cancer (NSCLC). Among NSCLC subtypes, lung adenocarcinoma (LUAD) is the most prevalent, followed by squamous cell lung cancer1. The treatment of advanced LUAD is typically based on chemotherapy. However, over the past two decades, the discovery of predictive biomarkers has opened up new therapeutic opportunities, including targeted therapies and immunotherapies, for advanced NSCLC patients. These therapies, especially immune checkpoint inhibitors (ICIs) and epidermal growth factor receptor tyrosine kinase inhibitors (EGFR TKIs), have significantly improved patient prognosis compared to conventional chemotherapy2,3,4. Despite these advances, LUAD is characterized by difficulties in early detection, frequent metastasis, poor response to radiotherapy, and low rates of surgical resection in advanced stages. Furthermore, the rise of drug resistance and the side effects of treatment remain critical challenges5,6. These factors underscore the urgent need for further research into the molecular mechanisms of LUAD and the identification of novel biomarkers to improve treatment strategies for advanced LUAD.
Membrane-less organelles (MLOs) can exhibit liquid-like properties through a physicochemical process known as liquid-liquid phase separation (LLPS), which is essential for various biological activities such as those occurring in processing bodies (P-bodies), stress granules, and nuclear speckles7. In 2009, Hyman’s team demonstrated the formation of phase separation by observing the properties of P granules in nematodes. This study showed that cells can facilitate the aggregation of specific intracellular molecules through phase separation. LLPS enables the dynamic exchange of components between membrane-free organelles and surrounding structures, maintaining a stable internal environment8. As research into LLPS mechanisms deepens, its involvement in various cellular functions—such as signal transduction, gene expression regulation, and stress responses—has become increasingly clear9. In recent years, studies have also linked LLPS to tumor development and progression. LLPS-related genes (LRGs) have been shown to significantly predict the prognosis of ovarian cancer, gastrointestinal malignancies, and low-grade gliomas. Specifically, high-risk patients generally have much lower survival rates than low-risk patients10,11,12,13. However, it remains unclear whether or how LRGs influence the prognosis and immune microenvironment of LUAD patients. Thus, understanding the role of LRGs as potential prognostic biomarkers for LUAD could contribute to more accurate LUAD classification and the development of personalized treatment strategies.
In this study, we aimed to identify prognostic genes associated with LLPS in LUAD through a series of bioinformatics analyses using publicly available LUAD-related datasets. We then explored the prognostic value and potential mechanisms of action of these genes through enrichment analysis, immune correlation analysis, regulatory network construction, drug sensitivity analysis, and other methods. Additionally, we validated the expression of these prognostic genes at the cellular level using single-cell sequencing and qRT-PCR. Compared to studies on LLPS in other types of cancer, this study integrated immune characteristics, clinical data, and drug sensitivity to provide references for personalized treatment14,15. Moreover, through single-cell RNA sequencing technology, we further explored cellular heterogeneity within the tumor microenvironment. Various analytical methods, including Lasso regression and Cox regression models, were employed to construct a prognostic risk model. This study lays a relatively comprehensive theoretical foundation for a deeper understanding of the role of LRGs in LUAD and provides potential insights for improving the treatment and prognosis of LUAD patients.
Methods
Data sources
Gene expression matrices of lung adenocarcinoma (LUAD) patients and normal lung tissue samples were downloaded from The Cancer Genome Atlas database (TCGA) (https://portal.gdc.cancer.gov/), and data from patients with no duration of follow-up and with unknown death or absence were excluded. Although some confounders were excluded, unrecorded factors such as treatment history and comorbidities could not be excluded due to raw data limitations. Clinical data and survival data for LUAD were derived from the University of California Santa Cruz (UCSC) Xena website (https://xena.ucsc.edu/). Date from 511 LUAD samples (tumor group) and 59 healthy samples (normal group) were incorporated into training set (TCGA-LUAD), and these data were used for subsequent analyses related to differential gene identification and construction of prognostic models (Supplementary Table 1). These data were used as a training set for subsequent analysis related to differential gene identification and prognostic model construction. The validation set (GSE31210 dataset) originated in the Gene Expression Omnibus database (GEO) (https://www.ncbi.nlm.nih.gov/geo/), and the sequencing platform was GPL570, which included in sum 226 LUAD patient samples with survival data (Tumor group) (Supplementary Table 2). GSE31210 was used for the validation of the prognostic model. A total of 3611 liquid-liquid phase separation-related genes (LRGs) were obtained from the data resource of liquid-liquid phase separation database (DrLLPS) (http://llps.biocuckoo.cn/), totaling 361116. The GSE131907 was derived from GEO database for subsequent single-cell data analysis, and the sequencing platform was GPL16791, containing data from 11 tumor tissue samples (Tumor group) and 11 distant normal lung samples (Normal group). The criteria for selecting the dataset in this study primarily included high-quality data sources, comprehensive sample information, and a large sample size. Specifically, the high-quality data sources not only included gene expression data but also encompassed rich clinical survival data, which provided important support for the construction of the LUAD prognostic model. Secondly, the comprehensive sample information helped to understand the differences in gene expression across different clinical backgrounds and their impact on the prognosis of LUAD patients. Lastly, the large sample size effectively avoided overfitting, improving the generalizability and accuracy of the model. Most importantly, the datasets also supported the study of LRGs in the prognosis of LUAD, providing potential biomarkers and therapeutic targets for early diagnosis, personalized treatment, and prognostic assessment of LUAD.
Identification of differentially expressed genes (DEGs)
The R “DEseq2” (v 1.38.0) was employed to proceed a difference analysis to obtain DEGs between tumor and control samples in the TCGA-LUAD, with screening conditions of |log2Fold change(FC)| > 1.5, and adjusted P value (adj. P) < 0.0517. R “ggplot2” (v 3.4.4) and “ComplexHeatmap” (v 2.15.1) was employed to visualize DEGs18,19. Subsequently, with the purpose of get the differentially expressed-LRGs (DE-LRGs), according to DEGs and LRGs obtained above, the intersection of DEGs and LRGs was taken using the R “ggvenn” (v 0.1.9) to obtain DE-LRGs as candidate genes and plotted the Venn diagram20.
DE-LRGs enrichment analysis and Building protein-protein interaction (PPI) networks
Exploring the functional pathways involved by DE-LRGs based on Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG). The R “clusterProfiler” (v 4.7.1.003) was employed to proceed GO function and KEGG pathway analysis of DE-LRGs (screening threshold adj.P < 0.05)21. To further understand the DE-LRGs interactions at the protein level, DE-LRGs were uploaded to the Search Tool for the Retrieval of Interaction Gene/Proteins database (STRING) (https://cn.string-db.org/) to build a PPI network (interaction score > 0.9). The Cytoscape software (v 3.9.1) was used to construct network maps after elimination of isolated targets22.
Screening for prognostic genes and constructing prognostic models
To enquire into the probable value of DEG-LRGs in anticipating overall survival (OS) in sufferers with LUAD. First, the R “Survival” (v 3.5-3) was employed to proceed a univariate Cox regression analysis of DEG-LRGs to shortlist for characterized genes associated with prognosis (P < 0.001)23. Subsequently, the R “glmnet” (v 4.1-4) was employed to proceed a Least Absolute Shrinkage and Selection Operator (Lasso) regression analysis to avoid overfitting, and candidate genes were further screened according to the optimal value of lambda (family = “cox”, nfold = 10)24. Next, the genes shortlisted by Lasso regression were subjected to the proportional hazards (PH) assumption test, the eligible ones were employed as prognostic genes and modeled.
In addition, the expression of prognostic genes in clinical samples was further explored by reverse transcription-quantitative PCR (RT-qPCR). It was important to note that before the experiment was conducted, the study was approved by the ethics committee of the First Affiliated Hospital of Anhui Medical University, Hefei, China. Besides, all samples were gathered from the First Affiliated Hospital of Anhui Medical University, with donors giving informed consent. Specifically, the 5 lung tissue samples with LUAD and 5 distant normal lung samples were collected. The total RNA of the samples was extracted by the Trizol method (Ambion, 15596018CN, USA), and the cDNA was acquired using the SweScript First Strand cDNA synthesis kit (Servicebio, G3333-50, China). GAPDH was employed as an internal reference gene and the 2-ΔΔCt method was employed to compute prognostic genes expression. Prognostic genes were considered statistically significant in their expression at P < 0.05. Primer sequences were shown in Supplementary Table 3. The research must have been performed in accordance with the Declaration of Helsinki. Lung adenocarcinoma tissues from patients who provided informed consent must be based on the guidelines laid out by the GEO Ethics, Law, and Policy Group. The resultant data were statistically analyzed and visualized by Graphpad Prism (v 8.0), and the difference between the two groups was obtained by t-test (P < 0.05).
Evaluate and validate prognostic models
In the TCGA-LUAD, risk score coefficients and formulas were used to calculate risk scores for each LUAD sample. The algorithm was: riskscore = Expi1*Coefi1 + Expi2*Coefi2 + . + Expin*Coefi, with Expin and Coefin indicating the expression level and multivariate Cox coefficient of each prognostic gene, respectively. Gene expression data were directly included in the model as continuous variables and were not categorized. The samples were allocated to high and low risk groups in accordance with the median risk score in TCGA-LAUD. The use of the median as a cut-off point is a common and statistically sound practice in survival analyses, enabling equal division of the sample into two groups, facilitating comparison of survival differences between the two groups, and making the median easy to interpret and reproduce. To evaluate the prognostic model, risk curves, survival status distributions, and the R “ggplot2” (v 3.4.4) and “survivor” (v 0.4.9) were utilized to plot Kaplan-Meier (KM) curves for the two groups to determine the differences in survival of LUAD25. In addition, in the TCGA-LUAD tumor samples, the R “survivalROC” (v 1.0.3.1) was employed to plot the Receiver Operating Characteristic (ROC) curves26. Area Under Curve (AUC) values were obtained and applied to ascertain the precision of the forecasts (AUC > 0.6). In addition, heatmaps were created employing the R “ComplexHeatmap” (v 2.15.1) to show the expression patterns of prognostic genes across two groups. To validate risk model constructed by the prognostic genes, calculate the risk score of GSE31210 founded on the risk score formula. Subsequently, the LUAD samples were classified allocated to the median risk score into two groups. Depending on risk group, the R “survviner” (v 0.4.9) was used to plot KM curves to determine survival differences among two groups. In addition, R “survivalROC” (v 1.0.3.1) was engaged for draft the ROC curves. Moreover, in order to further validate the ability of these 7 genes to differentiate between high-risk and low-risk patients, we conducted expression analysis and plotted boxplots of the expression levels in training and validation sets to more clearly demonstrate their expression differences.
Construction and evaluation of independent prognostic models
To understand the association among risk scores and clinical indicators, an independent prognostic analysis was performed. Univariate Cox regression analysis and PH assumption test were performed for 7 variables (risk score, Age, Gender, TNM stage, T stage, N stage, and M stage) using R package “survival” (v 3.5-3). Variables that were linked to patients’ prognosis (P < 0.05) and passed PH assumption test (P > 0.05) were selected for multivariate Cox regression analysis. Variables with a P < 0.05 in the multivariate Cox regression analysis forest plot were selected as independent prognostic factors. To further validate the predictive ability of the independent prognostic factors, in TCGA-LUAD, R “survivalROC” was used to plot ROC curves, and the AUC values were utilized to evaluate the model’s effectiveness (AUC > 0.6). Finally, the R package “scatplot3d” (v 0.3–44) was employed to proceed principal component analysis (PCA) investigating the distribution pattern of prognostic genes27.
Nomogram construction and assessment
So as to quantitatively anticipate the survival probability of LUAD, nomogram drawn from independent prognostic factors was constructed in TCGA-LUAD of tumor samples using the R package “rms” (v 6.5-0)28. Each independent prognostic factor corresponded to a score (Points) and accumulation of the factor scores corresponded to the total score (Total Points), which was employed to anticipate the probability survival for LUAD. Next, a calibration curve was used to suggest the relationship among the anticipated probability values and the true probability values. The reference line indicated that the anticipated probability was the same as the true probability, the closer the anticipated value approached the reference line, the more credible the outcome became. In the TCGA-LUAD of tumor samples, the R “rms” (v 6.5-0) was employed to construct the calibration curves of the nomogram. Finally, the R package “ggDCA” (v 1.2) were used to plot the Decision Curve Analysis (DCA) curves29.
Mutation analysis
So as to understand the differences in mutations amongst the risk groups, the R package “maftools” (v 2.14.0) were used to analyze mutation data between the two groups, and the visualization process focused on the 20 genes with the most frequent mutations30. Subsequently, tumor mutation burden (TMB) values were computed founded on the somatic mutation data generated for each LUAD, and the association among risk scores and TMB values was examined. Finally, Wilcoxon rank sum test of the “rstatix” (v 0.7.2) (https://CRAN.R-project.org/package=rstatix) was employed to assess the association among TMB values and clinical characteristics (patient age) by the correlation. Smoking is closely associated with factors such as epidemiology, pathology, molecular characteristics, and clinical features, with significant differences between smokers and non-smokers31. Therefore, the somatic mutation data of LUAD patients from the TCGA database were downloaded. Using the R package “maftools” (v 2.14.0), we analyzed the mutation data of smokers and non-smokers and selected the top 20 genes with the highest mutation frequencies for visualization30. Meanwhile, the correlation between TMB values and smoking status was assessed using the Wilcoxon rank-sum test with the R package “rstatix” (v 0.7.2) (https://CRAN.R-project.org/package=rstatix).
Gene set enrichment analysis (GSEA)
To further investigate the differences in gene function and underlying biological mechanisms between risk groups in risk models. First, in the training set, the R “DESeq2” (v 1.38.0) was utilized to proceed analysis to identify DEGs between risk groups, and to sort and generate gene lists based on log2FC values from largest to smallest. Then, the “h.all.v2023.2.Hs.symbols.gmt” was optioned as a background gene from Molecular Signatures Database (MSigDB) (https://www.gsea-msigdb.org/gsea/msigdb). Subsequently, the R “clusterProfiler” (v 4.7.1.003) was employed to proceed GSEA in the TCGA-LUAD to enrich signaling pathways associated with risk scores. The shortlisting criteria were adj. P < 0.05, and the R “enrichplot” (v 1.24.2) was engaged in visualize the first 5 apparently enriched pathways32.
Immune infiltration analysis
To assess difference in immune conditions among the TCGA-LUAD two groups, the CIBERSORT algorithm from the R package “IOBR” (v 0.99.9) was emoployed to map the infiltration patterns of immune cells, and to derive a per-immune cell score33,34. Subsequently, the Wilcoxon rank-sum test method was applied using the R package “rstatix” (v 0.7.2) to analyze the differences in 22 immune cell infiltration between the two groups35. In addition, to validate the robustness of the cell type decomposition results, data from the xCell database (http://xcell.ucsf.edu/) were used, and the Wilcoxon rank-sum test from the R package “rstatix” (v 0.7.2) was also applied to analyze the infiltration differences of 64 immune cell types between high-risk and low-risk groups. Finally, the R package “corrplot” (v 0.92) was used to analyze the Spearman’s method correlation between differential immune cells along with among prognostic genes and differential immune cells, which was visualized by correlation heatmaps36.
Immune checkpoint analysis
Immune checkpoints were small protein molecules produced by immune cells to regulate autoimmune functions. To compare the expression quantities of immune checkpoints among two groups, 48 immune checkpoints were obtained from the reference37. In TCGA-LUAD, the expression levels of 48 immune checkpoints were compared among in the two groups (P < 0.05).
Drug sensitivity analysis
Chemotherapy is a common clinical treatment for malignant tumors, and to evaluate the therapeutic effects of anticancer drugs on LUAD, R “pRRophetic” (v 0.5) was used to assess the sensitivity to 138 drugs with 50% inhibiting concentration (IC50) values for each LUAD in the training set tumor samples38. The IC50 value of LUAD patients for 138 drugs to assess the sensitivity of LUAD patients to the drugs. Differences in semi-inhibitory concentrations of 138 drugs were compared between two groups, along with analyzing the correlation among prognostic genes and differential drugs. To further validate the drug sensitivity results, the IC50 values of 138 drugs were assessed for each LUAD patient in the validation set of tumor samples using the R package “pRRophetic” (v 0.5)38. Subsequently, the differences in IC50 values between the high-risk and low-risk groups for each drug were compared by Wilcoxon test.
Building molecular regulatory networks
To show the regulatory association among prognostic genes and transcription factors, the NetworkAnalyst platform (https://www.networkanalyst.ca/) was used to anticipate transcription factor (TF) for prognostic genes, and visualize as a network.
Single-cell data analysis
R “Seurat” (v 4.3.0) was employed to create 10x single-cell transcriptome sequencing data from GSE131907 as seurat objects39. First, cells with fewer than 200 genes and fewer than 3 cells covering genes were filtered out, and the quality control criteria were that the number of genes measured per cell (nFeature_RNA) needed to be greater than 500 and less than 4,000, the total of the expression of all genes measured per cell (nCount_RNA) needed to be less than 4,000, and the percent.mt was less than 5%. The sum of nCount_RNA needs to be less than 4,000 and percent.mt less than 5%. Then, the “NormalizeData” function in the “Seurat” (v 4.3.0) was utilized for standardized the data, the “FindVariable Features” function was employed to select high variable genes, retaining the first 2,000 high variable genes for downstream analysis. The “ScaleData” was employed to normalize the data, the “ElbowPlot” was employed to draw a scree plot for visualization. Afterwards, the appropriate Principal Components (PCs) were selected for the next analysis founded on the contribution of the PCs to the variance. Finally, the “JackStraw” function was employed to calculate the P value for gene in PC, and the “ScoreJackStraw” function was employed to visualized. “ScoreJackStraw” function was employed to quantify the significance of the PCs, and PCs enriched with genes with low P values were shown to be more statistically significant.
Cell clustering analysis
The cluster was proceeding employing the Seurat standard procedure for Uniform Manifold Approximation and Projection (UMAP) cell clustering analysis, the resolution for cell class group identification set to 0.1. The marker gene was confirmed by the reference, which in turn annotated the cell clusters, and the R “ggplot2” (v 3.4.4) was employed for map the expression of the tagged genes to see the accuracy of the annotation18,40. Finally, the “VlnPlot” and “DotPlot” functions in the R “Seurat” (v 4.3.0) were employed to plot the gene expression founded on the tumor and normal groups, and expression of prognostic genes in a variety of cellular clusters.
Cellular communication
In order to understand the interactions between different cell types, the R “CellChat” (v 1.6.1) was intended for proceed a cellular communication analysis on the annotated cell types41. After creating CellChat objects, importing Ligand-receptor interactions in CellChat database for human (CellChatDB.human), and pre-processing, the cellular communication networks were inferred, and network diagram was used to visualized.
GESA of prognostic genes in scRNA-seq
For a deeper look into the functions of prognostic genes, we proceeded GSEA on each gene. The initial step involved classifying tumor samples into high and low expression categories, using the median expression of prognostic genes from the TCGA-LUAD study as a benchmark. Next, the FindMarkers function in the R “Seurat” (v 4.3.0) was employed to identify genes that were significantly different amongst the two groups, and to generate a list of genes sorted from largest to smallest based on log2FC values. Finally, “c2.cp.kegg_medicus.v2023.2.Hs.symbols.gmt” obtained from MsigDB was used as background gene. GSEA analysis was performed for prognostic gene using the R “ClusterProfiler” (v 4.7.1.003) (adj. P v < 0.05).
Statistical analysis
R language (v 4.2.2) was employed to proceed with bioinformatics analysis. Wilcoxon rank-sum test was employed to compare the differences among the two groups, and P value < 0.05 indicated statistically significant results. Cytoscape software (v 3.9.1) was used to draw the network diagrams.
Results
Access to DE-LRGs
After threshold screening, in sum 3,411 DEGs were obtained among tumor and control samples, amongst these 2,305 were up-regulated genes and 1,106 were down-regulated genes (Fig. 1A–B, Supplementary Table 4). Then, the obtained 3411 DEGs and 3,611 LRGs were taken to intersect, and 389 DE-LRGs were obtained for further analysis (Fig. 1C). Subsequently, a total of 682 functional pathways were significantly enriched in GO by DE-LRGs. Among them, GO-Biological process was enriched to 481 entries, significantly enriched in pathways involved in mitosis, nuclear division, mitotic cell cycle phase transition spindle. In GO-Cellular components, there were 129 entries, which were significantly enriched for spindles, mitotic spindles, neuron to neuron synapse, condensed chromosome, etc. In GO-Molecular functions, there were 72 entries, which were significantly enriched for structural constituent of cytoskeleton, alditol: NADP + 1-oxidoreductase activity, alcohol dehydrogenase (NADP+) activity, microtubule binding and other functions (Fig. 1D). The enrichment results were sorted from smallest to largest according to the criterion of screening. In sum 10 functional pathways were significantly enriched in KEGG by DE-LRGs. DE-LRGs were significantly associated with the pathways of cell cycle, amphetamine addiction, cytoskeleton in muscle cells, oocyte meiosis, long-term potentiation, etc. (Fig. 1E)42,43,44. The PPI network diagram of the results showed that there were 295 discrete proteins in 389 DEGs-LRGs with 94 protein interactions and 469 edges in the PPI network graph. Among them, the top five proteins with the strongest interactions with other proteins were CDK1, KIF11, CCNA2, KIF2C, and CCNB1 (Fig. 1F).
Access to DE-LRGs. (A) Volcano plot of differentially expressed genes between LUAD/NORMAL samples from TCGA. The yellow dots represented upregulated expression, the green dots represented downregulated expression, and the gray dots represented no significant difference. (B) Heatmap of up-regulated and down-regulated TOP10 differentially expressed genes between LUAD/NORMAL samples from TCGA. The yellow represented high expression, and the blue represented low expression. (C) Venn diagram of 389 intersecting genes between DEGs and LRGs. (D) GO enrichment analysis of LRGs. (E) KEGG enrichment analysis of LRGs. (F) PPI network of DE-LRGs.
Screening of prognostic genes and construction of risk models
After univariate Cox regression analysis, 36 candidate genes were screened out (P < 0.001), of which 3 candidate genes had hazard ratios (HR) less than 1, and 33 candidate genes had HR greater than 1 (Fig. 2A), indicating that most of the DE-LRGs were total survival promoters (HR > 1). Subsequently, 7 prognostic genes were obtained after Lasso regression analysis (lambda.min = 0.02257414) screening (Fig. 2B-C), which were CDC25C, BCAN, GPRIN1, ANLN, IGF2BP1, KRT6A, and GRIA1. All 7 genes passed the PH assumption test and satisfied the requirements of multivariate Cox regression analysis (Supplementary Table 5). Therefore, we employed seven prognostic genes to construct a risk score model. Moreover, the expression of prognostic genes was further revealed in clinical samples. Specifically, only GRIA1 was significantly down-regulated expression in LUAD (P < 0.0001). Notably, CDC25C, GPRIN1, GRIA1, BCAN, IGF2BP1 and ANLN were all significantly highly expressed in LUAD (P < 0.05), except for KRT6A which did not reach the significance level (Fig. 2D). It suggests that these prognostic genes have certain diagnostic properties for LUAD and play an important role in LUAD.
Screening of prognostic genes and construction of risk models. (A) Forest plot of univariate Cox regression analysis in DE-LRGs. (B–C) CDC25C, GPRIN1, GRIA1, BCAN, IGF2BP1, ANLN and KRT6A were identified through LASSO analysis. (D) The expression of prognostic genes in clinical samples. * p < 0.05, *** p < 0.001, **** p < 0.0001, ns: p > 0.05.
Assessment and validation of prognostic risk models
After calculation, we obtained risk scores for the 511 tumor samples in the TCGA-LUAD (Table 1 displays the seven prognostic genes and coefficients). The LUAD were divided into high and low risk groups founded on the median score (-0.05597733), constructing a prognostic risk model. There were 255 samples in the high-risk group, amongst these 135 survived and 120 died, and 256 samples in the low-risk group, of which 191 survived and 65 died (Fig. 3A, Supplementary Table 6). To sum up, it was observed that the survival rate of sufferers in the high-risk group was lower. KM curves indicated that there was a significant difference of survival among the two groups (P < 0.0001), and sufferers in high-risk group had lower survival (Fig. 3B). Moreover, the AUC values of the 1-, 3-, and 5-years were 0.71, 0.71, and 0.65, respectively (Fig. 3C), suggesting that the constructed model had a certain anticipatory accuracy. Furthermore, six prognostic genes were expressed higher in the high-risk group (BCAN, CDC25C, GPRIN1, ANLN, IGF2BP1, and KRT6A), and one prognostic gene was expressed higher in the low-risk group (GRIA1) (Fig. 3D). To enhance the assessment of the model, in GSE31210, the LUAD samples were divided into risk groups drawn from the median value (0.7376919). There were 113 high-risk samples, of which 87 survived and 26 died, and 113 low-risk samples, of which 104 survived and 9 died, the survival rate of the sufferers in the high-risk group was lower (Fig. 3E, Supplementary Table 7). KM curves indicated that there was a significant difference of survival along with the risk groups (P = 0.001) (Fig. 3F), and AUC values were all greater than 0.6 (Fig. 3G). The seven prognostic genes exhibited comparable expression levels between the two groups (Fig. 3H). In addition, the results of the box plot showed that BCAN was not significantly different between high and low risk groups in the GSE31210 dataset; whereas ANLN, CDC25C, GPRIN1, IGF2BP1, and KRT6A were significantly highly expressed in the high risk group and GRIA1 was significantly highly expressed in the low risk group (Fig. 3I). This result was consistent with the expression trend of prognostic genes between the two groups in the TCGA-LUAD dataset (Fig. 3J). In summary, the outcomes from the validation set were uniform with the TCGA-LUAD, and the constructed risk scoring model has certain predictive accuracy and validity.
Assessment and validation of prognostic risk models. (A) Survival curve for high and low risk groups from the TCGA-LUAD. (B) K–M survival analysis between high-risk and low-risk groups from the TCGA-LUAD. (C) ROC curves for 1/3/5 years of the TCGA-LUAD. (D) Heat map between high-risk and low-risk groups from the TCGA-LUAD. The pink represented high expression, and the blue represented low expression. (E) Survival curve for high and low risk groups from the dataset of GSE31210. (F) K–M survival analysis between high-risk and low-risk groups from the dataset of GSE31210. (G) ROC curves for 1/3/5 years of the dataset of GSE31210. (H) Heat map between high-risk and low-risk groups from the dataset of GSE31210. The pink represented high expression, and the blue represented low expression. (I) The expression of 7 prognostic genes in high and low risk groups for dataset GSE31210. (J) The expression of 7 prognostic genes in high and low risk groups for TCGA-LUAD. ** p < 0.01, **** p < 0.0001, ns: p > 0.05.
Construction and assessment of independent prognostic and nomogram
We integrated clinical metrics to formulate a predictive model for estimating OS in patients with LUAD. five variables were associated with LUAD OS and passed the PH assumption test (Fig. 4A). Next, risk score, T stage, and N stage were recognized as independent prognostic factors after performing multivariate Cox regression analysis, PH assumption test and p value screening (Fig. 4B). The results indicated that the favorable anticipatory ability of the independent prognostic factors (AUC > 0.6) (Fig. 4C-E). Furthermore, the PCA results of the seven prognostic genes showed good clustering of samples within the risk groups, which effectively differentiated the two groups (Fig. 4F). For better application, we constructed a nomogram using independent prognostic factors. In the TCGA-LUAD tumor samples, the survival probability of LUAD at 1-, 3-, and 5-years sufferers was obtained by calculating Total Points (Fig. 4G). The predicted probability of calibration curves showed that nomogram almost overlapped with the reference line (Fig. 4H), demonstrates the high accuracy of the nomogram in predicting OS in LUAD. In addition, DCA results that the nomogram model can benefit in treatment (Fig. 4I).
Construction and assessment of independent prognostic and nomogram. (A–B) Univariate and multivariate Cox independent prognostic analysis. (C–E) ROC curves indicated the favorable anticipatory ability of the independent prognostic factors (AUC > 0.6). (F) The PCA results of the seven prognostic genes showed good clustering of samples within the risk groups. (G) A nomogram using independent prognostic factors (the survival probability of LUAD at 1-, 3-, and 5-years sufferers was obtained by calculating total points). (H) The predicted probabilities of calibration curves. (I) DCA results of the nomogram model.
The differences of TMB and GESA among the risk groups
The outcomes demonstrated that there was a change in the mutation frequency of genes in risk groups, and the top 3 mutated genes in high-group were TP53, TTN, and MUC16 (Fig. 5A), and the top 3 mutated genes in the low-risk group were TP53, TTN, and RYR2 (Fig. 5B). Meanwhile, most of them were missense mutations. In the risk groups divided drawn from risk score, the high-risk group exhibited a significantly higher TMB compared to the low-risk group (P = 0.0000001), there existed a positive association among score and TMB (R = 0.33, P < 0.0000001) (Fig. 5C). In addition, TMB was correlated with clinical characteristics, and TMB values were significantly correlated with patient’s age, with patients with age greater than 60 years having significantly higher TMB values than those with age less than 60 years (Fig. 5D). The TMB analysis was performed with smoking status as a covariate, and the results showed that there were differences in gene mutation frequencies between smokers and non-smokers. In the smoking group, the top three genes with the highest mutation frequencies were TP53, TTN, and CSMD3, while in the non-smoking group, the top three mutated genes were TP53, TTN, and MUC16 (Fig. 5E-F). Most of the mutations were missense mutations. Additionally, smoking status was significantly associated with TMB values, with smokers exhibiting significantly higher TMB values than non-smokers (Fig. 5G).
The differences of TMB and GESA among the risk groups. (A) The mutation frequency of genes in high-group, and the top3 mutated genes were TP53, TTN, and MUC16. (B) The mutation frequency of genes in low-risk group, and the top3 mutated genes were TP53, TTN, and RYR2. (C) The high-risk group exhibited a significantly higher TMB compared to the low-risk group (P = 0.0000001), there existed a positive association among score and TMB (R = 0.33, P < 0.0000001). (D) TMB values were significantly correlated with patient’s age, with patients with age greater than 60 years having significantly higher TMB values than those with age less than 60 years. **p < 0.01. (E–F) Graph of TMB analysis in smoking and non-smoking groups. (G) Analysis of differences in TMB values between smoking and non-smoking groups. (H) The main pathways significantly associated with risk scores in GSEA results were hallmark e2f targets, hallmark myc targets v1, hallmark mtorc1 signaling, and hallmark epithelial mesenchymal transition pathways.
Based on the two risk groups, the main pathways significantly associated with risk scores in GSEA results were hallmark E2F targets, hallmark MYC targets V1, hallmark mtorc1 signaling, and hallmark epithelial mesenchymal transition pathways (Fig. 5H).
Difference and correlation between immune infiltration in two groups
Figure 6 A illustrates the distribution of 22 immune cell types within the two groups of samples. Marked disparities were observed in the levels of nine immune cells between the samples of the two groups. Among them, immune differential cells Macrophages M0, Macrophages M1, memory activated T cells CD4, resting Natural kill (NK) cells had higher infiltration abundance in the high-risk group, while memory resting T cells CD4, resting Mast cells, resting Dendritic cells, Monocytes and memory B cells had higher infiltration abundance in the low-risk group (Fig. 6B). The results of association among differential immune cells showed the most remarkably positive association among Monocytes and resting Dendritic cells (cor = 0.41, P < 0.05), and the most significant negative association among Macrophages.M0 and Monocytes (cor = − 0.42, P < 0.05) (Fig. 6C). Besides, the most significant strong correlation was observed within memory activated T cells CD4 and ANLN (cor = 0.47, P < 0.05), while the strongest negative correlation was observed within resting Mast cells and CDC25C (cor = − 0.38, P < 0.05) (Table 2) (Fig. 6D). This suggested that prognostic genes might alter the immune microenvironment of LUAD by affecting these differential immune cells. In the xCell database, the analysis revealed that there were significant differences in the infiltration of 46 immune cell types between the high-risk and low-risk groups (P < 0.05), including Adipocytes, B cells, Astrocytes, and others (Supplementary Fig. 1A). Spearman’s method was used to analyze the correlations between differentially expressed immune cells, and the results showed that Th1 cells had the strongest positive correlation with Pro B cells (cor = 0.63, p < 0.05), while HSC had the strongest negative correlation with Th1 cells (cor = − 0.62, p < 0.05) (Supplementary Fig. 1B). Furthermore, ANLN, GPRIN1, IGF2BP1, KRT6A, and CDC25C were positively correlated with Th2 cells, Th1 cells, Pro B cells, and pDC immune cells, while they were negatively correlated with NKT, Myocytes, and HSC immune cells. In contrast, GRIA1 showed the opposite correlation pattern (Supplementary Fig. 1C).
Difference and correlation between immune infiltration in two groups. (A) The distribution of 22 immune cell types within the two groups of samples. (B) Analysis of immune cell differences in high and low risk groups. ** p < 0.01, *** p < 0.001, **** p < 0.0001. (C) The results of association among differential immune cells. The yellow represented a positive correlation, and the blue represented a negative correlation. (D) Correlation analysis of prognostic genes and differential immune cells. The pink represented a positive correlation, and the blue represented a negative correlation. *p < 0.05, **p < 0.01, *** p < 0.001, ****p < 0.0001. (E) Expression analysis of immune checkpoints in high and low risk groups. *p < 0.05, **p < 0.01, *** p < 0.001, ****p < 0.0001. (F) A total of 95 drugs had significant differences in IC50 among the risk groups (P < 0.05), and the top 3 anticancer drugs ranked in the top 3 of the significance rankings were A.443,654, BI.2536, and CCT007093. (G) The results of correlation among prognostic genes and 95 drugs with significant differences (P < 0.05) in IC50 among risk groups. (H) Correlation analysis of 76 drugs with prognostic genes. (I) Drugs ranked top3 in significance for high and low risk groups (A.443654, CGP.082996, RO.3306). **** p < 0.0001. (J) TF-mRNA regulatory network found on the 7 prognostic genes with 35 TFs.
Immune checkpoints and drug sensitivity in risk group samples
In sum 31 immune checkpoints had expression differences among the two groups (P < 0.05). Among them, 8 checkpoints were elevated in the high-risk group (CD274, CD276, CD70, LAG3, TNFRSF18, TNFRSF9, TNFSF4, and TNFSF9), and the other 23 checkpoints had higher expression in the low-risk group (Fig. 6E). The IC50 of 138 chemotherapeutic agents were evaluated for each LUAD in the TCGA-LUAD, and a total of 95 drugs had significant differences in IC50 among the risk groups (P < 0.05), and the top 3 anticancer drugs ranked in the top 3 of the significance rankings were A.443,654, BI.2536, and CCT007093 (Fig. 6F). Besides, the results of correlation among prognostic genes and 95 drugs with significant differences (P < 0.05) in IC50 among risk groups showed that Parthenolide and GRIA1 (cor = 0.43804), Nutlin.3a and ANLN (cor = 0.45854), AS601245 and ANLN (cor = 0.47382) were significantly positively associated. BI.D1870 and CDC25C (cor=-0.67097), JNK.Inhibitor.VIII and CDC25C (cor=-0.64425), JNK. Inhibitor.VIII was significantly negatively associated with ANLN (cor=-0.63337) (Fig. 6G). In the validation set, the IC50 values of 138 anticancer drugs were analyzed, and 76 drugs were found to be significantly different between the high-risk and low-risk groups (P < 0.05) (Fig. 6H). Subsequently, the top 3 drugs in terms of significance (A.443654, CGP.082996, RO.3306) were selected for plotting presentation (Fig. 6I). These results provided ideas for stratifying the treatment of LUAD patients.
TF-mRNA regulatory network found on the 7 prognostic genes with 35 TFs
The regulatory network results visualized the regulatory relationships of the transcription of seven prognostic genes. In sum 35 TFs were predicted for the seven prognostic genes, of which, 12 TFs interacted with IGF2BP1, 10 TFs interacted with GPRIN1,9 TFs interacted with GRIA1, 8 TFs interacted with ANLN, 7 TFs and BCAN interaction, 7 TFs and CDC25C interaction, and 6 TFs and KRT6A interaction. Among them, the TFs FOXC1 and HINFP interact with several prognostic genes (Fig. 6J). This provided a basis for targeted regulation of prognostic gene expression.
Identification of tumor cell subtypes
GSE131907 was screened to obtain 88,114 cells and 25,498 genes (Supplementary Fig. 2). After quality control, 74,944 cells and 25,498 genes were obtained (Supplementary Fig. 3). After normalizing the data, first 2,000 highly variable genes were selected for downstream analysis, and top10 highly variable genes were marked in the graph (Supplementary Fig. 4). Afterwards, a scree plot of the contribution of the top 50 PCs to the variation was plotted. A gradual stabilization was observed after 30 PCs, indicating that the true signal comes mainly from the first 30 or so PCs (Supplementary Fig. 5). Immediately after that, which showed that the first 30 PCs were more statistically significant, so the first 30 PCs were selected for the downscaling analysis (Supplementary Fig. 6). Then, UMAP algorithm was utilized for clustering, co-clustering into 24 different cell clusters and distributing the cells of the samples into cell clusters (Supplementary Fig. 7, 8). Subsequently, 31 marker genes were found to annotate the cell clusters (Table 3), and in sum 8 types of cells were annotated (epithelial cells, fibroblasts, endothelial cells, T lymphocytes, NK cells, B lymphocytes, myeloid cells, and mast cells) (Fig. 7A). And the expression of these 31 genes in 24 cell clusters and 8 cell types were demonstrated by bubble plots, which indicated that the 31 marker genes were specifically highly expressed in each annotated cell (Fig. 7B, C), indicating the reliability of annotation. Finally, the expression intensity of candidate gene GRIA1 was higher in Fibroblasts, and the expression intensity of candidate gene BCAN was higher in mast cells (Fig. 7D).
Single-cell data analysis. (A) 31 marker genes were found to annotate the cell clusters, and in sum 8 types of cells were annotated (Epithelial cells, Fibroblasts, Endothelial cells, T lymphocytes, NK cells, B lymphocytes, Myeloid cells, and Mast cells). (B) The expression of these 31 genes in 24 cell clusters were demonstrated by bubble plots. (C) The 31 marker genes were specifically highly expressed in each annotated cell. (D) The expression intensity of candidate gene GRIA1 was higher in Fibroblasts, and the expression intensity of candidate gene BCAN was higher in MAST cells. The red represented high expression, and the blue represented low expression. (E–F) Communication networks showed the number of interactions and the intensity of interactions between epithelial cells and myeloid cells were higher.
Besides, the communication networks between eight cell types for predicting intercellular communication were inferred. The number of interactions among epithelial cells and myeloid cells, fibroblasts and myeloid cells was more frequent. Whereas the intensity of interaction among epithelial cells and myeloid cells, epithelial cells and B lymphocytes, fibroblasts and B lymphocytes was greater. Overall, the number of interactions and the intensity of interactions between epithelial cells and myeloid cells were higher (Fig. 7E, F).
Based on the single-cell data, GESA enrichment analysis inferred functional pathways in which 7 prognostic genes might be involved. Overall, prognostic genes were mainly enriched to translation initiation, pathogen SARS CoV 2 Nsp1 to translation initiation, nt06530 PI3K signaling, presentation by MHC class II molecules and nt06515 Regulation of kinetochore-microtubule interactions and other related pathways (Fig. 8A-G). This might indicate that prognostic genes influence the course of LUAD by participating in these important pathways, and if these speculations were confirmed, they would provide new ideas for targeted treatment of LUAD.
GESA of prognostic genes in scRNA-seq. (A–G) GESA enrichment analysis inferred that seven prognostic genes were mainly enriched to translation initiation, pathogen SARS CoV 2 Nsp1 to translation initiation, nt06530 PI3K signaling, presentation by MHC class II molecules and nt06515 Regulation of kinetochore-microtubule interactions and other related pathways.
Discussion
Lung cancer, as a major branch of malignant tumors, is one of the leading causes of cancer-related death worldwide. The incidence of LUAD remains high, and the prognosis for LUAD patients varies significantly. LRGs have been shown to significantly predict the prognosis of low-grade tumors. However, it is still unclear whether and how LRGs influence the prognosis of LUAD patients. The role of LLPS in cancer biology is complex and controversial. Studies have shown that LLPS plays a critical regulatory role in cancer cells by participating in the spatiotemporal coordination of cellular biological activities, and it exerts positive effects on normal cell functions. However, dysregulation of LLPS is also considered a potential driver of cancer development45. On the other hand, some studies suggest that LLPS may primarily have detrimental effects in cancer, where its dysregulation could disrupt normal cellular functions and promote tumor progression46. Additionally, there are viewpoints that, although LLPS may have harmful potential, specifically intervening in the formation of LLPS condensates might offer new therapeutic strategies for cancer treatment47. Currently, research findings regarding the role of LLPS in cancer are inconsistent, likely due to differences in study design, experimental models, or molecular contexts. Therefore, future research could address these discrepancies by employing new technologies and more precise experimental designs to further clarify the potential applications of LLPS in cancer. In this study, based on data from the TCGA and GEO databases and LRGs, we performed univariate Cox analysis, LASSO analysis, and multivariate Cox analysis to identify prognostic genes and establish a reliable prognostic model.
Through univariate Cox regression analysis and LASSO analysis, 7 phase-separated related prognostic genes were identified, namely CDC25C, GPRIN1, GRIA1, BCAN, IGF2BP1, ANLN and KRT6A. The risk model was constructed by calculating the riskScore. LUAD patients were categorized into high- and low-risk groups based on the median value of the risk score. In current research, 5 LLPS-related genes (IGF2BP1, KRT6A, LDHA, PKP2, PLK1) associated with LUAD prognosis have been identified, and a novel model has been proposed that effectively predicts the prognosis of LUAD patients. The study also highlighted that the KRT6A gene may play a key role in the LLPS-related malignant progression of LUAD48. Among these, IGF2BP1 and KRT6A align with the findings of our study. In another study, a prognostic tool based on core genes such as GRIA1, CRTAC1, MAGEA4, and MAPK4 also effectively predicted the prognosis of LUAD patients and provided new perspectives for potential biomarkers49. GRIA1 is consistent with our findings. In comparison with previous studies, we further explored the expression of prognostic genes in different cell types through single-cell data analysis and identified that CDC25C, GPRIN1, BCAN, and ANLN are associated with the LLPS process in LUAD. This analysis expands potential therapeutic targets and provides additional resources for disease research.
CDC25C is a critical cell cycle regulatory phosphatase that facilitates the transition from the G2 phase to mitosis by activating cyclin-dependent kinase 1 (CDK1). In lung cancer, aberrant expression of CDC25C has been implicated in tumor development and progression. Overexpression of CDC25C can lead to the bypass of essential cell cycle checkpoints and promote the G2/M transition in the cell cycle, resulting in uncontrolled cell proliferation and genomic instability. The poly(U)-binding splicing factor 60 (PUF60) is an upstream factor that regulates the CDC25C pathway. One study shows that depletion of PUF60 inhibits the G2/M transition, cell proliferation, and tumor progression in LUAD cells50. Subsequent research has found that high expression of CDC25C protein in LUAD is associated with invasive histological features and poor outcomes. Compared to normal lung tissue, CDC25C expression is upregulated in LUAD51. Consequently, CDC25C has been identified as a prognostic biomarker for LUAD patients and a key prognostic marker for NSCLC. Furthermore, studies have shown that CDC25C primarily regulates the cell cycle and key metabolic pathways, such as glycolysis and oxidative phosphorylation, thereby influencing the proliferation and progression of tumor cells52. In our study, based on LRGs, CDC25C was identified as a prognostic gene for LUAD, illustrating the correlation between LRGs and CDC25C, with results consistent with previous findings. Future research could further validate the prognostic role of CDC25C in LUAD using larger clinical sample sizes and explore in greater detail the specific mechanisms of the CDC25C-mediated signaling pathways in the occurrence and progression of LUAD.
GRIA1, which encodes the alpha-amino-3-hydroxy-5-methyl-4-isoxazolepropionic acid (AMPA) receptor subunit 1, is traditionally known for its role in excitatory neurotransmission within the central nervous system. Recent studies, however, have implicated GRIA1 in the oncogenic processes of lung cancer. Aberrant expression of GRIA1 has been observed in NSCLC tissues, where it may influence cellular proliferation, apoptosis, and migration through modulation of glutamatergic signaling pathways53. Elevated GRIA1 levels have been associated with poorer prognosis in LUAD patients, suggesting its potential as a prognostic biomarker. Additionally, GRIA1-mediated pathways might offer novel therapeutic targets, as inhibition of GRIA1 activity has shown promise in reducing tumor growth and metastasis in the preclinical model54. Certainly, recent studies have shown that GRIA1 is a prognostic gene for LUAD associated with overall survival rates55. Understanding the mechanistic role of GRIA1 in lung cancer progression could thus provide valuable insights for the development of targeted interventions aimed at improving patient outcomes.
GPRIN1 (G-protein-regulated inducer of neurite outgrowth 1) has been identified as a potential oncogene in lung cancer, particularly in NSCLC. Studies have demonstrated that GPRIN1 is overexpressed in NSCLC tissues compared to normal lung tissues, and this overexpression correlates with poorer overall survival rates. Functional assays indicate that GPRIN1 promotes NSCLC cell proliferation, migration, and invasion, potentially by facilitating epithelial-mesenchymal transition (EMT) processes56. In addition, GPRIN1 may interact with various cancer-related signaling pathways (such as the cGMP-PKG signaling pathway and Rap1 signaling pathway) and cellular processes (such as the cell cycle and DNA damage response) through m6A modification, thereby influencing the onset, progression, and prognosis of LUAD57. These findings suggest that GPRIN1 could serve as a prognostic biomarker and a therapeutic target in LUAD management.
Brevican (BCAN) is a chondroitin sulfate proteoglycan predominantly found in the central nervous system, but recent studies suggest its potential involvement in the progression of various cancers, including lung cancer. BCAN is known to interact with extracellular matrix components, influencing cell adhesion, migration, and invasion, which are critical processes in tumor metastasis. In lung cancer, elevated BCAN expression has been associated with increased tumor cell proliferation and enhanced metastatic potential, particularly in NSCLC58. Furthermore, BCAN’s role in regulating the tumor microenvironment and its potential to affect cell-cell interactions makes it an interesting candidate for investigation as both a diagnostic marker and therapeutic target. Furthermore, BCAN, within the context of lactate and hypoxic metabolism, may influence the biological behavior and prognosis of LUAD by regulating pathways associated with the cell cycle and metabolism. Additionally, it provides potential targets for personalized treatment strategies59. Further studies are needed to fully elucidate BCAN’s specific role in LUAD pathogenesis and to assess its clinical relevance.
Insulin-like growth factor 2 mRNA-binding protein 1 (IGF2BP1) is an RNA-binding protein that plays a critical role in post-transcriptional regulation by stabilizing mRNAs and promoting their translation. Recent studies have highlighted the significant involvement of IGF2BP1 in cancer biology, including lung cancer. In NSCLC, elevated expression of IGF2BP1 has been associated with increased tumor cell proliferation, migration, and resistance to apoptosis, all of which contribute to cancer progression and metastasis60. IGF2BP1 promotes the stability and translation of mRNAs involved in cell cycle regulation and metastasis, such as c-Myc and TGF-β. High levels of IGF2BP1 in lung cancer tissues have been linked to poor prognosis, suggesting that IGF2BP1 could serve as a valuable biomarker for LUAD61. Additionally, studies have shown that IGF2BP1 plays a crucial role in the proliferation-promoting process induced by CDCA4. It may influence the onset and progression of LUAD through its interaction with the PI3K/AKT signaling pathway. Furthermore, targeting IGF2BP1-mediated pathways might offer a potential therapeutic strategy to suppress tumor growth and metastasis in LUAD.
Anillin (ANLN) is an actin-binding protein involved in regulating cytokinesis and maintaining cell shape and integrity. In the context of lung cancer, particularly NSCLC, ANLN has emerged as a potential oncogene. Overexpression of ANLN has been linked to enhanced cell proliferation, migration, and invasion, which are hallmarks of cancer progression. Studies have shown that high ANLN expression correlates with poor prognosis in lung cancer patients, suggesting that ANLN plays a critical role in tumor aggressiveness and metastasis62. Mechanistically, ANLN interacts with the actin cytoskeleton and is involved in regulating key signaling pathways, including those related to cell cycle progression and EMT63. Targeting ANLN may thus provide a novel therapeutic approach for inhibiting tumor progression in LUAD. Given its association with poor survival outcomes, ANLN could also serve as a valuable biomarker for predicting disease prognosis in LUAD patients. Future studies could delve deeper into the specific mechanisms of ANLN in LUAD, particularly focusing on its relationship with the cell cycle and EMT signaling pathways.
Keratin 6 A (KRT6A) is a type II keratin protein that plays a crucial role in maintaining the structural integrity of epithelial cells. In the context of NSCLC, KRT6A has been identified as a potential biomarker associated with tumor progression. Elevated expression levels of KRT6A have been observed in NSCLC tissues compared to normal lung tissues, suggesting its involvement in the malignant transformation of lung epithelial cells64. Functionally, KRT6A contributes to cellular processes such as proliferation, migration, and invasion, which are essential for cancer metastasis. And, KRT6A expression has been linked to poor prognosis in LUAD patients, indicating its potential as a prognostic marker65. Research has found that KRT6A may regulate tumor progression in LUAD by influencing signaling pathways associated with FA metabolism66. These findings highlight the importance of KRT6A in LUAD pathogenesis and underscore its potential as a therapeutic target for intervention strategies aimed at inhibiting tumor progression.
The immune system is closely linked to the onset and progression of LUAD. Our immune infiltration analysis indicates significant associations between immune cells, and the prognostic genes are also closely related to differential immune cell infiltration. Studies suggest that CDC25C regulates tumor cell cycles and is associated with immune microenvironment modulation and the efficacy of immunotherapy67. Additionally, the expression of GPRIN1 is tightly connected to the tumor immune microenvironment, immune cell infiltration, immune markers, and immune checkpoint regulation, which may play a crucial role in immune evasion, immune therapy response, and prognosis in kidney renal papillary cell carcinoma (KIRP) and LUAD68. In the study by Changrui Qian et al., GRIA1 was found to be closely associated with immune responses, immune checkpoint expression, and the activation of tumor signaling pathways in LUAD, indicating its potential role in the immune microenvironment of LUAD and its possible utility as a target for immunotherapy69. Furthermore, BCAN may significantly affect the immune microenvironment, prognosis, and treatment response in LUAD by regulating lactate metabolism, hypoxic responses, and immune cell infiltration59. Research by JinFeng Liu et al. suggests that IGF2BP1, through its regulation of immune cell infiltration, immune checkpoint expression, and TMB, could play a key role in tumor immune evasion, immune therapy responses, and prognosis70. Another study indicates that ANLN may influence immune cell infiltration and TMB, potentially playing an important role in the immune microenvironment and treatment responses in LUAD71. Additionally, KRT6A, as an early prognostic gene, may contribute to immune evasion, tumor development, and treatment responses in LUAD through its regulation of the immune microenvironment and hypoxic responses72. These findings suggest that prognostic genes may influence immune cell infiltration and alter the immune microenvironment, subsequently affecting LUAD prognosis and therapeutic outcomes.
Next, based on the single-cell data, GESA enrichment analysis inferred functional pathways associated with 7 prognostic genes. Prognostic genes were mainly enriched to translation initiation, pathogen SARS CoV 2 Nsp1 to translation initiation, nt06530 PI3K signaling, presentation by MHC class II molecules and nt06515 Regulation of kinetochore-microtubule interactions and other related pathways. This result provided a theoretical basis for the exploration of further mechanisms of action.
This study has several limitations. Firstly, the sample distribution across regions, ethnicities, and genders was uneven, which may affect the generalizability and applicability of the results. To reduce bias, future studies should aim to select more diverse samples or use standardized procedures to balance regional and demographic distributions. Secondly, in the bioinformatics analysis, the data may be influenced by batch effects and measurement errors. Additionally, the precision of the algorithms and the validity of the model assumptions could limit the accuracy of the analytical results. Future studies can improve data quality and analytical precision by implementing stringent quality control, removing low-quality data, and employing validated algorithms and multiple analytical approaches. There are differences in the clinical and baseline characteristics of patients between the training and validation cohorts, which may introduce validation bias. Future studies should aim to collect larger-scale and more standardized datasets to validate the consistency of the results and enhance the reliability of the findings. Finally, although statistical models such as Lasso regression and Cox regression were used in the data analysis, these models might overlook potential complex biological factors or nonlinear relationships. Therefore, supplementary qualitative analyses could help address this limitation. Future research could further validate the assumptions of the statistical models through cell experiments or gene knockout studies, providing additional support for the research findings.
Conclusion
This study identified 7 prognosis-related genes associated with LUAD and established an effective risk model to predict patient survival probability. Additionally, significant differences were observed between the high-risk and low-risk groups in terms of clinical features, immune status, and drug sensitivity. Single-cell RNA sequencing analysis further revealed elevated expression of prognostic genes such as GRIA1 and BCAN in fibroblasts and mast cells. Moreover, RT-qPCR analysis demonstrated significantly increased expression levels of CDC25C, GPRIN1, GRIA1, BCAN, IGF2BP1, and ANLN in LUAD tissues. These findings suggest that these prognostic genes have the potential for diagnosing LUAD and may play key roles in disease progression. This research provides valuable insights for the prognosis and personalized treatment of LUAD patients. However, the clinical applicability and effectiveness of the model still require validation through comprehensive trials. Future studies should focus on clinical validation, exploring its potential in targeted therapy, personalized treatment plans, and early detection, as well as investigating the molecular mechanisms of these genes in LUAD to further improve the prognostic model and enhance its predictive ability.
Data availability
The TCGA data presented in this study are openly available in the GDC data portal (https://portal.gdc.cancer.gov/). GSE31210 (GPL570) and GSE131907 (GPL16791) were obtained from the GEO database (http://www.ncbi.nlm.nih.gov/geo/). Further information is available from the corresponding author upon request.
Abbreviations
- LLPS:
-
Liquid–liquid phase separation
- LRGs:
-
Liquid–liquid phase separation related genes
- LUAD:
-
Lung adenocarcinoma
- TCGA:
-
The cancer genome atlas
- GEO:
-
Gene Expression omnibus
- DEGs:
-
Differentially expressed genes
- GO:
-
Gene ontology
- BP:
-
Biological process
- CC:
-
Cellular component
- MF:
-
Molecular function
- PPI:
-
Protein-protein interaction
- KEGG:
-
Kyoto encyclopedia of genes and genomes
- Cox :
-
Proportional hazards model
- LASSO:
-
Least absolute shrinkage and selection operator
- K–M:
-
Kaplan–Meier
- OS:
-
Overall survival
- ROS:
-
Reactive oxygen species
- COR:
-
Correlation coefficient
- TFs:
-
Transcription factors
- TMB:
-
Tumor mutation burden
- GSEA:
-
Gene set enrichment analyses
- qRT-PCR:
-
Quantitative reverse transcription Polymerase Chain Reaction
- IC50:
-
50% Inhibiting concentration
- CDC25C :
-
Cell division cycle 25C
- CDK1:
-
Cyclin-dependent kinase 1
- PUF60:
-
Poly(U)-binding splicing factor 60
- GRIA1 :
-
Glutamate receptor 1
- AMPA:
-
Alpha-amino-3-hydroxy-5-methyl-4-isoxazolepropionic acid
- GPRIN1 :
-
G-protein-regulated inducer of neurite outgrowth 1
- EMT:
-
Epithelial-mesenchymal transition
- BCAN :
-
Brevican
- IGF2BP1 :
-
Insulin-like growth factor 2 mRNA-binding protein 1
- ANLN :
-
Anillin
- KRT6A :
-
Keratin 6A
References
Travis, W. D., Brambilla, E., Burke, A. P., Marx, A. & Nicholson, A. G. Introduction to the 2015 world health organization classification of tumors of the lung, pleura, thymus, and heart. J. Thorac. Oncol. 10(9), 1240–1242 (2015).
Herbst, R. S., Morgensztern, D. & Boshoff, C. The biology and management of non-small cell lung cancer. Nature 553(7689), 446–454 (2018).
Lemmon, M. A., Schlessinger, J. & Ferguson, K. M. The EGFR family: Not so prototypical receptor tyrosine kinases. Cold Spring Harb. Perspect. Biol. 6(4), a020768 (2014).
Topalian, S. L. et al. Safety, activity, and immune correlates of anti-PD-1 antibody in cancer. N Engl. J. Med. 366(26), 2443–2454 (2012).
Wheeler, D. L., Dunn, E. F. & Harari, P. M. Understanding resistance to EGFR inhibitors-impact on future treatment strategies. Nat. Rev. Clin. Oncol. 7(9), 493–507 (2010).
Sharma, P., Hu-Lieskovan, S., Wargo, J. A. & Ribas, A. Primary, adaptive, and acquired resistance to cancer immunotherapy. Cell 168(4), 707–723 (2017).
Liu, Q. et al. Glycogen accumulation and phase separation drives liver tumor initiation. Cell 184(22), 5559–5576 e5519 (2021).
Brangwynne, C. P. et al. Germline P granules are liquid droplets that localize by controlled dissolution/condensation. Science 324(5935), 1729–1732 (2009).
Lin, C. C. et al. Receptor tyrosine kinases regulate signal transduction through a liquid-liquid phase separated state. Mol. Cell. 82(6), 1089–1106 (2022). e1012.
Ahn, J. H. et al. Phase separation drives aberrant chromatin looping and cancer development. Nature 595(7868), 591–595 (2021).
Qiu, Y., Pan, M. & Chen, X. A liquid-liquid phase separation-related gene signature as prognostic biomarker for epithelial ovarian cancer. Front. Oncol. 11, 671892 (2021).
Zhang, Y. et al. Systematic analysis of molecular characterization and clinical relevance of liquid-liquid phase separation regulators in digestive system neoplasms. Front. Cell. Dev. Biol. 9, 820174 (2021).
Zheng, J., Wu, Z., Qiu, Y., Wang, X. & Jiang, X. An integrative multi-omics analysis based on liquid-liquid phase separation delineates distinct subtypes of lower-grade glioma and identifies a prognostic signature. J. Transl Med. 20 (1), 55 (2022).
Tang, Q. et al. Liquid-liquid phase separation-related gene in gliomas: FABP5 is a potential prognostic marker. J. Gene Med. 25(10), e3517 (2023).
Wu, X. H. et al. Liquid-liquid phase separation-related genes associated with prognosis, tumor microenvironment characteristics, and tumor cell features in bladder cancer. Clin. Transl. Oncol. (2024).
Ning, W. et al. DrLLPS: A data resource of liquid–liquid phase separation in eukaryotes. Nucleic Acids Res. 48(D1), D288–D295 (2020).
Love, M. I., Huber, W. & Anders, S. Moderated Estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15(12) (2014).
Wang, J. et al. PROS1 shapes the immune-suppressive tumor microenvironment and predicts poor prognosis in glioma. Front. Immunol. 13. (2023).
Gu, Z. Complex heatmap visualization. iMeta 1(3) (2022).
Wu, X., Sui, Z., Zhang, H., Wang, Y. & Yu, Z. Integrated analysis of lncRNA–Mediated CeRNA network in lung adenocarcinoma. Front. Oncol. 10 (2020).
Wu, T. et al. ClusterProfiler 4.0: A universal enrichment tool for interpreting omics data. The Innov. 2(3) (2021).
Doncheva, N. T. et al. Cytoscape StringApp 2.0: Analysis and visualization of heterogeneous biological networks. J. Proteome Res. 22(2), 637–646 (2022).
In, J. & Lee, D. K. Survival analysis: Part II—applied clinical data analysis. Korean J. Anesthesiol. 76(1), 84–85 (2023).
Friedman, J., Hastie, T. & Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1–22 (2010).
Shi, Y. et al. Crosstalk of ferroptosis regulators and tumor immunity in pancreatic adenocarcinoma: novel perspective to mRNA vaccines and personalized immunotherapy. Apoptosis 28(9–10), 1423–1435 (2023).
Zheng, Y. et al. Global characterization of immune infiltration in clear cell renal cell carcinoma. Onco Targets Ther. 14, 2085–2100 (2021).
Li, L. et al. Comparative efficacy of Honghua class injections for treating acute ischemic stroke: A bayesian network meta-analysis of randomized controlled trials. Front. Pharmacol. 13 (2022).
Liu, T. T. et al. Identification of CDK2-Related immune forecast model and CeRNA in lung adenocarcinoma, a Pan-Cancer analysis. Front. Cell. Dev. Biol. 9, 682002 (2021).
Zhao, L. et al. Understanding decision curve analysis in clinical prediction model research. Postgrad. Med. J. 100(1185), 512–515 (2024).
Mayakonda, A., Lin, D. C., Assenov, Y., Plass, C. & Koeffler, H. P. Maftools: Efficient and comprehensive analysis of somatic variants in cancer. Genome Res. 28(11), 1747–1756 (2018).
Wipfli, H. & Samet, J. M. One hundred years in the making: The global tobacco epidemic. Annu. Rev. Public. Health. 37, 149–166 (2016).
Wang, L. et al. Cuproptosis related genes associated with Jab1 shapes tumor microenvironment and Pharmacological profile in nasopharyngeal carcinoma. Front. Immunol. 13, 989286 (2022).
Zeng, D. et al. IOBR: Multi-omics immuno-oncology biological research to decode tumor microenvironment and signatures. Front. Immunol. 12, 687975 (2021).
Liu, J. et al. Liquid-liquid phase separation throws novel insights into treatment strategies for skin cutaneous melanoma. BMC Cancer. 23 (1), 388 (2023).
Kassambara, A. Pipe-Friendly Framework for Basic Statistical Tests [R package rstatix version 0.6.0]. (2020).
Wang, L., Shen, J., Wang, Y. & Bi, J. Identification of fatty acid metabolism-based molecular subtypes and prognostic signature to predict immune landscape and guide clinical drug treatment in renal clear cell carcinoma. Int. Immunopharmacol. 116, 109735 (2023).
Wu, J. et al. A risk model developed based on tumor microenvironment predicts overall survival and associates with tumor immunity of patients with lung adenocarcinoma. Oncogene 40(26), 4413–4424 (2021).
Geeleher, P., Cox, N. & Huang, R. S. pRRophetic: An R package for prediction of clinical chemotherapeutic response from tumor gene expression levels. PLoS One 9(9), e107468 (2014).
Hao, Y. et al. Integrated analysis of multimodal single-cell data. Cell 184(13), 3573–3587 (2021). e3529.
Kim, N. et al. Single-cell RNA sequencing demonstrates the molecular and cellular reprogramming of metastatic lung adenocarcinoma. Nat. Commun. 11(1), 2285 (2020).
Luo, J., Deng, M., Zhang, X. & Sun, X. ESICCC as a systematic computational framework for evaluation, selection, and integration of cell-cell communication inference methods. Genome Res. 33(10), 1788–1805 (2023).
Kanehisa, M. Toward understanding the origin and evolution of cellular organisms. Protein Sci. 28(11), 1947–1951 (2019).
Kanehisa, M., Furumichi, M., Sato, Y., Kawashima, M. & Ishiguro-Watanabe, M. KEGG for taxonomy-based analysis of pathways and genomes. Nucleic Acids Res. 51(D1), D587–D592 (2023).
Kanehisa, M. & Goto, S. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28(1), 27–30 (2000).
Mehta, S. & Zhang, J. Liquid-liquid phase separation drives cellular function and dysfunction in cancer. Nat. Rev. Cancer 22(4), 239–252 (2022).
Liu, Z. et al. Liquid–liquid phase separation: roles and implications in future cancer treatment. Int. J. Biol. Sci. 19(13), 4139–4156 (2023).
Igelmann, S., Lessard, F. & Ferbeyre, G. Liquid-Liquid phase separation in cancer signaling, metabolism and anticancer therapy. Cancers (Basel) 14(7) (2022).
Wang, Q., Sun, N., Li, J., Huang, F. & Zhang, Z. Liquid-Liquid phase separation in the prognosis of lung adenocarcinoma: An integrated analysis. Curr. Cancer Drug Targets (2024).
Chen, Y., Huang, C. & Wei, W. Establishment of liquid-liquid phase separation-related prognostic model in lung adenocarcinoma and systematic analysis of its clinical significance. Int J. Biol. Markers 3936155241310887 (2025).
Xu, N. et al. PUF60 promotes cell cycle and lung cancer progression by regulating alternative splicing of CDC25C. Cell. Rep. 42 (9), 113041 (2023).
Stern, E. et al. CDC25C protein expression correlates with tumor differentiation and clinical outcomes in lung adenocarcinoma. Biomedicines 11(2). (2023).
Liu, J., Xia, W., Xue, F. & Xu, C. Exploring a new signature for lung adenocarcinoma: analyzing cuproptosis-related genes through integrated single-cell and bulk RNA sequencing. Discov Oncol. 15(1), 508 (2024).
Lotsberg, M. L. et al. AXL targeting abrogates autophagic flux and induces Immunogenic cell death in drug-resistant cancer cells. J. Thorac. Oncol. 15(6), 973–999 (2020).
Rossi, U. G. & Rossi, G. A. Pulmonary anatomy, pathology and treatment: what will be the artificial intelligence and deep-learning role on them? Lung Cancer 159, 181–182 (2021).
Qiu, C. et al. Stratification of lung adenocarcinoma patients based on in Silico and immunohistochemistry analyses of oxidative Stress-Related genes. Cancer Biother. Radiopharm. (2024).
Wang, Y. et al. Potential immune biomarker candidates and immune subtypes of lung adenocarcinoma for developing mRNA vaccines. Front. Immunol. 12, 755401 (2021).
Mao, W. et al. Comprehensive analysis of the Transcriptome-wide m6A methylome in lung adenocarcinoma by MeRIP sequencing. Front. Oncol. 12, 791332 (2022).
Zhu, X. et al. Prognostic roles of a novel basement membranes-related gene signature in lung adenocarcinoma. Front. Genet. 14, 1100560 (2023).
Sun, J. et al. Identification of a combined hypoxia and lactate metabolism prognostic signature in lung adenocarcinoma. BMC Pulm Med. 24(1), 323 (2024).
Sun, M., Wang, L., Ge, L., Xu, D. & Zhang, R. IGF2BP1 facilitates non-small cell lung cancer progression by regulating the KIF2A-mediated Wnt/beta-catenin pathway. Funct. Integr. Genomics 24(1), 4 (2023).
Gong, L., Liu, Q., Jia, M. & Sun, X. Systematic analysis of IGF2BP family members in non-small-cell lung cancer. Hum. Genomics 18(1), 63 (2024).
Long, X., Zhou, W., Wang, Y. & Liu, S. Prognostic significance of ANLN in lung adenocarcinoma. Oncol. Lett. 16(2), 1835–1840 (2018).
Suzuki, C. et al. ANLN plays a critical role in human lung carcinogenesis through the activation of RHOA and by involvement in the phosphoinositide 3-kinase/AKT pathway. Cancer Res. (24), 11314–11325 (2005).
Zhou, J. et al. Identification of SRXN1 and KRT6A as key genes in smoking-related non-small-cell lung cancer through bioinformatics and functional analyses. Front. Oncol. 11, 810301 (2021).
Che, D. et al. KRT6A promotes lung cancer cell growth and invasion through MYC-Regulated Pentose phosphate pathway. Front. Cell. Dev. Biol. 9, 694071 (2021).
Wang, Y. et al. Identification of fatty acid-related subtypes, the establishment of a prognostic signature, and immune infiltration characteristics in lung adenocarcinoma. Aging (Albany NY). 15(10), 4202–4235 (2023).
Zhang, W. et al. CDC25C as a predictive biomarker for immune checkpoint inhibitors in patients with lung adenocarcinoma. Front. Oncol. 12, 867788 (2022).
Zhou, Q. et al. A novel lncRNA-miRNA-mRNA competing endogenous RNA regulatory network in lung adenocarcinoma and kidney renal papillary cell carcinoma. Thorac. Cancer. 12 (19), 2526–2536 (2021).
Qian, C. et al. Vesicle-mediated transport-related genes are prognostic predictors and are associated with tumor immunity in lung adenocarcinoma. Front. Immunol. 13 (2022).
Liu, J., Li, Z., Cheang, I., Li, J. & Zhou, C. RNA-binding protein IGF2BP1 associated with prognosis and immunotherapy response in lung adenocarcinoma. Front. Genet. 13, 777399 (2022).
Ting, Z. et al. LncRNA CERS6-AS1 upregulates the expression of ANLN by sponging miR-424-5p to promote the progression and drug resistance of lung adenocarcinoma. Noncoding RNA Res. 9(1), 221–235 (2024).
Sun, K. et al. Regulation of early diagnosis and prognostic markers of lung adenocarcinoma in immunity and hypoxia. Sci. Rep. 13(1), 6459 (2023).
Acknowledgements
We thank Professor Renquan Zhang for his excellent technical support. We would also like to thank the First Affiliated Hospital of Anhui Medical University, the Department of Thoracic Surgery, Hefei, Anhui, China, and all the reviewers for their hard work.
Funding
This study was supported by grants from Natural Science Foundation of Anhui Medical Universities (KJ2019ZD22); Natural Science Foundation of Anhui Province (1808085QH271).
Author information
Authors and Affiliations
Contributions
H.L.W., Q.Z. and J.X.Tparticipated in conception of the idea and writing of the manuscript. X.C. conducted the data collection. Y.W.L. performed the data analysis. H.L.W. and R.Q.Z. wrote the article.All authors contributed to the article and approved the submitted version.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Ethical approval and consent to participate
Data retrieved from the GEO and controlled-access databases were collected. Lung adenocarcinoma tissues from patients who provided informed consent must be based on the guidelines laid out by the GEO Ethics, Law, and Policy Group.
Consent for publication
After reviewing the manuscript, all authors agreed to its publication in the current form.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Wang, H., Zhang, Q., Liu, Y. et al. Identification of prognostic genes associated with phase separation in lung adenocarcinoma and construction of prognostic models. Sci Rep 15, 34547 (2025). https://doi.org/10.1038/s41598-025-17884-4
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-17884-4










