Abstract
Cancer-associated fibroblasts (CAFs) play important roles in the progression of lung adenocarcinoma (LUAD). We examined CAF subgroups via gene ontology, pseudo-time, and cell communication analyses and explored their prognostic value in LUAD using a digital cytometric machine learning algorithm. Next, we got a prognostic model based on CAF subgroups. We also screened potential therapeutic target genes in LUAD and experimentally validated the proliferation, migration, and invasion phenotypes related to these target genes. We identified myofibroblastic CAFs (MyCAFs) and Immune-related CAFs (ImmCAFs) as the major CAF subgroups in LUAD. Further, our inverse convolution algorithm showed that MyCAFs have prognostic potential in LUAD, and via LASSO-COX model regression, we obtained a MyCAFs-related prognostic model. We found POSTN as a potential therapeutic target in LUAD. These findings serve as a foundation for further studies on CAFs.
Similar content being viewed by others
Introduction
Lung cancer, with a 5-year survival rate of approximately 15%, is one of the deadliest cancers in the world1,2. It is classified as non-small cell lung cancer and small cell lung cancer. Specifically, lung adenocarcinoma (LUAD) is the most common histologic subtype of non-small cell lung cancer3, and its development is associated with genetic as well as environmental factors4. In recent years, the number of patients with LUAD has increased5. In addition to the elaboration of the theory of the “hallmarks of cancer”6, several studies have been conducted to explore cancer at the molecular level1, and biomarkers, such as KRAS have been identified7. Unfortunately, owing to the complexity of LUAD progression, most of its associated molecular mechanisms remain unclear. Further, early LUAD diagnosis and evaluation remain a major challenge for physicians. Thus, extensively exploring the molecular mechanisms underlying LUAD progression and identifying potential biomarkers are of important biological and clinical significance.
Cancer-associated fibroblasts (CAFs) play important roles in tumor invasion and metastasis and are important components of the tumor microenvironment8. Reportedly, CAFs have the ability to synthesize and secrete extracellular matrix and participate in multiple biological processes, such as angiogenesis, epithelial-mesenchymal transition, and extracellular matrix remodeling via cellular communication9. An increasing number of studies have shown that targeting CAFs can reduce their infiltration into the tumor microenvironment and exert a positive effect on the clinical treatment of cancer10,11,12. Some CAF subgroups have been described, e.g., myofibroblastic CAFs (MyCAFs) and inflammatory CAFs (InCAFs) in non-small cell lung cancer as well as pancreatic, breast, and colorectal cancers13,14,15,16. Vascular, cycling, and developmental CAFs in breast cancer have also been described14. However, owing to the highly heterogeneous characteristics of CAFs9, the roles of different CAF subgroups in different tumor types remain unknown. Moreover, studies on CAF subgroups in LUAD are limited. Therefore, further studies are necessary to clarify the characteristics of different CAFs subgroups in LUAD.
With the continuous advancement of sequencing technologies, single-cell RNA sequencing (scRNA-seq) has enabled the exploration of CAFs in LUAD. Relative to bulk RNA-seq, scRNA-seq enables the characterization of distinctive cellular subgroups and allows the exploration of cellular communication and cell trajectories17,18. Therefore, in this study, we performed scRNA-seq and bulk RNA-seq from the perspective of CAFs with the aim of identifying new therapeutic targets in LUAD. We hope that our findings will provide new insights for future basic clinical studies on LUAD therapy.
Methods and materials
Acquisition and organization of data sets
The scRNA-seq data used in this study were obtained from the GSE131907 dataset in the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/)19 derived from the GPL16791 platform. The GSE131907 dataset contains 208,506 scRNA-seq data from 44 patients with LUAD. For this study, we retained a total of 88,144 scRNA-seq data. Using the “Seurat” R package20, we screened cells based on the “min.features < 200” function and subsequently removed cells with mitochondrial gene expression levels above 20%. Using the NormalizeData and ScaleData functions, we processed the scRNA-seq data for quality control (Supplementary Table 1). Bulk RNA-Seq and somatic mutation data were obtained from the open data on LUAD in TCGA (https://cancergenome.nih.gov/). We converted fragments per kilobase million in TCGA into transcripts per kilobase million. After removing duplicates, 493 samples were obtained for subsequent analyses. The data used for model validation were obtained from the GSE42127 dataset in the GEO database21 derived from the GPL6884 platform. We used the “limma” R package22 to normalize the validation set. The technical flow diagram for this study is shown in Fig. 1. No ethical review was required for conducting this study.
Identification of CAFs
Using the “FindVariableFeatures” function in the “Seurat” R package20, we obtained 2000 highly expressed gene variants from preprocessed scRNA-seq data. Next, we performed principal component analysis (PCA) for dimensionality reduction. The principal components (PCs) were set to 30, and the “FindNeighbors” and “FindClusters” functions were used for cell clustering. The obtained cell subgroups were represented using uniform manifold approximation and projection (UMAP)23. The DCN, COL1A1, THY1 and COL1A2 are widely recognized as marker genes for fibroblasts19. Based on these marker genes, we obtained fibroblasts. After setting the PCs from 1 to 10 and setting the resolution to 0.5, we performed the “FindNeighbors” and the “FindClusters” functions for clustering the fibroblasts. The obtained fibroblast subgroups were represented using UMAP. The COL13A1+ matrix fibroblasts (Marker genes: NPNT and TCF21), the smooth muscle cells (Marker genes: MYH11 and ACTG2) and the COL14A1+ matrix fibroblasts (Marker genes: PI16 and MFAP5) were annotated according to the previously reported19. By comparing normal and tumor tissues, we obtained two major CAFs subgroups in LUAD. The POSTN and MMP11 are the marker genes of MyCAFs, so we labeled the CAFs subgroups with the high expression of these two genes as MyCAFs16. Since IGHG4 and IGHG1 are associated with immune activities, we labeled the subgroups of CAFs with the high expression of these two genes as Immune-related CAFs (ImmCAFs).
Description of MyCAFs and ImmCAFs
Using Benjamini-Hochberg-adjusted P < 0.01 and absolute fold change (log2FC) value > 1 as thresholds and the “FindAllMarkers” function in the “Seurat” R package, we screened genes with high expression levels in MyCAFs and ImmCAFs20. Next, using the “clusterProfiler” R package24, we performed Gene Ontology (GO) analysis on the screened MyCAFs and ImmCAFs genes and using the “monocle3” R package25,26,27, we explored the cellular trajectories between the MyCAFs and ImmCAFs. Furthermore, using the “Cellchat” R package28, we characterized the intercellular communication conditions between the MyCAFs and ImmCAFs.
Digital cytometry analysis
To evaluate the prognostic values of MyCAFs and ImmCAFs in LUAD, we employed digital cytometry, a machine-learning algorithm. This back-convolution method is analyzed based on the support vector regression (SVR) approach, which can obtain the cellular proportion of MyCAFs and ImmCAFs in each sample of bulk RNA-seq data. We used the expression profiles of seven fibroblast subgroups as reference sample matrices. Next, we generated a CIBERSORTx signature matrix using the CIBERSORTx online tool (https://cibersortx.stanford.edu/runcibersortx.php) set to default29. To obtain the relative abundances of individual fibroblast subgroups in TCGA, we used the generated matrix to analyze TCGA RNA-seq data. Setting type = “nu-regression” and kernel = “linear” in the “SVM” function, we performed SVR analysis. We obtained the cell proportions of MyCAFs and ImmCAFs in each sample of TCGA RNA-seq data. Then, to distinguish between the high- and low-risk groups, the median proportion of cells with ImmCAFs or MyCAFs was set as the threshold. Thus, we retained 479 LUAD samples from patients with survival time above 30 days in TCGA, and assessed the prognostic ability of the MyCAFs and ImmCAFs using R packages “survminer” and “survivor”.
Acquisition of MyCAFs-related DEGs
We obtained MyCAFs-related marker genes using the “FindAllMarkers” function and Benjamini-Hochberg-adjusted P < 0.01 and log2FC > 2 as thresholds. We examined the differences between normal and LUAD samples in TCGA and obtained DEGs using the “DESeq2” R package with Benjamini-Hochberg-adjusted P < 0.01 and absolute of log2FC value > 1 as thresholds. The MyCAFs-related DEGs used to construct the LUAD prognostic model were obtained by intersecting the DEGs with MyCAFs-related marker genes.
Construction and evaluation of the prognostic model
We retained 479 samples corresponding to patients with survival time > 30 days and analyzed the MyCAFs-related DEGs via LASSO-COX regression analysis using the “glmnet” R package. Setting “family” = cox and “maxit” = 10,000 in “glmnet” function, we trained the model. Setting “nfolds” = 5 and “type.measure” = ‘C’ in “cv.glmnet” function, we cross-validated the model with 5 folds and evaluated the model performance with C-index. We selected the number of model genes based on the condition that the error was within 1 standard deviation. Thereafter, we obtained 13 prognostic model genes and determined their correlation coefficients. Thus, we constructed a MyCAFs-related prognostic model. The formula used for the construction of the model was as follows:
Further, to distinguish between the high- and low-risk patients, we calculated the risk scores of the patients in the training set using the median risk score as the cut-off. We also performed survival analysis for patients at different risk levels using the “survminer” R package. Then, for the time-dependent ROC analysis for patients at different risk levels, we used the “timeROC” R package. Next, to evaluate our prognostic model, we performed a multivariate COX analysis considering the age and sex of the patients as well as the pathological stage of the disease. The GSE42127 dataset was used as the validation set for the model. We calculated the risk scores in the validation set and categorized the patients into high-risk or low-risk groups according to the median. The KM survival curves for the validation set were performed using the “survminer” R package, and the time-dependent ROC curves for the validation set were conducted using the “timeROC” R package.
Tumor mutation burden (TMB) analysis
Using the formula, TMB = (total number of mutations)/(exon length), we calculated tumor mutation loads. Using the “maftools” R package, we presented the somatic mutation landscapes of the different risk score groups as waterfall plots. We also performed Pearson correlation analysis to explore the association between risk scores and somatic mutation status.
Protein–protein interaction (PPI) network analysis
Model genes were obtained using the STRING (https://string-db.org/)30. Then, based on the medium confidence score (0.400), we obtained PPI results for the model genes. Next, using the CytoNCA plug-in in Cytoscape 3.9.0 software, we calculated betweenness (BC) topological parameters31 and plotted the PPI network of the hub model genes based on the magnitudes of the BC parameters.
Cell culture and transfection
LUAD cell lines (H1299, A549, SPC-A1, H1299, and H1650) and a normal epithelial cell line (BEAS-2B) purchased from the American Type Culture Collection (ATCC; Manassas, VA, USA) were cultured in Dulbecco’s Modified Eagle Medium (HyClone, Logan, UT, USA) or Eagle’s Minimum Essential Medium (Biosharp, Beijing, China) supplemented with 10% Fetal Bovine Serum (HyClone) and 1% antibiotics. The cells were cultured in a humidified thermostat incubator containing 5% CO2 at 37 °C. POSTN was silenced in the A549 and H1299 cells via transfection using Si-POSTN-1 (target sequence: 5'-GTGACAGTATAACAGTAAA-3') and Si-POSTN-2 (target sequence: 5'-CCAGGAGTTTTCATTGGAA-3'), which were both acquired from RiboBio (Guangzhou, China). The transfection was performed using Lipofectamine 3000 Transfection Reagent (Invitrogen, Carlsbad, CA, USA). Finally, stably transfected cells were selected for subsequent experiments after transfection efficiency was examined at the protein level.
Real-time quantitative PCR (RT-qPCR) analysis
Total RNA was isolated using the Total RNA Extraction Reagent kit (Vayzyme, Nanjing, China) and reverse-transcribed using the HiScript II Q RT SuperMix for kit (+ gDNA wiper; Vayzyme). The cDNA was used as a template for RT-qPCR, which was performed on a real-time fluorescence quantitative PCR instrument (Bio-Rad, Hercules, CA, USA) using ChamQ SYBR qPCR Master Mix (Vayzyme). The forward and reverse primer sequences for POSTN were: 5'-AGGATGTGACGGTGACAGTATAA-3′ and 5′-CAGGTGCCAGCAAAGTGTATTC-3′, respectively. GADPH, with forward and reverse primer sequences as follows, 5′-GGAGTCCACTGGCGTCTTCA-3′ and 5′-GTCATGAGTCCTTCCACGATACC-3′, respectively, was used as the internal reference. The gene expression levels were quantified using the 2−ΔΔCt method.
Western blot analysis
Total cell protein was extracted using Leigh RIPA Cell Lysate (Thermo Fisher Scientific, Waltham, MA, USA). Thereafter, protein concentrations were determined using a BCA Protein Assay Kit (Solarbio, Beijing, China). The proteins were separated using sodium dodecyl sulfate–polyacrylamide gel electrophoresis and transferred onto polyvinylidene fluoride (PVDF) membranes, which were then blocked using 5% skimmed milk. Next, the blocking solution was discarded and primary antibodies (anti-POSTN, anti-PCNA, anti-KI67, and anti-GADPH) were added for the overnight incubation of the PVDF membranes at 4 °C. The next day, the membranes were washed three times with TBST and further incubated with secondary antibodies. The secondary antibodies used included anti-rabbit IgG, horseradish peroxidase (HRP)-linked antibody, anti-mouse IgG, and HRP-linked antibody (Cell Signaling Technology, Beverly, MA, USA). Finally, to determine protein expression levels, we used a gel imaging system.
Cell counting kit-8 (CCK8) assay
CCK-8 assay was made using a CCK-8 solution (Vayzyme). Briefly, A549 and H1299 cells were inoculated into 96-well plates (1 × 104 cells/well) and cultured in an incubator containing 5% CO2 at 37 ℃. At 0, 24, 48, and 72 h, the 96-well plates were removed and the CCK-8 solution was added to each well followed by further incubation for 2 h. Absorbance was then measured at 450 nm using an ELISA enzyme labeling instrument (Bio-Rad, Hercules, CA, USA).
Wound healing assay
Equal number of pretreated A549 and H1299 cells were seeded into 64-well plates and grown till a monolayer was formed. Thereafter, the cell monolayer was scratched vertically using the tip of a sterile pipette tip. The cells were imaged immediately (0 h) or 24 h later using an inverted microscope (Jiangnan, Nanjing, China). Cell migration was defined as the difference in distance between scratches recorded between 0 and 24 h.
Transwell assay
Transwell migration assays were performed to explore the invasive and migratory phenotypes of the LUAD cells using Transwell chambers purchased from Corning (Corning, NY, USA). In brief, POSTN-silenced and control cells were resuspended separately in serum-free medium and subsequently inoculated into the upper chamber of the Transwell. After 24 h of incubation, the non-migrating cells in the upper chamber were removed by wiping and the cells that migrated to the outer membrane of the chamber were fixed with paraformaldehyde for 20 min. Images were then acquired using a microscope (Nikon, Tokyo, Japan) after staining using 10% crystal violet staining solution (Servicebio, Wuhan, China). For the Transwell invasion assay, Matrigel (ABW, Shanghai, China) was added to the upper chamber prior to cell inoculation.
Statistical analysis
Statistical analyses were performed using R software version 4.1.2 (R Core Development Team, Vienna, Austria) and GraphPad Prism software version 10.0 (Graphpad Inc., San Diego, CA, USA). The Wilcoxon method was used to compare variables. Next, variance chi-square analysis was performed to analyze differences between multiple groups. The Kaplan–Meier method and log-rank test were employed for survival analysis. Further, LASSO-COX regression and multivariate Cox methods were used for model construction and evaluation, respectively. Pearson’s correlation analysis was performed to evaluate correlations between variables, and the Benjamini–Hochberg test was used to adjust P values. Statistical significance was set at P > 0.05.
Results
Acquisition of CAFs
From the GSE131907 dataset, we obtained 88,144 scRNA-seq data from normal lung and LUAD tissues for subsequent analysis. After processing the data for quality control, the top 2000 differentially expressed genes (DEGs) obtained were used for downscaling and clustering. Specifically, clustering using UMAP plots led to the identification of 33 cell subgroups (Fig. 2A), and based on the relevant literature19, the six predominant cell subgroups were defined as follows: epithelial, T/NK, B, myeloid, MAST, and endothelial cell subgroups (Fig. 2B; Supplementary Table 1). Further, owing to the high expression levels of fibroblast marker genes, such as DCN, COL1A1, THY1, COL1A2 in subgroups 10 and 30, we defined these two groups as fibroblasts (Fig. 2C), and after the downgrading and clustering of these two fibroblast groups, nine fibroblast subgroups were obtained (Fig. 2D). Given the high expression level of inflammation-related genes, such as IL6 and CXCL2 in cluster 7, we defined this cluster as comprising InCAFs. Additionally, cluster 0 was defined as COL14A1+ matrix fibroblasts given the high expression levels of genes, such as PI16 and MFAP519, and clusters 1 and 6 were defined as COL13A1+ matrix fibroblasts owing to the high expression levels of genes such, as NPNT and TCF2119. Clusters 4 and 8 were defined as smooth muscle cells given the high expression levels of genes, such as MYH11 and ACTG219. Furthermore, clusters 3 and 5 were mainly derived from the LUAD samples (Fig. 2E), and given their association with LUAD, they were defined as CAFs and were further explored. In one study, cells with high POSTN and MMP11 expression levels were defined as MyCAFs16. Therefore, given the high expression levels of POSTN and MMP11 in cluster 5 and IGHG4 and IGHG1 (immune-related genes) in cluster 3, we defined these two clusters as MyCAFs and ImmCAFs, respectively (Fig. 2F).
CAFs subgroups in LUAD. (A) UMAP plot showing the distribution of 33 CAFs subgroups based on all cell types obtained via dimensionality reduction and clustering. (B) UMAP plot showing the distribution for 8 cell subgroups after cell annotation. (C) Marker genes for cell annotation. (D) UMAP plot showing the distribution of nine fibroblast subgroups obtained via dimensionality reduction and clustering. (E) UMAP plot showing the distribution of seven fibroblast subgroups after cell annotation. (F) Marker genes for different fibroblast subgroups. CAFs cancer-associated fibroblasts, UMAP uniform manifold approximation and projection.
Description of MyCAFs and ImmCAFs
GO analysis to explore the characteristics of ImmCAFs and MyCAFs showed that ImmCAFs were associated with DNA-binding transcription factor, heparin, glycosaminoglycan, sulfur compounds, immunoglobulin receptor, and antigen binding (Fig. 3A), while MyCAFs were associated with extracellular matrix structural constituents, collagen binding, glycosaminoglycan binding, extracellular matrix structural constituents that confer tensile strength and serine-type endopeptidase activity (Fig. 3B). These observations indicated a stronger association between MyCAFs and extracellular matrix remodeling than between ImmCAFs and extracellular matrix remodeling. Next, we performed cell trajectory analysis between the ImmCAFs and MyCAFs. Given the greater number of MyCAFs in LUAD than ImmCAFs (Fig. 3C), we used ImmCAFs as the starting point of the proposed time for the cell trajectory analysis (Fig. 3D). Thus, we observed that as the proposed time increased, the ImmCAFs gradually progressed to MyCAFs. Moreover, we observed that the trajectory within the MyCAFs showed circular intersections owing to the absence of any obvious changes as the proposed time changed (Fig. 3E). The distribution of the top 10 genes (ASPN, C3, C7, COL10A1, FN1, IGF1, IGFBP4, MGP, MMP11, SFRP1) showed significant changes during the progression of ImmCAFs to MyCAFs as shown in the UMAP plot in Fig. 3F. Additionally, the exploration of the roles of ImmCAFs and MyCAFs in cellular communication revealed that the WNT pathway was differentially expressed in ImmCAFs and MyCAFs. Specifically, in ImmCAFs, the WNT pathway was less important and mainly appeared as an influencer (Fig. 3G), while in MyCAFs, it played a more important role, acting as a sender, mediator, and influencer (Fig. 3H). Only the WNT3A pathway was found to be involved in cellular communication in ImmCAFs. Notably, the ImmCAFs only communicated with themselves and with epithelial cells (Fig. 3I). However, in MyCAFs, the WNT3A and WNT2 pathways were found to be involved in cellular communication, and the MyCAFs showed the ability to communicate with themselves as well as with B, T/NK, endothelial, epithelial, MAST, and myeloid cells (Fig. 3J). Thus, the WNT pathway was more active in MyCAFs, indicating that it may be more closely associated with LUAD deterioration.
Characterization of MyCAFs and ImmCAFs. (A) Results of GO enrichment analysis for ImmCAFs. The larger dot represented a higher number of enriched genes, and the redder color of the dot represented a more significant relationship. (B) Results of GO enrichment for MyCAFs. The larger dot represented a higher number of enriched genes, and the redder color of the dot represented a more significant relationship. (C) Histogram showing differences in the proportion of ImmCAFs and MyCAFs between tumor and normal tissue samples. (D) UMAP plot showing the selection of ImmCAFs as the starting point for the cell trajectory analysis. ImmCAFs were indicated by purple color and MyCAFs were indicated by orange color. (E) Progression from ImmCAFs to MyCAFs with changing proposed time for cell trajectory analysis. (F) Top altered genes during the transformation of ImmCAFs to MyCAFs. (G) WNT pathway in ImmCAFs. (H) WNT pathway in MyCAFs. (I) Specific ligand activities of the WNT pathway in ImmCAFs. The WNT3A pathway was found to be involved in cellular communication in ImmCAFs. The ImmCAFs communicated with themselves and with epithelial cells. (J) Specific ligand activities of the WNT pathway in MyCAFs. The WNT3A and WNT2 pathways were found to be involved in cellular communication in MyCAFs. The MyCAFs communicated with themselves as well as with B, T/NK, endothelial, epithelial, MAST, and myeloid cells. GO gene ontology, CAFs cancer-associated fibroblasts, MyCAFs myofibroblastic cancer-associated fibroblasts, ImmCAFs immune-related cancer-associated fibroblasts, UMAP uniform manifold approximation and projection. P < 0.05 denotes statistically significant differences.
Exploring the prognostic capabilities of MyCAFs and ImmCAFs
Using digital cytometry, we explored whether ImmCAFs and MyCAFs have prognostic value in LUAD. Using a digital cytometry reverse convolution machine learning algorithm, we obtained the relative abundances of the seven fibroblast subtypes (Supplementary Fig. 1A; Supplementary Table 2). Thereafter, we grouped the LUAD samples based on the median relative abundance of the ImmCAFs and determined the grouping threshold using a density distribution plot (Fig. 4A). By performing survival analysis, we observed no significant prognostic difference between the high- and low-ImmCAF groups (P = 6.304e−01) (Fig. 4B). Thus, there may be little association between ImmCAFs and LUAD prognosis. Similarly, using the median relative abundance of MyCAFs, we grouped the LUAD samples and used density distribution to determine the grouping threshold of MyCAFs (Fig. 4C). Survival analysis performed thereafter revealed a significant prognostic difference between the high- and low-MyCAFs groups (P = 2.206e-02) (Fig. 4D). Specifically, MyCAFs showed a strong association with a poor LAUD prognosis. Thus, they were used to construct the LUAD prognostic model.
Prognostic ability of MyCAFs and ImmCAFs explored via digital cytometry. (A) Grouping of LUAD samples based on the median relative abundance of ImmCAFs. (B) Exploration of the prognostic value of ImmCAFs via survival analysis. (C) Grouping of LUAD samples based on the median relative abundance of MyCAFs. (D) Exploration of the prognostic value of MyCAFs in LUAD via survival analysis. TCGA The Cancer Genome Atlas, LUAD lung adenocarcinoma, MyCAFs myofibroblastic cancer-associated fibroblasts, ImmCAFs immune-related cancer-associated fibroblasts. P < 0.05, statistically significant differences.
Prognostic model construction based on MyCAFs
Digital cytometry showed that MyCAFs were strongly associated with a poor LUAD prognosis. Thus, we constructed a MyCAFs-based LUAD prognostic model. Thereafter, difference analysis between normal and LUAD samples in TCGA with setting the absolute Benjamini–Hochberg-adjusted P < 0.01 and Fold change (log2FC) value > 1 as thresholds showed 13,899 DEGs (Supplementary Table 3). Further, we identified the DEGs using a volcano plot (Fig. 5A), and using the “FindAllMarkers” function of the “Seurat” R package, we obtained 64 MyCAFs-related genes (Supplementary Table 4) with Benjamini-Hochberg-adjusted P < 0.01 and Fold change (log2FC) value > 2 as thresholds. Then, by intersecting the 13,899 DEGs with the 64 MyCAFs-related genes, we identified 28 MyCAFs-related DEGs (Fig. 5B) (Supplementary Table 5). Next, by performing LASSO-COX regression analysis of the 28 MyCAFs-related DEGs COL1A2, COL5A2, COMP, CTHRC1, DCN, HTRA3, IGFBP3, MFAP4, MYL9, POSTN, RARRES2, SERPINH1, and TIMP1 were identified as the prognosis-related genes in the model (Fig. 5C,D). Using the expression levels and corresponding coefficients of these prognostic model genes, we constructed a prognostic model related to MyCAFs for LUAD. The specific model equations were as follows:
Construction of a prognostic model associated with MyCAFs. (A) The 13,899 DEGs between normal and LUAD samples in TCGA database based on P < 0.05 and Fold change (log2FC) > 1. There were 3193 up-regulated DEGs, indicated by red dots. There were 10,706 down-regulated DEGs genes, indicated by blue dots. (B) There were 13,899 DEGs and 64 MyCAFs-related genes. Venn diagram showing 28 MyCAFs-related DEGs obtained by intersecting DEGs with MyCAFs-related genes. (C) LASSO coefficient curves for the 28 MyCAFs-related DEGs. (D) Model constructed using the largest λ value (λ = 13) of the mean square error within the standard error. LASSO least absolute shrinkage, and selection operator, TCGA The Cancer Genome Atlas, LUAD lung adenocarcinoma, MyCAFs myofibroblastic cancer-associated fibroblasts, DEGs differentially expressed genes. P < 0.05 denotes statistically significant differences.
Assessment and validation of the LUAD prognostic model
LUAD data from the TCGA database were used as the training set. Based on the median risk score of the patients, they were divided into high- and low-risk groups (Fig. 6A). The distribution of the survival and death rates of the patients with the different risk scores is shown in Fig. 6B. Further, survival curve analysis showed that the prognosis of patients in the high-risk group in the training set was significantly worse than that of the patients in the low-risk group (P = 3.373e−05) (Fig. 6C). Time-dependent ROC analysis showed that the 1-, 2-, and 3-year AUC for patients in the training set were 0.715, 0.691, and 0.691, respectively (Fig. 6D). Combining the risk score obtained using this model with clinical factors, such as age, sex, and pathologic status of disease via a multivariate COX analysis showed that the risk score had independent prognostic power (hazard ratio (HR) = 4.485; 95% confidence interval (CI) [2.858, 7.040], P < 0.001) (Fig. 6E). LUAD data from the GEO database was used as the validation set. Based on the median risk score, we divided the validation set into the high- and low-risk groups (Supplementary Fig. 1C). We also obtained the distribution of survival and death in patients with different risk scores in the validation set (Supplementary Fig. 1D), and the survival curve showed a significantly worse prognosis for the high-risk patients than for low-risk patients (P = 2.612e−02) (Fig. 6F). Additionally, time-dependent ROC analysis showed that the 1-, 2-, and 3-year AUC values for the validation set were 0.731, 0.792, and 0.726, respectively (Fig. 6G).
Assessment and validation of the prognostic Model. (A) Training set divided into high- and low-risk groups based on the median risk score. (B) Distribution of survival and death for patients in the training set with different risk scores. (C) Kaplan–Meier survival curve in the training set showing a poorer prognosis for patients in the high-risk group (P = 3.373e−05). (D) TimeROC analysis results for the training set at 1 (AUC = 0.715), 2 (AUC = 0.691), and 3 (AUC = 0.691) years. (E) Risk scores in terms of age, sex, and pathologic status obtained via multivariate COX regression analysis. Our model had an independent prognostic power (HR = 4.485; 95% CI [2.858, 7.040], P < 0.001). (F) Kaplan–Meier survival curve for the validation set showing a worse prognosis for the high-risk group (P = 2.612e−02) than for the low-risk group. (G) TimeROC analysis results for the validation set at 1 (AUC = 0.731), 2 (AUC = 0.792), and 3 (AUC = 0.726) years. ROC receiver operating characteristic, AUC area under the curve, HR hazard ratio, CI confidence interval. P < 0.05 is statistically significant.
TMB and PPI analyses
TMB analysis was performed for high- and low-risk patients. Using a waterfall plot, we identified TP53, TTN, and MUC16 as the most commonly mutated genes, with missense mutations showing predominance. In the high-risk group, the mutation rates of TP53, TTN, and MUC16 were above 40% (i.e., 56, 56, and 44%, respectively; Fig. 7A), and the median TMB (3.65 mut/MB) for this group was higher than that obtained for the low-risk group (Fig. 7B). Additionally, in the low-risk group, the mutation rates of TP53, TTN, and MUC16 were all less than 40% (i.e., 39%, 36%, 34%, respectively; Fig. 7D), and the median TMB (2.92 mut/MB) was lower than that obtained for the high-risk group (Fig. 7C). Combining the risk scores and TMB for correlation analysis revealed that TMB was positively correlated with LUAD risk score (R = 0.16, P = 74e−5; Supplementary Fig. 1B).
TMB and PPI analyses. (A) Waterfall plot of the somatic mutation landscape for the patients in the high-risk group. (B) TMB status for patients in the high-risk group (median TMB = 3.65 mut/MB). (C) TMB status for patients in the low-risk group (median TMB = 2.92 mut/MB). (D) Waterfall plot of the somatic mutation landscape of patients in the low-risk group. (E) PPI network map. (F) PPI network map based on BC parameters. TMB tumor mutation burden, PPI the protein–protein interaction; mut/MB, number of mutations/exon Mb length detected, BC betweenness. P < 0.05 denotes statistically significant differences.
PPI network analysis was performed to explore the connections between the model genes. Thus, we identified associations between 10 model genes (POSTN, COL1A2, DCN, COL5A2, TIMP1, SERPINH1, COMP, MFAP4, CTHRC1, and IGFBP3). We identified 28 relevant acting edges (Fig. 7E) in these 10 genes. These acting edges were found to be primarily associated with collagen fibril organization, extracellular matrix organization, ossification, collagen binding, and structural molecule activity. Using the BC parameters in the CytoNCA plugin, we calculated the BC scores of the 10 genes (Fig. 7F; Supplementary Table 6). By plotting the PPI network map, we observed that the BC values for POSTN (12.50), COL1A2 (12.50), DCN (3.50), COL5A2 (2.50), TIMP1 (2.50), and SERPINH1 (0.50) were greater than 0. Specifically, POSTN and COL1A2 showed the largest BC values; therefore, we considered them as the hub genes of the PPI network.
POSTN in LUAD cell lines
Based on preliminary analysis results, we identified POSTN and COL1A2 as core genes in our prognostic model. Then, to screen potential biological targets in LUAD, we experimentally validated the expression of POSTN in LUAD cell lines. Rt-qPCR results showed higher POSTN expression levels at the mRNA level in LUAD cell lines (A549, H1299, SPC-A1, and H1650) than in the normal cell line (BEAS-2B) (Fig. 8A). Similarly, western blot analysis revealed a higher POSTN protein expression level in LUAD cell lines than in the normal cell line (Fig. 8B). To further investigate the effect of POSTN on LUAD cell proliferation, migration, and invasion, we knocked down POSTN in A549 and H1299 cells lines (Fig. 8C,D). The Si-POSTN-1 cells thus obtained were then used for transfection in subsequent experiments.
POSTN expression in LUAD. (A) qPCR results showing that POSTN was highly expressed in LUAD cell lines (A549, H1299, SPC-A1, H1650). ****P < 0.0001. (B) Western blot analysis showing high POSTN expression levels in LUAD cell lines. ****P < 0.0001. (C) Screening to identify the effective Si-RNA in A549 cells. ****P < 0.0001. (D) Screening to identify the effective Si-RNA in H1299 cells. *****P < 0.0001.
POSTN affects the proliferative ability of LUAD cells
KI-67 and PCNA are cell proliferation markers. To explore the relationship between POSTN and cell proliferation, we measured the expression levels of KI-67 and PCNA in LUAD cells. Thus, we observed that KI-67 and PCNA expression levels were significantly reduced in LUAD cell lines after POSTN knockdown (Fig. 9A,B). CCK8 assay also confirmed the relationship between POSTN and LUAD cell proliferation. Specifically, our results indicated that POSTN knockdown resulted in a significant decrease in the number of A549 cells after 24, 36, and 72 h (Fig. 9C). We also observed that POSTN knockdown significantly lowered the number of H1299 cells observed after 72 h (Fig. 9D). These results provided evidence that POSTN was involved in the cell proliferation of LUAD cell lines.
Effect of POSTN expression on LUAD cell proliferation. (A) Western blot analysis results showing that POSTN knockdown in A549 cells lowered KI-67 and PCNA expression. ****P < 0.0001. (B) Western blot analysis showing that POSTN knockdown lowered KI-67 and PCNA expression in H1299 cells. ***P < 0.001,****P < 0.0001. (C) CCK8 assay results showing that POSNT knockdown resulted in a significant decrease in A549 cell counts at 24, 48, and 72 h. **P < 0.01, ***P < 0.001, ****P < 0.0001. (D) CCK8 assay results showing that POSTN knockdown resulted in a significant decrease in H1299 cell counts at 72 h. *P < 0.05.
POSTN affects migration and invasion
Wound-healing assays were performed to explore the effect of POSTN on the migration of LUAD cells. The results thus obtained revealed that POSTN knockdown shortened the migration distance of both A549 and H1299 cells (Fig. 10A,B). These observations indicated that POSTN was involved in cell migration of LUAD cell lines. Next, the results of Transwell assay revealed that LUAD cell migration and invasion were significantly reduced after POSTN knockdown in both A549 and H1299 cells (Fig. 10C,D). These results indicated that POSTN was involved in cell migration and invasion of LUAD cell lines.
Effect of POSTN expression on cells migration and invasion in LUAD. (A) Wound healing assay results showing that POSTN knockdown reduced A549 cell migration. ***P < 0.001. (B) Wound healing assay results showing that POSTN knockdown reduced H1299 cell migration. **P < 0.01. (C) Transwell assay results showing that POSTN knockdown reduced A549 cell migration and invasion. *P < 0.05, **P < 0.01. (D) Transwell assay results showing that POSTN knockdown reduced H1299 cell migration and invasion. *P < 0.05, **P < 0.01.
Discussion
LUAD is a malignant disease characterized by a high level of heterogeneity and a poor prognosis32. Current treatment options for LUAD include surgical resection, radiotherapy, and chemotherapy. However, these conventional treatments have limited efficacy owing to the emergence of tumor metastasis and drug resistance33. Studies on biomarkers have demonstrated the feasibility of targeted therapies for LUAD. Genes such as EGFR, ALK, and KRAS have been generally recognized as important biomarkers in LUAD7,34. With the development of sequencing technology, an increasing number of important biomarkers have been identified using bioinformatics techniques35. BRM2 has been identified as a biomarker associated with iron death and tumor immune infiltration36, and DARS2 has been identified as a biomarker associated with Cu death and tumor immune infiltration37. In recent years, researchers have begun identifying important relationships between fibroblasts and cancer. Specifically, CAFs are a subgroup of fibroblasts with important effects on tumorigenesis and cancer development. They also play important roles in extracellular matrix deposition and immune environment regulation and considerably affect cancer metabolism and drug resistance38. scRNA-seq technology offers the possibility to label different CAF subgroups in LUAD. Therefore, exploring different CAF subgroups, constructing a CAF-related prognostic model, and screening for CAF-related biomarkers can promote LUAD diagnosis and treatment.
Using LUAD scRNA-seq data obtained from the GEO database, we labeled and obtained fibroblasts, and based on the relevant literature, we annotated fibroblast subgroups. Next, we identified two CAF subgroups that were predominantly expressed in LUAD tissues. In particular, one CAF subgroup showed greater activity in LUAD and high expression levels for MyCAFs-related genes, such as POSTN and MMP11. Thus, we labeled this subgroup as MyCAFs. The other subgroup showed high expression levels for immune-related genes, such as IGHG4 and IGHG1. Thus, we labeled the subgroup as ImmCAFs. Thereafter, functional annotation analysis showed that ImmCAFs were strongly associated with biological activities, such as immunoglobulin receptor and antigen binding; therefore, the ImmCAFs may be associated with abnormal immune activities in LUAD. Conversely, MyCAFs showed a strong association with extracellular matrix structural constituents and collagen binding; therefore, they possibly play an important role in extracellular matrix remodeling in LUAD. These findings validated the reliability of our annotation and enabled the exploration of the characteristics of these two CAF subgroups. Given that MyCAFs were more active in LUAD, we performed cell trajectory analysis using ImmCAFs as a starting point. Thus, we observed that the most significantly altered genes reflected transformation from ImmCAFs to MyCAFs.
To further clarify the characteristics of ImmCAFs and MyCAFs in cellular communication, we performed cellular communication analysis, which showed that WNT pathway activity was significantly different between ImmCAFs and MyCAFs. In MyCAFs, the WNT pathway was found to be involved in various cell-to-cell communication processes. Reportedly, the WNT pathway is closely associated with a variety of phenotypes, such as epithelial-mesenchymal transition and angiogenesis39,40. It has also been reported that chromobox 4 protein can influence LUAD development via the WNT/β-catenin passage41. Junctional adhesion molecule-like protein has also been shown to influence LUAD progression via WNT/β-catenin42. Therefore, owing to the important role of the WNT pathway in LUAD development, further studies are necessary to explore the relationship between MyCAFs and the WNT pathway in LUAD.
ScRNA-seq data do not contain clinical data, so we introduced bulk RNA-seq data in our study. Bulk RNA-seq data is a convolution data. The back-convolution method can be performed on bulk RNA-seq data, demonstrating the proportion of cell subgroups in each sample of bulk RNA-seq data. Digital cytometry method is a machine learning method based on the principle of back-convolution. Based on digital cytometry machine learning method, we obtained the proportions of MyCAFs and ImmCAFs in each sample of bulk RNA-seq. Combined with clinical data from bulk RNA-seq data, we found that ImmCAFs had no prognostic value in LUAD, whereas MyCAFs were identified as a significant prognostic factor in LUAD. Therefore, we then performed model construction and data analysis for MyCAFs. We screened genes and constructed MyCAFs-related prognostic model by bulk RNA-seq data. In brief, we obtained MyCAFs-related DEGs by intersecting DEGs in LUAD and MyCAFs-related genes. Using 28 MyCAFs-related DEGs identified via the LASSO-Cox regression analysis, we generated a MyCAFs-related prognostic model. This model showed a significant difference in prognosis between high- and low-risk patient groups in the training set (AUC values were 0.715 = , 0.691, and 0.691 for 1, 2, and 3 years, respectively). We also observed a significant difference in prognosis between high- and low-risk groups for the validation bulk RNA-seq set (AUC values of 0.731, 0.792, and 0.726 for 1, 2, and 3 years, respectively). Furthermore, TMB analysis using TCGA data revealed that the mutation rates of TP53 (56%), TTN (50%), and MUC16 (44%) were significantly higher in high-risk patients than in their low-risk counterparts. Therefore, our model could distinguish patients with different prognoses. Hence, its clinical and practical significance.
Our prognosis model consisted of COL1A2, COL5A2, COMP, CTHRC1, DCN, HTRA3, IGFBP3, MFAP4, MYL9, POSTN, RARRES2, SERPINH1, and TIMP1, and via PPI analysis, we identified POSTN and COL1A2 as the central genes. The effect of COL1A2 on LUAD progression has been previously investigated43. However, studies on the association between POSTN and LUAD are limited. Therefore, we designed experiments to explore the association between POSTN and LUAD cell lines. Specifically, we investigated the relationship between POSTN and the proliferation, migration, and invasion phenotypes of LUAD cells using A549 and H1299 cells with POSTN knockdown. Via western blot analysis and CCK8 assay, we showed that POSTN was involved in proliferation of LUAD cell lines. Moreover, results of wound healing and Transwell assays showed that POSTN was involved in the migration and invasion of LUAD cell lines. POSTN is regulated by various transcription factors, including Twist44,45. Besides, various cytokines, such as IL-3 and IL-4 affect POSTN expression46. One study showed that in renal epithelial cell carcinoma, POSTN affects proliferative and epithelial-mesenchymal cells via the ILK/AKT/mTOM pathway47. It has also been reported that in colorectal cancer, POSTN promotes disease progression and metastasis via the AKT/PKB pathway48. The results of this study revealed that LUAD progression was correlated with high POSTN expression levels.
In this study, we constructed a MyCAFs-related prognostic model for LUAD and identified POSTN as a potential therapeutic target gene in LUAD. The construction of the new MyCAFs-based prognostic model may contribute to the clinical diagnosis of LUAD. Importantly, the discovery of POSTN as a novel therapeutic target in LUAD may promote the development of targeted therapies for this cancer type. Our findings not only facilitate the development of basic research but may also be employed to improve the early diagnosis and precise treatment of patients with LUAD. Our manuscript also had some limitations. We did not validate POSTN in vivo or patient-derived samples. Due to space and experimental resource limitations, we did not perform parallel experiment with POSTN on COL1A2. In the future, we will further explore the role of POSTN and COL1A2 in LUAD using various experimental methods.
Data availability
The GSE131907 dataset for single cells of this study was obtained from the GEO website (https://www.ncbi.nlm.nih.gov/geo/). The RNA-seq training set data for this study was from the TCGA website (https://cancergenome.nih.gov/). The RNA-seq validation set data GSE42127 for this study was from the GEO website (https://www.ncbi.nlm.nih.gov/geo/). The code and data used in our manuscript are available from the GitHub website (https://github.com/JiaRui12345/LUAD_DATA).
Abbreviations
- CAFs:
-
Cancer-associated fibroblasts
- ImmCAFs:
-
Immune-related cancer-associated fibroblasts
- UMAP:
-
Uniform manifold approximation and projection
- TCGA:
-
The Cancer Genome Atlas
- LUAD:
-
Lung adenocarcinoma
- MyCAFs:
-
Myofibroblastic cancer-associated fibroblasts
- GEO:
-
Gene Expression Omnibus
- LASSO:
-
Least absolute shrinkage and selection operator
- ROC:
-
Receiver operating characteristic
- GO:
-
Gene ontology
- BC:
-
Betweenness
- TMB:
-
Tumor mutation burden
- InCAFs:
-
Inflammatory CAFs
- PPI:
-
The protein–protein interaction
- DEGs:
-
Differentially expressed genes
- AUC:
-
Area under the curve
- ROC:
-
Receiver operating characteristic
- HR:
-
Hazard ratio
- CI:
-
Confifidence interval
- SVR:
-
Support vector regression
References
Imielinski, M. et al. Mapping the hallmarks of lung adenocarcinoma with massively parallel sequencing. Cell 150(6), 1107–1120. https://doi.org/10.1016/j.cell.2012.08.029 (2012).
UyBico, S. J. et al. Lung cancer staging essentials: The new TNM staging system and potential imaging pitfalls. Radiographics 30(5), 1163–1181. https://doi.org/10.1148/rg.305095166 (2010).
Thai, A. A., Solomon, B. J., Sequist, L. V., Gainor, J. F. & Heist, R. S. Lung cancer. Lancet 398(10299), 535–554. https://doi.org/10.1016/S0140-6736(21)00312-3 (2021).
Chen, Z., Fillmore, C. M., Hammerman, P. S., Kim, C. F. & Wong, K. K. Non-small-cell lung cancers: A heterogeneous set of diseases. Nat. Rev. Cancer 14(8), 535–546. https://doi.org/10.1038/nrc3775 (2014).
Barta, J. A., Powell, C. A. & Wisnivesky, J. P. Global epidemiology of lung cancer. Ann. Glob. Health 85(1), 8. https://doi.org/10.5334/aogh.2419 (2019).
Hanahan, D. & Weinberg, R. A. Hallmarks of cancer: The next generation. Cell 144(5), 646–674. https://doi.org/10.1016/j.cell.2011.02.013 (2011).
Sholl, L. M. Biomarkers in lung adenocarcinoma: A decade of progress. Arch. Pathol. Lab. Med. 139(4), 469–480. https://doi.org/10.5858/arpa.2014-0128-RA (2015).
Piersma, B., Hayward, M. K. & Weaver, V. M. Fibrosis and cancer: A strained relationship. Biochim. Biophys. Acta Rev. Cancer 1873(2), 188356. https://doi.org/10.1016/j.bbcan.2020.188356 (2020).
Lavie, D., Ben-Shmuel, A., Erez, N. & Scherz-Shouval, R. Cancer-associated fibroblasts in the single-cell era. Nat. Cancer 3(7), 793–807. https://doi.org/10.1038/s43018-022-00411-z (2022).
Chen, X. & Song, E. Turning foes to friends: Targeting cancer-associated fibroblasts. Nat. Rev. Drug Discov. 18(2), 99–115. https://doi.org/10.1038/s41573-018-0004-1 (2019).
Mhaidly, R. & Mechta-Grigoriou, F. Role of cancer-associated fibroblast subpopulations in immune infiltration, as a new means of treatment in cancer. Immunol. Rev. 302(1), 259–272. https://doi.org/10.1111/imr.12978 (2021).
Chen, Y., McAndrews, K. M. & Kalluri, R. Clinical and therapeutic relevance of cancer-associated fibroblasts. Nat. Rev. Clin. Oncol. 18(12), 792–804. https://doi.org/10.1038/s41571-021-00546-5 (2021).
Öhlund, D. et al. Distinct populations of inflammatory fibroblasts and myofibroblasts in pancreatic cancer. J. Exp. Med. 214(3), 579–596. https://doi.org/10.1084/jem.20162024 (2017).
Bartoschek, M. et al. Spatially and functionally distinct subclasses of breast cancer-associated fibroblasts revealed by single cell RNA sequencing. Nat. Commun. 9(1), 5150. https://doi.org/10.1038/s41467-018-07582-3 (2018).
Li, H. et al. Reference component analysis of single-cell transcriptomes elucidates cellular heterogeneity in human colorectal tumors. Nat. Genet. 49(5), 708–718. https://doi.org/10.1038/ng.3818 (2017).
Hanley, C. J. et al. Single-cell analysis reveals prognostic fibroblast subpopulations linked to molecular and immunological subtypes of lung cancer. Nat. Commun. 14(1), 387. https://doi.org/10.1038/s41467-023-35832-6 (2023).
Jovic, D. et al. Single-cell RNA sequencing technologies and applications: A brief overview. Clin. Transl. Med. 12(3), e694. https://doi.org/10.1002/ctm2.694 (2022).
Kolodziejczyk, A. A., Kim, J. K., Svensson, V., Marioni, J. C. & Teichmann, S. A. The technology and biology of single-cell RNA sequencing. Mol. Cell 58(4), 610–620. https://doi.org/10.1016/j.molcel.2015.04.005 (2015).
Kim, N. et al. Single-cell RNA sequencing demonstrates the molecular and cellular reprogramming of metastatic lung adenocarcinoma. Nat. Commun. 11(1), 2285. https://doi.org/10.1038/s41467-020-16164-1 (2020).
Satija, R., Farrell, J. A., Gennert, D., Schier, A. F. & Regev, A. Spatial reconstruction of single-cell gene expression data. Nat. Biotechnol. 33(5), 495–502. https://doi.org/10.1038/nbt.3192 (2015).
Tang, H. et al. A 12-gene set predicts survival benefits from adjuvant chemotherapy in non-small cell lung cancer patients. Clin. Cancer Res. 19(6), 1577–1586. https://doi.org/10.1158/1078-0432.CCR-12-2321 (2013).
Smyth, G. K. Bioinformatics and computational biology solutions using R and bioconductor. In Statistics for Biology and Health (eds Gentleman, R. et al.) (Limma: Linear Models for Microarray Data, 2005).
Becht, E. et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nat. Biotechnol. https://doi.org/10.1038/nbt.4314 (2018).
Wu, T. et al. clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. Innovation (Camb) 2(3), 100141. https://doi.org/10.1016/j.xinn.2021.100141 (2021).
Cao, J. et al. The single-cell transcriptional landscape of mammalian organogenesis. Nature 566(7745), 496–502. https://doi.org/10.1038/s41586-019-0969-x (2019).
Qiu, X. et al. Reversed graph embedding resolves complex single-cell trajectories. Nat. Methods 14(10), 979–982. https://doi.org/10.1038/nmeth.4402 (2017).
Trapnell, C. et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat. Biotechnol. 32(4), 381–386. https://doi.org/10.1038/nbt.2859 (2014).
Jin, S. et al. Inference and analysis of cell–cell communication using cell chat. Nat. Commun. 12(1), 1088. https://doi.org/10.1038/s41467-021-21246-9 (2021).
Newman, A. M. et al. Determining cell type abundance and expression from bulk tissues with digital cytometry. Nat. Biotechnol. 37(7), 773–782. https://doi.org/10.1038/s41587-019-0114-2 (2019).
Szklarczyk, D. et al. STRING v10: Protein–protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 43(Database issue), D447–D452. https://doi.org/10.1093/nar/gku1003 (2015).
Tang, Y., Li, M., Wang, J., Pan, Y., Wu, F. X. CytoNCA: a cytoscape plugin for centrality analysis and evaluation of protein interaction networks. Biosystems. 127, 67–72; https://doi.org/10.1016/j.biosystems.2014.11.005 (2015).
Seguin, L., Durandy, M. & Feral, C. C. Lung Adenocarcinoma tumor origin: A guide for personalized medicine. Cancers (Basel) 14(7), 1759. https://doi.org/10.3390/cancers14071759 (2022).
Denisenko, T. V., Budkevich, I. N. & Zhivotovsky, B. Cell death-based treatment of lung adenocarcinoma. Cell Death Dis. 9(2), 117. https://doi.org/10.1038/s41419-017-0063-y (2018).
Saito, M. et al. Gene aberrations for precision medicine against lung adenocarcinoma. Cancer Sci. 107(6), 713–720. https://doi.org/10.1111/cas.12941 (2016).
Zou, J. & Wang, E. Cancer biomarker discovery for precision medicine: New progress. Curr. Med. Chem. 26(42), 7655–7671. https://doi.org/10.2174/0929867325666180718164712 (2019).
Tang, B. et al. Identification of critical ferroptosis regulators in lung adenocarcinoma that RRM2 facilitates tumor immune infiltration by inhibiting ferroptotic death. Clin. Immunol. 232, 108872. https://doi.org/10.1016/j.clim.2021.108872 (2021).
Liu, X. S. et al. DARS2 is a prognostic biomarker and correlated with immune infiltrates and cuproptosis in lung adenocarcinoma. Am. J. Cancer Res. 13(3), 818–834 (2023).
Kalluri, R. The biology and function of fibroblasts in cancer. Nat. Rev. Cancer 16(9), 582–598. https://doi.org/10.1038/nrc.2016.73 (2016).
Yang, S. et al. FOXP3 promotes tumor growth and metastasis by activating Wnt/β-catenin signaling pathway and EMT in non-small cell lung cancer. Mol. Cancer 16(1), 124. https://doi.org/10.1186/s12943-017-0700-1 (2017).
Shukla, S. et al. Cucurbitacin B inhibits the stemness and metastatic abilities of NSCLC via downregulation of canonical Wnt/β-catenin signaling axis. Sci. Rep. 6, 21860. https://doi.org/10.1038/srep21860 (2016).
Wang, Z. et al. Chromobox 4 facilitates tumorigenesis of lung adenocarcinoma through the Wnt/β-catenin pathway. Neoplasia 23(2), 222–233. https://doi.org/10.1016/j.neo.2020.12.005 (2021).
Wu, Q. et al. Junctional adhesion molecule-like protein promotes tumor progression via the Wnt/β-catenin signaling pathway in lung adenocarcinoma. J. Transl. Med. 20(1), 260. https://doi.org/10.1186/s12967-022-03457-w (2022).
Yu, D. H. et al. Analysis of the interaction network of hub miRNAs-hub genes, being involved in idiopathic pulmonary fibers and its emerging role in non-small cell lung cancer. Front. Genet. 11, 302. https://doi.org/10.3389/fgene.2020.00302 (2020).
Oshima, A. et al. A novel mechanism for the regulation of osteoblast differentiation: transcription of periostin, a member of the fasciclin I family, is regulated by the bHLH transcription factor, twist. J. Cell. Biochem. 86(4), 792–804. https://doi.org/10.1002/jcb.10272 (2002).
Dorafshan, S. et al. Periostin: Biology and function in cancer. Cancer Cell Int. 22(1), 315. https://doi.org/10.1186/s12935-022-02714-8 (2022).
Takayama, G. et al. Periostin: A novel component of subepithelial fibrosis of bronchial asthma downstream of IL-4 and IL-13 signals. J. Allergy Clin. Immunol. 118(1), 98–104. https://doi.org/10.1016/j.jaci.2006.02.046 (2006).
Jia, Y. Y., Yu, Y. & Li, H. J. POSTN promotes proliferation and epithelial-mesenchymal transition in renal cell carcinoma through ILK/AKT/mTOR pathway. J. Cancer. 12(14), 4183–4195. https://doi.org/10.7150/jca.51253 (2021).
Bao, S. et al. Periostin potently promotes metastatic growth of colon cancer by augmenting cell survival via the Akt/PKB pathway. Cancer Cell 5(4), 329–339. https://doi.org/10.1016/s1535-6108(04)00081-9 (2004).
Funding
The National Natural Science Foundation of China (Grant No. 81874442).
Author information
Authors and Affiliations
Contributions
Jiarui Zhao analyzed the data, designed the study and wrote the article; Chuanqing Jing performed the cell experiments; Rui Fan acquired the raw data and validated the model; Wei Zhang reviewed the whole article.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Zhao, J., Jing, C., Fan, R. et al. Prognostic model of lung adenocarcinoma from the perspective of cancer-associated fibroblasts using single-cell and bulk RNA-sequencing. Sci Rep 15, 21961 (2025). https://doi.org/10.1038/s41598-025-05058-1
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-05058-1












