Abstract
Lung adenocarcinoma (LUAD) is the most common type of lung cancer, accounting for approximately 35–40% of lung cancers, and the overall survival time of patients with LUAD is still very poor. B cells are important effector cells of adaptive immunity, and B-cell infiltration increases in various tumors. The role of B cells in LUAD is still largely unknown. Therefore, it is particularly important to clarify the role of B cells in LUAD. GSE164983, GSE50081, GSE37745 and GSE30219 were obtained from the GEO database. The TCGA-LUAD dataset was obtained from the TCGA database. UMAP was used to perform clustering descending and subgroup identification on single-cell RNA-sequencing (scRNA-seq) data to obtain B-cell markers. The TCGA cohort was used to obtain differentially expressed genes (DEGs). B-cell-related differentially expressed genes (BRGs) were identified through the intersection of B-cell markers and DEGs. The LASSO method was used to identify characteristic genes of BRGs and construct a prognostic risk model. LUAD patients were divided into high-risk and low-risk groups based on risk scores, and the immune landscape of the two groups was evaluated. We also analyzed the differences in clinical characteristics, mutations, immunotherapy, and drug sensitivity between the two groups. Thirty BRGs were obtained, and 6 characteristic genes were identified. Based on the characteristic genes, a prognostic risk model was constructed. According to the prognostic risk model, LUAD patients were divided into two groups: high-risk group and low-risk group. Patients in the high-risk group had worse outcomes and shorter survival times. Low-risk patients had better survival, while patients with high TNM stage accounted for a greater proportion of patients in the high-risk group. In addition, high-risk patients had a greater probability of mutation and worse immunotherapy response. Finally, we found different susceptibility profiles between the high-risk and low-risk groups. The prognostic risk model built based on the BRGs had good predictive performance, providing a new perspective on the prognosis and immunotherapy of LUAD patients and a new reference for LUAD research.
Similar content being viewed by others
Introduction
Lung cancer is a malignant tumor originating from the bronchial mucosa or lung glands that poses a great threat to human life. In recent years, the incidence and mortality of lung cancer have increased significantly1. Lung adenocarcinoma (LUAD), a subtype of non-small cell lung cancer (NSCLC), accounts for approximately 40% of all lung cancers and has a poor five-year survival rate2,3. The early symptoms of lung cancer are often recurrent, and a majority of patients may be in an advanced state when diagnosed, with tumor cell invasion and metastasis already present, which limits therapeutic options and results in a poor prognosis4. Over the past few decades, our understanding of the molecular pathogenesis of LUAD has improved significantly due to the rapid development of omics technology5,6. A series of omic data-derived signatures were generated to predict the clinical outcomes of LUAD patients7. Accordingly, more novel multigene signatures are valuable for predicting the outcome and recurrence of LUAD.
B cells are the main effector cells of humoral immunity, while Tfh cells are helper T cells that control the maturation and activation of B cells. The interaction of B cells, Tfh cells, and dendritic cells (DCs) is the basis of the adaptive immune response8. B cells have a variety of immune response functions. Tumor-infiltrating B lymphocytes (TIBs) can be observed in a variety of solid tumors9,10. Existing evidence shows that TIBs inhibit tumor progression by secreting immunoglobulins, promoting T-cell responses, and directly killing cancer cells11.
In recent years, the important role of the tumor microenvironment (TME) in tumor progression and treatment has emphasized the importance of identifying immune expression profiles and immune signatures in patients with different tumors12. TIBs are important infiltrating cells in the TME. In chronic lymphocytic leukemia, interfering with TIB receptors and B-cell-related CCR7 signaling can delay tumor progression13. TIBs and B-cell-related pathways also maintain the structure and function of tertiary lymphoid structures (TLSs). TLSs consist of T-cell- and B-cell-rich regions that are sites of differentiation of effector T cells and memory T and B cells14. TLSs are transient ectopic lymphoid aggregates whose structural organization and function are similar to those of secondary lymphoid organs15. Studies have shown that the presence of TLSs in the TME is associated with local antitumor immune responses and a positive patient prognosis16. TIBs and B-cell-related pathways play key roles in the formation of TLSs and the local immune response that occurs in TLSs17.
With the rapid development of bulk RNA sequencing (RNA-seq) and single-cell RNA sequencing (scRNA-seq), a large number of new technologies have been used to screen and identify key genes and have effective predictive capabilities in disease diagnosis, treatment, and prognosis18,19,20. Therefore, this study combined RNA-seq and scRNA-seq to identify characteristic genes and immune expression profiles of LUAD patients. Based on B-cell-related characteristic genes, a prognostic risk prediction model for LUAD patients was constructed, and a variety of techniques were used to analyze the prognostic characteristics of LUAD patients. This study aimed to explore the impact of B-cell-related genes on the prognosis of LUAD patients. The detailed flow chart is shown in Fig. 1.
Materials and methods
Data acquisition and processing
The scRNA-seq data of GSE164983, which included 2 patients with lung adenocarcinoma, were downloaded from the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/)21. First, the “Seurat” R package22 was used to convert scRNA-seq data into Seurat objects and to exclude substandard quality cells via R software 4.4.0 (https://www.R-project.org/)23. The features of RNA, RNA count, proportion of mitochondria and number of red blood cells were assessed using the PercentageFeatureSet function in the “Seurat” R package. Then, to screen single cells, each cell was set to express 300–10000genes, UMI > 600, proportion of mitochondria < 10% and proportion of red blood cells < 1%. The batch effects for 2 samples were eliminated using the FindIntegrationAnchors function. The top 2000 variable genes were identified using the “FindVariableFeatures” program. Nonlinear dimensional reduction was conducted using principal component analysis (PCA) with 41 principal components and a resolution of 0.6. Two thousand genes were used for cell subpopulation identification via t-distributed stochastic neighbor embedding (tSNE) and uniform manifold approximation and projection (UMAP). The “SingleR” R package24 was used for the annotation of different cell types. To identify marker genes in B cells, the “FindAllMarkers” tool was used with a |log2FC| > 1. LUAD transcriptome data, copy number variation (CNV) data and clinical information were obtained from The Cancer Genome Atlas (TCGA) database (https://portal.gdc.cancer.gov/)25. For the RNA-seq data, 541 tumor samples and 49 normal samples were ultimately included in the training set. The GSE50081, GSE30219 and GSE37745 dataset were downloaded from the GEO database as a validation set. Differentially expressed genes (DEGs) between tumor and normal samples were screened using the “limma” R package26 with a false discovery rate (FDR) < 0.05 and |log2FC| > 1.
Construction and verification of the B-cell-related DEG (BRGs) signature
BRGs were screened for subsequent analysis by intersecting B-cell markers and DEGs. The “glmnet” R package27was used to perform LASSO regression analysis on the BRGs and to construct a risk model. Cox multivariate regression analysis revealed 6 characteristic genes of BRGs and correlation coefficients (COEF). The calculation formula was as follows: risk score = Expression mRNA1 × COEF mRNA1 + Expression mRNA2 × COEF mRNA2 +…+ Expression mRNAn × COEF mRNAn. Then, a risk score was calculated for each patient. Patients in the training set were divided into high-risk and low-risk groups based on the median risk score. K‒M survival analysis was performed, and a receiver operating characteristic (ROC) curve was constructed using the “pROC”28and “timeROC” R packages29. To verify the predictive ability of the risk model, we evaluated the prognosis, sensitivity, and specificity of this model in the TCGA cohort. Then, it was verified in the validation set according to the risk scoring formula.
Independent prognostic analysis and nomogram construction
To determine whether 6 characteristic genes of BRGs could serve as independent predictive factors in LUAD patients, we performed univariate and multivariate Cox regression analyses using the “autoReg” R package30. A nomogram for clinical patients based on age, sex, TNM stage, smoking history, and risk score was created using the “rms” R package31.
Prognostic value analysis of the risk score
K‒M analysis was used to explore the prognostic value of the risk score based on age, sex, TNM stage, and smoking history.
Functional enrichment analysis
GO and KEGG pathway analyses were performed using the “clusterProfiler” R package32. The “GSVA” R package33 was used for GSVA analysis with “c2.cp.kegg_legacy.v2023.2.Hs.symbols” and “c5.go.v2023.2.Hs.symbols” to determine differences in enrichment pathways between different risk groups.
Simple nucleotide variation (SNV) analysis
The “maftools” R package34 was used to calculate SNVs from LUAD samples. We calculated the tumor mutation burden (TMB) for each LUAD patient and explored the relationship between the risk score and the TMB. K‒M analysis was used to explore the prognostic value of the TMB in LUAD patients.
Correlation analysis of characteristic genes of BRGs and immune infiltration
Seven algorithms, including CIBERSORT, XCELL, TIMER, QUANTISEQ, MCPCOUNTER, EPIC and CIBERSORT-ABS, were used to evaluate the correlation between risk scores and tumor-infiltrating immune cells. Using 19 common suppressive immune checkpoints, we evaluated the expression levels of immune checkpoints between the low-risk and high-risk groups. In addition, the “estimate” R package35 was used to calculate the TME scores of the two groups of patients, including the stromal scores, immune scores, estimated scores and tumor purity scores.
Immune function analysis and immune escape analysis
The “GSEABase” R package36 was used to calculate immune function scores. Immunophenotypic scoring (IPS) was calculated by The Cancer Immunome Atlas database (https://tcia.at/home)37 to assess the response to immunotherapy in different groups of patients. In addition, TIDE scores were calculated using the Tumor Immune Dysfunction and Exclusion Database (http://tide.dfci.harvard.edu/login/)38 to evaluate immune escape in different groups of patients.
Construction of the regulatory network
Transcription factor (TF)-gene regulatory network and miRNA-gene regulatory network based on characteristic genes were constructed with the NetworkAnalyst database39, and Cytoscape 3.8.240 was used for visualization.
Drug sensitivity analysis
Drug sensitivity analysis was based on the Genomics of Drug Sensitivity in Cancer database (GDSC, https://www.cancerrxgene.org/)41. We estimated the sensitivities of commonly used drugs in LUAD patients in different groups through the “OncoPredict” R package42.
Results
scRNA-seq analysis and identification of B-cell markers
First, we performed quality control on the scRNA-seq data as described previously (Fig. 2A and supplementary materials 1 A-C) and obtained a total of 19,263 cells. After downstream clustering, a total of 22 distinct cell subpopulations were identified. The “SingleR” R package was then used to annotate and visualize the reduced clusters of cell types. Overall, we identified 8 major cell types in this step, including monocytes-macrophages, T cells, epithelial cells, endothelial cells, natural killer (NK) cells, B/plasma cells, fibroblasts, and neutrophil cell (Fig. Fig. 2B). Figure Fig. 2 C shows marker gene expression in these cell subpopulations. We further analyzed and partially displayed the markers of 8 types of cells (Fig. Fig. 2D) and identified 125 markers of B cells.
Screening of DEGs and BRGs
First, we used the “limma” R package to screen the DEGs of LUAD patients in the TCGA cohort and obtained a total of 3378 DEGs, including 1441 downregulated genes and 1937 upregulated genes (Fig. 3B). The heatmap shows the top 50 upregulated and downregulated genes (Fig. 3A). To clarify the immune infiltration profile of LUAD patients, we used CIBERSORT to analyze the infiltration of 22 types of immune cells in LUAD patients. The results showed that the infiltration of 18 types of immune cells, including B cells, plasma cells, CD4+ memory T cells, and macrophages, changed significantly in LUAD patients (Fig. 3C). We found that the infiltration of B cells and plasma cells increased significantly in LUAD patients. Therefore, we considered B cells may play an important role in LUAD. To uncover the role of B cells in LUAD, we intersected the B cell markers and DRGs, and 30 BRGs were selected for further analysis to clarify the role of B cells in LUAD (Fig. 3D).
Two groups of patients distinguished based on a prognostic risk model had different survival times
To clarify the characteristic genes, LASSO regression was used to screen the characteristic genes and construct a prognostic risk prediction model based on the characteristic genes. We identified a total of 6 characteristic genes, including G protein subunit gamma 7 (GNG7), voltage-gated hydrogen channel 1 (HVCN1), DNA binding inhibitory factor 3 (ID3), cAMP-dependent protein kinase inhibitor G (PKIG), ral GEF with PH domain and SH3 binding motif 2 (RALGPS2) and SH3 domain binding protein 5 (SH3BP5) (Fig. 4A). A prognostic risk model was constructed based on these 6 characteristic genes. The risk score calculation formula was risk score = (−0.0413) × GNG7 exp + (−0.0019) × HVCN1 exp + 0.0017 × ID3 exp + (−0.0014) × PKIG exp + 0.0104 × RALGPS2 exp + (−0.0202) × SH3BP5 exp. ID3 and RALGPS2 were positively correlated with risk scores, and GNG7 had the largest correlation coefficient with risk scores. To prove the stability and reliability of the model, we calculated the risk score of each sample in the training set and validation set according to the risk score formula and divided LUAD patients into low-risk and high-risk groups based on the median risk score. As the risk of LUAD patients increased in both cohorts, the patients showed a survival disadvantage and increased mortality (Fig. 4B). There was a significant difference in survival between the high-risk and low-risk groups in the two cohorts, with high-risk patients showing a worse survival disadvantage (Fig. 4C). The ROC curve was used as a tool for evaluating model performance. In the training set, the areas under the curve (AUCs) for risk and 1-year, 3-year, and 5-year survival were > 0.5, indicating that the model had good predictive performance (Fig. 4D-F). In GSE50081 dataset, the patients also showed a survival disadvantage and increased mortality as the risk of LUAD patients increased (Fig. 4G, H). The AUC areas predicted for risk, 1-year, 3-year, and 5-year survival were also all > 0.5 in GSE50081 dataset, showing good predictive performance of the model (Fig. 4I-K). In GSE37745 and GSE30219 dataset, patients in the high-risk group also showed worse survival time, suggesting that the model can better predict patient prognosis (Supplementary materials 2 A-J).
Construction and evaluation of prognostic risk model. (A) regression coefficient path diagram, cross verification curve and calibration curve. (B) scatter diagram showing living state in training set. (C) survivorship curve showing living state in training set. (D) ROC of training set. (E) ROC of risk score in training set. (F) ROC of 1, 3, 5 years in training set. (G) scatter diagram showing living state in validation set. (H) survivorship curve showing living state in validation set. (I) ROC of validation set. (J) ROC of risk score in validation set. (K) ROC of 1, 3, 5 years in validation set.
Characteristic genes expressions were different between high-risk and low-risk group
The expression of 6 characteristic genes in the high- and low-risk groups was analyzed. The results showed that, except for ID3, there were significant differences in 5 characteristic genes between the high- and low-risk groups in the training set (Fig. 5A). GNG7, HVCN1, PKIG and SH3BP5 were significantly different between the high- and low-risk groups in the validation set (Fig. 5B). Importantly, the changes in the six characteristic genes in the training set and validation set were consistent with the model coefficients, indicating the accuracy of the model.
Nomogram based on risk score and clinical characteristics predicted accurately the prognosis of LUAD patients
To verify the clinical value of the prognostic risk model constructed by BRDs, we used univariate regression analysis and multivariate regression analysis to evaluate the correlation between the risk score, clinical characteristics (age, sex, TNM stage, smoking history) and prognosis. Univariate regression analysis revealed that T stage, N stage and risk score were independent risk factors for LUAD patients (Fig. 6A). Multivariate regression analysis was conducted on these factors, and the results showed that N stage and risk score were risk factors (Fig. 6A, B). Next, we constructed a nomogram based on risk scores and clinical characteristics and clarified the role of the nomogram in prognosis (Fig. 6C). Univariate regression analysis revealed that the nomogram was also an independent risk factor for LUAD patients (Fig. 6D). ROC analysis revealed that the risk score, nomogram and clinical characteristics could accurately predict patient prognosis (Fig. 6E, F). We compared the risk score, nomogram and clinical characteristics and found that the T stage, risk score and nomogram had good predictive performance for 1-year, 3-year, and 5-year survival (Fig. 6G).
Moreover, in order to explore and compare the roles of characteristic genes of BRGs under different clinical characteristics, we divided LUAD patients into different subgroups and analyzed the survival of patients in different subgroups. It was worth noting that the survival times of patients in the high-risk and low-risk subgroups were significantly different, except for those in the M1 subgroup (Supplementary materials 3 A-F). The results showed that the prognostic risk model had good predictive performance in different situations.
Distributions of age, sex, TNM stage and smoking history were different in the high-risk and low-risk groups
We compared the distribution of clinical characteristics between the high-risk and low-risk groups, and the clustering effect was not very significant (Fig. 7A). Next, we calculated the proportions of patients with different clinical characteristics in the high-risk and low-risk groups and visualized the results (Fig. 7B-G). Among patients in the high-risk group, the proportions of males < 65 years old and those with a smoking history of T3-4, N1-3, M1, or 3–5 years were greater. The risk scores of LUAD patients were analyzed to reveal the relationships between the risk score and clinical variables (Fig. 7H-M). The results suggested that men, patients with high TNM stage and long-term smokers have higher risk scores, implying that these types of patients have a worse prognosis.
Function of the enriched gene was significantly different between high-risk and low-risk group
To clarify the role of BRGs in LUAD, 30 BRGs were used for functional enrichment analysis. GO analysis revealed that BRGs were mainly related to the differentiation and activation of B cells and lymphocytes (Fig. 8A). KEGG analysis revealed that BRGs were mainly involved in hematopoietic cell lines, B-cell receptor signaling and NF-κB signaling pathways (Fig. 8B). Next, we compared differentially enriched pathways between the high-risk and low-risk groups. The low-risk group was significantly enriched in the cell cycle and oocyte meiosis, and the high-risk group was significantly enriched in the tight junction and MAPK signaling pathways (Fig. 8C, D).
The probability of genetic mutation was higher in high-risk group
We obtained the SNV data of LUAD patients from the TCGA database and visualized the mutation data in the high-risk and low-risk groups. The top 3 mutated genes in all LUAD patients were TP53 (49%), TTN (43%), and MUC16 (41%) (Fig. 8E). The top 3 mutated genes in the high-risk group were TP53 (58%), TTN (50%), and MUC16 (46%) (Fig. 8F). The top 3 mutated genes in the low-risk group were TP53 (41%), CSMD3 (38%), and MUC16 (35%) (Fig. 8G). These results suggested that patients in the high-risk group had a greater probability of mutations. Next, we calculated the TMB scores of the two groups (Fig. 8H). The high-risk group had significantly greater TMB scores than did the low-risk group. Finally, we evaluated the relationship between the TMB and survival (Fig. 8I). Low-risk, low-TMB patients had better survival.
Immune infiltration was different between high-risk and low-risk group, and patients in the high-risk group responded poorly to some immunotherapy
Using 7 algorithms (CIBERSORT, XCELL, TIMER, QUANTISEQ, MCPCOUNTER, CIBERSORT-abs and EPIC), we studied the relationships between risk scores and immune cells. All the algorithm results showed significant differences in B-cell infiltration between the high- and low-risk groups (Fig. 9A, B). Then, we calculated the TMB in the high-risk and low-risk groups to evaluate immune infiltration. The stromal score, immune score and estimated score in the low-risk group were significantly greater than those in the high-risk group, while the tumor purity score was significantly lower than that in the high-risk group, indicating that patients in the low-risk group had a better prognosis (Fig. 9C). We compared the response to immunotherapy between the high-risk and low-risk groups. Among the 19 common immune checkpoints, 7 immune checkpoints, namely, CD40LG, HHLA2, LGALS9, TNFSF18, LAIR1, TNFRSF18 and TNFRSF4, exhibited significant differences in expression between the two groups (Fig. 9D). We calculated TIDE scores to assess the potential for immune evasion in both groups. The median value of TIDE scores was 0.03 in high-risk group and that was 0.01 in low-risk group. Patients in the high-risk group had a greater possibility of immune escape, although p > 0.05 (Fig. 9E). Finally, we analyzed the immunotherapy response in both groups of patients. Patients in the low-risk group benefited more from treatment with CTL4 inhibitors, PD1 inhibitors, and type 1 interferon (Fig. 9F, G).
Immune infiltration and immunotherapy analysis. (A) Immune cell infiltration in two groups. (B) 7 algorithms of immune infiltration. (C) TME in two groups. (D) immune checkpoint in two groups. (E) TIDE in two groups. (F) IPS score in two groups. (G) ssGSEA of immunotherapy analysis. *P < 0.05, **P < 0.01, ***P < 0.001.
Characteristic genes were regulated in a variety of ways, and two group patients had different sensitive drug profiles
We constructed TF-Gene and miRNA-Gene regulatory networks of 6 characteristic genes via the NetworkAnalyst database (Fig. 10A, B). The results showed that the expression of characteristic genes was regulated by a variety of TFs and miRNAs, suggesting potential molecular mechanisms that interfere with characteristic genes. Afterwards, we used the GDSC database to perform drug sensitivity analysis in the high-risk and low-risk groups. Among the 198 drugs tested, 49 drugs significantly affected drug susceptibility in the high-risk and low-risk groups (Fig. 10C). Representative sensitive drugs in the low-risk group included doramapimod, ribociclib, BMS-754,807, SB505124 and PF-4,708,671 (Fig. 10D-H). Representative sensitive drugs in the high-risk group included BI-2536, PAK-5339 and venetoclax (Fig. 10I-K). The results suggested that the high-risk and low-risk groups had different treatment strategies.
Regulatory network and drug sensitivity. (A) TF-Genes regulatory network constructed by NetworkAnalyst database (https://www.networkanalyst.ca/). (B) miRNA-Genes regulatory network constructed by NetworkAnalyst database (https://www.networkanalyst.ca/). (C) drug sensitivity of 49 drugs in two groups. (D-H) sensitive drugs of low-risk group. (I-K) sensitive drugs of high-risk group. *P < 0.05, **P < 0.01, ***P < 0.001.
Discussion
At present, great progress has been made in the treatment of LUAD, which mainly includes resection, chemotherapy, radiotherapy, and targeted therapy43. However, the results are still unsatisfactory, and the overall survival time of patients is still grim. With the deepening of research on immune checkpoints, immunotherapy has become a new option for LUAD patients44. Considering the role of T cells in immunity, PD-1 inhibitors have been developed and are at the forefront of LUAD immunotherapy45. As the main effector cells of humoral immunity, B cells are also involved in the construction of the TME46. However, the immune infiltration characteristics and potential mechanisms of action of B cells in LUAD have not been studied. This study aimed to elucidate the impact of B cells on the clinical characteristics and prognosis of LUAD patients. Biomarkers have great potential for exploring the TME and immune characteristics, which traditional methods of tumor research cannot accurately reflect47. Thus, we combined RNA-seq and scRNA-seq to explore B-cell-related characteristic genes to explore the immune characteristics and prognostic factors of LUAD. Finally, we obtained 30 BRGs. We found that these BRGs not only were involved in the differentiation and activation of B cells, lymphocytes, and monocytes but were also related to immune deficiencies.
To prevent overfitting, we used LASSO regression analysis to screen 6 characteristic genes, namely, GNG7, HVCN1, ID3, PKIG, RALGPS2 and SH3BP5, as molecular markers. The expression of GNG7, an important gene that regulates cell proliferation and induces apoptosis, is significantly reduced in LUAD patients and is significantly negatively correlated with patient prognosis. Zheng et al. reported that GNG7 significantly inhibited the occurrence of transplanted tumors in mice and the proliferation and migration of LUAD cells in vitro by inhibiting the expression of E2 promoter binding factor 1 (E2F1)48. As a member of the ID protein family, ID3 is believed to be involved in the regulation of tumor-associated macrophages to intervene in tumor proliferation and invasion49. In a study on endogenous biomarkers, RALGPS2 was identified as an important biomarker for LUAD, and its expression in tumor tissues was significantly greater than that in adjacent tissues50. RALGPS2 regulated tumor occurrence and development via 3 ceRNA networks. By detecting SH3BP5 in the blood of 171 early-stage LUAD patients, Qiao et al. reported that a reduced degree of SH3BP5 methylation was significantly related to an increased risk of LUAD, and the degree of reduction increased with advanced stage, suggesting that SH3BP5 may be a diagnostic marker in early-stage LUAD patients51. HVCN1 is characterised as a key modulator of B-cell receptor signaling pathway, and hypofunction of HVCN1 could have a role in the treatment of cancer related with BCR signaling52. Elayne Hondares et al. verified confirmed that HVCN1 is highly expressed in B cells of tumor patients. It promoted tumor proliferation and migration, and enhanced BCR signaling. HVCN1S, as one of the two subtypes of HVCN1, had stronger effects on promoting proliferation and migration53. PKI gene family can inactivate PKA and terminate PKA-induced gene expression. The major subtype PKIG regulates osteoblast and adipocyte differentiation, and loss of PKIG promotes osteogenesis and reduces adipogenesis54. The expression of PKIG is reduced in lung cancer patients. The expression of PKIG is positively correlated with the infiltration of T cells and the expression of cytokines in lung cancer patients, such as CCL2, CXCL12, and CXCR4, suggesting that PKIG participates in humoral immune response to regulate the progression of lung cancer55.
We evaluated the predictive performance of the prognostic risk model constructed based on characteristic genes through receiver operating characteristic (ROC) curves and nomograms56. With respect to different clinical characteristics, the areas under the curve (AUCs) were all > 0.5, indicating that the prognostic model could accurately evaluate patient outcomes. Survival analysis also verified this conclusion. We found that there were certain differences in the distributions of sex, age, TNM stage, and smoking history between the high-risk and low-risk groups, suggesting that there was a connection between risk score and the clinical characteristics of LUAD patients. The proportion of late-stage TNM patients in the high-risk group was greater than that in the low-risk group, indicating that the prognosis of these patients was worse.
Mutations in certain key genes are critical to tumorigenesis57. Therefore, we analyzed the mutation probability of various genes in two groups of LUAD patients. We found that patients in the high-risk group had a greater probability of having mutations, including mutations in TP53, TTN, and MUC16. Some studies have shown that the use of PD-1 inhibitors may produce better results in LUAD patients with TP53 mutations58. Notably, tumorigenesis and malignant transformation are usually the result of the accumulation of mutations in multiple genes, and a single gene is not enough to describe the overall mutation status of a tumor59. TMB refers to the cumulative number of somatic missense mutations and represents genomic instability60. Currently, TMB, as a new biomarker, is widely studied for its role in tumor prognosis. In a study of 151 LUAD patients, researchers found that EGFR mutations were most likely to occur in LUAD patients. TP53, KRAS, PIK3CA and other mutations also existed in LUAD patients. Patients with only EGFR mutations had a better prognosis than patients who harbored EGFR and other mutant driver genes61. Wang et al. suggested that the reason why patients with high TMB have better prognosis may be related to immune suppression and immune depletion pathways62. Our results also suggested that high TMB corresponds to poor prognosis in LUAD patients. Interestingly, patients with high TMB do not always show worse prognosis. He Z et al. found that patients with high TMB had prolonged progression-free survival, which was attributed to higher immune cell infiltration and high expression of immune genes63. In addition, high TMB cannot be used as a biomarker in all types of solid tumors, such as breast cancer and prostate cancer. There was no significant correlation between TMB levels and immune cell infiltration in the tumors of these patients64. Therefore, TMB as a prognostic biomarker for LUAD patients still needs more verification.
Immunotherapy based on immune checkpoint inhibitors has become an integral part of various cancer treatment strategies and is being promoted as a first-line treatment for advanced unresectable tumors65. By evaluating immune checkpoint gene expression in different patients, tumor patients who are more likely to benefit from immune checkpoint blockade therapy can be selected to achieve individualization and precision66. Based on the risk score, we evaluated the expression of immune checkpoints in the two groups of patients and found that patients in the low-risk group had higher expression of 7 immune checkpoints, suggesting that they are more likely to benefit from immunotherapy. Similarly, we found that patients in the low-risk group were more sensitive to PD-1 inhibitors and CTLA-4 inhibitors. The TME is critical for tumorigenesis and progression67. Subsequently, we evaluated immune infiltration in the TME of the two groups of patients, and patients in the low-risk group had more abundant immune cell infiltration in the TME. Finally, we constructed a regulatory network of TF-genes and miRNA-genes and sensitive drug expression profiles of the two groups of patients to explore the potential molecular mechanisms regulating characteristic genes and personalized treatment strategies.
Although the B-cell characteristic genes we constructed have good performance in identifying the immune landscape and predicting patient prognosis, there are still some limitations that we need to find appropriate methods to address. Our data analysis was based on public database data, which may cause the prediction results to deviate from the actual situation. More data from LUAD patients need to be collected to validate the utility of this model and the accuracy of immunotherapy predictions. Unfortunately, clinical samples are not obtained in this study. A well-designed prospective study can help us confirm whether high-risk patients have worse survival times, which is the focus of our follow-up studies. Moreover, with the help of single-cell sequencing methods, tumor samples can provide us with more information about B cells or other important cells in LUAD patients, which helps to elucidate the potential mechanism of poor prognosis of LUAD patients in the high-risk group.
Conclusion
For the first time, we confirmed the role of B-cell characteristic genes as biomarkers for LUAD patients, and these genes may be new targets for LUAD treatment. In addition, the prognostic risk model we established based on B-cell characteristic genes could evaluate the prognosis of LUAD patients to a certain extent. This study provides a basis for the clinical identification of patient subgroups that may benefit from immunotherapy and personalized treatment and provides new ideas for in-depth research on LUAD.
Data availability
The datasets generated and/or analysed during the current study are available in the GEO repository (https://www.ncbi.nlm.nih.gov/geo/) and TCGA repository (https://portal.gdc.cancer.gov/).
References
Siegel R. L., Miller K. D., Wagle N. S. & Jemal, A. Cancer statistics, 2023. CA Cancer J. Clin. 73 (1), 17–48. https://doi.org/10.3322/caac.21763 (2023).
Bade, B. C., Dela Cruz, C. S. & Lung, C. : Epidemiology, Etiology, and Prevention. Clinics in Chest Medicine 41(1), 1–24 (2020). https://doi.org/10.1016/j.ccm.2019.10.001
Mei, J. et al. ATF1 promotes the malignancy of lung adenocarcinoma cells by transcriptionally regulating ZNF143 expression. Acta Biochim. Biophys. Sin (Shanghai). 55 (5), 769–782. https://doi.org/10.3724/abbs.2023087 (2023).
Jiang, N. et al. HIF-1ɑ-regulated miR-1275 maintains stem cell-like phenotypes and promotes the progression of LUAD by simultaneously activating Wnt/β-catenin and notch signaling. Theranostics 10 (6), 2553–2570. https://doi.org/10.7150/thno.41120 (2020).
Sorin, M. et al. Single-cell spatial landscape of immunotherapy response reveals mechanisms of CXCL13 enhanced antitumor immunity. J. Immunother Cancer. 11 (2), e005545. https://doi.org/10.1136/jitc-2022-005545 (2023).
Kashima, Y. et al. Single-cell analyses reveal diverse mechanisms of resistance to EGFR tyrosine kinase inhibitors in Lung Cancer. Cancer Res. 81 (18), 4835–4848. https://doi.org/10.1158/0008-5472.CAN-20-2811 (2021).
Wang, Y. et al. Immune characteristics analysis and construction of a four-gene prognostic signature for lung adenocarcinoma based on estrogen reactivity. BMC Cancer. 23 (1), 1047. https://doi.org/10.1186/s12885-023-11415-y (2023).
Tokunaga, R. et al. B cell and B cell-related pathways for novel cancer treatments. Cancer Treat. Rev. 73, 10–19. https://doi.org/10.1016/j.ctrv.2018.12.001 (2019).
Patel, A. J., Richter, A., Drayson, M. T. & Middleton, G. W. The role of B lymphocytes in the immuno-biology of non-small-cell lung cancer. Cancer Immunol. Immunother. 69 (3), 325–342. https://doi.org/10.1007/s00262-019-02461-2 (2020).
Lu, Z. et al. Ten-eleven translocation-2 inactivation restrains IL-10-producing regulatory B cells to enable antitumor immunity in hepatocellular carcinoma. Hepatology 77 (3), 745–759. https://doi.org/10.1002/hep.32442 (2023).
Wang, S. S. et al. Tumor-infiltrating B cells: their role and application in anti-tumor immunity in lung cancer. Cell. Mol. Immunol. 16 (1), 6–18. https://doi.org/10.1038/s41423-018-0027-x (2019).
Bejarano, L., Jordāo, M. J. C. & Joyce, J. A. Therapeutic targeting of the Tumor Microenvironment. Cancer Discov. 11 (4), 933–959. https://doi.org/10.1158/2159-8290.CD-20-1808 (2021).
Chen, J. et al. ZAP-70 augments tonic B-cell receptor and CCR7 signaling in IGHV-unmutated chronic lymphocytic leukemia. Blood Adv. 8 (5), 1167–1178. https://doi.org/10.1182/bloodadvances.2022009557 (2024).
Helmink, B. A. et al. B cells and tertiary lymphoid structures promote immunotherapy response. Nature 577 (7791), 549–555. https://doi.org/10.1038/s41586-019-1922-8 (2020).
Schumacher, T. N. & Thommen, D. S. Tertiary lymphoid structures in cancer. Science 375 (6576), eabf9419. https://doi.org/10.1126/science.abf9419 (2022).
Fridman, W. H. et al. B cells and tertiary lymphoid structures as determinants of tumour immune contexture and clinical outcome. Nat. Rev. Clin. Oncol. 19 (7), 441–457. https://doi.org/10.1038/s41571-022-00619-z (2022).
Kobayashi, S. et al. TGF-β induces the differentiation of human CXCL13-producing CD4(+) T cells. Eur. J. Immunol. 46 (2), 360–371. https://doi.org/10.1002/eji.201546043 (2016).
Yu, L. et al. Characterization of cancer-related fibroblasts (CAF) in hepatocellular carcinoma and construction of CAF-based risk signature based on single-cell RNA-seq and bulk RNA-seq data. Front. Immunol. 13, 1009789. https://doi.org/10.3389/fimmu.2022.1009789 (2022).
Guo, S. et al. Integrated analysis of single-cell RNA-seq and bulk RNA-seq unravels T cell-related prognostic risk model and tumor immune microenvironment modulation in triple-negative breast cancer. Comput. Biol. Med. 161, 107066. https://doi.org/10.1016/j.compbiomed.2023.107066 (2023).
Wei, E. et al. Integration of scRNA-Seq and TCGA RNA-Seq to analyze the heterogeneity of HPV + and HPV- cervical Cancer Immune cells and establish molecular risk models. Front. Oncol. 12, 860900. https://doi.org/10.3389/fonc.2022.860900 (2022).
Barrett, T. et al. NCBI GEO: archive for functional genomics data sets–update. Nucleic Acids Res. 41 (Database issue), D991–D995. https://doi.org/10.1093/nar/gks1193 (2013).
Hao, Y. et al. Dictionary learning for integrative, multimodal and scalable single-cell analysis. Nat. Biotechnol. 42 (2), 293–304. https://doi.org/10.1038/s41587-023-01767-y (2024).
R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. (2024). https://www.R-project.org/
Aran, D. et al. Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage. Nat. Immunol. 20 (2), 163–172. https://doi.org/10.1038/s41590-018-0276-y (2019).
Wang, Z., Jensen, M. A. & Zenklusen, J. C. A practical guide to the Cancer Genome Atlas (TCGA). Methods Mol. Biol. 1418, 111–141. https://doi.org/10.1007/978-1-4939-3578-9_6 (2016).
Ritchie, M. E. et al. Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43 (7), e47. https://doi.org/10.1093/nar/gkv007 (2015).
Friedman, J., Hastie, T. & Tibshirani, R. Regularization paths for generalized Linear models via Coordinate Descent. J. Stat. Softw. 33 (1), 1–22 (2010).
Robin, X. et al. pROC: an open-source package for R and S + to analyze and compare ROC curves. BMC Bioinform. 12, 77. https://doi.org/10.1186/1471-2105-12-77 (2011).
Blanche, P., Dartigues, J. F. & Jacqmin-Gadda, H. Estimating and comparing time-dependent areas under receiver operating characteristic curves for censored event times with competing risks. Stat. Med. 32 (30), 5381–5397. https://doi.org/10.1002/sim.5958 (2013).
Moon, K. & autoReg Automatic Linear and Logistic Regression and Survival Analysis. 2023. R package version 0.3.3. https://CRAN.R-project.org/package=autoReg
Harrell, F. E. Jr rms: Regression Modeling Strategies. R package version 6.8-0. (2024). https://CRAN.R-project.org/package=rms
Wu, T. et al. clusterProfiler 4.0: a universal enrichment tool for interpreting omics data. Innov. (Camb). 2 (3), 100141. https://doi.org/10.1016/j.xinn.2021.100141 (2021).
Hänzelmann, S., Castelo, R. & Guinney, J. GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinform. 14, 7. https://doi.org/10.1186/1471-2105-14-7 (2013).
Mayakonda, A., Lin, D. C., Assenov, Y., Plass, C. & Koeffler, H. P. Maftools: efficient and comprehensive analysis of somatic variants in cancer. Genome Res. 28 (11), 1747–1756. https://doi.org/10.1101/gr.239244.118 (2018).
Yoshihara, K., Kim, H. & Verhaak, R. G. Estimate: Estimate of Stromal and Immune cells in malignant tumor tissues from expression data. R package version 1.0.13/r21. (2016). https://R-Forge.R-project.org/projects/estimate/
Morgan, M., Falcon, S., Gentleman, R. & GSEABase Gene set enrichment data structures and methods. R package version 1.66.0. (2024).
Charoentong, P. et al. Pan-cancer immunogenomic analyses reveal genotype-immunophenotype relationships and predictors of response to checkpoint blockade. Cell. Rep. 18 (1), 248–262. https://doi.org/10.1016/j.celrep.2016.12.019 (2017).
Fu, J. et al. Large-scale public data reuse to model immunotherapy response and resistance. Genome Med. 12 (1), 21. https://doi.org/10.1186/s13073-020-0721-z (2020).
Zhou, G. et al. NetworkAnalyst 3.0: a visual analytics platform for comprehensive gene expression profiling and meta-analysis. Nucleic Acids Res. 47 (W1), W234–W241. https://doi.org/10.1093/nar/gkz240 (2019).
Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13 (11), 2498–2504. https://doi.org/10.1101/gr.1239303 (2003).
Yang, W. et al. Genomics of Drug Sensitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells. Nucleic Acids Res. 41 (Database issue), D955–D961. https://doi.org/10.1093/nar/gks1111 (2013).
Maeser, D., Gruener, R. & oncoPredict Drug Response Modeling and Biomarker Discovery. R package version 1.2. (2024). https://CRAN.R-project.org/package=oncoPredict
Seguin, L., Durandy, M. & Feral, C. C. Lung Adenocarcinoma Tumor Origin: a guide for Personalized Medicine. Cancers (Basel). 14 (7), 1759. https://doi.org/10.3390/cancers14071759 (2022).
Cai, R. et al. To be, or not to be: the dilemma of immunotherapy for non-small cell lung cancer harboring various driver mutations. J. Cancer Res. Clin. Oncol. 149 (12), 10027–10040. https://doi.org/10.1007/s00432-023-04919-4 (2023).
Palermo, B. et al. CD28/PD1 co-expression: dual impact on CD8 + T cells in peripheral blood and tumor tissue, and its significance in NSCLC patients’ survival and ICB response. J. Exp. Clin. Cancer Res. 42 (1), 287. https://doi.org/10.1186/s13046-023-02846-3 (2023).
Downs-Canner, S. M., Meier, J., Vincent, B. G. & Serody, J. S. B cell function in the Tumor Microenvironment. Annu. Rev. Immunol. 40, 169–193. https://doi.org/10.1146/annurev-immunol-101220-015603 (2022).
Yu, Y., Wang, Z., Zheng, Q. & Li, J. FAM72 serves as a biomarker of poor prognosis in human lung adenocarcinoma. Aging (Albany NY). 13 (6), 8155–8176. https://doi.org/10.18632/aging.202625 (2021).
Zheng, H., Tian, H., Yu, X., Ren, P. & Yang, Q. G protein gamma 7 suppresses progression of lung adenocarcinoma by inhibiting E2F transcription factor 1. Int. J. Biol. Macromol. 182, 858–865. https://doi.org/10.1016/j.ijbiomac.2021.04.082 (2021).
Zhang, X., Wu, L., Zhang, X. & Xu, Y. Identifying the tumor-associated macrophage of lung adenocarcinoma reveals immune landscape through omics data integration. Heliyon 10 (6), e27586. https://doi.org/10.1016/j.heliyon.2024.e27586 (2024).
Tan, J., Wang, W., Song, B., Song, Y. & Meng, Z. Integrative analysis of three Novel competing endogenous RNA biomarkers with a Prognostic Value in Lung Adenocarcinoma. Biomed. Res. Int. 2020, 2837906. https://doi.org/10.1155/2020/2837906 (2020).
Qiao, R. et al. Novel blood-based hypomethylation of SH3BP5 is associated with very early-stage lung adenocarcinoma. Genes Genomics. 44 (4), 445–453. https://doi.org/10.1007/s13258-021-01190-0 (2022).
Boyd, R. S., Dyer, M. J. & Cain, K. Proteomic analysis of B-cell malignancies. J. Proteom. 73 (10), 1804–1822. https://doi.org/10.1016/j.jprot.2010.03.010 (2010).
Hondares, E. et al. Enhanced activation of an amino-terminally truncated isoform of the voltage-gated proton channel HVCN1 enriched in malignant B cells. Proc. Natl. Acad. Sci. U S A. 111 (50), 18078–18083. https://doi.org/10.1073/pnas.1411390111 (2014).
Chen, X. et al. Protein kinase inhibitor γ reciprocally regulates osteoblast and adipocyte differentiation by downregulating leukemia inhibitory factor. Stem Cells. 31 (12), 2789–2799. https://doi.org/10.1002/stem.1524 (2013).
Liu, Q. et al. Exploration of the perturbation of PKIG in lung squamous cell carcinoma and the role in Tumor Microenvironment based on Bioinformatics Method. Zhongguo Fei Ai Za Zhi. 26 (7), 523–537. https://doi.org/10.3779/j.issn.1009-3419.2023.102.24 (2023).
Zhang, W. et al. Nomogram predicts risk and prognostic factors for bone metastasis of pancreatic Cancer: a Population-based analysis. Front. Endocrinol. (Lausanne). 12, 752176. https://doi.org/10.3389/fendo.2021.752176 (2022).
Chand, S., Mehta, V., Sharma, R. K., Anvikar, A. R. & Chander, H. Cancer informatics analysis indicates high CHAC2 associated with unfavorable prognosis in breast cancer. Front. Oncol. 12, 1058931. https://doi.org/10.3389/fonc.2022.1058931 (2022).
Sun, H. et al. Specific TP53 subtype as biomarker for immune checkpoint inhibitors in lung adenocarcinoma. EBioMedicine 60, 102990. https://doi.org/10.1016/j.ebiom.2020.102990 (2020).
Scalera, S. et al. Maugeri-Saccà M. KEAP1 and TP53 frame genomic, evolutionary, and immunologic subtypes of lung adenocarcinoma with different sensitivity to Immunotherapy. J. Thorac. Oncol. 16 (12), 2065–2077. https://doi.org/10.1016/j.jtho.2021.08.010 (2021).
Sha, D. et al. Tumor Mutational Burden as a predictive biomarker in solid tumors. Cancer Discov. 10 (12), 1808–1825. https://doi.org/10.1158/2159-8290.CD-20-0522 (2020).
Zhang, Y. et al. The co-mutation of EGFR and tumor-related genes leads to a worse prognosis and a higher level of tumor mutational burden in Chinese non-small cell lung cancer patients. J. Thorac. Dis. 14 (1), 185–193. https://doi.org/10.21037/jtd-21-1921 (2022).
Li, L. & Li, J. Correlation of tumor mutational burden with prognosis and immune infiltration in lung adenocarcinoma. Front. Oncol. 13, 1128785. https://doi.org/10.3389/fonc.2023.1128785 (2023).
He, Z. et al. LRP1B mutation is associated with tumor immune microenvironment and progression-free survival in lung adenocarcinoma treated with immune checkpoint inhibitors. Transl Lung Cancer Res. 12 (3), 510–529. https://doi.org/10.21037/tlcr-23-39 (2023).
McGrail, D. J. et al. High tumor mutation burden fails to predict immune checkpoint blockade response across all cancer types. Ann. Oncol. 32 (5), 661–672. https://doi.org/10.1016/j.annonc.2021.02.006 (2021).
Zhang, Y. & Zhang, Z. The history and advances in cancer immunotherapy: understanding the characteristics of tumor-infiltrating immune cells and their therapeutic implications. Cell. Mol. Immunol. 17 (8), 807–821. https://doi.org/10.1038/s41423-020-0488-6 (2020).
O’Donnell, J. S., Teng, M. W. L. & Smyth, M. J. Cancer immunoediting and resistance to T cell-based immunotherapy. Nat. Rev. Clin. Oncol. 16 (3), 151–167. https://doi.org/10.1038/s41571-018-0142-8 (2019).
de Visser, K. E. & Joyce, J. A. The evolving tumor microenvironment: from cancer initiation to metastatic outgrowth. Cancer Cell. 41 (3), 374–403. https://doi.org/10.1016/j.ccell.2023.02.016 (2023).
Funding
This research was funded by National Administration of Traditional Chinese Medicine wih the Chinese Medicine innovation team and talent support program (ZYYCXTD-C-202206); Henan Province Traditional Chinese Medicine “Double First-Class” Scientific Research Project (HSRP-DFCTCM-2023-3-09); Henan Provincial Clinical Medical Research Center (Henan Provincial Respiratory Disease Clinical Medical Research Center); the Science and Technology Development Fund, Macau SAR (0098/2021/A2 and 0048/2023/AFJ), and Chinese Medicine Guangdong Laboratory (HQCML-C-2024007).
Author information
Authors and Affiliations
Contributions
LSY designed the research plan and provided funding support. XKX completed data acquisition and analysis construcetd the disease risk then wrote the manuscript. HD, FZY, LYX, ZH and WQB revised the manuscript. LY and LSX completed data quality control. All authors reviewed the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Xu, K., Han, D., Fan, Z. et al. B-cell signatures characterize the immune landscape and predict LUAD prognosis via the integration of scRNA-seq and bulk RNA-seq. Sci Rep 15, 5453 (2025). https://doi.org/10.1038/s41598-025-89213-8
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-89213-8












