Introduction

Lung cancer is a malignant tumor originating from the bronchial mucosa or lung glands that poses a great threat to human life. In recent years, the incidence and mortality of lung cancer have increased significantly1. Lung adenocarcinoma (LUAD), a subtype of non-small cell lung cancer (NSCLC), accounts for approximately 40% of all lung cancers and has a poor five-year survival rate2,3. The early symptoms of lung cancer are often recurrent, and a majority of patients may be in an advanced state when diagnosed, with tumor cell invasion and metastasis already present, which limits therapeutic options and results in a poor prognosis4. Over the past few decades, our understanding of the molecular pathogenesis of LUAD has improved significantly due to the rapid development of omics technology5,6. A series of omic data-derived signatures were generated to predict the clinical outcomes of LUAD patients7. Accordingly, more novel multigene signatures are valuable for predicting the outcome and recurrence of LUAD.

B cells are the main effector cells of humoral immunity, while Tfh cells are helper T cells that control the maturation and activation of B cells. The interaction of B cells, Tfh cells, and dendritic cells (DCs) is the basis of the adaptive immune response8. B cells have a variety of immune response functions. Tumor-infiltrating B lymphocytes (TIBs) can be observed in a variety of solid tumors9,10. Existing evidence shows that TIBs inhibit tumor progression by secreting immunoglobulins, promoting T-cell responses, and directly killing cancer cells11.

In recent years, the important role of the tumor microenvironment (TME) in tumor progression and treatment has emphasized the importance of identifying immune expression profiles and immune signatures in patients with different tumors12. TIBs are important infiltrating cells in the TME. In chronic lymphocytic leukemia, interfering with TIB receptors and B-cell-related CCR7 signaling can delay tumor progression13. TIBs and B-cell-related pathways also maintain the structure and function of tertiary lymphoid structures (TLSs). TLSs consist of T-cell- and B-cell-rich regions that are sites of differentiation of effector T cells and memory T and B cells14. TLSs are transient ectopic lymphoid aggregates whose structural organization and function are similar to those of secondary lymphoid organs15. Studies have shown that the presence of TLSs in the TME is associated with local antitumor immune responses and a positive patient prognosis16. TIBs and B-cell-related pathways play key roles in the formation of TLSs and the local immune response that occurs in TLSs17.

With the rapid development of bulk RNA sequencing (RNA-seq) and single-cell RNA sequencing (scRNA-seq), a large number of new technologies have been used to screen and identify key genes and have effective predictive capabilities in disease diagnosis, treatment, and prognosis18,19,20. Therefore, this study combined RNA-seq and scRNA-seq to identify characteristic genes and immune expression profiles of LUAD patients. Based on B-cell-related characteristic genes, a prognostic risk prediction model for LUAD patients was constructed, and a variety of techniques were used to analyze the prognostic characteristics of LUAD patients. This study aimed to explore the impact of B-cell-related genes on the prognosis of LUAD patients. The detailed flow chart is shown in Fig. 1.

Fig. 1
figure 1

Flow chart.

Materials and methods

Data acquisition and processing

The scRNA-seq data of GSE164983, which included 2 patients with lung adenocarcinoma, were downloaded from the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/)21. First, the “Seurat” R package22 was used to convert scRNA-seq data into Seurat objects and to exclude substandard quality cells via R software 4.4.0 (https://www.R-project.org/)23. The features of RNA, RNA count, proportion of mitochondria and number of red blood cells were assessed using the PercentageFeatureSet function in the “Seurat” R package. Then, to screen single cells, each cell was set to express 300–10000genes, UMI > 600, proportion of mitochondria < 10% and proportion of red blood cells < 1%. The batch effects for 2 samples were eliminated using the FindIntegrationAnchors function. The top 2000 variable genes were identified using the “FindVariableFeatures” program. Nonlinear dimensional reduction was conducted using principal component analysis (PCA) with 41 principal components and a resolution of 0.6. Two thousand genes were used for cell subpopulation identification via t-distributed stochastic neighbor embedding (tSNE) and uniform manifold approximation and projection (UMAP). The “SingleR” R package24 was used for the annotation of different cell types. To identify marker genes in B cells, the “FindAllMarkers” tool was used with a |log2FC| > 1. LUAD transcriptome data, copy number variation (CNV) data and clinical information were obtained from The Cancer Genome Atlas (TCGA) database (https://portal.gdc.cancer.gov/)25. For the RNA-seq data, 541 tumor samples and 49 normal samples were ultimately included in the training set. The GSE50081, GSE30219 and GSE37745 dataset were downloaded from the GEO database as a validation set. Differentially expressed genes (DEGs) between tumor and normal samples were screened using the “limma” R package26 with a false discovery rate (FDR) < 0.05 and |log2FC| > 1.

Construction and verification of the B-cell-related DEG (BRGs) signature

BRGs were screened for subsequent analysis by intersecting B-cell markers and DEGs. The “glmnet” R package27was used to perform LASSO regression analysis on the BRGs and to construct a risk model. Cox multivariate regression analysis revealed 6 characteristic genes of BRGs and correlation coefficients (COEF). The calculation formula was as follows: risk score = Expression mRNA1 × COEF mRNA1 + Expression mRNA2 × COEF mRNA2 +…+ Expression mRNAn × COEF mRNAn. Then, a risk score was calculated for each patient. Patients in the training set were divided into high-risk and low-risk groups based on the median risk score. K‒M survival analysis was performed, and a receiver operating characteristic (ROC) curve was constructed using the “pROC”28and “timeROC” R packages29. To verify the predictive ability of the risk model, we evaluated the prognosis, sensitivity, and specificity of this model in the TCGA cohort. Then, it was verified in the validation set according to the risk scoring formula.

Independent prognostic analysis and nomogram construction

To determine whether 6 characteristic genes of BRGs could serve as independent predictive factors in LUAD patients, we performed univariate and multivariate Cox regression analyses using the “autoReg” R package30. A nomogram for clinical patients based on age, sex, TNM stage, smoking history, and risk score was created using the “rms” R package31.

Prognostic value analysis of the risk score

K‒M analysis was used to explore the prognostic value of the risk score based on age, sex, TNM stage, and smoking history.

Functional enrichment analysis

GO and KEGG pathway analyses were performed using the “clusterProfiler” R package32. The “GSVA” R package33 was used for GSVA analysis with “c2.cp.kegg_legacy.v2023.2.Hs.symbols” and “c5.go.v2023.2.Hs.symbols” to determine differences in enrichment pathways between different risk groups.

Simple nucleotide variation (SNV) analysis

The “maftools” R package34 was used to calculate SNVs from LUAD samples. We calculated the tumor mutation burden (TMB) for each LUAD patient and explored the relationship between the risk score and the TMB. K‒M analysis was used to explore the prognostic value of the TMB in LUAD patients.

Correlation analysis of characteristic genes of BRGs and immune infiltration

Seven algorithms, including CIBERSORT, XCELL, TIMER, QUANTISEQ, MCPCOUNTER, EPIC and CIBERSORT-ABS, were used to evaluate the correlation between risk scores and tumor-infiltrating immune cells. Using 19 common suppressive immune checkpoints, we evaluated the expression levels of immune checkpoints between the low-risk and high-risk groups. In addition, the “estimate” R package35 was used to calculate the TME scores of the two groups of patients, including the stromal scores, immune scores, estimated scores and tumor purity scores.

Immune function analysis and immune escape analysis

The “GSEABase” R package36 was used to calculate immune function scores. Immunophenotypic scoring (IPS) was calculated by The Cancer Immunome Atlas database (https://tcia.at/home)37 to assess the response to immunotherapy in different groups of patients. In addition, TIDE scores were calculated using the Tumor Immune Dysfunction and Exclusion Database (http://tide.dfci.harvard.edu/login/)38 to evaluate immune escape in different groups of patients.

Construction of the regulatory network

Transcription factor (TF)-gene regulatory network and miRNA-gene regulatory network based on characteristic genes were constructed with the NetworkAnalyst database39, and Cytoscape 3.8.240 was used for visualization.

Drug sensitivity analysis

Drug sensitivity analysis was based on the Genomics of Drug Sensitivity in Cancer database (GDSC, https://www.cancerrxgene.org/)41. We estimated the sensitivities of commonly used drugs in LUAD patients in different groups through the “OncoPredict” R package42.

Results

scRNA-seq analysis and identification of B-cell markers

First, we performed quality control on the scRNA-seq data as described previously (Fig. 2A and supplementary materials 1 A-C) and obtained a total of 19,263 cells. After downstream clustering, a total of 22 distinct cell subpopulations were identified. The “SingleR” R package was then used to annotate and visualize the reduced clusters of cell types. Overall, we identified 8 major cell types in this step, including monocytes-macrophages, T cells, epithelial cells, endothelial cells, natural killer (NK) cells, B/plasma cells, fibroblasts, and neutrophil cell (Fig. Fig. 2B). Figure Fig. 2 C shows marker gene expression in these cell subpopulations. We further analyzed and partially displayed the markers of 8 types of cells (Fig. Fig. 2D) and identified 125 markers of B cells.

Fig. 2
figure 2

ScRNA-seq analysis. (A) quality control by feature of RNA, count of RNA, proportion of mitochondria and red blood cell. (B) cell annotation by umap. (C) cell markers of 8 type cells. (D) top 10 marker genes of 8 type cells.

Screening of DEGs and BRGs

First, we used the “limma” R package to screen the DEGs of LUAD patients in the TCGA cohort and obtained a total of 3378 DEGs, including 1441 downregulated genes and 1937 upregulated genes (Fig. 3B). The heatmap shows the top 50 upregulated and downregulated genes (Fig. 3A). To clarify the immune infiltration profile of LUAD patients, we used CIBERSORT to analyze the infiltration of 22 types of immune cells in LUAD patients. The results showed that the infiltration of 18 types of immune cells, including B cells, plasma cells, CD4+ memory T cells, and macrophages, changed significantly in LUAD patients (Fig. 3C). We found that the infiltration of B cells and plasma cells increased significantly in LUAD patients. Therefore, we considered B cells may play an important role in LUAD. To uncover the role of B cells in LUAD, we intersected the B cell markers and DRGs, and 30 BRGs were selected for further analysis to clarify the role of B cells in LUAD (Fig. 3D).

Fig. 3
figure 3

Screening of DEGs and BRGs. (A) heatmap showing the top 50 upregulated and downregulated genes. (B) volcano plot showing DEGs. (C) immune cell infiltration between normal and tumor. (D) Venn showing 30 BRGs.

Two groups of patients distinguished based on a prognostic risk model had different survival times

To clarify the characteristic genes, LASSO regression was used to screen the characteristic genes and construct a prognostic risk prediction model based on the characteristic genes. We identified a total of 6 characteristic genes, including G protein subunit gamma 7 (GNG7), voltage-gated hydrogen channel 1 (HVCN1), DNA binding inhibitory factor 3 (ID3), cAMP-dependent protein kinase inhibitor G (PKIG), ral GEF with PH domain and SH3 binding motif 2 (RALGPS2) and SH3 domain binding protein 5 (SH3BP5) (Fig. 4A). A prognostic risk model was constructed based on these 6 characteristic genes. The risk score calculation formula was risk score = (−0.0413) × GNG7 exp + (−0.0019) × HVCN1 exp + 0.0017 × ID3 exp + (−0.0014) × PKIG exp + 0.0104 × RALGPS2 exp + (−0.0202) × SH3BP5 exp. ID3 and RALGPS2 were positively correlated with risk scores, and GNG7 had the largest correlation coefficient with risk scores. To prove the stability and reliability of the model, we calculated the risk score of each sample in the training set and validation set according to the risk score formula and divided LUAD patients into low-risk and high-risk groups based on the median risk score. As the risk of LUAD patients increased in both cohorts, the patients showed a survival disadvantage and increased mortality (Fig. 4B). There was a significant difference in survival between the high-risk and low-risk groups in the two cohorts, with high-risk patients showing a worse survival disadvantage (Fig. 4C). The ROC curve was used as a tool for evaluating model performance. In the training set, the areas under the curve (AUCs) for risk and 1-year, 3-year, and 5-year survival were > 0.5, indicating that the model had good predictive performance (Fig. 4D-F). In GSE50081 dataset, the patients also showed a survival disadvantage and increased mortality as the risk of LUAD patients increased (Fig. 4G, H). The AUC areas predicted for risk, 1-year, 3-year, and 5-year survival were also all > 0.5 in GSE50081 dataset, showing good predictive performance of the model (Fig. 4I-K). In GSE37745 and GSE30219 dataset, patients in the high-risk group also showed worse survival time, suggesting that the model can better predict patient prognosis (Supplementary materials 2 A-J).

Fig. 4
figure 4

Construction and evaluation of prognostic risk model. (A) regression coefficient path diagram, cross verification curve and calibration curve. (B) scatter diagram showing living state in training set. (C) survivorship curve showing living state in training set. (D) ROC of training set. (E) ROC of risk score in training set. (F) ROC of 1, 3, 5 years in training set. (G) scatter diagram showing living state in validation set. (H) survivorship curve showing living state in validation set. (I) ROC of validation set. (J) ROC of risk score in validation set. (K) ROC of 1, 3, 5 years in validation set.

Characteristic genes expressions were different between high-risk and low-risk group

The expression of 6 characteristic genes in the high- and low-risk groups was analyzed. The results showed that, except for ID3, there were significant differences in 5 characteristic genes between the high- and low-risk groups in the training set (Fig. 5A). GNG7, HVCN1, PKIG and SH3BP5 were significantly different between the high- and low-risk groups in the validation set (Fig. 5B). Importantly, the changes in the six characteristic genes in the training set and validation set were consistent with the model coefficients, indicating the accuracy of the model.

Fig. 5
figure 5

Differential expression of characteristic genes. (A) differential expression of 6 characteristic genes in training set. (B) differential expression of 6 characteristic genes in validation set.

Nomogram based on risk score and clinical characteristics predicted accurately the prognosis of LUAD patients

To verify the clinical value of the prognostic risk model constructed by BRDs, we used univariate regression analysis and multivariate regression analysis to evaluate the correlation between the risk score, clinical characteristics (age, sex, TNM stage, smoking history) and prognosis. Univariate regression analysis revealed that T stage, N stage and risk score were independent risk factors for LUAD patients (Fig. 6A). Multivariate regression analysis was conducted on these factors, and the results showed that N stage and risk score were risk factors (Fig. 6A, B). Next, we constructed a nomogram based on risk scores and clinical characteristics and clarified the role of the nomogram in prognosis (Fig. 6C). Univariate regression analysis revealed that the nomogram was also an independent risk factor for LUAD patients (Fig. 6D). ROC analysis revealed that the risk score, nomogram and clinical characteristics could accurately predict patient prognosis (Fig. 6E, F). We compared the risk score, nomogram and clinical characteristics and found that the T stage, risk score and nomogram had good predictive performance for 1-year, 3-year, and 5-year survival (Fig. 6G).

Moreover, in order to explore and compare the roles of characteristic genes of BRGs under different clinical characteristics, we divided LUAD patients into different subgroups and analyzed the survival of patients in different subgroups. It was worth noting that the survival times of patients in the high-risk and low-risk subgroups were significantly different, except for those in the M1 subgroup (Supplementary materials 3 A-F). The results showed that the prognostic risk model had good predictive performance in different situations.

Fig. 6
figure 6

Construction of nomogram. (A) single factor and multifactor regression analysis. (B) forest map showing single factor regression analysis. (C) nomogram. (D) single factor and multifactor regression analysis. (E) ROC. (F) time ROC. (G) ROC in 1, 3, 5 years.

Distributions of age, sex, TNM stage and smoking history were different in the high-risk and low-risk groups

We compared the distribution of clinical characteristics between the high-risk and low-risk groups, and the clustering effect was not very significant (Fig. 7A). Next, we calculated the proportions of patients with different clinical characteristics in the high-risk and low-risk groups and visualized the results (Fig. 7B-G). Among patients in the high-risk group, the proportions of males < 65 years old and those with a smoking history of T3-4, N1-3, M1, or 3–5 years were greater. The risk scores of LUAD patients were analyzed to reveal the relationships between the risk score and clinical variables (Fig. 7H-M). The results suggested that men, patients with high TNM stage and long-term smokers have higher risk scores, implying that these types of patients have a worse prognosis.

Fig. 7
figure 7

Distribution of clinical characteristics in two groups. (A) heatmap showing distribution of clinical characteristics. (B-G) histogram frequency distribution diagram. (H-M) difference of distribution clinical characteristics. *P < 0.05, **P < 0.01, ***P < 0.001.

Function of the enriched gene was significantly different between high-risk and low-risk group

To clarify the role of BRGs in LUAD, 30 BRGs were used for functional enrichment analysis. GO analysis revealed that BRGs were mainly related to the differentiation and activation of B cells and lymphocytes (Fig. 8A). KEGG analysis revealed that BRGs were mainly involved in hematopoietic cell lines, B-cell receptor signaling and NF-κB signaling pathways (Fig. 8B). Next, we compared differentially enriched pathways between the high-risk and low-risk groups. The low-risk group was significantly enriched in the cell cycle and oocyte meiosis, and the high-risk group was significantly enriched in the tight junction and MAPK signaling pathways (Fig. 8C, D).

Fig. 8
figure 8

Functional enrichment analysis and TMB analysis. (A) GO of BRGs. (B) KEGG of BRGs. (C) GO of GSVA. (D) KEGG of GSVA. (E-G) mutation of all LUAD patients, high-risk group and low-risk group. (H) TMB in two groups. (I) survival analysis based on risk score and TMB.

The probability of genetic mutation was higher in high-risk group

We obtained the SNV data of LUAD patients from the TCGA database and visualized the mutation data in the high-risk and low-risk groups. The top 3 mutated genes in all LUAD patients were TP53 (49%), TTN (43%), and MUC16 (41%) (Fig. 8E). The top 3 mutated genes in the high-risk group were TP53 (58%), TTN (50%), and MUC16 (46%) (Fig. 8F). The top 3 mutated genes in the low-risk group were TP53 (41%), CSMD3 (38%), and MUC16 (35%) (Fig. 8G). These results suggested that patients in the high-risk group had a greater probability of mutations. Next, we calculated the TMB scores of the two groups (Fig. 8H). The high-risk group had significantly greater TMB scores than did the low-risk group. Finally, we evaluated the relationship between the TMB and survival (Fig. 8I). Low-risk, low-TMB patients had better survival.

Immune infiltration was different between high-risk and low-risk group, and patients in the high-risk group responded poorly to some immunotherapy

Using 7 algorithms (CIBERSORT, XCELL, TIMER, QUANTISEQ, MCPCOUNTER, CIBERSORT-abs and EPIC), we studied the relationships between risk scores and immune cells. All the algorithm results showed significant differences in B-cell infiltration between the high- and low-risk groups (Fig. 9A, B). Then, we calculated the TMB in the high-risk and low-risk groups to evaluate immune infiltration. The stromal score, immune score and estimated score in the low-risk group were significantly greater than those in the high-risk group, while the tumor purity score was significantly lower than that in the high-risk group, indicating that patients in the low-risk group had a better prognosis (Fig. 9C). We compared the response to immunotherapy between the high-risk and low-risk groups. Among the 19 common immune checkpoints, 7 immune checkpoints, namely, CD40LG, HHLA2, LGALS9, TNFSF18, LAIR1, TNFRSF18 and TNFRSF4, exhibited significant differences in expression between the two groups (Fig. 9D). We calculated TIDE scores to assess the potential for immune evasion in both groups. The median value of TIDE scores was 0.03 in high-risk group and that was 0.01 in low-risk group. Patients in the high-risk group had a greater possibility of immune escape, although p > 0.05 (Fig. 9E). Finally, we analyzed the immunotherapy response in both groups of patients. Patients in the low-risk group benefited more from treatment with CTL4 inhibitors, PD1 inhibitors, and type 1 interferon (Fig. 9F, G).

Fig. 9
figure 9

Immune infiltration and immunotherapy analysis. (A) Immune cell infiltration in two groups. (B) 7 algorithms of immune infiltration. (C) TME in two groups. (D) immune checkpoint in two groups. (E) TIDE in two groups. (F) IPS score in two groups. (G) ssGSEA of immunotherapy analysis. *P < 0.05, **P < 0.01, ***P < 0.001.

Characteristic genes were regulated in a variety of ways, and two group patients had different sensitive drug profiles

We constructed TF-Gene and miRNA-Gene regulatory networks of 6 characteristic genes via the NetworkAnalyst database (Fig. 10A, B). The results showed that the expression of characteristic genes was regulated by a variety of TFs and miRNAs, suggesting potential molecular mechanisms that interfere with characteristic genes. Afterwards, we used the GDSC database to perform drug sensitivity analysis in the high-risk and low-risk groups. Among the 198 drugs tested, 49 drugs significantly affected drug susceptibility in the high-risk and low-risk groups (Fig. 10C). Representative sensitive drugs in the low-risk group included doramapimod, ribociclib, BMS-754,807, SB505124 and PF-4,708,671 (Fig. 10D-H). Representative sensitive drugs in the high-risk group included BI-2536, PAK-5339 and venetoclax (Fig. 10I-K). The results suggested that the high-risk and low-risk groups had different treatment strategies.

Fig. 10
figure 10

Regulatory network and drug sensitivity. (A) TF-Genes regulatory network constructed by NetworkAnalyst database (https://www.networkanalyst.ca/). (B) miRNA-Genes regulatory network constructed by NetworkAnalyst database (https://www.networkanalyst.ca/). (C) drug sensitivity of 49 drugs in two groups. (D-H) sensitive drugs of low-risk group. (I-K) sensitive drugs of high-risk group. *P < 0.05, **P < 0.01, ***P < 0.001.

Discussion

At present, great progress has been made in the treatment of LUAD, which mainly includes resection, chemotherapy, radiotherapy, and targeted therapy43. However, the results are still unsatisfactory, and the overall survival time of patients is still grim. With the deepening of research on immune checkpoints, immunotherapy has become a new option for LUAD patients44. Considering the role of T cells in immunity, PD-1 inhibitors have been developed and are at the forefront of LUAD immunotherapy45. As the main effector cells of humoral immunity, B cells are also involved in the construction of the TME46. However, the immune infiltration characteristics and potential mechanisms of action of B cells in LUAD have not been studied. This study aimed to elucidate the impact of B cells on the clinical characteristics and prognosis of LUAD patients. Biomarkers have great potential for exploring the TME and immune characteristics, which traditional methods of tumor research cannot accurately reflect47. Thus, we combined RNA-seq and scRNA-seq to explore B-cell-related characteristic genes to explore the immune characteristics and prognostic factors of LUAD. Finally, we obtained 30 BRGs. We found that these BRGs not only were involved in the differentiation and activation of B cells, lymphocytes, and monocytes but were also related to immune deficiencies.

To prevent overfitting, we used LASSO regression analysis to screen 6 characteristic genes, namely, GNG7, HVCN1, ID3, PKIG, RALGPS2 and SH3BP5, as molecular markers. The expression of GNG7, an important gene that regulates cell proliferation and induces apoptosis, is significantly reduced in LUAD patients and is significantly negatively correlated with patient prognosis. Zheng et al. reported that GNG7 significantly inhibited the occurrence of transplanted tumors in mice and the proliferation and migration of LUAD cells in vitro by inhibiting the expression of E2 promoter binding factor 1 (E2F1)48. As a member of the ID protein family, ID3 is believed to be involved in the regulation of tumor-associated macrophages to intervene in tumor proliferation and invasion49. In a study on endogenous biomarkers, RALGPS2 was identified as an important biomarker for LUAD, and its expression in tumor tissues was significantly greater than that in adjacent tissues50. RALGPS2 regulated tumor occurrence and development via 3 ceRNA networks. By detecting SH3BP5 in the blood of 171 early-stage LUAD patients, Qiao et al. reported that a reduced degree of SH3BP5 methylation was significantly related to an increased risk of LUAD, and the degree of reduction increased with advanced stage, suggesting that SH3BP5 may be a diagnostic marker in early-stage LUAD patients51. HVCN1 is characterised as a key modulator of B-cell receptor signaling pathway, and hypofunction of HVCN1 could have a role in the treatment of cancer related with BCR signaling52. Elayne Hondares et al. verified confirmed that HVCN1 is highly expressed in B cells of tumor patients. It promoted tumor proliferation and migration, and enhanced BCR signaling. HVCN1S, as one of the two subtypes of HVCN1, had stronger effects on promoting proliferation and migration53. PKI gene family can inactivate PKA and terminate PKA-induced gene expression. The major subtype PKIG regulates osteoblast and adipocyte differentiation, and loss of PKIG promotes osteogenesis and reduces adipogenesis54. The expression of PKIG is reduced in lung cancer patients. The expression of PKIG is positively correlated with the infiltration of T cells and the expression of cytokines in lung cancer patients, such as CCL2, CXCL12, and CXCR4, suggesting that PKIG participates in humoral immune response to regulate the progression of lung cancer55.

We evaluated the predictive performance of the prognostic risk model constructed based on characteristic genes through receiver operating characteristic (ROC) curves and nomograms56. With respect to different clinical characteristics, the areas under the curve (AUCs) were all > 0.5, indicating that the prognostic model could accurately evaluate patient outcomes. Survival analysis also verified this conclusion. We found that there were certain differences in the distributions of sex, age, TNM stage, and smoking history between the high-risk and low-risk groups, suggesting that there was a connection between risk score and the clinical characteristics of LUAD patients. The proportion of late-stage TNM patients in the high-risk group was greater than that in the low-risk group, indicating that the prognosis of these patients was worse.

Mutations in certain key genes are critical to tumorigenesis57. Therefore, we analyzed the mutation probability of various genes in two groups of LUAD patients. We found that patients in the high-risk group had a greater probability of having mutations, including mutations in TP53, TTN, and MUC16. Some studies have shown that the use of PD-1 inhibitors may produce better results in LUAD patients with TP53 mutations58. Notably, tumorigenesis and malignant transformation are usually the result of the accumulation of mutations in multiple genes, and a single gene is not enough to describe the overall mutation status of a tumor59. TMB refers to the cumulative number of somatic missense mutations and represents genomic instability60. Currently, TMB, as a new biomarker, is widely studied for its role in tumor prognosis. In a study of 151 LUAD patients, researchers found that EGFR mutations were most likely to occur in LUAD patients. TP53, KRAS, PIK3CA and other mutations also existed in LUAD patients. Patients with only EGFR mutations had a better prognosis than patients who harbored EGFR and other mutant driver genes61. Wang et al. suggested that the reason why patients with high TMB have better prognosis may be related to immune suppression and immune depletion pathways62. Our results also suggested that high TMB corresponds to poor prognosis in LUAD patients. Interestingly, patients with high TMB do not always show worse prognosis. He Z et al. found that patients with high TMB had prolonged progression-free survival, which was attributed to higher immune cell infiltration and high expression of immune genes63. In addition, high TMB cannot be used as a biomarker in all types of solid tumors, such as breast cancer and prostate cancer. There was no significant correlation between TMB levels and immune cell infiltration in the tumors of these patients64. Therefore, TMB as a prognostic biomarker for LUAD patients still needs more verification.

Immunotherapy based on immune checkpoint inhibitors has become an integral part of various cancer treatment strategies and is being promoted as a first-line treatment for advanced unresectable tumors65. By evaluating immune checkpoint gene expression in different patients, tumor patients who are more likely to benefit from immune checkpoint blockade therapy can be selected to achieve individualization and precision66. Based on the risk score, we evaluated the expression of immune checkpoints in the two groups of patients and found that patients in the low-risk group had higher expression of 7 immune checkpoints, suggesting that they are more likely to benefit from immunotherapy. Similarly, we found that patients in the low-risk group were more sensitive to PD-1 inhibitors and CTLA-4 inhibitors. The TME is critical for tumorigenesis and progression67. Subsequently, we evaluated immune infiltration in the TME of the two groups of patients, and patients in the low-risk group had more abundant immune cell infiltration in the TME. Finally, we constructed a regulatory network of TF-genes and miRNA-genes and sensitive drug expression profiles of the two groups of patients to explore the potential molecular mechanisms regulating characteristic genes and personalized treatment strategies.

Although the B-cell characteristic genes we constructed have good performance in identifying the immune landscape and predicting patient prognosis, there are still some limitations that we need to find appropriate methods to address. Our data analysis was based on public database data, which may cause the prediction results to deviate from the actual situation. More data from LUAD patients need to be collected to validate the utility of this model and the accuracy of immunotherapy predictions. Unfortunately, clinical samples are not obtained in this study. A well-designed prospective study can help us confirm whether high-risk patients have worse survival times, which is the focus of our follow-up studies. Moreover, with the help of single-cell sequencing methods, tumor samples can provide us with more information about B cells or other important cells in LUAD patients, which helps to elucidate the potential mechanism of poor prognosis of LUAD patients in the high-risk group.

Conclusion

For the first time, we confirmed the role of B-cell characteristic genes as biomarkers for LUAD patients, and these genes may be new targets for LUAD treatment. In addition, the prognostic risk model we established based on B-cell characteristic genes could evaluate the prognosis of LUAD patients to a certain extent. This study provides a basis for the clinical identification of patient subgroups that may benefit from immunotherapy and personalized treatment and provides new ideas for in-depth research on LUAD.