Introduction

Non-small cell lung cancer (NSCLC) constitutes 85% of lung cancers and remains the leading cause of cancer-related death worldwide1. Surgical resection of the primary lesion is performed in approximately 30–80% of NSCLC patients; however, 30–55% of these patients experience relapse and succumb to the disease upon progression2,3. Recent research advances have shown that patients with stage IIIA, IIIB, and IIIC NSCLC benefit from concurrent or sequential chemoradiotherapy, albeit the benefits are limited, with respective 5-year overall survival (OS) rates of 36, 26, and 13%4. Over the past decade, significant progress has been made in the therapeutic strategy for NSCLC, particularly with the advent of immunotherapy. The development of immune checkpoint inhibitors (ICIs) has led to unprecedented and prolonged survival for a subset of patients. Initially, the clinical development of ICIs, including anti-PD-1 and PD-L1 therapies, focused on monotherapy in the second-line setting. However, recent advancements have shifted towards combination approaches in first-line settings and earlier integration of immunotherapy in the clinical paradigm. Despite substantial efforts to optimize the use of ICIs, the low response rates to anti-PD-1 therapy and the common occurrence of resistance underscore the necessity for identifying novel biomarkers.

B cells that infiltrate tumors have emerged as significant entities within the cancer research community and are now being used to predict the efficacy of immunotherapy5,6,7. Although these B cells exhibit spatiotemporal variation across different cancer types, they fulfill multiple roles, primarily due to their ability to differentiate into plasma cells that produce antibodies7. Analyzing the B cell repertoire and developmental stages across diverse cancer types may enhance the immunotherapeutic response. T and B lymphocytes that infiltrate tumors constitute essential and collaborative components of the tumor microenvironment (TME)8,9,10. However, the transcriptional diversity of tumor-infiltrating B cells has been underappreciated, obscuring their complex roles in various cancer types. Recent studies have highlighted the necessity of clarifying B cell states and compositions in diverse cancer types, revealing significant adaptability in response to stress and tumor reactivity11,12.

The POU domain class 2 transcription factor 2 (POU2F2), also known as octamer-binding protein 2 (OCT2), is a B-cell-restricted transcription factor belonging to the POU domain family, which utilizes the POU domain to bind DNA13,14. Previous studies have indicated that POU2F2 is predominantly expressed in B cells and B cell lineage tumor cells, where it regulates immunoglobulin (Ig) production, B cell proliferation, and B cell differentiation genes15,16,17,18. Recent research has demonstrated that POU2F2 is also expressed in some solid tumors and influences the prognosis of cancer patients, including those with clear cell renal cell carcinoma, gastric cancer, and pancreatic cancer19,20,21. However, the effects of POU2F2 on the progression of lung cancer remain unclear.

In this study, we elucidated the significant role of POU2F2+ B cells in influencing survival and immune cell infiltration in NSCLC. We systematically characterized three developmental genes of B cells and evaluated the correlation between these genes and the prognosis of NSCLC. Multiplex immunohistochemistry (mlHC) was performed on tissues of primary LUAD and LUSC, revealing that POU2F2 is predominantly expressed in B cells. The mlHC assays demonstrated differences in spatial proximity between POU2F2-negative and POU2F2-positive B cells with PD-1+ CD8+ T cells. Our findings highlight the pivotal role of POU2F2+ B cells in lung cancer, providing a theoretical foundation for the identification of immunotherapy markers and the development of new targeted drugs.

Materials and methods

Patients and samples

Our study included samples from 10 NSCLC patients, consisting of 5 LUAD and 5 LUSC patients, all of whom underwent curative resection at the First Affiliated Hospital of Nanjing Medical University (Supplementary Table 1). Specimens were collected via surgical resection following a stringent standard operating procedure and were archived in formalin-fixed, paraffin-embedded (FFPE) tissue blocks. The study was conducted in accordance with the ethical standards of the Ethics Committee of the First Affiliated Hospital of Nanjing Medical University, and all specimens were collected with informed consent from the patients. Ethical approval for this study was granted by the Ethics Committee of the First Affiliated Hospital of Nanjing Medical University.

Data processing for scRNA-seq data

Processed scRNA-seq datasets and author-supplied annotations were obtained from each study. The raw count data were analyzed using R version 4.1.1 and the Seurat package (v4.0.4) for downstream analyses. Quality control was applied to the preliminarily filtered data, removing cells with fewer than 500 unique molecular identifiers (UMIs) and more than 10% mitochondrial gene content. Potential doublets were excluded by filtering out cells with UMI counts exceeding 60,000. Subsequently, all genes were normalized, and principal component analysis (PCA) was performed on the 2000 most highly variable genes. The resolution parameter for cluster identification was set to 0.8. Uniform manifold approximation and projection (UMAP), a nonlinear dimensionality reduction method, was employed for cell visualization. B cells and plasma cells were isolated from the integrated dataset for further analysis.

B cell developmental trajectory

The developmental trajectory of B cells was inferred using Monocle2. We first utilized the “relative2abs” function in Monocle2 to convert transcripts per million (TPM) into normalized mRNA counts and created an object using the parameter “expressionFamily = negbinomial.size,” following the Monocle2 tutorial. Differentially expressed genes (DEGs) from each cluster were identified using the “differentialGeneTest” function, with genes having a q-value < 1e-5 used to order cells in pseudotime analysis. After constructing cell trajectories, differentially expressed genes along pseudotime were detected using the “differentialGeneTest” function.

Differential gene expression analysis

The R package “edgeR” was employed to identify differentially expressed genes (DEGs) between the immunotherapy-responsive and non-responsive groups. A threshold of log2 (fold-change) > 1.5 or < − 1.5 and a false discovery rate (FDR) < 0.05 were used to select the most significant DEGs.

Prognosis analysis of multiple genes by GEPIA2

Gene expression profiling interactive analysis 2 (GEPIA2) is a web-based analytical platform that integrates transcriptomic expression profiles and survival data from various tumor types, sourced from the cancer genome atlas (TCGA) and other cancer databases. For prognosis analysis, users first select the “Survival Map” interface, then input the gene lists and cancer types of interest. The analysis is set to evaluate overall survival, with a significance threshold of 0.05, no P-value adjustment, and a medium group cutoff. In this study, the following B cell signature gene lists were defined as follow: POU2F2+ B cell signature (POU2F2, CD19, CD79A, and CD79B), CD2+ B cell signature (CD2, CD19, CD79A, and CD79B), and CST7+ B cell signature (CST7, CD19, CD79A, and CD79B). These gene lists were used to generate corresponding survival maps for further analysis.

Multiplex immunohistochemistry (mIHC) and spatial distance analysis

Multiplex staining of NSCLC tissues was conducted using the PANO-4-plex IHC kit (Cat# 10293100100, Panovue) according to the manufacturer’s instructions. The primary antibodies used were anti-PD-1 (Cat#13684, CST), anti-CD8 (Cat# 85336, CST), anti-CD68 (Cat#97778, CST), anti-CD206 (Cat#91992, CST), anti-CD20 (Cat#70168, CST), and anti-POU2F2 (Cat#49509, CST). This was followed by incubation with horseradish peroxidase-conjugated secondary antibodies and tyramide signal amplification. Nuclei were stained with DAPI after all the above antigens were labeled. The Opal detection fluorophores used included CD8-Opal 480, PD-1-Opal 650, CD20-Opal 570, POU2F2-Opal 520; CD68-Opal 480, CD206-Opal 650, CD20-Opal 570, and POU2F2-Opal 520. Slides were scanned using a Vectra Polaris system, followed by quantitative and spatial analyses using HALO software (v3.3.2541.202).

Pathway enrichment analysis

Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses22,23 were performed using the R package “clusterProfiler” and the “org.Hs.eg.db” database. The analysis utilized “ENTREZID” identifiers. Adjusted p-values were calculated using the “Benjamini-Hochberg” adjustment method, with an adjusted p-value of 0.05 used as the significance cutoff.

Immune infiltration analysis

CIBERSORT is a deconvolution algorithm based on the principle of linear support vector regression, used to analyze the expression matrix of immune cell subtypes. It employs RNA-Seq data to estimate the abundance of immune cells in a sample20. The CIBERSORT R package was utilized to estimate the quantities of 22 immune cell types between disease and control samples in the datasets. The immune cell composition was visualized using boxplots. Differences in immune cell proportions were calculated using the Wilcoxon test, with P < 0.05 considered statistically significant.

Statistical analysis

Unpaired two-tailed Student’s t-tests were used to compare gene expression between the two groups. Unpaired two-tailed Wilcoxon rank-sum tests were used to evaluate the cell distribution differences between the two groups for scRNA-seq analysis. A P value of less than 0.05 was considered to indicate statistical significance.

Fig. 1
Fig. 1
Full size image

Prognostic role of three hallmark genes (POU2F2, CD2, CST7) in NSCLC. (a) Venn diagram of differentially expressed genes in GSE139555, GSE127465, GSE126044, and GSE135222. (b) Kaplan-Meier survival curves illustrating the prognostic implications of POU2F2 in NSCLC. (c) Kaplan-Meier survival curves illustrating the prognostic implications of CD2 in NSCLC. (d) Kaplan-Meier survival curves illustrating the prognostic implications of CST7 in NSCLC. (e) Kaplan-Meier survival curves illustrating the prognostic implications of POU2F2+ B cells in NSCLC. (f) Kaplan-Meier survival curves illustrating the prognostic implications of CD2+ B cells in NSCLC. (g) Kaplan-Meier survival curves illustrating the prognostic implications of CST7+ B cells in NSCLC.

Results

Identification of three hallmark genes associated with B cell development, immunotherapy efficacy, and NSCLC prognosis

To identify crucial biomarkers related to B cells and their potential functions in NSCLC, we extracted B cell subsets from two NSCLC single-cell transcriptome datasets (GSE139555 and GSE127465) and conducted pseudotime analysis (Supplementary Figs. 1 and 2) to derive a list of genes associated with B cell development (Supplementary Tables 2, 3). Furthermore, we selected two GEO bulk-RNA profiles (GSE126044 and GSE135222) to examine the DEGs in NSCLC patients responsive to immunotherapy versus those with immune resistance (Supplementary Tables 4, 5). Subsequently, three critical genes (CD2, CST7, and POU2F2) were common across the four GEO profiles, as illustrated by Venn diagram analysis (Fig. 1a). Next, we analyzed the roles of hallmarked genes in the NSCLC by Kaplan Meier (KM) survival analysis. We found that the expression of POU2F2 was statistically associated with the overall survival rates of LUAD and LUSC samples (p = 0.0095 and p = 0.025, respectively). However, high POU2F2 expression was associated with better prognosis in LUAD, while it correlated with poorer prognosis in LUSC (Fig. 1b). Although tumors with high CD2 expression had significantly better overall survival rates than those with low expression level in LUAD (p = 0.022), there was no significant correlation between the expression of CD2 and overall survival rates of patients with LUSC (p = 0.59) (Fig. 1c). For CST7, CST7 expression was associated with overall survival (OS) in LUAD (p = 0.015) but not in LUSC (p = 0.73) (Fig. 1d). Furthermore, we analyzed the role of the POU2F2+ B cell signature in predicting the prognosis of NSCLC. We found that POU2F2+ B cells were statistically associated with the overall survival rate in LUAD (p = 0.00085) but not in LUSC (p = 0.17) (Fig. 1e). Additionally, CD2+ B cells (Fig. 1f) and CST7+ B cells (Fig. 1g) were linked to better prognosis in LUAD (p = 0.0035 and p = 0.0055, respectively).

POU2F2 is relatively highly expressed on B cells in the TME of NSCLC

We next examined the specific expression patterns of the three core genes, POU2F2, CD2, and CST7, across different cell types in NSCLC. Analysis of single-cell transcriptome datasets for NSCLC (GSE139555, GSE127465, EMTAB6149, and GSE117570) revealed that, compared to CD2 and CST7, POU2F2 is predominantly expressed in B cells (Fig. 2a, b and Supplementary Fig. 3a, b). Specifically, according to the NSCLC single-cell transcriptome datasets, POU2F2 is expressed not only in B cells but also in a subset of T cells and monocytes, with minimal expression in malignant cells, endothelial cells, and fibroblasts. In contrast, CD2 and CST7 showed more pronounced expression in T cells. To further validate the expression of POU2F2 in B cells within NSCLC, we performed multiplex immunohistochemistry (mIHC) and confirmed the colocalization of POU2F2 with the B cell marker CD20 in primary LUAD (Fig. 2c) and LUSC (Fig. 2d) lesions.

Next, we analyzed B cells extracted from the single-cell dataset GSE139555, dividing them into POU2F2+ and POU2F2- groups based on POU2F2 expression levels. DEGs between these two B-cell subsets were identified (Supplementary Table 6), and subsequent GO and KEGG analyses were performed. GO enrichment terms included regulation of actin filament, actin polymerization or depolymerization, and actin filament organization, as well as processes such as the viral life cycle and protein complex assembly (Fig. 2f). KEGG pathway analysis revealed that the POU2F2 gene is primarily implicated in the regulation of the actin cytoskeleton, as well as in conditions such as Salmonella infection, Parkinson’s disease, prion diseases, pathogenic Escherichia coli infection, and tight junction integrity (Fig. 2g).

Fig. 2
Fig. 2
Full size image

POU2F2 is highly expressed in tumor infiltration B cell. (a,b) UMAP plots showing different cell types and the expression of POU2F2, CD2, and CST7 in LUAD scRNA-seq datasets (GSE139555, GSE127465). (c,d) mIHC images showing the colocalization of POU2F2 and CD20 in LUAD (c) and LUSC (d). Scale bar: 100 μm, 20 μm. (e) Violin plot showing B cell groups based on POU2F2 expression levels. (f) GO enrichment analysis results for B cells with differential POU2F2 expression. g) KEGG enrichment analysis results for B cells with differential POU2F2 expression.

The role of POU2F2+ B cells in immune cell infiltration of NSCLC

We characterized immune cell distribution in NSCLC tumors using mIHC, which allowed simultaneous visualization of five markers in each formalin-fixed and paraffin-embedded tissue section. Representative images of different cells in tumor tissue are shown in (Fig. 3). When comparing the spatial distance between POU2F2+/− B cells and PD-1+ CD8+ T cells, a greater distance between PD-1+ CD8+ T cells and POU2F2+ CD20+ B cells was observed in LUAD tumor tissues (Fig. 3a). However, in LUSC, no significant difference in the spatial distance between POU2F2+/− B cells and PD-1+ CD8+ T cells was observed (Fig. 3b). Additionally, we found that POU2F2+ CD20+ B cells were spatially further from CD206+ CD68+ macrophages in LUAD (Fig. 3c), but closer to CD206+ CD68+ macrophages in LUSC (Fig. 3d). Further analysis using TCGA-LUAD and TCGA-LUSC data revealed that a high POU2F2+ B cell score in LUAD was associated with a higher proportion of CD8+ T cells, M1 macrophages, and a lower proportion of M2 macrophages (Fig. 3e). In contrast, in LUSC, a high POU2F2+ B cell score was associated only with a higher proportion of M1 macrophages, with no significant differences in the proportions of CD8+ T cells or M2 macrophages (Fig. 3f).

Fig. 3
Fig. 3
Full size image

POU2F2+B cells were associated with better immune infiltration in NSCLC. (a,b) mIHC staining and spatial distance calculations for CD20, POU2F2, CD8, and PD-1 in LUAD (a) and LUSC (b). (c,d) mIHC staining and spatial distance calculations for CD20, POU2F2, CD68, and CD206 in LUAD (c) and LUSC (d). (e,f) The relationship between POU2F2+ B cell score and immune cell infiltration in TCGA-LUAD (e) and TCGA-LUSC (f). ns not significant, *P < 0.05; **P < 0.01; ***P < 0.001; ****P < 0.0001.

Discussion

Recent studies indicate that B cells may serve as agents for next-generation immunotherapy, with a specific emphasis on the production of anti-tumor-associated antibodies that enhance the antitumor capabilities of phagocytes in presenting antigens to CD8+ T cells24,25,26. Spatially, B cells engage extensively with T cells within tertiary lymphoid structures (TLS), which correlates with enhanced survival rates in various cancers7,8,9,10 and increased immune infiltration within the tumor microenvironment5,27,28. Previous research has demonstrated that B cells also exhibit tumor-promoting properties through the release of cytokines, formation of immune complexes, and engagement in immune checkpoint pathways. These observations underscore both the beneficial and detrimental roles of B cells, emphasizing the necessity for a comprehensive, data-driven analysis of B cells in human oncology29,30,31. Recent studies have demonstrated that proliferating B cells can be observed in approximately 35% of lung cancers, and their presence varies between stage and histological subtypes, suggesting a critical role for B cells during lung tumor progression32,33,34. Some studies highlight the capacity of B cells to induce and maintain beneficial antitumor activity, while others have found that B cells may exert protumor functions due to their various immunosuppressive subtypes. In this study, we identified a key gene, POU2F2, pivotal to the developmental trajectory of B cells. We discuss the role of POU2F2+ B cells in NSCLC and their potential clinical applications.

POU2F2 is originally considered a B cell specific transcription factor that regulates B cell proliferation and differentiation by binding to the immunoglobulin gene promoter35,36, and expressed in the B cell lineage, where its expression increases upon cellular activation37. However, it is not exclusively limited to B cells, as POU2F2 has also been detected in neurons, macrophages, and T cells38. Our findings demonstrated that POU2F2 was expressed in a variety of immune cell types, including T cells and monocytes, in addition to its predominant expression in B cells. While POU2F2 is best known for its regulation of B cell development, proliferation, and differentiation, its expression in T cells and monocytes points to a potential immunoregulatory role beyond B cell-mediated responses. Although this regulatory mechanism has not been identified, the combined effect of POU2F2 in these immune subsets may help shape the overall immune landscape of NSCLC. High expression of POU2F2+ B cells, along with modulation of T cell and monocyte activity, may result in a more robust anti-tumor immune response, as observed in our correlation with improved survival outcomes in LUAD. However, in LUSC, where the immune microenvironment is typically more suppressive, the impact of POU2F2 may be less pronounced, highlighting the complexity of tumor-immune interactions in different NSCLC subtypes. Further investigation into how POU2F2 regulates immune cell differentiation and function across these different subsets will be essential for understanding its full role in cancer immunology. However, this broad immune expression suggests that developing a B-cell-specific POU2F2 therapy will be a challenge.

Indeed, inter and intratumor heterogeneity is a key factor contributing to poor prognosis and variations in therapy responses between LUAD and LUSC39. The complex physiological environment in the TME exerts comprehensive effects on the biological performance of diverse immune cells, including infiltration, migration, polarization, function, and metabolism40. A high proportion of CD8+ T cells and M1 macrophages suggests an active state of tumor immunity, whereas a high proportion of M2 macrophages indicates a pro-tumor state. As professional antigen-presenting cells (APCs), B cells can process and present major histocompatibility complex (MHC) class I and II epitopes to CD8+ and CD4+ T cells, enabling them to induce, shape, and amplify T cell responses41. In a study of metastatic ovarian tumors, tumor-infiltrating B cells secreted GM-CSF, IFNγ, IL-12p40, CXCL10, and IL-7, which could stimulate and induce macrophage differentiation42. Although our study has found that POU2F2+ B cells can participate in the regulation of tumor immune microenvironment, the precise mechanisms remain unclear. Interestingly, we found that POU2F2+ B cells were located further from PD-1+ CD8+ T cells and CD206+ CD68+ macrophages compared to POU2F2 B cells in LUAD. The spatial distribution of POU2F2+ CD20+ B cells relative to PD-1+ CD8+ T cells and CD206+ CD68+ macrophages highlight the complex interplay within the TME. The increased distance from PD-1+ CD8+ T cells suggests that POU2F2+ B cells may modulate T cell activity indirectly, potentially through cytokine or chemokine signaling, not engaging directly with T cells through physical contact, such as through immune synapse formation or direct presentation of antigens. By being further away, POU2F2+ B cells might avoid mechanisms of immune suppression that could arise from direct interactions with PD-1+ CD8+ T cells, such as activation of inhibitory pathways (e.g., PD-1/PD-L1 signaling). This aligns with our observation of elevated CD8+ T cell proportions in patients with high POU2F2+ B cell scores. Moreover, the limited proximity to CD206+ CD68+ macrophages, which are associated with the immunosuppressive M2 phenotype, may reduce tumor-promoting interactions and shift the immune balance toward an antitumor state. These findings underscore the multifaceted role of POU2F2+ B cells in shaping the immune landscape of LUAD and suggest potential avenues for therapeutic intervention.

The spatial distribution of POU2F2+ CD20+ B cells relative to PD-1+ CD8+ T cells and CD206+ CD68+ macrophages highlights several potential molecular biological mechanisms underlying their role in the tumor immune microenvironment. The increased distance from PD-1+ CD8+ T cells suggests that POU2F2+ B cells may regulate T cell activity through paracrine signaling mechanisms involving cytokines or chemokines, rather than direct antigen presentation or immune synapse formation. This spatial separation may help avoid inhibitory pathways, such as PD-1/PD-L1 signaling, thereby enhancing CD8+ T cell functionality. Additionally, the reduced proximity to CD206+ CD68+ macrophages, associated with the immunosuppressive M2 phenotype, may limit tumor-promoting interactions. Notably, studies have shown that POU2F2 can regulate the expression and release of chemokines via pathways such as NF-κB signaling, potentially influencing the spatial localization of different immune cell subsets within the TME43. POU2F2’s involvement in transcriptional regulation of key immune-modulating genes, including those governing cytokine secretion and antigen presentation, warrants further investigation. Future studies should explore whether POU2F2 mediates immune cell recruitment and functional polarization by directly activating or repressing target genes in B cells and other immune subsets.

This study has several limitations that should be addressed in future investigations. First, while our data strongly suggest a role for POU2F2+ B cells in modulating the immune landscape of LUAD, the underlying regulatory mechanisms remain poorly understood. Future work should focus on elucidating whether POU2F2 exerts its immunomodulatory effects through direct cytokine signaling, transcriptional regulation, or through cross-talk with other immune cells. Furthermore, our research involved a relatively small sample size, which may limit the generalizability of the findings. Increasing the sample size in future studies will be critical for validating our conclusions. Finally, the spatial distribution of POU2F2+ B cells suggests potential signaling dynamics with other immune cells that warrant more in-depth exploration using advanced spatial transcriptomics technologies.

In summary, this study suggests that POU2F2 may serve as a biomarker for predicting the function of B cells and the prognosis of LUAD. Furthermore, it appears that POU2F2+ B cells facilitate immune cell infiltration in LUAD. Given the varied functions of POU2F2, targeting POU2F2 in B cells is anticipated to be a crucial therapeutic approach for managing LUAD.