Introduction

Primary central nervous system lymphoma (PCNSL), a rare subtype of extranodal non-Hodgkin lymphoma, is associated with an unfavorable prognosis even when treated with standard high-dose methotrexate (HD-MTX)-based regimens [1, 2]. PCNSL is a unique lymphoma that differs from other tumors and is characterized as a genetically heterogeneous disorder marked by a variety of low-frequency mutations, somatic copy number alterations, and structural variants [3,4,5,6]. Thus, combining standard immunochemotherapy with promising novel agents that target specific pathways via different molecular clusters may improve patient prognosis [7, 8].

Pathologically, nearly 95% of PCNSLs are diffuse large B-cell lymphomas (DLBCLs). Recently, DLBCL has been categorized into various molecular clusters on the basis of genomic sequencing and RNA sequencing, including quartet categorization by L.M. Staudt’s team [9], quintet categorization by M.A. Shipp’s team [10], L.M. Staudt’s team’s refined septet method [11], and the LymphPlex classification proposed by W.L. Zhao’s team [12]. However, PCNSL has been proven to be a biological entity that is molecularly distinct from DLBCL [13], so these molecular clusters of DLBCL may not be applicable to PCNSL. Notably, the integration of genome-wide data from multiomic studies by A. Alentorn et al. [14] showed four molecular patterns of PCNSL mainly using gene expression data. These patterns have a distinctive prognostic impact, providing a basis for future clinical stratification and subtype-based targeted interventions. Owing to the molecular heterogeneity between various ethnic groups of PCNSL patients [3, 15], the applicability of the proposed classifications for PCNSL to other populations requires further confirmation. No effective biomarker or molecular classification scheme exists to tailor therapies for individual PCNSL patients. Furthermore, the issue of PCNSL molecular heterogeneity in Chinese patients has not been adequately addressed, primarily because the data have come from small, single-center studies of Chinese PCNSL patients [16, 17].

To address these issues, we conducted whole-exome sequencing (WES) on specimens obtained from 140 patients with newly diagnosed PCNSL, and divided into a discovery cohort and a validation cohort. Additionally, WES/Whole Genome Sequencing (WGS) data were obtained from 36 Chinese PCNSL patients at two other tertiary care cancer centers [17], serving as an independent external cohort. To enable clinical translation, we defined three molecular subtypes based solely on single-nucleotide variations (SNVs), each associated with distinct outcomes. This study delineates the genomic landscape of Chinese PCNSL and proposes a potential classification framework that may inform precision oncology.

Materials/Subjects and Methods

Sample collection and clinicopathological features of PCNSL patients

In this study, a total of 140 newly diagnosed PCNSL patients were retrospectively enrolled from January 2010 to December 2020 from three independent medical centers, namely, Huashan Hospital of Fudan University, Renji Hospital of Shanghai Jiao Tong University, and Shanghai Cancer Center of Fudan University. The study comprised two cohorts: the discovery cohort, which included 58 patients from Huashan Hospital of Fudan University, Shanghai; and the validation cohort (n = 82), which consisted of 25 patients from Huashan Hospital of Fudan University, 51 patients from Renji Hospital of Shanghai Jiao Tong University, and 6 patients from Shanghai Cancer Center of Fudan University. In the discovery cohort, paired blood samples were also obtained for analysis in conjunction with each tumor sample. In the validation cohort, only unpaired Formalin-Fixed, Paraffin-Embedded (FFPE) tumor tissue samples were obtained for analysis. Specimens from patients with PCNSL in both the discovery and validation cohorts were subjected to WES for genomic profiling.

Additionally, WES (n = 12) /WGS (n = 24) data of 36 patients newly diagnosed with PCNSL from Fujian Cancer Hospital and The First Affiliated Hospital of Fujian Medical University between February 2012 and October 2020 were included in this study, as an independent external Chinese PCNSL cohort. The general clinical characteristics of these 36 PCNSL patients and the sequencing protocol have been previously described [17].

None of the patients had received steroid treatment for PCNSL, and PCNSL tumors were diagnosed on the basis of the World Health Organization (WHO) criteria [13]. All patients underwent a 2-deoxy-2[F-18] fluoro-D-glucose positron emission tomography/computed tomography (FDG PET/CT) scan and bone marrow aspiration to exclude systemic tumor manifestation on the basis of the guidelines of the International PCNSL Collaborative Group [18] and the European Association of Neuro-Oncology [19]. PCNSL patients for whom ocular involvement was present were excluded. Each tumor tissue sample was independently reviewed by at least 2 pathologists to confirm that the tumor sample was histologically consistent with the PCNSL. All patients underwent long-term follow-up, which ran to December 30, 2022.

Treatment Regimens

All PCNSL patients were treated with HD-MTX-based combination immunochemotherapy. The following chemotherapy protocols were used for induction: HD-MTX combined with idarubicin (IDA), HD-MTX combined with rituximab (R), HD-MTX combined with IDA and R, or a combination of Bruton’s tyrosine kinase (BTK) inhibitors. Whole-brain radiation therapy or stem cell transplantation is a form of consolidation therapy. The detailed therapeutic schedule was the same as that previously described [20].

Sample processing

The collection and processing of samples for this study were carried out by the Department of Pathology or Department of Neurosurgery at Huashan Hospital of Fudan University, Renji Hospital of Shanghai Jiao Tong University, and Shanghai Cancer Center of Fudan University. In the discovery cohort, surplus tumor tissues and paired blood samples obtained during surgical procedures were collected. To guarantee the quality of these samples, each specimen was promptly and accurately labeled with relevant patient information within 30 min of collection and then immediately snap-frozen in liquid nitrogen. These samples were stored at -80 °C until being shipped to GenomiCare Biotechnology (Shanghai, China). In the validation cohort, the residual tumor tissue specimens were subjected to FFPE. These FFPE samples were transported in a dry ice container to Sinotech Genomics Technologies in Shanghai, China, accompanied by a time and temperature tracker to ensure proper monitoring during shipment. Upon arrival, the samples were promptly stored in liquid nitrogen to maintain their integrity until further processing.

DNA extraction

For frozen fresh tissue and matched peripheral blood samples, total genomic DNA (gDNA) was extracted via the Maxwell RSC Blood DNA Kit (AS1400, Promega) on a Maxwell RSC system (AS4500, Promega) following the manufacturer’s instructions. For FFPE tissue, total gDNA was extracted via the QIAamp DNA FFPE Tissue Kit (56404, Qiagen) following the manufacturer’s instructions. The integrity and concentration of the gDNA were determined via the Qsep100 System (BIOptic, China) and a Qubit 3.0 fluorometer (Thermo Fisher Scientific). The OD260 was measured with a NanoDrop One (Thermo Fisher Scientific).

WES library preparation and sequencing

For frozen tissue and matched peripheral blood samples, exome DNA was extracted via the SureSelect Human All Exon V7 Kit (5991-9039EN, Agilent). The library was subsequently prepared via the SureSelectXT Low Input Target Enrichment and Library Preparation system (G9703-90000, Agilent). For the FFPE sample, the exome DNA was captured via the SureSelect Human All Exon V8 Kit (5191-6874, Agilent) and prepared into a library via the SureSelect XT HS2 DNA Reagent Kit (G9983A, Agilent). The library was validated via the use of an Agilent 2100 Bioanalyzer and a Qubit 3.0 fluorometer. Paired-end 150 bp read sequencing was performed on an Illumina NovaSeq 6000. Image analysis and base calling were performed via onboard RTA3 software (Illumina). During bioinformatic processing, ComBat batch correction was applied at the feature matrix level to minimize residual systematic differences between FF and FFPE samples after initial quality control and filtering.

Somatic mutation calling

Quality control was conducted via FastQC software. The cleaned and trimmed FASTQ files were aligned to the UCSC human reference genome (hg19) via the Burrows–Wheeler Aligner (BWA) with default parameters. For each paired blood sample, SNVs and short insertions/deletions (INDELs) were identified via Sentieon TNseq [21] with default parameters. The identified somatic mutations were removed if they did not satisfy any one of the following criteria: a variant allele frequency (VAF) of at least 0.05, support from a minimum of three reads, annotation by the Variant Effect Predictor (VEP) package [22], and transformation into a mutation annotation format (MAF) file for further analysis with maftools [23].

When a paired normal sample (FFPE sample) was not available, a panel of normal (PON) files was used as a paired control to call the SNVs and InDels. A panel of normal files was created using the reads from clinical blood samples that showed no evidence of tumor contamination from 100 different individuals collected by Genomicare Biotechnology (Shanghai). The reads were retained when they passed the standard WES quality control, as described in the WES subsection. The reads for reference alleles and alternative alleles were gathered for each candidate site in all 100 samples, except for false positives, which were those exhibiting a candidate allele frequency exceeding 0.05 in more than 5 samples.

We applied an additional PON mask for all the candidate somatic SNV and INDEL sites to exclude germline mutations, as previously reported [10, 24]. In general, in addition to the standard retention rules, mutations were retained if they were known driver alterations highlighted as being biologically significant in the COSMIC database. Mutations with a tumor mutation frequency ≥20% were also retained. Other germline mutations were retained if their existence ratio in the Exome Aggregation Consortium (ExAC) exac_all was < 0.00005, their existence ratio in the Asian population exac_eas was < 0.0003, and their existence ratio in previous GenomiCare samples germline_gc was < 0.0006. Retained mutations were considered somatic mutations.

Thus, the PON file and PON mask limit possible recurrent artifacts of sequencing and the presence of germline mutations in somatic mutations that are falsely selected owing to a lack of paired blood data.

Somatic copy number alteration calling and profiling

Using the method described in the ExomeCNV package [25], we implemented a normalized depth‒coverage ratio method to identify CNVs in paired samples. To correct for five potential biases (the size of exonic regions, batch effects, both the quantity and quality of sequencing data, local GC content, and genomic mappability) that affect the raw read counts, we utilized a standard normal distribution model. Genes with a haploid copy number ≥3 or ≤1.2 were defined as amplified or deleted, respectively, and a minimum tumor content (purity) of 20% was needed.

Significance analysis of recurrent somatic copy number alterations

Sequenza [26] was used to generate the segment files. The segment files were used as the input for GISTIC2.0 [27] to detect recurrent arm-level and focal peaks in copy number alterations via the parameters -js 40 -conf 0.99 -ta 0.2 -td 0.2 -brlen 0.8, with other parameters set to the defaults.

Cancer-driver gene mutations filtering

After the aforementioned steps of calling SNVs and CNVs, we further filtered the resulting mutated genes by intersecting them with a designated group of genes recognized as cancer-related genes. This group of genes was sourced from two prominent public cancer gene databases. Part 1 of the group was obtained from the OncoKB curated cancer gene list (https://www.oncokb.org/cancerGenes) [28]. These genes are classified as cancer genes by OncoKB on the basis of their inclusion in various sequencing panels, the Sanger Cancer Gene Census, or the criteria established by Vogelstein et al. [29]. Part 2 was derived from the ranked CIViC gene candidates table (https://github.com/griffithlab/civic-server/blob/master/public/downloads/RankedCivicGeneCandidates.tsv) [30]. The final list of cancer-related genes consisted of all genes in Part 1 and genes in Part 2 only when its ‘panel_count’ value (in the CIViC tsv file listed above) was ≥ 2.

Bioinformatic analysis

The mutational signature classification was based on all single-base substitution (SBS) signatures of the COSMIC mutational signatures [31]. Tumor mutational burden (TMB) was defined as the aggregate count of somatic nonsynonymous mutations, including SNVs or INDELs, within the tumor exome for each patient [32]. This count was then divided by the overall size of the targeted regions (35 for WES), resulting in the TMB, expressed in counts per megabase (counts/Mb). The mutant-allele tumor heterogeneity (MATH) score [33] was calculated on the basis of the width of the VAF distribution, utilizing maftools for the analysis [23]. The pathway map was generated via maftools as previously described [34].

Classification model for identifying molecular types

CoxNet survival analyses and least absolute shrinkage and selection operator (LASSO) regressions were performed via the scikit-learn package in Python (version 3.9.7). K‒M survival curves (log-rank test) were used for overall survival (OS) analysis via the survival and survminer packages in R (version 4.1.2).

  1. (1)

    Selection of 24 initial features in the discovery cohort

    Molecular features derived from SNVs were selected to develop subsequent classification models.

    First, to identify genes with high variability for use in clustering, those with low mutation frequencies were filtered out. This process led to the selection of genes with a mutation frequency greater than 5% for clustering purposes. A total of 150 genes were screened for further analysis.

    In the second phase, OS was analyzed as the dependent variable, with the aforementioned 150 genes serving as independent variables. The feature selection steps were as follows: (1) a comprehensive approach involving CoxNet survival analyses and LASSO regressions was employed to identify characteristic genes; (2) the importance score of each feature in the model was calculated and ranked in descending order; (3) features that ranked in the top 50% were recorded as potential features once; (4) processes 1-3 were repeated 500 times, and all potential features were recorded; and (5) among the feature sets obtained in the above process, those recorded more than 250 times were selected.

    Ultimately, 24 feature genes, namely, EP300, KMT2D, MPEG1, TUSC3, MYD88, PTCH1, DST, ZNF521, BTG1, NOTCH2, PRDM1, ADGRL3, CDH11, CREBBP, ATRX, TCL1A, GNA13, ETV6, IRF4, HIST2H3D, CD79B, GRM8, MYH11, and SPTA1, were selected.

  2. (2)

    Differential feature selection

    To determine the most suitable features among the initial 24 for the classification model, Kaplan‒Meier analyses were conducted. Features qualifying for further analysis met the following criteria: a log-rank test P value of less than 0.2 and any group with more than 5 patients. Ultimately, 6 genes, namely, EP300, CREBBP, DST, MYD88, BTG1, and IRF4, were identified.

  3. (3)

    Model training utilizing random combinations of 6 differential features

    To facilitate prognostic assessment and individualized management in accordance with clinical practice guidelines, patients with PCNSL were categorized into three prognostic subgroups-good, intermediate, and poor-based on their actual OS. Six differential features were ultimately selected for building a classification model to classify patients into the three subtypes. These features were randomly combined to form three types, yielding a total of 288 combinations. To identify the most appropriate classification model among these combinations, Kaplan‒Meier analyses were performed. Models were chosen on the basis of a log-rank test P value of less than 0.05 and subgroup sample sizes greater than 8. Ultimately, seven candidate molecular subtype combinations met these criteria: (molecular markers shown in the format of subtype 1 vs. subtype 2 vs. subtype 3): EP300 (predominant) vs. MYD88/BTG1/IRF4 ≥ 2 vs. others; EP300 vs. MYD88/BTG1/IRF4 ≥ 2 (predominant) vs. others; BTG1 (predominant) vs. CREBP/DST ≥ 1 vs. others; BTG1 vs. CREBP/DST ≥ 1 (predominant) vs. others; BTG1/IRF4 ≥ 1 vs. EP300/CREBP/DST ≥ 1 vs. others; BTG1/IRF4 ≥ 1 vs. EP300/DST ≥ 1 vs. others; and CREBP/DST ≥ 1 vs. MYD88/BTG1/IRF4 ≥ 2 vs. others.

  4. (4)

    Validation of candidate molecular subtypes

The WES data and OS data of 82 patients in the validation cohort were included to validate the candidate molecular subtypes. The mutation frequencies of EP300, CREBBP, DST, MYD88, BTG1, and IRF4 in the validation cohort were greater than 5%. Kaplan–Meier survival analysis was used to validate the seven candidate molecular subtype combinations. The performance of [EP300 (predominant) vs. MYD88/BTG1/IRF4 ≥ 2 vs. others] (3-way P = 0. 0202) and [EP300 vs. MYD88/BTG1/IRF4 ≥ 2 vs. others (predominant)] (3-way P = 0. 0346) was better than those of the other five molecular subtypes (3-way P > 0.05 for both). Additionally, compared with the [EP300 vs. MYD88/BTG1/IRF4 ≥ 2 vs. others (predominant)] molecular subtype, the Kaplan–Meier survival curve of [EP300 (predominant) vs. MYD88/BTG1/IRF4 ≥ 2 vs. others] showed no crossover. Finally, the performance of [EP300 (predominant) vs. MYD88/BTG1/IRF4 ≥ 2 vs. others] was the best, so we selected it as the molecular subtype of Chinese PCNSL patients. In other words, PCNSL patients with both EP300 and BMI (BTG1, IRF4, and MYD88) variants were categorized into the E3 subtype.

Hematoxylin & eosin and immunohistochemical staining

PCNSL tumor tissue was fixed with 10% paraformaldehyde and embedded in paraffin. The paraffin-embedded tissues were cut into 4-µm-thick sections. These sections were then subjected to staining with hematoxylin and eosin (H&E) or for immunohistochemistry. For H&E staining, the sections were stained with hematoxylin and eosin following standard protocols (ab245880, Abcam). Immunohistochemistry was performed for ki-67 (ab15580, Abcam), CD2 (ab314761, Abcam), CD5 (ab75877, Abcam), CD10 (ab256494, Abcam), CD19 (ab134114, Abcam), CD20 (ab78237, Abcam), CD79a (ab79414, Abcam), BCL2 (MA5-11757, Thermo Scientific), BCL6 (PA5-14259, Thermo Scientific), MUM1 (ab247079, Abcam), c-Myc (MA1-980, Thermo Scientific), and p53 (MA5-14516, Thermo Scientific) via an automated Leica BOND-III staining system. H&E and immunohistochemistry images were obtained with a KFBIO scanner (KF-PRO-005-EX, Zhejiang, China) and visualized via KFBIO SlideViewer software. Each staining was independently reviewed by at least 2 pathologists.

In situ hybridization of fluorescence

Fluorescence in situ hybridization was performed in accordance with standard protocols via the following commercial probes: MYC break-apart, BCL-2 break-apart, and BCL-6 break-apart probes (LBP Medicine Science & Technology Co., Ltd., Guangzhou, China). Staining was independently evaluated by two hematopathologists, and discrepancies were resolved by another hematopathologist.

HIV, HBV, and EBV detection

Antibody and antigen levels of human immunodeficiency virus (HIV) were measured via an Elecscy HIV combi PT assay kit (Roche, Germany) with a Cobas e 601 analyzer (Roche). Hepatitis B virus (HBV)-specific PCR was performed via an HBV virus nucleic acid test kit (Sansure Biotech, China) with an Applied Biosystems 7500 Real-Time PCR System (Thermo Scientific). EBV-encoded RNA in situ hybridization (EBER-ISH) was performed to assess the presence of Epstein-Barr virus (EBV) in tumor tissue. The testing utilized a fluorescein isothiocyanate -coupled specific peptide nucleic acid probe (Roche Diagnostics, Mannheim, Germany).

Statistical analysis

All statistical analyses were performed using R version 4.1.2, GraphPad Prism version 9, and Python version 3.9.7. The specific statistical analyses employed for each figure are detailed in the respective legends. We used PASS software to calculate the sample size for our study, with a significance level of 0.05 and a power of 0.80. The observed group proportions were 0.6, 0.2, and 0.2, whereas the expected proportions were 0.33 for each group. The sample size required to meet these criteria was 25 participants. Consequently, the sample size utilized in this study is deemed adequate. Whole-exome sequencing data were processed via nf-core/sarek v3.4.2 [35] of the nf-core collection of workflows [36], which uses reproducible software environments from the Bioconda [37] and Biocontainers [38] projects. The pipeline was executed with Nextflow v24.04.2 [39].

Results

PCNSL genomic landscape

Using WES, we detected mutations in 140 patients who were newly diagnosed with PCNSL at one of three cancer centers; in 59% of cases, blood samples were lacking. Because EBV-positive PCNSL is recognized as an immunodeficiency-associated PCNSL in the current WHO CNS5 classification [40, 41], all participants in the study who tested negative for EBV were included. Additionally, DLBCL patients with HBV and HIV infection may have different mutational profiles [42, 43]; thus, all participants in the study who tested negative for HIV and HBV were included. The key demographic and clinical characteristics of the patients are summarized in Figure S1 and Table S1. The study design and flow chart are displayed in Fig. 1A. After filtering, we identified 115,938 genetic variation events in the 140 PCNSL samples analyzed (Figure S2A, Figure S3A).

Fig. 1: Study design and the PCNSL mutational landscape in Chinese patients.
Fig. 1: Study design and the PCNSL mutational landscape in Chinese patients.The alternative text for this image may have been generated using AI.
Full size image

A Study design, including workflow and data composition for the discovery cohort and validation cohort. B Number and frequency of recurrent mutations. Gene‒sample matrix of recurrently mutated genes ranked by mutation frequency (n = 140). The total mutation density across the cohort is displayed at the top, with the variant allele fraction and trinucleotides at the bottom. C GISTIC2.0 results of significant recurrent focal amplifications (red) and deletions (blue). Genes affected by each focal event are annotated (n = 140). X-axis: plot of chromosomes; Y-axis: G score.

We applied a filter to the 140 PCNSLs on the basis of the genes curated in the CIViC and OncoKB databases to identify candidate cancer genes with hallmark mutations in PCNSL. Some of the mutated genes were PIM1 (62.1%), MYD88 (55.0%), KMT2D (48.6%), and so on (Fig. 1B), which are involved in chromatin histone modification, BCR-TLR-mediated NF-κB signaling, immune signaling, the cell cycle, PI3 kinase signaling, MAPK signaling, and Wnt/β-catenin signaling (Figure S2B) [3, 5, 9,10,11, 44,45,46,47,48,49,50,51]. In this study, we also identified several other pathways in PCNSL, such as pathways related to genome integrity, RTK signaling, RNA abundance, TGFB signaling, TOR signaling, and apoptosis (Figure S2B), many of which have defined roles in other cancers [34]. The mutually exclusive or cooccurring relationships of these genes are shown in Figure S3D. Survival associated with candidate cancer genes with mutation frequencies greater than 15% plus the NOTCH2 and EZH2 genes [9] is shown in Figure S4.

In the search for focal copy number alterations (CNAs) in the 140 PCNSLs, we detected significant recurrent amplifications at chromosomal locations 1q21.1, 4q16.3, 11p15.5, and so on (Fig. 1C). The arm-level CNAs among all 140 PCNSL samples are displayed in Figure S2C. The pattern of somatic mutations caused by the different mutational processes in the genome, termed mutational signatures, was calculated and compared to the well-established signatures in COSMIC [52]. The mutational signature contributions are shown in Figure S2D. The COSMIC 5, COSMIC 45, and COSMIC 1 signatures contributed the most to the PCNSL genome (Figure S2E). Taken together, these results establish a comprehensive genomic landscape of PCNSL in Chinese individuals.

Comparison of the genomic landscapes of the discovery and validation cohorts

Due to differences in sample types (fresh vs. FFPE) and matched sample availability between the discovery and validation cohorts, additional filtering was applied to the validation cohort to reduce false positives prior to genomic landscape comparison. Key clinical features are summarized in Table 1. We identified 12,222 genetic variation events (Figure S3B, Figure S5A) in the analyzed discovery cohort of PCNSL samples and 103,716 genetic variation events (Figure S3C, Figure S5B) in the analyzed validation cohort of PCNSL samples. The genetic variation profiles of the discovery and validation cohort of PCNSL samples were similar, as revealed by principal component analysis (Fig. 2A), but the mutation rate was higher in the validation cohort of PCNSL samples (Fig. 2B).

Fig. 2: Comparison of mutation profiles between the discovery cohort and validation cohort of PCNSL patients.
Fig. 2: Comparison of mutation profiles between the discovery cohort and validation cohort of PCNSL patients.The alternative text for this image may have been generated using AI.
Full size image

A Principal component analysis of whole-exome sequencing (WES) data between the discovery cohort and validation cohort of PCNSL patients. B Mutation rate of WES data between the discovery cohort and validation cohort of PCNSL patients. C A mirror bar plot showing the frequencies of genetic alterations in the discovery cohort of PCNSL patients compared with those in the validation cohort. D Arm-level copy number alterations of the discovery cohort and validation cohort samples are displayed. The frequencies of amplifications and deletions were compared. E Comparison of the tumor mutational burden (TMB) between the discovery cohort and the validation cohort of PCNSL patients. F The levels of mutant-allele tumor heterogeneity (MATH) in the discovery cohort and the validation cohort of PCNSL patients were compared. G The levels of microsatellite instability (MSI) in the discovery cohort and the validation cohort of PCNSL patients were compared. H The variant allele frequency (VAF) in the discovery cohort and the validation cohort of PCNSL patients were compared. The discovery cohort included blood-paired fresh tumor tissue samples (n = 58), and the validation cohort included unpaired FFPE tumor tissue samples (n = 82). Independent-samples t tests and chi-squared tests were used. *P < 0.05; **P < 0.01; ***P < 0.001.

Table 1 Patient characteristics of discovery cohort and validation cohort.

PIM1, MYD88, CD79B, and KMT2D were the most frequently mutated candidate cancer genes of PCNSL in the discovery cohort (Figure S5C) and validation cohort (Figure S5D). However, several of the less frequently mutated candidate cancer genes in PCNSL whose mutation frequencies were significantly different when the discovery and validation cohorts were compared included ZFHX3, TRRAP, SZT2, and so on (all P < 0.05) (Fig. 2C). The arm-level CNAs between the discovery and validation cohort of PCNSL samples were mostly similar and are displayed in Fig. 2D. Significant differences in both recurrent amplifications (1q, 8q, and 12p) and deletions (6p, 8p, 19p, and 19q) were observed. The results of significant recurrent focal amplifications and deletions were visualized via GISTIC 2.0 in the discovery (Figure S5E) and validation (Figure S5F) cohorts.

Although the TMB (Fig. 2E), MATH score (Fig. 2F), microsatellite instability (MSI) (Fig. 2G), and the VAF (Fig. 2H) were greater in the validation cohort than in the discovery cohort (P < 0.001), the mean values were relatively close. The 3 signatures extracted from the discovery samples presented cosine similarities of 86.7%, 93.1% and 94.4% to COSMIC signatures 6, 45, and 10a, respectively (Figure S5G), whereas those extracted from the validation samples presented similarities of 80.4%, 70.9%, and 77.9% to COSMIC signatures 5, 1, and 23, respectively (Figure S5H). Taken together, these results suggest that the genetic variation profiles of the discovery and validation PCNSL samples exhibit certain discrepancies.

The unique genomic landscape of PCNSLs in Chinese patients

The frequencies of recurrent genetic alterations in PCNSL patients were compared with those in DLBCL patients [49]. Mirror bar plots (Fig. 3A) revealed that a set of genes, such as PIM1 (62.1% vs. 16.88%), MYD88 (55% vs. 18.01%), IRF4 (15% vs. 3.98%), and EP300 (19.29% vs. 5.97%), had significantly higher mutation rates in Chinese PCNSL patients than in DLBCL patients, suggesting genetic diversity of the cancer genome between diseases. Compared with those in PCNSLs in French patients [14] (Fig. 3B), the mutation frequencies of EP300 (19.29% vs. 3.48%) and KMT2D (48.6% vs. 22.61%) were significantly higher in the PCNSLs of Chinese patients. Compared with those in the PCNSLs [3] of Japanese patients (Fig. 3C), the mutation frequencies of MYD88 (55% vs. 85.6%) and BTG2 (38.6% vs. 87.8%) were significantly lower in the PCNSLs of Chinese patients. These data revealed a unique mutational landscape of Chinese PCNSL patients in terms of the frequency of SNVs. Using GISTIC2.0, we compared the frequencies of recurrent SNAs between Chinese PCNSL and DLBCL patients. At the focal level, Chinese PCNSL samples showed significant deletions in 11q12.4, 18p11.21, 19q13.42, 22q11.1, and 22q11.21, and amplifications in 1q21.1, 4p16.3, 11q12.2, and 17q12 (Fig. 3D). Similar alterations were observed when compared with Japanese PCNSL samples, highlighting distinct genomic features in Chinese patients (Fig. 3E). These results suggest genetic diversity in the PCNSL genome between races.

Fig. 3: Mutation profiles of Chinese PCNSLs, DLBCLs, and PCNSLs of other races.
Fig. 3: Mutation profiles of Chinese PCNSLs, DLBCLs, and PCNSLs of other races.The alternative text for this image may have been generated using AI.
Full size image

A A mirror bar plot showing the frequencies of genetic alterations between Chinese PCNSL patients (n = 140) and DLBCL patients (n = 955). B A mirror bar plot showing the frequencies of genetic alterations between Chinese PCNSL patients (n = 140) and French PCNSL patients (n = 115). C A mirror bar plot showing the frequencies of genetic alterations between Chinese PCNSL patients (n = 140) and Japanese PCNSL patients (n = 41). D GISTIC-defined recurrent copy number focal deletions (blue) and gains (red) as mirror plots in DLBCL (TCGA, n = 46) and Chinese PCNSLs (n = 140) are shown. X-axis: plot of chromosomes; Y-axis: G score. E The GISTIC2.0-defined recurrent copy number focal deletions (blue) and gains (red) in Japanese PCNSL patients (n = 41) and Chinese PCNSL patients (n = 140) are shown as mirror plots. Y-axis: plot of chromosomes; X-axis: G score. The chi-squared test was used. *P < 0.05. (A-C) Mirror plots showing the mutation frequencies of genes present in the intersection of the top 100 most frequently mutated genes from each disease cohort. In the figure legend of A-C, ‘Inferred’ means that the mutation types were not directly provided but were calculated from the supplemental tables of the corresponding paper; thus, the results are ‘Inferred’, and the in facto mutation number will not be less than the inferred value.

Owing to the genetic diversity of the PCNSL genome across races and the genetic diversity between the PCNSL and DLBCL cancer genomes, the existing molecular subtyping methods for PCNSL [14] and DLBCL [9,10,11,12, 53, 54] may not be applicable to Chinese PCNSL patients. We assessed the prognostic value of these published molecular subtypes for OS and progression-free survival (PFS) in our Chinese PCNSL cohort. Actionable PCNSL classification can perfectly differentiate prognosis, thereby improving precision medicine strategies. According to cell-of-origin molecular subtyping [53, 54], there were no significant differences in OS (P = 0.42) or PFS (P = 0.67) between the germinal center type (GCB) and non-GCB subtypes (Figure S6A). Similarly, according to LymphPlex molecular subtyping [12], no significant differences in OS (P = 0.68) or PFS (P = 0.71) were observed between the BN2-like, MCD-like, EZB-like, N1-like, TP53, and other subtypes (Figure S6B). We further categorized our data on the basis of the DLBCL subtyping (C0-C5) proposed by M. A. Shipp et al. [10] Within our cohort, the C4 subtype was associated with shorter OS (P = 0.009, P = 0.013) and PFS (P = 0.096, P = 0.047) than the C0 and C5 subtypes, respectively (Figure S6C). However, the overall efficacy of this DLBCL subtyping system in predicting outcomes was less than satisfactory (OS, P = 0.074). The data were also analyzed via the five subtypes (BN2, EZB, MCD, N1, and others) [9] and seven subtypes (A53, BN2, EZB, MCD, N1, ST2, and others) [11] of DLBCL proposed by Louis M Staudt et al. There were no significant differences in OS (P > 0.05) or PFS (P > 0.05) between BN2, N1, and other categories within the 5-subtype model (Figure S6D). Overall, the seven-subtype model (Figure S6E) had inferior performance, with no significant differences observed in OS (P = 0.61) or PFS (P = 0.47). With respect to the four subtypes (CS1-4) of PCNSL proposed by A Alentorn [14], our data revealed that the OS (P = 0.12) and PFS (P = 0.54) of patients with the CS3 subtype were shorter than those of patients with the other three subtypes (Figure S6F), but the differences were not significant. Furthermore, the double-hit gene expression signature defines a distinct subgroup of germinal center B-cell-like DLBCL, which is associated with a poor prognosis [13]. However, in this study, there were no significant differences in OS (P > 0.05) or PFS (P > 0.05) between the double- or triple-hit PCNSL patients and the other PCNSL patients, as shown in Figure S7. Taken together, these results suggest that the molecular subtyping of PCNSL and DLBCL, as previously published, may not be fully applicable to Chinese PCNSL patients.

PCNSL molecular subtypes with clinical outcome implications

To establish a molecular classification system for PCNSL, consensus clustering was conducted based on WES data from 58 patients in the discovery cohort who received HD-MTX-based immunochemotherapy. The resulting molecular subtypes were validated in an independent cohort of 82 patients treated with the same regimen and further confirmed using WES/WGS data from an external Chinese cohort of 36 newly diagnosed PCNSL patients [17].

The molecular classification scheme, presented in Fig. 4A, Table S2 and Figure S8, was determined independently of clinical information and finalized before the analysis of clinical data, allowing us to analyze the relationships between genetic subtypes and survival in this entire cohort. Finally, EP300, BTG1, MYD88, and IRF4 were included in the molecular subtyping of Chinese PCNSLs.

Fig. 4: Integrated genetic drivers reveal PCNSL molecular subtypes with clinical outcome implications.
Fig. 4: Integrated genetic drivers reveal PCNSL molecular subtypes with clinical outcome implications.The alternative text for this image may have been generated using AI.
Full size image

A Schematic of the typing strategies used to identify PCNSL molecular subtypes in the discovery cohort (n = 58) and in the validation cohort (n = 82). B Kaplan‒Meier estimates of overall survival (OS) and progression-free survival (PFS) among patients belonging to each molecular subtype in the discovery cohort. C Kaplan‒Meier estimates of overall survival (OS) and progression-free survival (PFS) among patients belonging to each molecular subtype in the validation cohort. D Kaplan‒Meier estimates of overall survival (OS) among patients belonging to each molecular subtype in an independent external cohort (n = 36). E The prevalence of EP300 mutations in PCNSL patients of different races. F Kaplan‒Meier analyses for comparisons of the overall survival (OS) and progression-free survival (PFS) of patients with mutations within and those with mutations outside the EP300 domain. G Multivariate Cox regression analysis after adjusting for key clinical confounders, including age, IELSG score, MSKCC score, LDH levels, deep brain location, CSF protein levels, and consolidation therapy (WBRT, stem cell transplant), in PCNSL patients (n = 140).

All nonsynonymous mutations in EP300, BTG1, MYD88, and IRF4 were visualized within the functional domains of the encoded proteins (Figure S9). As previously reported [10, 55, 56], MYD88 (L265P) was the most prevalent mutation, while EP300, BTG1, and IRF4 all harbored scattered mutations. We evaluated the prognostic significance of these four identified genes for OS and PFS. The survival outcomes of patients with EP300 mutations were significantly poorer than those of patients with wild-type EP300, and patients harboring mutations in either BTG1 (OS, P = 0.017; PFS, P = 0.12), MYD88 (OS, P = 0.016; PFS, P = 0.069), or IRF4 (OS, P = 0.045; PFS, P = 0.018) had favorable OS (Figure S10A) and PFS (Figure S10B) relative to individuals with their wild-type counterparts.

In the discovery cohort, we identified three subtypes of PCNSL: BMI (n = 16, 27.59%), E3 (n = 9, 15.52%), and UC (n = 33, 56.90%). The E3 subtype was defined as those with mutations in the EP300 gene. The PCNSLs that carried mutations in at least two of the three genes, BTG1, MYD88, and IRF4, were characterized as the BMI subtype. The other PCNSLs that did not fit into either of these categories were classified under the UC subtype (Unclassified). The three subtypes differed significantly in OS (P < 0.001) and PFS (P = 0.043), the BMI subtype (P < 0.001, P = 0.023) had much more favorable outcomes than the E3 and UC subtypes did, and the E3 subtype (P < 0.001, P = 0.037) had far worse outcomes than the BMI and UC subtypes did (Fig. 4B). These differences between the three subtypes of PCNSL were detected in the validation cohort (OS, P = 0.020; PFS, P = 0.048; n = 82, tumor-only sample), and patients with E3 had significantly shorter survival times than those with BMI (OS, P = 0.007) or UC (OS, P = 0.043) (Fig. 4C). Similar results were observed in the entire cohort (n = 140, OS, P < 0.0001; PFS, P = 0.0004; Figure S10C): 26 patients were classified as BMI (18.57%), 27 as E3 (19.29%), and 87 as UC (62.14%). Furthermore, the raw WES or WGS data, along with follow-up data from an independent cohort of 36 Chinese patients with PCNSL [17], were obtained. The three identified subtypes significantly differed in OS (P = 0.0004; Fig. 4D). Specifically, the BMI subgroup tended to have more favorable outcomes than the E3 and UC subgroups (P = 0.0004 and P = 0.005, respectively), whereas the E3 subtype exhibited poorer outcomes compared with the BMI and UC subgroups (P = 0.0004 and P = 0.0231, respectively). We additionally conducted sensitivity analyses stratified by sequencing platform (WGS vs. WES). The WGS-based analysis demonstrated significant differences in OS among the three subtypes (P = 0.0050): the BMI subtype had significantly more favorable outcomes than the E3 and UC subtypes (P = 0.0101 and P = 0.0500, respectively), whereas the E3 subtype had significantly worse outcomes compared with both BMI and UC subtypes (P = 0.0101 and P = 0.0280, respectively) (Figure S11A). Due to the limited sample size, the WES-based analysis (n = 12) did not reveal statistically significant differences among the three subtypes (P = 0.1774); however, a consistent trend was observed, with the E3 subtype showing the poorest prognosis and the BMI subtype the most favorable (Figure S11B). While these findings support the potential clinical relevance of our molecular classification, they warrant further validation in larger cohorts given the limited sample size.

BTK inhibitors have been shown to improve the prognosis of PCNSL patients by blocking B-cell receptor signaling pathways, which influence the growth and survival of B cells. Consequently, we further analyzed whether the use of BTK inhibitors affects the robustness of our classification. There were no significant differences in OS (P > 0.05, Figure S12A) or PFS (P > 0.05, Figure S12B) between patients who received BTK inhibitors (n = 8) and those who did not receive BTK inhibitors (n = 132). Additionally, after initial treatment options (HD-MTX combined with IDA, HD-MTX combined with R, HD-MTX combined with IDA and R, or a combination of BTK inhibitors) and consolidated therapy (WBRT, stem cell transplant) were adjusted, multivariable Cox regression analysis revealed that the use of different treatment options did not affect the robustness of our classification (Table S3).

When High Grade B-cell lymphoma cases (double hit) were excluded, significant differences in OS among the three subtypes remained (P < 0.0001): the BMI subtype had significantly more favorable outcomes than the E3 and UC subtypes (P = 0.0011 and P = 0.0004, respectively), whereas the E3 subtype had significantly worse outcomes compared with both BMI and UC subtypes (P = 0.0101 and P = 0.0158, respectively) (Figure S13A). Similar results were observed in both the discovery (Figure S13B) and validation cohorts (Figure S13C).

For the E3 subtype, the mean mutation prevalence in the PCNSLs of Asian patients [3, 16, 17] (16.98%) was significantly higher than that in the PCNSLs of Western patients [4, 5, 14, 57,58,59] (4.53%) (Fig. 4E). Patients with the E3 subtype had significantly poorer survival, which was not affected by the mutation site (OS, P = 0.82; PFS, P = 0.69; Fig. 4F). Univariate Cox regression analysis revealed that PCNSL molecular subtypes are associated with clinical outcomes (Table S4), which was also confirmed by multivariate Cox regression hazard ratio analysis after adjusting for important confounders (age, IELSG, MSKCC, LDH levels, deep brain location, CSF protein levels, and consolidated therapy [WBRT, stem cell transplant]) (Fig. 4G). These findings provide supportive evidence for a potential association between PCNSL molecular subtypes and clinical outcomes. The key demographic and clinical characteristics of the three MS patients are summarized in Table S5.

We also constructed a Sankey diagram to visualize the correspondence between our samples and the previously described genetic subtypes of PCNSL and DLBCL (Figure S14). The Chinese PCNSL subtypes were significantly distinct from those previously described. Taken together, these results indicate that three unique molecular subtypes of Chinese PCNSL are associated with clinical outcomes.

Genomic landscape and tumor microenvironment across PCNSL subtypes

We further aimed to determine whether these three molecular subtypes of Chinese PCNSL patients could affect the mutational landscape, oncogenic pathways and tumor microenvironment. Intergroup mutation distribution analysis was performed to identify unique and shared mutations. A total of 6,102, 41,548, and 73,440 mutated genes were identified in the BMI, E3, and UC subtypes, respectively, with 518 mutated genes shared among the three subtypes (Fig. 5A and S15A). Key candidate cancer genes in the 140 PCNSL patients, grouped by these three subtypes, are presented in Fig. 5B. The mutation frequencies of EP300, MYD88, IRF4, and so on were significantly different (P < 0.05) among these three subtypes. Arm-level CNAs among the samples of the three subtypes are displayed in Fig. 5C, showing significant differences in recurrent deletions (in 6q and 12p) but not in amplifications (Fig. 5D). The results of significant recurrent focal amplifications and deletions were visualized by GISTIC 2.0 in the BMI, E3, and UC subtypes (Figure S15B). There was no significant difference (P > 0.05) in the MATH (Fig. 5E) or MSI (Fig. 5F) scores among the three subtypes, but the TMB was greater in the E3 subgroup than in the BMI (P < 0.0001) or UC subgroup (P < 0.001) (Fig. 5G). Furthermore, the VAF was lower in the E3 subgroup than in the BMI (P < 0.0001) or UC subgroup (P < 0.001) (Fig. 5H).

Fig. 5: Mutation profiles across three molecular subtypes of PCNSL.
Fig. 5: Mutation profiles across three molecular subtypes of PCNSL.The alternative text for this image may have been generated using AI.
Full size image

A Venn diagram showing unique and shared mutations among the three identified PCNSL subtypes. B Number and frequency of recurrent mutations and the gene‒sample matrix of recurrently mutated genes among the three PCNSL subtypes. The relative abundance across the molecular subtypes is displayed on the right. C Arm-level copy number alterations among the three molecular subtype samples are displayed. D The frequencies of amplifications and deletions were compared among the three molecular subtype samples. E The levels of mutant-allele tumor heterogeneity (MATH) were compared among the three molecular subtype samples. F The levels of microsatellite instability (MSI) were compared among the three MS samples. G The tumor mutational burden (TMB) was compared among the three MS samples. H The levels of variant allele frequency (VAF) were compared among the three MS samples. Independent-samples t tests and chi-squared tests were used. *P < 0.05; **P < 0.01; ***P < 0.001; ns P > 0.05.

The contributions of mutational signatures across the three subtypes are illustrated in Figure S15C. The three signatures extracted from the E3 subtype samples displayed cosine similarities of 82.0%, 93.8%, and 95.2% to the COSMIC signatures 6, 10a, and 45, respectively (Figure S15D). In contrast, those extracted from BMI subtype samples presented similarities of 77.3%, 81.7%, and 58.3% to COSMIC signatures 5, 84, and 87, respectively (Figure S15D). The signatures from the UC subtype samples were 80.8%, 70.3%, and 91.7% and similar to the COSMIC signatures 5, 1, and 45 (Figure S15D), respectively.

H&E and immunohistochemical staining of CD2, CD5, CD10, CD19, CD20, CD79a, Ki-67, BCL2, BCL6, MUM1, c-Myc, and p53 from samples of different PCNSL subtypes revealed significant malignant progression in patients with the E3 subtype of PCNSL, as shown in Figure S16A and Table S6. Although PCNSLs generally have a high proliferative tendency and high Ki-67 expression, the E3 subtype is more poorly differentiated and aggressively growing, as reflected by its Ki-67 expression levels, compared with the other two subtypes. Immunostaining revealed a greater presence of CD2 + , CD5 + , CD10 + , CD19 + , CD20 + , and CD79a+ cells in the E3 subtype. Additionally, a greater presence of BCL2 + , BCL6 + , MUM1 + , and c-Myc+ cells, which are markers associated with the diagnosis of B-cell lymphomas [60], was observed in the E3 subtype.

To explore potential therapeutic strategies for the genetic subtypes of PCNSL, we examined groups of genetic aberrations that target oncogenic signaling pathways (Figure S16B). Genetic events affecting NF-κB regulators that negatively regulate the stability of NF-κB-dependent mRNAs were detected in 85.2% and 84.6% of the E3 and BMI subtype samples but not in the UC subtype samples (P < 0.05). These findings suggest that the E3 and BMI subtypes may be more responsive to BTK inhibitors [61]. The PI3 kinase pathway, a pathway that can indirectly activate NF-κB, is genetically altered in 66.7% of E3 patients [62]. We also noted a higher frequency of genetic alterations in chromatin histone modifiers (100%), RTK signaling (77.8%), protein homeostasis/ubiquitination (66.7%), cell cycle pathways (77.8%), genome integrity (82.5%), Wnt/B-catenin signaling (59.3%), the chromatin SWI/SNF complex (70.4%), and splicing (37%) in E3 patients. Therefore, a combination of cyclin D-Cdk4,6 and PI3 kinase inhibitors might be beneficial for patients with the E3 subtype.

Taken together, these results suggest that the three molecular subtypes of PCNSL in Chinese patients each have a unique genomic landscape, tumor microenvironment, and oncogenic pathways.

Mutational landscape and oncogenic pathways in PCNSLs with different sites of onset

Next, we aimed to determine whether the site of PCNSL onset in Chinese patients influences the mutational landscape and oncogenic pathways. Figure 6A shows that the areas affected by PCNSL included the basal ganglia, brainstem, cerebellum, corpus callosum, frontal lobe, occipital lobe, parietal lobe, temporal lobe, thalamus, ventricles, and combinations of multiple sites. The most frequently involved sites were the frontal lobe (22.14%), temporal lobe (15.71%), and basal ganglia (8.57%).

Fig. 6: Mutational profiles across different sites of onset in PCNSL.
Fig. 6: Mutational profiles across different sites of onset in PCNSL.The alternative text for this image may have been generated using AI.
Full size image

A The number and percentage of PCNSL patients with various molecular subtypes at different disease onset sites. B Venn diagram of unique and shared mutations among PCNSLs with different sites of onset. C The number and frequency of recurrent mutations, along with a gene‒sample matrix of recurrently mutated genes in PCNSL patients at different sites of onset, are presented. The relative abundance across the molecular subtypes is displayed on the right. D Arm-level copy number alterations across different sites of onset are displayed. The frequencies of amplifications and deletions were compared. E The levels of mutant-allele tumor heterogeneity (MATH) were compared among the samples representing different sites of onset. F Comparisons of the tumor mutational burden (TMB) among the different sites of onset samples were performed. G The levels of microsatellite instability (MSI) were compared among the samples representing different sites of onset. H Pathways affected by oncogenes in PCNSL patients with different sites of onset. The chi-squared test and one-way ANOVA were used. *P < 0.05. Comparisons without asterisks are not statistically significant (P > 0.05).

First, we found that there was no significant difference in the proportion of different disease sites among the three subtypes (chi-squared test, corpus callosum, P = 0.887; basal ganglia, P = 0.645; thalamus, P = 0.317; brainstem, P = 1.000; ventricle, P = 0.631; front lobe, P = 0.576; multiple, P = 0.949; temporal lobe, P = 0.214; cerebellum, P = 0.831; occipital lobe, P = 0.778; parietal lobe, P = 0.887).

Next, we explored whether the tumor sites differed between the discovery and validation cohorts. As shown in Figure S17A, there was no significant difference in the proportions of different disease sites between the discovery and validation cohorts (chi-square test P = 0.610). Furthermore, the proportions of deep brain lesions in the discovery and validation cohorts were not significantly different (P = 0.478).

Sixteen mutated genes were represented among all the sites (Fig. 6B). Key candidate cancer genes in the 140 PCNSL patients, grouped by site of disease onset, are presented in Fig. 6C. Arm-level CNAs among the samples of these sites are displayed in Fig. 6D, and significant differences in both recurrent amplifications (2q, 10q, and 10p) and deletions (9p) were observed. The difference in MATH scores across these sites was not significant (P > 0.05; Fig. 6E). However, significant differences in TMB (P < 0.05; Fig. 6F) and MSI scores (P < 0.05; Fig. 6G) were observed between several sites.

Next, we further divided the tumor sites into deep brain and shallow brain tumor subgroups. A total of 854 (81%) mutated genes were represented between the deep brain and shallow brain tumor subgroups (Figure S17B). The mutational landscape between deep brain and shallow brain tumors is shown in Figure S17C. The mutation frequencies of LRP1B, FAT1, KMT2B, CIC, PTPRB, MAP3KI, ARID1B, and CPS1 in the deep brain and shallow brain tumors were significantly different (P < 0.05). Arm-level CNAs between deep brain and shallow brain tumors are displayed in Figure S17D, and no significant differences (P > 0.05) in either recurrent amplification or deletion were observed. There was no significant difference (P > 0.05) in the MATH (Figure S17E), TMB (Figure S17F), or MSI (Figure S17G) score between deep brain and shallow brain tumors.

We further conducted pathway enrichment analysis (Fig. 6H) and revealed significant involvement of NF-κB signaling in the tumors of patients with onset in the parietal lobe and brainstem, each with an involvement rate of 100%. Furthermore, we noted a significant difference (P < 0.05) in the frequency of genetic alterations in the chromatin SWI/SNF complex but not in the other pathways (Figure S17H). Taken together, these results indicate that the mutational landscape in Chinese patients with PCNSL varies between sites of onset. Larger studies are needed to confirm this observation.

Discussion

Owing to the genetic, phenotypic, and tumor microenvironment heterogeneity of PCNSL, identifying classification and prognostic biomarkers for patients is extremely challenging, which in turn makes developing effective new therapies challenging. Here, we performed a study of 176 Chinese PCNSL patients that was adequately powered to expand the genomic landscape and explore its clinical significance utilizing an unprecedented sample size and a multicenter approach. We defined recurrent mutations, CNAs, and associated cancer candidate genes in Chinese PCNSL patients and compared these comprehensive genetic signatures with those of PCNSL patients and systemic DLBCL patients of other races, revealing that genetic heterogeneity is a defining feature of PCNSL. Our results highlight the complexity of PCNSLs, which have a median of 19.06 mutations/Mb and a median of 949.5 variants per sample.

Although the genetic variation profiles of the PCNSL discovery and validation samples were similar, notable differences were evident in the mutation frequencies of certain genes, the TMB, the MSI, and the MATH score. These discrepancies may largely be attributed to differences in specimen type—fresh frozen tissue in the discovery cohort versus FFPE in the validation cohort—and the lack of paired samples in the latter [63]. FFPE specimens are known to introduce characteristic artefacts, including DNA fragmentation, crosslinking, and cytosine deamination, which can compromise sequencing quality and affect downstream variant calling. Furthermore, genetic heterogeneity between patient populations [15] and biological differences between PCNSL and DLBCL [13] may be the main reasons why the published classifications for DLBCL and PCNSL are not applicable to Chinese PCNSL patients. Currently, the classification frameworks for systemic DLBCL offer valuable insights into the underlying biology based on molecular mechanisms. However, it is important to note that biological insights do not necessarily correlate with clinical outcomes, especially given that treatment paradigms may evolve over time.

Importantly, in our study, we identified three potential molecular subtypes of PCNSL in Chinese patients: BMI, E3, and UC. Distinct clinical outcomes, activated cellular pathways, and alterations in the genetic landscape distinguish the three subtypes and could be utilized to personalize therapy for patients with different subtypes. For the E3 subtype, the mean mutation rate was significantly higher in the PCNSLs of Chinese and Japanese patients (16.88%) than in those of Western PCNSL patients (4.53%). This difference may be attributed to variations in age, lifestyle, and EBV infection rates [17, 41, 64]. Patients with the E3 subtype had significantly poorer survival outcomes, a finding that is consistent with reported findings in DLBCL [65]. EP300 and CREBBP are two closely related members of the KAT3 family of histone acetyltransferases that function similarly [66]. However, in this study, the survival of patients with EP300 mutations was significantly poorer, whereas CREBBP mutations did not contribute to an inferior prognosis (Figure S4). This discrepancy might stem from the distinct roles of EP300 and CREBBP in PCNSL. Specifically, in a recent study, EP300, but not CREBBP, was reported to play an essential role in supporting the viability of classical Hodgkin’s lymphoma by directly modulating the expression of the oncogenic MYC/IRF4 network, surface receptor CD30, immunoregulatory cytokine interleukin 10, and immune checkpoint protein PD-L1 [67]. The molecular mechanisms through which EP300 mutations facilitate the progression of PCNSL are still not understood. We plan to explore these mechanisms in a forthcoming study.

Furthermore, the higher mutation frequencies observed (Fig. 3C) in the Japanese cohort may be partly explained by differences in sample size, sample type, patient age, and population genetics. The Japanese study included 41 fresh tumor tissues, whereas our cohort comprised 58 fresh frozen and 82 FFPE samples (total n = 140). Differences in sample preservation methods can affect DNA quality, sequencing performance, and mutation detection rates. In addition, the mean age of patients was higher in the Japanese cohort (63 vs. 59 years), which may reflect differences in clinical characteristics and disease biology [64]. Underlying genetic differences between Japanese and Chinese populations may also contribute to these discrepancies.

This study has several limitations. First, the proportion of unclassified PCNSLs (UC subtypes) is high, possibly due to the lack of multiomics data needed for classification. However, the three-class classification based on WES proposed in this study has strong clinical practicality and translational value, as it relies solely on the four genes identified through WES. In other words, clinicians can predict patient outcomes and further perform individualized management of PCNSLs by detecting mutations in these four genes without the need for additional multiomics testing. Second, the three identified molecular subtypes can be used to predict patient outcomes. However, it remains challenging to determine whether these three molecular subtypes can still be used to accurately distinguish biological subtypes without additional RNA-seq data layers.

In summary, by performing genomic sequencing on tumor specimens from 176 Chinese PCNSL patients, we identified three potential molecular subtypes that can be used to predict patient outcomes. These genetic signatures associated with PCNSL outcomes illuminate the path toward individualized management of PCNSL patients and the discovery of novel treatment strategies for this challenging disease.