Abstract
The genetic landscape and molecular subtypes of primary central nervous system lymphoma (PCNSL) remain inadequately characterized, presenting a major obstacle to the development of effective therapeutic strategies. We conducted a genomic study of 176 PCNSLs from five tertiary centers with long-term follow-up to expand the genomic landscape and identify new molecular subtypes. We first confirmed that the molecular subtyping of diffuse large B-cell lymphoma, as previously published, may not be fully applicable to Chinese PCNSL patients. We then identified (n = 58) and validated (n = 82) three prominent genetic subtypes related to different clinical and molecular features of PCNSL, and further confirmed by an independent external Chinese PCNSL cohort (n = 36). We called these BMIs (from the co-occurrence of mutations in two genes among BTG1, MYD88, and IRF4), which are associated with favorable outcomes; E3s (so-called EP300 mutations), which are associated with unfavorable outcomes; and UCs (unclassified, without characteristic mutations). Importantly, EP300 was mutated in more Asians (16.98%) than in Western PCNSLs (<4.53%), resulting in unfavorable outcomes independent of the specific mutation site. Our analysis comprehensively reveals the genomic landscape of Chinese PCNSL and emphasizes the clinical value of molecular classification for improving precision medicine strategies.

Similar content being viewed by others
Introduction
Primary central nervous system lymphoma (PCNSL), a rare subtype of extranodal non-Hodgkin lymphoma, is associated with an unfavorable prognosis even when treated with standard high-dose methotrexate (HD-MTX)-based regimens [1, 2]. PCNSL is a unique lymphoma that differs from other tumors and is characterized as a genetically heterogeneous disorder marked by a variety of low-frequency mutations, somatic copy number alterations, and structural variants [3,4,5,6]. Thus, combining standard immunochemotherapy with promising novel agents that target specific pathways via different molecular clusters may improve patient prognosis [7, 8].
Pathologically, nearly 95% of PCNSLs are diffuse large B-cell lymphomas (DLBCLs). Recently, DLBCL has been categorized into various molecular clusters on the basis of genomic sequencing and RNA sequencing, including quartet categorization by L.M. Staudt’s team [9], quintet categorization by M.A. Shipp’s team [10], L.M. Staudt’s team’s refined septet method [11], and the LymphPlex classification proposed by W.L. Zhao’s team [12]. However, PCNSL has been proven to be a biological entity that is molecularly distinct from DLBCL [13], so these molecular clusters of DLBCL may not be applicable to PCNSL. Notably, the integration of genome-wide data from multiomic studies by A. Alentorn et al. [14] showed four molecular patterns of PCNSL mainly using gene expression data. These patterns have a distinctive prognostic impact, providing a basis for future clinical stratification and subtype-based targeted interventions. Owing to the molecular heterogeneity between various ethnic groups of PCNSL patients [3, 15], the applicability of the proposed classifications for PCNSL to other populations requires further confirmation. No effective biomarker or molecular classification scheme exists to tailor therapies for individual PCNSL patients. Furthermore, the issue of PCNSL molecular heterogeneity in Chinese patients has not been adequately addressed, primarily because the data have come from small, single-center studies of Chinese PCNSL patients [16, 17].
To address these issues, we conducted whole-exome sequencing (WES) on specimens obtained from 140 patients with newly diagnosed PCNSL, and divided into a discovery cohort and a validation cohort. Additionally, WES/Whole Genome Sequencing (WGS) data were obtained from 36 Chinese PCNSL patients at two other tertiary care cancer centers [17], serving as an independent external cohort. To enable clinical translation, we defined three molecular subtypes based solely on single-nucleotide variations (SNVs), each associated with distinct outcomes. This study delineates the genomic landscape of Chinese PCNSL and proposes a potential classification framework that may inform precision oncology.
Materials/Subjects and Methods
Sample collection and clinicopathological features of PCNSL patients
In this study, a total of 140 newly diagnosed PCNSL patients were retrospectively enrolled from January 2010 to December 2020 from three independent medical centers, namely, Huashan Hospital of Fudan University, Renji Hospital of Shanghai Jiao Tong University, and Shanghai Cancer Center of Fudan University. The study comprised two cohorts: the discovery cohort, which included 58 patients from Huashan Hospital of Fudan University, Shanghai; and the validation cohort (n = 82), which consisted of 25 patients from Huashan Hospital of Fudan University, 51 patients from Renji Hospital of Shanghai Jiao Tong University, and 6 patients from Shanghai Cancer Center of Fudan University. In the discovery cohort, paired blood samples were also obtained for analysis in conjunction with each tumor sample. In the validation cohort, only unpaired Formalin-Fixed, Paraffin-Embedded (FFPE) tumor tissue samples were obtained for analysis. Specimens from patients with PCNSL in both the discovery and validation cohorts were subjected to WES for genomic profiling.
Additionally, WES (n = 12) /WGS (n = 24) data of 36 patients newly diagnosed with PCNSL from Fujian Cancer Hospital and The First Affiliated Hospital of Fujian Medical University between February 2012 and October 2020 were included in this study, as an independent external Chinese PCNSL cohort. The general clinical characteristics of these 36 PCNSL patients and the sequencing protocol have been previously described [17].
None of the patients had received steroid treatment for PCNSL, and PCNSL tumors were diagnosed on the basis of the World Health Organization (WHO) criteria [13]. All patients underwent a 2-deoxy-2[F-18] fluoro-D-glucose positron emission tomography/computed tomography (FDG PET/CT) scan and bone marrow aspiration to exclude systemic tumor manifestation on the basis of the guidelines of the International PCNSL Collaborative Group [18] and the European Association of Neuro-Oncology [19]. PCNSL patients for whom ocular involvement was present were excluded. Each tumor tissue sample was independently reviewed by at least 2 pathologists to confirm that the tumor sample was histologically consistent with the PCNSL. All patients underwent long-term follow-up, which ran to December 30, 2022.
Treatment Regimens
All PCNSL patients were treated with HD-MTX-based combination immunochemotherapy. The following chemotherapy protocols were used for induction: HD-MTX combined with idarubicin (IDA), HD-MTX combined with rituximab (R), HD-MTX combined with IDA and R, or a combination of Bruton’s tyrosine kinase (BTK) inhibitors. Whole-brain radiation therapy or stem cell transplantation is a form of consolidation therapy. The detailed therapeutic schedule was the same as that previously described [20].
Sample processing
The collection and processing of samples for this study were carried out by the Department of Pathology or Department of Neurosurgery at Huashan Hospital of Fudan University, Renji Hospital of Shanghai Jiao Tong University, and Shanghai Cancer Center of Fudan University. In the discovery cohort, surplus tumor tissues and paired blood samples obtained during surgical procedures were collected. To guarantee the quality of these samples, each specimen was promptly and accurately labeled with relevant patient information within 30 min of collection and then immediately snap-frozen in liquid nitrogen. These samples were stored at -80 °C until being shipped to GenomiCare Biotechnology (Shanghai, China). In the validation cohort, the residual tumor tissue specimens were subjected to FFPE. These FFPE samples were transported in a dry ice container to Sinotech Genomics Technologies in Shanghai, China, accompanied by a time and temperature tracker to ensure proper monitoring during shipment. Upon arrival, the samples were promptly stored in liquid nitrogen to maintain their integrity until further processing.
DNA extraction
For frozen fresh tissue and matched peripheral blood samples, total genomic DNA (gDNA) was extracted via the Maxwell RSC Blood DNA Kit (AS1400, Promega) on a Maxwell RSC system (AS4500, Promega) following the manufacturer’s instructions. For FFPE tissue, total gDNA was extracted via the QIAamp DNA FFPE Tissue Kit (56404, Qiagen) following the manufacturer’s instructions. The integrity and concentration of the gDNA were determined via the Qsep100 System (BIOptic, China) and a Qubit 3.0 fluorometer (Thermo Fisher Scientific). The OD260 was measured with a NanoDrop One (Thermo Fisher Scientific).
WES library preparation and sequencing
For frozen tissue and matched peripheral blood samples, exome DNA was extracted via the SureSelect Human All Exon V7 Kit (5991-9039EN, Agilent). The library was subsequently prepared via the SureSelectXT Low Input Target Enrichment and Library Preparation system (G9703-90000, Agilent). For the FFPE sample, the exome DNA was captured via the SureSelect Human All Exon V8 Kit (5191-6874, Agilent) and prepared into a library via the SureSelect XT HS2 DNA Reagent Kit (G9983A, Agilent). The library was validated via the use of an Agilent 2100 Bioanalyzer and a Qubit 3.0 fluorometer. Paired-end 150 bp read sequencing was performed on an Illumina NovaSeq 6000. Image analysis and base calling were performed via onboard RTA3 software (Illumina). During bioinformatic processing, ComBat batch correction was applied at the feature matrix level to minimize residual systematic differences between FF and FFPE samples after initial quality control and filtering.
Somatic mutation calling
Quality control was conducted via FastQC software. The cleaned and trimmed FASTQ files were aligned to the UCSC human reference genome (hg19) via the Burrows–Wheeler Aligner (BWA) with default parameters. For each paired blood sample, SNVs and short insertions/deletions (INDELs) were identified via Sentieon TNseq [21] with default parameters. The identified somatic mutations were removed if they did not satisfy any one of the following criteria: a variant allele frequency (VAF) of at least 0.05, support from a minimum of three reads, annotation by the Variant Effect Predictor (VEP) package [22], and transformation into a mutation annotation format (MAF) file for further analysis with maftools [23].
When a paired normal sample (FFPE sample) was not available, a panel of normal (PON) files was used as a paired control to call the SNVs and InDels. A panel of normal files was created using the reads from clinical blood samples that showed no evidence of tumor contamination from 100 different individuals collected by Genomicare Biotechnology (Shanghai). The reads were retained when they passed the standard WES quality control, as described in the WES subsection. The reads for reference alleles and alternative alleles were gathered for each candidate site in all 100 samples, except for false positives, which were those exhibiting a candidate allele frequency exceeding 0.05 in more than 5 samples.
We applied an additional PON mask for all the candidate somatic SNV and INDEL sites to exclude germline mutations, as previously reported [10, 24]. In general, in addition to the standard retention rules, mutations were retained if they were known driver alterations highlighted as being biologically significant in the COSMIC database. Mutations with a tumor mutation frequency ≥20% were also retained. Other germline mutations were retained if their existence ratio in the Exome Aggregation Consortium (ExAC) exac_all was < 0.00005, their existence ratio in the Asian population exac_eas was < 0.0003, and their existence ratio in previous GenomiCare samples germline_gc was < 0.0006. Retained mutations were considered somatic mutations.
Thus, the PON file and PON mask limit possible recurrent artifacts of sequencing and the presence of germline mutations in somatic mutations that are falsely selected owing to a lack of paired blood data.
Somatic copy number alteration calling and profiling
Using the method described in the ExomeCNV package [25], we implemented a normalized depth‒coverage ratio method to identify CNVs in paired samples. To correct for five potential biases (the size of exonic regions, batch effects, both the quantity and quality of sequencing data, local GC content, and genomic mappability) that affect the raw read counts, we utilized a standard normal distribution model. Genes with a haploid copy number ≥3 or ≤1.2 were defined as amplified or deleted, respectively, and a minimum tumor content (purity) of 20% was needed.
Significance analysis of recurrent somatic copy number alterations
Sequenza [26] was used to generate the segment files. The segment files were used as the input for GISTIC2.0 [27] to detect recurrent arm-level and focal peaks in copy number alterations via the parameters -js 40 -conf 0.99 -ta 0.2 -td 0.2 -brlen 0.8, with other parameters set to the defaults.
Cancer-driver gene mutations filtering
After the aforementioned steps of calling SNVs and CNVs, we further filtered the resulting mutated genes by intersecting them with a designated group of genes recognized as cancer-related genes. This group of genes was sourced from two prominent public cancer gene databases. Part 1 of the group was obtained from the OncoKB curated cancer gene list (https://www.oncokb.org/cancerGenes) [28]. These genes are classified as cancer genes by OncoKB on the basis of their inclusion in various sequencing panels, the Sanger Cancer Gene Census, or the criteria established by Vogelstein et al. [29]. Part 2 was derived from the ranked CIViC gene candidates table (https://github.com/griffithlab/civic-server/blob/master/public/downloads/RankedCivicGeneCandidates.tsv) [30]. The final list of cancer-related genes consisted of all genes in Part 1 and genes in Part 2 only when its ‘panel_count’ value (in the CIViC tsv file listed above) was ≥ 2.
Bioinformatic analysis
The mutational signature classification was based on all single-base substitution (SBS) signatures of the COSMIC mutational signatures [31]. Tumor mutational burden (TMB) was defined as the aggregate count of somatic nonsynonymous mutations, including SNVs or INDELs, within the tumor exome for each patient [32]. This count was then divided by the overall size of the targeted regions (35 for WES), resulting in the TMB, expressed in counts per megabase (counts/Mb). The mutant-allele tumor heterogeneity (MATH) score [33] was calculated on the basis of the width of the VAF distribution, utilizing maftools for the analysis [23]. The pathway map was generated via maftools as previously described [34].
Classification model for identifying molecular types
CoxNet survival analyses and least absolute shrinkage and selection operator (LASSO) regressions were performed via the scikit-learn package in Python (version 3.9.7). K‒M survival curves (log-rank test) were used for overall survival (OS) analysis via the survival and survminer packages in R (version 4.1.2).
-
(1)
Selection of 24 initial features in the discovery cohort
Molecular features derived from SNVs were selected to develop subsequent classification models.
First, to identify genes with high variability for use in clustering, those with low mutation frequencies were filtered out. This process led to the selection of genes with a mutation frequency greater than 5% for clustering purposes. A total of 150 genes were screened for further analysis.
In the second phase, OS was analyzed as the dependent variable, with the aforementioned 150 genes serving as independent variables. The feature selection steps were as follows: (1) a comprehensive approach involving CoxNet survival analyses and LASSO regressions was employed to identify characteristic genes; (2) the importance score of each feature in the model was calculated and ranked in descending order; (3) features that ranked in the top 50% were recorded as potential features once; (4) processes 1-3 were repeated 500 times, and all potential features were recorded; and (5) among the feature sets obtained in the above process, those recorded more than 250 times were selected.
Ultimately, 24 feature genes, namely, EP300, KMT2D, MPEG1, TUSC3, MYD88, PTCH1, DST, ZNF521, BTG1, NOTCH2, PRDM1, ADGRL3, CDH11, CREBBP, ATRX, TCL1A, GNA13, ETV6, IRF4, HIST2H3D, CD79B, GRM8, MYH11, and SPTA1, were selected.
-
(2)
Differential feature selection
To determine the most suitable features among the initial 24 for the classification model, Kaplan‒Meier analyses were conducted. Features qualifying for further analysis met the following criteria: a log-rank test P value of less than 0.2 and any group with more than 5 patients. Ultimately, 6 genes, namely, EP300, CREBBP, DST, MYD88, BTG1, and IRF4, were identified.
-
(3)
Model training utilizing random combinations of 6 differential features
To facilitate prognostic assessment and individualized management in accordance with clinical practice guidelines, patients with PCNSL were categorized into three prognostic subgroups-good, intermediate, and poor-based on their actual OS. Six differential features were ultimately selected for building a classification model to classify patients into the three subtypes. These features were randomly combined to form three types, yielding a total of 288 combinations. To identify the most appropriate classification model among these combinations, Kaplan‒Meier analyses were performed. Models were chosen on the basis of a log-rank test P value of less than 0.05 and subgroup sample sizes greater than 8. Ultimately, seven candidate molecular subtype combinations met these criteria: (molecular markers shown in the format of subtype 1 vs. subtype 2 vs. subtype 3): ① EP300 (predominant) vs. MYD88/BTG1/IRF4 ≥ 2 vs. others; ② EP300 vs. MYD88/BTG1/IRF4 ≥ 2 (predominant) vs. others; ③ BTG1 (predominant) vs. CREBP/DST ≥ 1 vs. others; ④ BTG1 vs. CREBP/DST ≥ 1 (predominant) vs. others; ⑤ BTG1/IRF4 ≥ 1 vs. EP300/CREBP/DST ≥ 1 vs. others; ⑥ BTG1/IRF4 ≥ 1 vs. EP300/DST ≥ 1 vs. others; and ⑦ CREBP/DST ≥ 1 vs. MYD88/BTG1/IRF4 ≥ 2 vs. others.
-
(4)
Validation of candidate molecular subtypes
The WES data and OS data of 82 patients in the validation cohort were included to validate the candidate molecular subtypes. The mutation frequencies of EP300, CREBBP, DST, MYD88, BTG1, and IRF4 in the validation cohort were greater than 5%. Kaplan–Meier survival analysis was used to validate the seven candidate molecular subtype combinations. The performance of [EP300 (predominant) vs. MYD88/BTG1/IRF4 ≥ 2 vs. others] (3-way P = 0. 0202) and [EP300 vs. MYD88/BTG1/IRF4 ≥ 2 vs. others (predominant)] (3-way P = 0. 0346) was better than those of the other five molecular subtypes (3-way P > 0.05 for both). Additionally, compared with the [EP300 vs. MYD88/BTG1/IRF4 ≥ 2 vs. others (predominant)] molecular subtype, the Kaplan–Meier survival curve of [EP300 (predominant) vs. MYD88/BTG1/IRF4 ≥ 2 vs. others] showed no crossover. Finally, the performance of [EP300 (predominant) vs. MYD88/BTG1/IRF4 ≥ 2 vs. others] was the best, so we selected it as the molecular subtype of Chinese PCNSL patients. In other words, PCNSL patients with both EP300 and BMI (BTG1, IRF4, and MYD88) variants were categorized into the E3 subtype.
Hematoxylin & eosin and immunohistochemical staining
PCNSL tumor tissue was fixed with 10% paraformaldehyde and embedded in paraffin. The paraffin-embedded tissues were cut into 4-µm-thick sections. These sections were then subjected to staining with hematoxylin and eosin (H&E) or for immunohistochemistry. For H&E staining, the sections were stained with hematoxylin and eosin following standard protocols (ab245880, Abcam). Immunohistochemistry was performed for ki-67 (ab15580, Abcam), CD2 (ab314761, Abcam), CD5 (ab75877, Abcam), CD10 (ab256494, Abcam), CD19 (ab134114, Abcam), CD20 (ab78237, Abcam), CD79a (ab79414, Abcam), BCL2 (MA5-11757, Thermo Scientific), BCL6 (PA5-14259, Thermo Scientific), MUM1 (ab247079, Abcam), c-Myc (MA1-980, Thermo Scientific), and p53 (MA5-14516, Thermo Scientific) via an automated Leica BOND-III staining system. H&E and immunohistochemistry images were obtained with a KFBIO scanner (KF-PRO-005-EX, Zhejiang, China) and visualized via KFBIO SlideViewer software. Each staining was independently reviewed by at least 2 pathologists.
In situ hybridization of fluorescence
Fluorescence in situ hybridization was performed in accordance with standard protocols via the following commercial probes: MYC break-apart, BCL-2 break-apart, and BCL-6 break-apart probes (LBP Medicine Science & Technology Co., Ltd., Guangzhou, China). Staining was independently evaluated by two hematopathologists, and discrepancies were resolved by another hematopathologist.
HIV, HBV, and EBV detection
Antibody and antigen levels of human immunodeficiency virus (HIV) were measured via an Elecscy HIV combi PT assay kit (Roche, Germany) with a Cobas e 601 analyzer (Roche). Hepatitis B virus (HBV)-specific PCR was performed via an HBV virus nucleic acid test kit (Sansure Biotech, China) with an Applied Biosystems 7500 Real-Time PCR System (Thermo Scientific). EBV-encoded RNA in situ hybridization (EBER-ISH) was performed to assess the presence of Epstein-Barr virus (EBV) in tumor tissue. The testing utilized a fluorescein isothiocyanate -coupled specific peptide nucleic acid probe (Roche Diagnostics, Mannheim, Germany).
Statistical analysis
All statistical analyses were performed using R version 4.1.2, GraphPad Prism version 9, and Python version 3.9.7. The specific statistical analyses employed for each figure are detailed in the respective legends. We used PASS software to calculate the sample size for our study, with a significance level of 0.05 and a power of 0.80. The observed group proportions were 0.6, 0.2, and 0.2, whereas the expected proportions were 0.33 for each group. The sample size required to meet these criteria was 25 participants. Consequently, the sample size utilized in this study is deemed adequate. Whole-exome sequencing data were processed via nf-core/sarek v3.4.2 [35] of the nf-core collection of workflows [36], which uses reproducible software environments from the Bioconda [37] and Biocontainers [38] projects. The pipeline was executed with Nextflow v24.04.2 [39].
Results
PCNSL genomic landscape
Using WES, we detected mutations in 140 patients who were newly diagnosed with PCNSL at one of three cancer centers; in 59% of cases, blood samples were lacking. Because EBV-positive PCNSL is recognized as an immunodeficiency-associated PCNSL in the current WHO CNS5 classification [40, 41], all participants in the study who tested negative for EBV were included. Additionally, DLBCL patients with HBV and HIV infection may have different mutational profiles [42, 43]; thus, all participants in the study who tested negative for HIV and HBV were included. The key demographic and clinical characteristics of the patients are summarized in Figure S1 and Table S1. The study design and flow chart are displayed in Fig. 1A. After filtering, we identified 115,938 genetic variation events in the 140 PCNSL samples analyzed (Figure S2A, Figure S3A).
A Study design, including workflow and data composition for the discovery cohort and validation cohort. B Number and frequency of recurrent mutations. Gene‒sample matrix of recurrently mutated genes ranked by mutation frequency (n = 140). The total mutation density across the cohort is displayed at the top, with the variant allele fraction and trinucleotides at the bottom. C GISTIC2.0 results of significant recurrent focal amplifications (red) and deletions (blue). Genes affected by each focal event are annotated (n = 140). X-axis: plot of chromosomes; Y-axis: G score.
We applied a filter to the 140 PCNSLs on the basis of the genes curated in the CIViC and OncoKB databases to identify candidate cancer genes with hallmark mutations in PCNSL. Some of the mutated genes were PIM1 (62.1%), MYD88 (55.0%), KMT2D (48.6%), and so on (Fig. 1B), which are involved in chromatin histone modification, BCR-TLR-mediated NF-κB signaling, immune signaling, the cell cycle, PI3 kinase signaling, MAPK signaling, and Wnt/β-catenin signaling (Figure S2B) [3, 5, 9,10,11, 44,45,46,47,48,49,50,51]. In this study, we also identified several other pathways in PCNSL, such as pathways related to genome integrity, RTK signaling, RNA abundance, TGFB signaling, TOR signaling, and apoptosis (Figure S2B), many of which have defined roles in other cancers [34]. The mutually exclusive or cooccurring relationships of these genes are shown in Figure S3D. Survival associated with candidate cancer genes with mutation frequencies greater than 15% plus the NOTCH2 and EZH2 genes [9] is shown in Figure S4.
In the search for focal copy number alterations (CNAs) in the 140 PCNSLs, we detected significant recurrent amplifications at chromosomal locations 1q21.1, 4q16.3, 11p15.5, and so on (Fig. 1C). The arm-level CNAs among all 140 PCNSL samples are displayed in Figure S2C. The pattern of somatic mutations caused by the different mutational processes in the genome, termed mutational signatures, was calculated and compared to the well-established signatures in COSMIC [52]. The mutational signature contributions are shown in Figure S2D. The COSMIC 5, COSMIC 45, and COSMIC 1 signatures contributed the most to the PCNSL genome (Figure S2E). Taken together, these results establish a comprehensive genomic landscape of PCNSL in Chinese individuals.
Comparison of the genomic landscapes of the discovery and validation cohorts
Due to differences in sample types (fresh vs. FFPE) and matched sample availability between the discovery and validation cohorts, additional filtering was applied to the validation cohort to reduce false positives prior to genomic landscape comparison. Key clinical features are summarized in Table 1. We identified 12,222 genetic variation events (Figure S3B, Figure S5A) in the analyzed discovery cohort of PCNSL samples and 103,716 genetic variation events (Figure S3C, Figure S5B) in the analyzed validation cohort of PCNSL samples. The genetic variation profiles of the discovery and validation cohort of PCNSL samples were similar, as revealed by principal component analysis (Fig. 2A), but the mutation rate was higher in the validation cohort of PCNSL samples (Fig. 2B).
A Principal component analysis of whole-exome sequencing (WES) data between the discovery cohort and validation cohort of PCNSL patients. B Mutation rate of WES data between the discovery cohort and validation cohort of PCNSL patients. C A mirror bar plot showing the frequencies of genetic alterations in the discovery cohort of PCNSL patients compared with those in the validation cohort. D Arm-level copy number alterations of the discovery cohort and validation cohort samples are displayed. The frequencies of amplifications and deletions were compared. E Comparison of the tumor mutational burden (TMB) between the discovery cohort and the validation cohort of PCNSL patients. F The levels of mutant-allele tumor heterogeneity (MATH) in the discovery cohort and the validation cohort of PCNSL patients were compared. G The levels of microsatellite instability (MSI) in the discovery cohort and the validation cohort of PCNSL patients were compared. H The variant allele frequency (VAF) in the discovery cohort and the validation cohort of PCNSL patients were compared. The discovery cohort included blood-paired fresh tumor tissue samples (n = 58), and the validation cohort included unpaired FFPE tumor tissue samples (n = 82). Independent-samples t tests and chi-squared tests were used. *P < 0.05; **P < 0.01; ***P < 0.001.
PIM1, MYD88, CD79B, and KMT2D were the most frequently mutated candidate cancer genes of PCNSL in the discovery cohort (Figure S5C) and validation cohort (Figure S5D). However, several of the less frequently mutated candidate cancer genes in PCNSL whose mutation frequencies were significantly different when the discovery and validation cohorts were compared included ZFHX3, TRRAP, SZT2, and so on (all P < 0.05) (Fig. 2C). The arm-level CNAs between the discovery and validation cohort of PCNSL samples were mostly similar and are displayed in Fig. 2D. Significant differences in both recurrent amplifications (1q, 8q, and 12p) and deletions (6p, 8p, 19p, and 19q) were observed. The results of significant recurrent focal amplifications and deletions were visualized via GISTIC 2.0 in the discovery (Figure S5E) and validation (Figure S5F) cohorts.
Although the TMB (Fig. 2E), MATH score (Fig. 2F), microsatellite instability (MSI) (Fig. 2G), and the VAF (Fig. 2H) were greater in the validation cohort than in the discovery cohort (P < 0.001), the mean values were relatively close. The 3 signatures extracted from the discovery samples presented cosine similarities of 86.7%, 93.1% and 94.4% to COSMIC signatures 6, 45, and 10a, respectively (Figure S5G), whereas those extracted from the validation samples presented similarities of 80.4%, 70.9%, and 77.9% to COSMIC signatures 5, 1, and 23, respectively (Figure S5H). Taken together, these results suggest that the genetic variation profiles of the discovery and validation PCNSL samples exhibit certain discrepancies.
The unique genomic landscape of PCNSLs in Chinese patients
The frequencies of recurrent genetic alterations in PCNSL patients were compared with those in DLBCL patients [49]. Mirror bar plots (Fig. 3A) revealed that a set of genes, such as PIM1 (62.1% vs. 16.88%), MYD88 (55% vs. 18.01%), IRF4 (15% vs. 3.98%), and EP300 (19.29% vs. 5.97%), had significantly higher mutation rates in Chinese PCNSL patients than in DLBCL patients, suggesting genetic diversity of the cancer genome between diseases. Compared with those in PCNSLs in French patients [14] (Fig. 3B), the mutation frequencies of EP300 (19.29% vs. 3.48%) and KMT2D (48.6% vs. 22.61%) were significantly higher in the PCNSLs of Chinese patients. Compared with those in the PCNSLs [3] of Japanese patients (Fig. 3C), the mutation frequencies of MYD88 (55% vs. 85.6%) and BTG2 (38.6% vs. 87.8%) were significantly lower in the PCNSLs of Chinese patients. These data revealed a unique mutational landscape of Chinese PCNSL patients in terms of the frequency of SNVs. Using GISTIC2.0, we compared the frequencies of recurrent SNAs between Chinese PCNSL and DLBCL patients. At the focal level, Chinese PCNSL samples showed significant deletions in 11q12.4, 18p11.21, 19q13.42, 22q11.1, and 22q11.21, and amplifications in 1q21.1, 4p16.3, 11q12.2, and 17q12 (Fig. 3D). Similar alterations were observed when compared with Japanese PCNSL samples, highlighting distinct genomic features in Chinese patients (Fig. 3E). These results suggest genetic diversity in the PCNSL genome between races.
A A mirror bar plot showing the frequencies of genetic alterations between Chinese PCNSL patients (n = 140) and DLBCL patients (n = 955). B A mirror bar plot showing the frequencies of genetic alterations between Chinese PCNSL patients (n = 140) and French PCNSL patients (n = 115). C A mirror bar plot showing the frequencies of genetic alterations between Chinese PCNSL patients (n = 140) and Japanese PCNSL patients (n = 41). D GISTIC-defined recurrent copy number focal deletions (blue) and gains (red) as mirror plots in DLBCL (TCGA, n = 46) and Chinese PCNSLs (n = 140) are shown. X-axis: plot of chromosomes; Y-axis: G score. E The GISTIC2.0-defined recurrent copy number focal deletions (blue) and gains (red) in Japanese PCNSL patients (n = 41) and Chinese PCNSL patients (n = 140) are shown as mirror plots. Y-axis: plot of chromosomes; X-axis: G score. The chi-squared test was used. *P < 0.05. (A-C) Mirror plots showing the mutation frequencies of genes present in the intersection of the top 100 most frequently mutated genes from each disease cohort. In the figure legend of A-C, ‘Inferred’ means that the mutation types were not directly provided but were calculated from the supplemental tables of the corresponding paper; thus, the results are ‘Inferred’, and the in facto mutation number will not be less than the inferred value.
Owing to the genetic diversity of the PCNSL genome across races and the genetic diversity between the PCNSL and DLBCL cancer genomes, the existing molecular subtyping methods for PCNSL [14] and DLBCL [9,10,11,12, 53, 54] may not be applicable to Chinese PCNSL patients. We assessed the prognostic value of these published molecular subtypes for OS and progression-free survival (PFS) in our Chinese PCNSL cohort. Actionable PCNSL classification can perfectly differentiate prognosis, thereby improving precision medicine strategies. According to cell-of-origin molecular subtyping [53, 54], there were no significant differences in OS (P = 0.42) or PFS (P = 0.67) between the germinal center type (GCB) and non-GCB subtypes (Figure S6A). Similarly, according to LymphPlex molecular subtyping [12], no significant differences in OS (P = 0.68) or PFS (P = 0.71) were observed between the BN2-like, MCD-like, EZB-like, N1-like, TP53, and other subtypes (Figure S6B). We further categorized our data on the basis of the DLBCL subtyping (C0-C5) proposed by M. A. Shipp et al. [10] Within our cohort, the C4 subtype was associated with shorter OS (P = 0.009, P = 0.013) and PFS (P = 0.096, P = 0.047) than the C0 and C5 subtypes, respectively (Figure S6C). However, the overall efficacy of this DLBCL subtyping system in predicting outcomes was less than satisfactory (OS, P = 0.074). The data were also analyzed via the five subtypes (BN2, EZB, MCD, N1, and others) [9] and seven subtypes (A53, BN2, EZB, MCD, N1, ST2, and others) [11] of DLBCL proposed by Louis M Staudt et al. There were no significant differences in OS (P > 0.05) or PFS (P > 0.05) between BN2, N1, and other categories within the 5-subtype model (Figure S6D). Overall, the seven-subtype model (Figure S6E) had inferior performance, with no significant differences observed in OS (P = 0.61) or PFS (P = 0.47). With respect to the four subtypes (CS1-4) of PCNSL proposed by A Alentorn [14], our data revealed that the OS (P = 0.12) and PFS (P = 0.54) of patients with the CS3 subtype were shorter than those of patients with the other three subtypes (Figure S6F), but the differences were not significant. Furthermore, the double-hit gene expression signature defines a distinct subgroup of germinal center B-cell-like DLBCL, which is associated with a poor prognosis [13]. However, in this study, there were no significant differences in OS (P > 0.05) or PFS (P > 0.05) between the double- or triple-hit PCNSL patients and the other PCNSL patients, as shown in Figure S7. Taken together, these results suggest that the molecular subtyping of PCNSL and DLBCL, as previously published, may not be fully applicable to Chinese PCNSL patients.
PCNSL molecular subtypes with clinical outcome implications
To establish a molecular classification system for PCNSL, consensus clustering was conducted based on WES data from 58 patients in the discovery cohort who received HD-MTX-based immunochemotherapy. The resulting molecular subtypes were validated in an independent cohort of 82 patients treated with the same regimen and further confirmed using WES/WGS data from an external Chinese cohort of 36 newly diagnosed PCNSL patients [17].
The molecular classification scheme, presented in Fig. 4A, Table S2 and Figure S8, was determined independently of clinical information and finalized before the analysis of clinical data, allowing us to analyze the relationships between genetic subtypes and survival in this entire cohort. Finally, EP300, BTG1, MYD88, and IRF4 were included in the molecular subtyping of Chinese PCNSLs.
A Schematic of the typing strategies used to identify PCNSL molecular subtypes in the discovery cohort (n = 58) and in the validation cohort (n = 82). B Kaplan‒Meier estimates of overall survival (OS) and progression-free survival (PFS) among patients belonging to each molecular subtype in the discovery cohort. C Kaplan‒Meier estimates of overall survival (OS) and progression-free survival (PFS) among patients belonging to each molecular subtype in the validation cohort. D Kaplan‒Meier estimates of overall survival (OS) among patients belonging to each molecular subtype in an independent external cohort (n = 36). E The prevalence of EP300 mutations in PCNSL patients of different races. F Kaplan‒Meier analyses for comparisons of the overall survival (OS) and progression-free survival (PFS) of patients with mutations within and those with mutations outside the EP300 domain. G Multivariate Cox regression analysis after adjusting for key clinical confounders, including age, IELSG score, MSKCC score, LDH levels, deep brain location, CSF protein levels, and consolidation therapy (WBRT, stem cell transplant), in PCNSL patients (n = 140).
All nonsynonymous mutations in EP300, BTG1, MYD88, and IRF4 were visualized within the functional domains of the encoded proteins (Figure S9). As previously reported [10, 55, 56], MYD88 (L265P) was the most prevalent mutation, while EP300, BTG1, and IRF4 all harbored scattered mutations. We evaluated the prognostic significance of these four identified genes for OS and PFS. The survival outcomes of patients with EP300 mutations were significantly poorer than those of patients with wild-type EP300, and patients harboring mutations in either BTG1 (OS, P = 0.017; PFS, P = 0.12), MYD88 (OS, P = 0.016; PFS, P = 0.069), or IRF4 (OS, P = 0.045; PFS, P = 0.018) had favorable OS (Figure S10A) and PFS (Figure S10B) relative to individuals with their wild-type counterparts.
In the discovery cohort, we identified three subtypes of PCNSL: BMI (n = 16, 27.59%), E3 (n = 9, 15.52%), and UC (n = 33, 56.90%). The E3 subtype was defined as those with mutations in the EP300 gene. The PCNSLs that carried mutations in at least two of the three genes, BTG1, MYD88, and IRF4, were characterized as the BMI subtype. The other PCNSLs that did not fit into either of these categories were classified under the UC subtype (Unclassified). The three subtypes differed significantly in OS (P < 0.001) and PFS (P = 0.043), the BMI subtype (P < 0.001, P = 0.023) had much more favorable outcomes than the E3 and UC subtypes did, and the E3 subtype (P < 0.001, P = 0.037) had far worse outcomes than the BMI and UC subtypes did (Fig. 4B). These differences between the three subtypes of PCNSL were detected in the validation cohort (OS, P = 0.020; PFS, P = 0.048; n = 82, tumor-only sample), and patients with E3 had significantly shorter survival times than those with BMI (OS, P = 0.007) or UC (OS, P = 0.043) (Fig. 4C). Similar results were observed in the entire cohort (n = 140, OS, P < 0.0001; PFS, P = 0.0004; Figure S10C): 26 patients were classified as BMI (18.57%), 27 as E3 (19.29%), and 87 as UC (62.14%). Furthermore, the raw WES or WGS data, along with follow-up data from an independent cohort of 36 Chinese patients with PCNSL [17], were obtained. The three identified subtypes significantly differed in OS (P = 0.0004; Fig. 4D). Specifically, the BMI subgroup tended to have more favorable outcomes than the E3 and UC subgroups (P = 0.0004 and P = 0.005, respectively), whereas the E3 subtype exhibited poorer outcomes compared with the BMI and UC subgroups (P = 0.0004 and P = 0.0231, respectively). We additionally conducted sensitivity analyses stratified by sequencing platform (WGS vs. WES). The WGS-based analysis demonstrated significant differences in OS among the three subtypes (P = 0.0050): the BMI subtype had significantly more favorable outcomes than the E3 and UC subtypes (P = 0.0101 and P = 0.0500, respectively), whereas the E3 subtype had significantly worse outcomes compared with both BMI and UC subtypes (P = 0.0101 and P = 0.0280, respectively) (Figure S11A). Due to the limited sample size, the WES-based analysis (n = 12) did not reveal statistically significant differences among the three subtypes (P = 0.1774); however, a consistent trend was observed, with the E3 subtype showing the poorest prognosis and the BMI subtype the most favorable (Figure S11B). While these findings support the potential clinical relevance of our molecular classification, they warrant further validation in larger cohorts given the limited sample size.
BTK inhibitors have been shown to improve the prognosis of PCNSL patients by blocking B-cell receptor signaling pathways, which influence the growth and survival of B cells. Consequently, we further analyzed whether the use of BTK inhibitors affects the robustness of our classification. There were no significant differences in OS (P > 0.05, Figure S12A) or PFS (P > 0.05, Figure S12B) between patients who received BTK inhibitors (n = 8) and those who did not receive BTK inhibitors (n = 132). Additionally, after initial treatment options (HD-MTX combined with IDA, HD-MTX combined with R, HD-MTX combined with IDA and R, or a combination of BTK inhibitors) and consolidated therapy (WBRT, stem cell transplant) were adjusted, multivariable Cox regression analysis revealed that the use of different treatment options did not affect the robustness of our classification (Table S3).
When High Grade B-cell lymphoma cases (double hit) were excluded, significant differences in OS among the three subtypes remained (P < 0.0001): the BMI subtype had significantly more favorable outcomes than the E3 and UC subtypes (P = 0.0011 and P = 0.0004, respectively), whereas the E3 subtype had significantly worse outcomes compared with both BMI and UC subtypes (P = 0.0101 and P = 0.0158, respectively) (Figure S13A). Similar results were observed in both the discovery (Figure S13B) and validation cohorts (Figure S13C).
For the E3 subtype, the mean mutation prevalence in the PCNSLs of Asian patients [3, 16, 17] (16.98%) was significantly higher than that in the PCNSLs of Western patients [4, 5, 14, 57,58,59] (4.53%) (Fig. 4E). Patients with the E3 subtype had significantly poorer survival, which was not affected by the mutation site (OS, P = 0.82; PFS, P = 0.69; Fig. 4F). Univariate Cox regression analysis revealed that PCNSL molecular subtypes are associated with clinical outcomes (Table S4), which was also confirmed by multivariate Cox regression hazard ratio analysis after adjusting for important confounders (age, IELSG, MSKCC, LDH levels, deep brain location, CSF protein levels, and consolidated therapy [WBRT, stem cell transplant]) (Fig. 4G). These findings provide supportive evidence for a potential association between PCNSL molecular subtypes and clinical outcomes. The key demographic and clinical characteristics of the three MS patients are summarized in Table S5.
We also constructed a Sankey diagram to visualize the correspondence between our samples and the previously described genetic subtypes of PCNSL and DLBCL (Figure S14). The Chinese PCNSL subtypes were significantly distinct from those previously described. Taken together, these results indicate that three unique molecular subtypes of Chinese PCNSL are associated with clinical outcomes.
Genomic landscape and tumor microenvironment across PCNSL subtypes
We further aimed to determine whether these three molecular subtypes of Chinese PCNSL patients could affect the mutational landscape, oncogenic pathways and tumor microenvironment. Intergroup mutation distribution analysis was performed to identify unique and shared mutations. A total of 6,102, 41,548, and 73,440 mutated genes were identified in the BMI, E3, and UC subtypes, respectively, with 518 mutated genes shared among the three subtypes (Fig. 5A and S15A). Key candidate cancer genes in the 140 PCNSL patients, grouped by these three subtypes, are presented in Fig. 5B. The mutation frequencies of EP300, MYD88, IRF4, and so on were significantly different (P < 0.05) among these three subtypes. Arm-level CNAs among the samples of the three subtypes are displayed in Fig. 5C, showing significant differences in recurrent deletions (in 6q and 12p) but not in amplifications (Fig. 5D). The results of significant recurrent focal amplifications and deletions were visualized by GISTIC 2.0 in the BMI, E3, and UC subtypes (Figure S15B). There was no significant difference (P > 0.05) in the MATH (Fig. 5E) or MSI (Fig. 5F) scores among the three subtypes, but the TMB was greater in the E3 subgroup than in the BMI (P < 0.0001) or UC subgroup (P < 0.001) (Fig. 5G). Furthermore, the VAF was lower in the E3 subgroup than in the BMI (P < 0.0001) or UC subgroup (P < 0.001) (Fig. 5H).
A Venn diagram showing unique and shared mutations among the three identified PCNSL subtypes. B Number and frequency of recurrent mutations and the gene‒sample matrix of recurrently mutated genes among the three PCNSL subtypes. The relative abundance across the molecular subtypes is displayed on the right. C Arm-level copy number alterations among the three molecular subtype samples are displayed. D The frequencies of amplifications and deletions were compared among the three molecular subtype samples. E The levels of mutant-allele tumor heterogeneity (MATH) were compared among the three molecular subtype samples. F The levels of microsatellite instability (MSI) were compared among the three MS samples. G The tumor mutational burden (TMB) was compared among the three MS samples. H The levels of variant allele frequency (VAF) were compared among the three MS samples. Independent-samples t tests and chi-squared tests were used. *P < 0.05; **P < 0.01; ***P < 0.001; ns P > 0.05.
The contributions of mutational signatures across the three subtypes are illustrated in Figure S15C. The three signatures extracted from the E3 subtype samples displayed cosine similarities of 82.0%, 93.8%, and 95.2% to the COSMIC signatures 6, 10a, and 45, respectively (Figure S15D). In contrast, those extracted from BMI subtype samples presented similarities of 77.3%, 81.7%, and 58.3% to COSMIC signatures 5, 84, and 87, respectively (Figure S15D). The signatures from the UC subtype samples were 80.8%, 70.3%, and 91.7% and similar to the COSMIC signatures 5, 1, and 45 (Figure S15D), respectively.
H&E and immunohistochemical staining of CD2, CD5, CD10, CD19, CD20, CD79a, Ki-67, BCL2, BCL6, MUM1, c-Myc, and p53 from samples of different PCNSL subtypes revealed significant malignant progression in patients with the E3 subtype of PCNSL, as shown in Figure S16A and Table S6. Although PCNSLs generally have a high proliferative tendency and high Ki-67 expression, the E3 subtype is more poorly differentiated and aggressively growing, as reflected by its Ki-67 expression levels, compared with the other two subtypes. Immunostaining revealed a greater presence of CD2 + , CD5 + , CD10 + , CD19 + , CD20 + , and CD79a+ cells in the E3 subtype. Additionally, a greater presence of BCL2 + , BCL6 + , MUM1 + , and c-Myc+ cells, which are markers associated with the diagnosis of B-cell lymphomas [60], was observed in the E3 subtype.
To explore potential therapeutic strategies for the genetic subtypes of PCNSL, we examined groups of genetic aberrations that target oncogenic signaling pathways (Figure S16B). Genetic events affecting NF-κB regulators that negatively regulate the stability of NF-κB-dependent mRNAs were detected in 85.2% and 84.6% of the E3 and BMI subtype samples but not in the UC subtype samples (P < 0.05). These findings suggest that the E3 and BMI subtypes may be more responsive to BTK inhibitors [61]. The PI3 kinase pathway, a pathway that can indirectly activate NF-κB, is genetically altered in 66.7% of E3 patients [62]. We also noted a higher frequency of genetic alterations in chromatin histone modifiers (100%), RTK signaling (77.8%), protein homeostasis/ubiquitination (66.7%), cell cycle pathways (77.8%), genome integrity (82.5%), Wnt/B-catenin signaling (59.3%), the chromatin SWI/SNF complex (70.4%), and splicing (37%) in E3 patients. Therefore, a combination of cyclin D-Cdk4,6 and PI3 kinase inhibitors might be beneficial for patients with the E3 subtype.
Taken together, these results suggest that the three molecular subtypes of PCNSL in Chinese patients each have a unique genomic landscape, tumor microenvironment, and oncogenic pathways.
Mutational landscape and oncogenic pathways in PCNSLs with different sites of onset
Next, we aimed to determine whether the site of PCNSL onset in Chinese patients influences the mutational landscape and oncogenic pathways. Figure 6A shows that the areas affected by PCNSL included the basal ganglia, brainstem, cerebellum, corpus callosum, frontal lobe, occipital lobe, parietal lobe, temporal lobe, thalamus, ventricles, and combinations of multiple sites. The most frequently involved sites were the frontal lobe (22.14%), temporal lobe (15.71%), and basal ganglia (8.57%).
A The number and percentage of PCNSL patients with various molecular subtypes at different disease onset sites. B Venn diagram of unique and shared mutations among PCNSLs with different sites of onset. C The number and frequency of recurrent mutations, along with a gene‒sample matrix of recurrently mutated genes in PCNSL patients at different sites of onset, are presented. The relative abundance across the molecular subtypes is displayed on the right. D Arm-level copy number alterations across different sites of onset are displayed. The frequencies of amplifications and deletions were compared. E The levels of mutant-allele tumor heterogeneity (MATH) were compared among the samples representing different sites of onset. F Comparisons of the tumor mutational burden (TMB) among the different sites of onset samples were performed. G The levels of microsatellite instability (MSI) were compared among the samples representing different sites of onset. H Pathways affected by oncogenes in PCNSL patients with different sites of onset. The chi-squared test and one-way ANOVA were used. *P < 0.05. Comparisons without asterisks are not statistically significant (P > 0.05).
First, we found that there was no significant difference in the proportion of different disease sites among the three subtypes (chi-squared test, corpus callosum, P = 0.887; basal ganglia, P = 0.645; thalamus, P = 0.317; brainstem, P = 1.000; ventricle, P = 0.631; front lobe, P = 0.576; multiple, P = 0.949; temporal lobe, P = 0.214; cerebellum, P = 0.831; occipital lobe, P = 0.778; parietal lobe, P = 0.887).
Next, we explored whether the tumor sites differed between the discovery and validation cohorts. As shown in Figure S17A, there was no significant difference in the proportions of different disease sites between the discovery and validation cohorts (chi-square test P = 0.610). Furthermore, the proportions of deep brain lesions in the discovery and validation cohorts were not significantly different (P = 0.478).
Sixteen mutated genes were represented among all the sites (Fig. 6B). Key candidate cancer genes in the 140 PCNSL patients, grouped by site of disease onset, are presented in Fig. 6C. Arm-level CNAs among the samples of these sites are displayed in Fig. 6D, and significant differences in both recurrent amplifications (2q, 10q, and 10p) and deletions (9p) were observed. The difference in MATH scores across these sites was not significant (P > 0.05; Fig. 6E). However, significant differences in TMB (P < 0.05; Fig. 6F) and MSI scores (P < 0.05; Fig. 6G) were observed between several sites.
Next, we further divided the tumor sites into deep brain and shallow brain tumor subgroups. A total of 854 (81%) mutated genes were represented between the deep brain and shallow brain tumor subgroups (Figure S17B). The mutational landscape between deep brain and shallow brain tumors is shown in Figure S17C. The mutation frequencies of LRP1B, FAT1, KMT2B, CIC, PTPRB, MAP3KI, ARID1B, and CPS1 in the deep brain and shallow brain tumors were significantly different (P < 0.05). Arm-level CNAs between deep brain and shallow brain tumors are displayed in Figure S17D, and no significant differences (P > 0.05) in either recurrent amplification or deletion were observed. There was no significant difference (P > 0.05) in the MATH (Figure S17E), TMB (Figure S17F), or MSI (Figure S17G) score between deep brain and shallow brain tumors.
We further conducted pathway enrichment analysis (Fig. 6H) and revealed significant involvement of NF-κB signaling in the tumors of patients with onset in the parietal lobe and brainstem, each with an involvement rate of 100%. Furthermore, we noted a significant difference (P < 0.05) in the frequency of genetic alterations in the chromatin SWI/SNF complex but not in the other pathways (Figure S17H). Taken together, these results indicate that the mutational landscape in Chinese patients with PCNSL varies between sites of onset. Larger studies are needed to confirm this observation.
Discussion
Owing to the genetic, phenotypic, and tumor microenvironment heterogeneity of PCNSL, identifying classification and prognostic biomarkers for patients is extremely challenging, which in turn makes developing effective new therapies challenging. Here, we performed a study of 176 Chinese PCNSL patients that was adequately powered to expand the genomic landscape and explore its clinical significance utilizing an unprecedented sample size and a multicenter approach. We defined recurrent mutations, CNAs, and associated cancer candidate genes in Chinese PCNSL patients and compared these comprehensive genetic signatures with those of PCNSL patients and systemic DLBCL patients of other races, revealing that genetic heterogeneity is a defining feature of PCNSL. Our results highlight the complexity of PCNSLs, which have a median of 19.06 mutations/Mb and a median of 949.5 variants per sample.
Although the genetic variation profiles of the PCNSL discovery and validation samples were similar, notable differences were evident in the mutation frequencies of certain genes, the TMB, the MSI, and the MATH score. These discrepancies may largely be attributed to differences in specimen type—fresh frozen tissue in the discovery cohort versus FFPE in the validation cohort—and the lack of paired samples in the latter [63]. FFPE specimens are known to introduce characteristic artefacts, including DNA fragmentation, crosslinking, and cytosine deamination, which can compromise sequencing quality and affect downstream variant calling. Furthermore, genetic heterogeneity between patient populations [15] and biological differences between PCNSL and DLBCL [13] may be the main reasons why the published classifications for DLBCL and PCNSL are not applicable to Chinese PCNSL patients. Currently, the classification frameworks for systemic DLBCL offer valuable insights into the underlying biology based on molecular mechanisms. However, it is important to note that biological insights do not necessarily correlate with clinical outcomes, especially given that treatment paradigms may evolve over time.
Importantly, in our study, we identified three potential molecular subtypes of PCNSL in Chinese patients: BMI, E3, and UC. Distinct clinical outcomes, activated cellular pathways, and alterations in the genetic landscape distinguish the three subtypes and could be utilized to personalize therapy for patients with different subtypes. For the E3 subtype, the mean mutation rate was significantly higher in the PCNSLs of Chinese and Japanese patients (16.88%) than in those of Western PCNSL patients (4.53%). This difference may be attributed to variations in age, lifestyle, and EBV infection rates [17, 41, 64]. Patients with the E3 subtype had significantly poorer survival outcomes, a finding that is consistent with reported findings in DLBCL [65]. EP300 and CREBBP are two closely related members of the KAT3 family of histone acetyltransferases that function similarly [66]. However, in this study, the survival of patients with EP300 mutations was significantly poorer, whereas CREBBP mutations did not contribute to an inferior prognosis (Figure S4). This discrepancy might stem from the distinct roles of EP300 and CREBBP in PCNSL. Specifically, in a recent study, EP300, but not CREBBP, was reported to play an essential role in supporting the viability of classical Hodgkin’s lymphoma by directly modulating the expression of the oncogenic MYC/IRF4 network, surface receptor CD30, immunoregulatory cytokine interleukin 10, and immune checkpoint protein PD-L1 [67]. The molecular mechanisms through which EP300 mutations facilitate the progression of PCNSL are still not understood. We plan to explore these mechanisms in a forthcoming study.
Furthermore, the higher mutation frequencies observed (Fig. 3C) in the Japanese cohort may be partly explained by differences in sample size, sample type, patient age, and population genetics. The Japanese study included 41 fresh tumor tissues, whereas our cohort comprised 58 fresh frozen and 82 FFPE samples (total n = 140). Differences in sample preservation methods can affect DNA quality, sequencing performance, and mutation detection rates. In addition, the mean age of patients was higher in the Japanese cohort (63 vs. 59 years), which may reflect differences in clinical characteristics and disease biology [64]. Underlying genetic differences between Japanese and Chinese populations may also contribute to these discrepancies.
This study has several limitations. First, the proportion of unclassified PCNSLs (UC subtypes) is high, possibly due to the lack of multiomics data needed for classification. However, the three-class classification based on WES proposed in this study has strong clinical practicality and translational value, as it relies solely on the four genes identified through WES. In other words, clinicians can predict patient outcomes and further perform individualized management of PCNSLs by detecting mutations in these four genes without the need for additional multiomics testing. Second, the three identified molecular subtypes can be used to predict patient outcomes. However, it remains challenging to determine whether these three molecular subtypes can still be used to accurately distinguish biological subtypes without additional RNA-seq data layers.
In summary, by performing genomic sequencing on tumor specimens from 176 Chinese PCNSL patients, we identified three potential molecular subtypes that can be used to predict patient outcomes. These genetic signatures associated with PCNSL outcomes illuminate the path toward individualized management of PCNSL patients and the discovery of novel treatment strategies for this challenging disease.
Data availability
The raw sequence data have been deposited in the Genome Sequence Archive [68] in the National Genomics Data Center [69], China National Center for Bioinformation/Beijing Institute of Genomics, Chinese Academy of Sciences (GSA-Human HRA006122), which is publicly accessible at https://ngdc.cncb.ac.cn/gsa-human. Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request. The raw sequencing data of the DLBCL WES data are accessible from the TCGA database and were reanalyzed via pipelines and filter settings that were identical to those used in the present study. The raw WES data of the Japanese PCNSL patients were obtained from the Japanese Genotype–Phenotype Archive (JGA, http://trace.ddbj.nig.ac.jp/jga) and reanalyzed via pipelines and filtering settings that were identical to those used in the present study, which is hosted by DDBJ under the accession number JGAS00000000021 [3]. The raw sequencing WES/WGS data of an external independent Chinese PCNSL cohort (n = 36) [17] were obtained from the China National Center for Bioinformation/Beijing Institute of Genomics, Chinese Academy of Sciences and reanalyzed via pipelines and filtering settings that were identical to those used in the present study, under the accession number GSA-Human HRA002475. The frequencies of recurrent genetic alterations in DLBCLs [49] and French PCNSLs [14] are accessible from the manuscript or supplemental information and were not reanalyzed via identical pipelines and filtering settings as those used in the present study.
References
Chen T, Liu Y, Wang Y, Chang Q, Wu J, Wang Z, et al. Evidence-based expert consensus on the management of primary central nervous system lymphoma in China. J Hematol Oncol. 2022;15:136.
Ferreri AJM, Calimeri T, Cwynarski K, Dietrich J, Grommes C, Hoang-Xuan K, et al. Primary central nervous system lymphoma. Nat Rev Dis Prim. 2023;9:29.
Fukumura K, Kawazu M, Kojima S, Ueno T, Sai E, Soda M, et al. Genomic characterization of primary central nervous system lymphoma. Acta Neuropathol (Berl). 2016;131:865–75.
Braggio E, Van Wier S, Ojha J, McPhail E, Asmann YW, Egan J, et al. Genome-Wide Analysis Uncovers Novel Recurrent Alterations in Primary Central Nervous System Lymphomas. Clin Cancer Res. 2015;21:3986–94.
Radke J, Ishaque N, Koll R, Gu Z, Schumann E, Sieverling L, et al. The genomic and transcriptional landscape of primary central nervous system lymphoma. Nat Commun. 2022;13:2558.
Vater I, Montesinos-Rongen M, Schlesner M, Haake A, Purschke F, Sprute R, et al. The mutational pattern of primary lymphoma of the central nervous system determined by whole-exome sequencing. Leukemia. 2015;29:677–85.
Rosenwald A, Wright G, Chan WC, Connors JM, Campo E, Fisher RI, et al. The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. N Engl J Med. 2002;346:1937–47.
Wilson WH, Young RM, Schmitz R, Yang Y, Pittaluga S, Wright G, et al. Targeting B cell receptor signaling with ibrutinib in diffuse large B cell lymphoma. Nat Med. 2015;21:922–6.
Schmitz R, Wright GW, Huang DW, Johnson CA, Phelan JD, Wang JQ, et al. Genetics and Pathogenesis of Diffuse Large B-Cell Lymphoma. N Engl J Med. 2018;378:1396–407.
Chapuy B, Stewart C, Dunford AJ, Kim J, Kamburov A, Redd RA, et al. Molecular subtypes of diffuse large B cell lymphoma are associated with distinct pathogenic mechanisms and outcomes. Nat Med. 2018;24:679–90.
Wright GW, Huang DW, Phelan JD, Coulibaly ZA, Roulland S, Young RM, et al. A Probabilistic Classification Tool for Genetic Subtypes of Diffuse Large B Cell Lymphoma with Therapeutic Implications. Cancer Cell. 2020;37:551–568.e14.
Shen R, Fu D, Dong L, Zhang M-C, Shi Q, Shi Z-Y, et al. Simplified algorithm for genetic subtyping in diffuse large B-cell lymphoma. Signal Transduct Target Ther. 2023;8:145.
Swerdlow SH, Campo E, Pileri SA, Harris NL, Stein H, Siebert R, et al. The 2016 revision of the World Health Organization classification of lymphoid neoplasms. Blood. 2016;127:2375–90.
Hernández-Verdin I, Kirasic E, Wienand K, Mokhtari K, Eimer S, Loiseau H, et al. Molecular and clinical diversity in primary central nervous system lymphoma. Ann Oncol. 2023;34:186–99.
Yuan X, Yu T, Zhao J, Jiang H, Hao Y, Lei W, et al. Analysis of the genomic landscape of primary central nervous system lymphoma using whole-genome sequencing in Chinese patients. Front Med. 2023;17:889–906.
He X, Fan X, Shan Y, Ji X, Su L, Wang Y. Analysis of genomic alterations in primary central nervous system lymphoma. Medicine (Baltim). 2023;102:e34931.
Zhu Q, Wang J, Zhang W, Zhu W, Wu Z, Chen Y, et al. Whole-Genome/Exome Sequencing Uncovers Mutations and Copy Number Variations in Primary Diffuse Large B-Cell Lymphoma of the Central Nervous System. Front Genet. 2022;13:878618.
Abrey LE, Batchelor TT, Ferreri AJM, Gospodarowicz M, Pulczynski EJ, Zucca E, et al. Report of an international workshop to standardize baseline evaluation and response criteria for primary CNS lymphoma. J Clin Oncol. 2005;23:5034–43.
Hoang-Xuan K, Bessell E, Bromberg J, Hottinger AF, Preusser M, Rudà R, et al. Diagnosis and treatment of primary CNS lymphoma in immunocompetent patients: guidelines from the European Association for Neuro-Oncology. Lancet Oncol. 2015;16:e322–332.
Li Q, Ma J, Ma Y, Lin Z, Kang H, Chen B. Improvement of outcomes of an escalated high-dose methotrexate-based regimen for patients with newly diagnosed primary central nervous system lymphoma: a real-world cohort study. Cancer Manag Res. 2021;13:6115–22.
Freed D, Aldana R, Weber JA, Edwards JS The Sentieon Genomics Tools–A fast and accurate solution to variant calling from next-generation sequence data. BioRxiv 2017; 115717.
McLaren W, Gil L, Hunt SE, Riat HS, Ritchie GRS, Thormann A, et al. The Ensembl Variant Effect Predictor. Genome Biol. 2016;17:122.
Mayakonda A, Lin D-C, Assenov Y, Plass C, Koeffler HP. Maftools: efficient and comprehensive analysis of somatic variants in cancer. Genome Res. 2018;28:1747–56.
Wang Y, Bae T, Thorpe J, Sherman MA, Jones AG, Cho S, et al. Comprehensive identification of somatic nucleotide variants in human brain tissue. Genome Biol. 2021;22:92.
Sathirapongsasuti JF, Lee H, Horst BAJ, Brunner G, Cochran AJ, Binder S, et al. Exome sequencing-based copy-number variation and loss of heterozygosity detection: ExomeCNV. Bioinformatics. 2011;27:2648–54.
Favero F, Joshi T, Marquard AM, Birkbak NJ, Krzystanek M, Li Q, et al. Sequenza: allele-specific copy number and mutation profiles from tumor sequencing data. Ann Oncol. 2015;26:64–70.
Mermel CH, Schumacher SE, Hill B, Meyerson ML, Beroukhim R, Getz G. GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers. Genome Biol. 2011;12:R41.
Chakravarty D, Gao J, Phillips SM, Kundra R, Zhang H, Wang J, et al. OncoKB: A Precision Oncology Knowledge Base. JCO Precis Oncol. 2017;2017:PO.17.00011.
Vogelstein B, Papadopoulos N, Velculescu VE, Zhou S, Diaz LA, Kinzler KW. Cancer genome landscapes. Science. 2013;339:1546–58.
Griffith M, Spies NC, Krysiak K, McMichael JF, Coffman AC, Danos AM, et al. CIViC is a community knowledgebase for expert crowdsourcing the clinical interpretation of variants in cancer. Nat Genet. 2017;49:170–4.
Alexandrov LB, Kim J, Haradhvala NJ, Huang MN, Tian Ng AW, Wu Y, et al. The repertoire of mutational signatures in human cancer. Nature. 2020;578:94–101.
Chalmers ZR, Connelly CF, Fabrizio D, Gay L, Ali SM, Ennis R, et al. Analysis of 100,000 human cancer genomes reveals the landscape of tumor mutational burden. Genome Med. 2017;9:34.
Mroz EA, Rocco JW. MATH, a novel measure of intratumor genetic heterogeneity, is high in poor-outcome classes of head and neck squamous cell carcinoma. Oral Oncol. 2013;49:211–5.
Sanchez-Vega F, Mina M, Armenia J, Chatila WK, Luna A, La KC, et al. Oncogenic Signaling Pathways in The Cancer Genome Atlas. Cell. 2018;173:321–337.e10.
Garcia M, Juhos S, Larsson M, Olason PI, Martin M, Eisfeldt J, et al. Sarek: A portable workflow for whole-genome sequencing analysis of germline and somatic variants. F1000Res. 2020;9:63.
Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, et al. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020;38:276–8.
Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, et al. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018;15:475–6.
da Veiga Leprevost F, Grüning BA, Alves Aflitos S, Röst HL, Uszkoreit J, Barsnes H, et al. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017;33:2580–2.
P DT, M C, Ew F, Pp B, E P, CN Nextflow enables reproducible computational workflows. Nature biotechnology 2017;35. https://doi.org/10.1038/nbt.3820.
Gandhi MK, Hoang T, Law SC, Brosda S, O’Rourke K, Tobin JWD, et al. EBV-associated primary CNS lymphoma occurring after immunosuppression is a distinct immunobiological entity. Blood. 2021;137:1468–77.
Kaulen LD, Denisova E, Hinz F, Hai L, Friedel D, Henegariu O, et al. Integrated genetic analyses of immunodeficiency-associated Epstein-Barr virus- (EBV) positive primary CNS lymphomas. Acta Neuropathol. 2023;146:499–514.
Ren W, Ye X, Su H, Li W, Liu D, Pirmoradian M, et al. Genetic landscape of hepatitis B virus-associated diffuse large B-cell lymphoma. Blood. 2018;131:2670–81.
Berhan A, Bayleyegn B, Getaneh Z. HIV/AIDS Associated Lymphoma: Review. Blood Lymphat Cancer. 2022;12:31–45.
Pasqualucci L, Trifonov V, Fabbri G, Ma J, Rossi D, Chiarenza A, et al. Analysis of the coding genome of diffuse large B-cell lymphoma. Nat Genet. 2011;43:830–7.
Morin RD, Mendez-Lago M, Mungall AJ, Goya R, Mungall KL, Corbett RD, et al. Frequent mutation of histone-modifying genes in non-Hodgkin lymphoma. Nature. 2011;476:298–303.
Lohr JG, Stojanov P, Lawrence MS, Auclair D, Chapuy B, Sougnez C, et al. Discovery and prioritization of somatic mutations in diffuse large B-cell lymphoma (DLBCL) by whole-exome sequencing. Proc Natl Acad Sci USA. 2012;109:3879–84.
Morin RD, Mungall K, Pleasance E, Mungall AJ, Goya R, Huff RD, et al. Mutational and structural analysis of diffuse large B-cell lymphoma using whole-genome sequencing. Blood. 2013;122:1256–65.
de Miranda NFCC, Georgiou K, Chen L, Wu C, Gao Z, Zaravinos A, et al. Exome sequencing reveals novel mutation targets in diffuse large B-cell lymphomas derived from Chinese patients. Blood. 2014;124:2544–53.
Reddy A, Zhang J, Davis NS, Moffitt AB, Love CL, Waldrop A, et al. Genetic and Functional Drivers of Diffuse Large B Cell Lymphoma. Cell. 2017;171:481–494.e15.
Bruno A, Boisselier B, Labreche K, Marie Y, Polivka M, Jouvet A, et al. Mutational analysis of primary central nervous system lymphoma. Oncotarget. 2014;5:5065–75.
Chapuy B, Roemer MGM, Stewart C, Tan Y, Abo RP, Zhang L, et al. Targetable genetic features of primary testicular and primary central nervous system lymphomas. Blood. 2016;127:869–81.
Alexandrov LB, Nik-Zainal S, Wedge DC, Aparicio SAJR, Behjati S, Biankin AV, et al. Signatures of mutational processes in human cancer. Nature. 2013;500:415–21.
Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, et al. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature. 2000;403:503–11.
Hans CP, Weisenburger DD, Greiner TC, Gascoyne RD, Delabie J, Ott G, et al. Confirmation of the molecular classification of diffuse large B-cell lymphoma by immunohistochemistry using a tissue microarray. Blood. 2004;103:275–82.
Phelan JD, Young RM, Webster DE, Roulland S, Wright GW, Kasbekar M, et al. A multiprotein supercomplex controlling oncogenic signalling in lymphoma. Nature. 2018;560:387–91.
Ngo VN, Young RM, Schmitz R, Jhavar S, Xiao W, Lim K-H, et al. Oncogenically active MYD88 mutations in human lymphoma. Nature. 2011;470:115–9.
Geng H, Mo SS, Chen L, Ballapuram A, Tsang M, Lu M, et al. Identification of Genomic Biomarkers of Disease Progression and Survival in Primary CNS Lymphoma. Blood Adv 2024; bloodadvances.2024014460.
Zorofchian S, El-Achi H, Yan Y, Esquenazi Y, Ballester LY. Characterization of genomic alterations in primary central nervous system lymphomas. J Neurooncol. 2018;140:509–17.
Barakat M, Albitar M, Whitney R, Abdulhaq H. Diversity of genetic alterations of primary central nervous system lymphoma in Hispanic versus non-Hispanic patients. Cancer Treat Res Commun. 2021;27:100310.
van Imhoff GW, Boerma E-JG, van der Holt B, Schuuring E, Verdonck LF, Kluin-Nelemans HC, et al. Prognostic impact of germinal center-associated proteins and chromosomal breakpoints in poor-risk diffuse large B-cell lymphoma. J Clin Oncol. 2006;24:4135–42.
Iwasaki H, Takeuchi O, Teraguchi S, Matsushita K, Uehata T, Kuniyoshi K, et al. The IκB kinase complex regulates the stability of cytokine-encoding mRNA induced by TLR-IL-1R by controlling degradation of regnase-1. Nat Immunol. 2011;12:1167–75.
Kloo B, Nagel D, Pfeifer M, Grau M, Düwel M, Vincendeau M, et al. Critical role of PI3K signaling for NF-kappaB-dependent survival in a subset of activated B-cell-like diffuse large B-cell lymphoma cells. Proc Natl Acad Sci USA. 2011;108:272–7.
Spencer DH, Sehn JK, Abel HJ, Watson MA, Pfeifer JD, Duncavage EJ. Comparison of clinical targeted next-generation sequence data from formalin-fixed and fresh-frozen tissue specimens. J Mol Diagn. 2013;15:623–33.
Yokoyama A, Kakiuchi N, Yoshizato T, Nannya Y, Suzuki H, Takeuchi Y, et al. Age-related remodelling of oesophageal epithelia by mutated cancer drivers. Nature. 2019;565:312–7.
Huang Y-H, Cai K, Xu P-P, Wang L, Huang C-X, Fang Y, et al. CREBBP/EP300 mutations promoted tumor progression in diffuse large B-cell lymphoma through altering tumor-associated macrophage polarization via FBXW7-NOTCH-CCL2/CSF1 axis. Signal Transduct Target Ther. 2021;6:10.
Meyer SN, Scuoppo C, Vlasevska S, Bal E, Holmes AB, Holloman M, et al. Unique and Shared Epigenetic Programs of the CREBBP and EP300 Acetyltransferases in Germinal Center B Cells Reveal Targetable Dependencies in Lymphoma. Immunity. 2019;51:535–547.e9.
Wei W, Song Z, Chiba M, Wu W, Jeong S, Zhang J-P, et al. Analysis and therapeutic targeting of the EP300 and CREBBP acetyltransferases in anaplastic large cell lymphoma and Hodgkin lymphoma. Leukemia. 2023;37:396–407.
Chen T, Chen X, Zhang S, Zhu J, Tang B, Wang A, et al. The Genome Sequence Archive Family: Toward Explosive Data Growth and Diverse Data Types. Genomics Proteom Bioinforma. 2021;19:578–83.
CNCB-NGDC Members and Partners. Database Resources of the National Genomics Data Center, China National Center for Bioinformation in 2022. Nucleic Acids Res. 2022;50:D27–D38.
Acknowledgements
We are deeply grateful to Prof. Xingfu Wang (Department of Pathology, The First Affiliated Hospital of Fujian Medical University, Fuzhou, China) and Prof. Gang Chen (Department of Pathology, Fujian Cancer Hospital, Fujian Medical University Cancer Hospital, Fuzhou, China) for generously sharing clinical and sequencing data and for their invaluable support of this study.
Author information
Authors and Affiliations
Contributions
SJ L, CX L, DH L, WJ C, ZG X, and Y M conceived and designed the project. SJ L, DH L, JZ C, J R, G C, XF W, and ZG X collected the clinical samples. SJ L, JN W, JZ C, and Y S analyzed the WES data and performed bioinformatic analyses. SJ L, DH L, J R, G C, XF W, and JN W integrated the sequencing data, drew the display items. SJ L, DH L, and CX L wrote the manuscript. WJ C, CX L and Y M oversaw the ethical guidelines and data regulation. WJ C and Y M supervised the project. All of the authors contributed to the final version of the paper.
Corresponding authors
Ethics declarations
Ethics approval and consent to participate
This study was conducted with the approval of the Institutional Review Boards at Huashan Hospital of Fudan University (Approval No. 2022-529), Shanghai Cancer Center of Fudan University (Approval No. 1612167-18), and Renji Hospital of Shanghai Jiao Tong University (Approval No. LY2024-112-C). All participants provided written informed consent before participating in the clinical study and before the collection of tumor tissues. The research was carried out in strict adherence to the ethical principles of the Declaration of Helsinki and conformed to international norms of good clinical practice.
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Li, S., Li, D., Xia, Z. et al. Genomic landscape and molecular subtypes of primary central nervous system lymphoma. Blood Cancer J. 16, 8 (2026). https://doi.org/10.1038/s41408-025-01421-7
Received:
Revised:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41408-025-01421-7








