Fig. 4: Significantly mutated noncoding REs.

a, Methodology to localize noncoding REs in CLL primary cells. These REs were defined across the whole genome based on chromatin state data from CLL primary cells. We intersected H3K27ac peaks and open chromatin regions defined by ATAC-seq (derived from 104 and 106 primary CLL, respectively)27. Next, these regions were annotated using genome-wide segmentations of seven CLL samples (five mutated and two unmutated IGHV cases) with available chromatin immunoprecipitation followed by sequencing (ChIP–seq) data of six histone marks including H3K4me3, H3K4me1, H3K27ac, H3K36me3, H3K27me3 and H3K9me3. As our annotations of noncoding variants were based on CLL samples from different cohorts, chromatin states defined by ChIP–seq were considered only for regions that were seen in at least two samples. Common regions based on shared overlaps were used to define these REs. REs active exclusively in samples with m-IGHV and u-IGHV mutational status were also defined. REs were linked to target genes by correlating RNA expression (gene) and H3k27ac (REs) (Pearson correlation 0.3, FDR ≤ 0.05), within topologically associated domains of GM12878 defined by Hi-C30. For additional annotations and more details, see Methods. b, Candidate noncoding drivers including UTRs, promoters and enhancers affected by SNVs/indels, were revealed using several discovery algorithms and regions with FDR below the significance threshold were selected. The presence of single-site hotspots, and regions with high mutational density/kataegis were reported and regions with FDR below the significance threshold were selected. Annotations and postfiltering of somatic noncoding hits were including immunoglobulin loci and known false positive exclusion, AID and APOBEC signature annotations, and additional genomic and functional annotations from the literature. c,d, Significantly mutated REs for which target genes are CLL drivers or in the COSMIC database (c) or other genes (d). Upper panel, number of samples mutated; middle panel, proportion of variants with signature attributed to AID, APOBEC or other processes; lower panel, FDR of the likelihood these regions as mutated more frequently than expected. e, Gene set enrichment analysis based on the target genes of all noncoding candidate drivers for gene ontology terms biological process (GO:BP) and human phenotype ontology (HP). We applied a hypergeometric test and multiple testing correction of P value using the g:SCS algorithm41.