Fig. 2: Biological features of coding mutations and CNAs.

a, Annotations of genes. CNAs, presence/absence of CNAs affecting the gene; COSMIC, proportion of variants reported in the COSMIC database; High impact, proportion of nonsense variants, Median CCF, median cancer cell fraction of variants; Prot domain: proportion of variants occurring in a protein domain from the Prot2HG database40. b, Distribution of cancer cell fractions in selected recurrent regions of CNAs (all regions in Extended Data Fig. 3e). The boxplot shows the minimum and maximum values and the interquartile range. c, Candidate drivers classified in ten main pathways described in CLL3,7. Genes in bold are present in more than 3% and genes in red font are candidate drivers. Other drivers are absent because not involved in these ten main pathways. d, Detection of variants of interest (n = 118) by RNA-seq (with minimum depth of five) in selected 73 samples. Difference of variant allele frequency (VAF) between RNA-seq methods and WGS methods shows allelic skew of variants. Ratio of expression in transcript per million (TPM) in sample with variant against all other samples reflects change in gene expression. Selected variants annotated with gene names, all data in Supplementary Table 13. DP, depth. e, Enrichment of genomic features in different CLL subgroups using two-sided Fisher’s exact test (plot showing the median, minimum and maximum values). The groups were (1) stage: relapsed/refractory (R/R), versus frontline (N = 443 frontline versus 30 R/R), (2) TP53: altered versus WT (N = 420 WT versus 65 disrupted), (3) IGHV mutational status: unmutated versus hypermutated (N = 197 hypermutated versus 288 unmutated), where an enrichment for the former is indicated by an odds ratio greater than one. Adjusted P values (FDR) are shown.