Fig. 1: Identification of coding mutations and structural variants.

a, Methodology used for the discovery of candidate coding drivers. Discovery method 1A selected genes with a FDR below significance threshold for two out of four algorithms. Discovery method 1B combined the P values of four algorithms using weighted Stouffer and weighted Harmonic mean. Genes with FDR below significance threshold for at least one result were selected. With discovery method 2, CNAs were used to define minimally affected regions (by copy number loss or gain). Then, genes included in these genomic regions were selected as candidate drivers if they presented at least five SNVs/indels impacting the coding sequence focality and recurrence scores greater than threshold and mechanism of action of gene in agreement with the type with CNAs (loss for TSG and gain for oncogenes). An additional list, not considered as candidate drivers, included genes fulfilling all requirements except the SNVs/indels count threshold. (Permissive list; see Methods for more details). b, Number of SNVs/indels (left axis) and proportion in the cohort (right axis) of the 58 candidate drivers. Other CLL cohorts used as comparators are described in (Supplementary Table 5; Methods). c, The 76 regions recurrently affected by CNAs. The y axis is shown in log10 scale. Known CLL drivers are indicated in blue and putative driver genes identified as hotspots are indicated in yellow. d, Candidate drivers found by integrating both CNAs and SNVs/indels. The score represents combined focality and recurrence scores derived from MutComFocal, integrating SNVs/indels data with CNA data (Discovery method 2; Methods); NS, not significant. Known CLL drivers are indicated in blue and putative driver genes identified as hotspots are indicated in yellow. e, All translocation breakpoint pairs found in more than three samples (out of 495), including those occurring in coding and noncoding regions.