Fig. 5: Frequency of rare putative loss-of-function (pLoF) variants across cohorts in comparison to individuals from gnomAD.
From: The impact of rare germline variants on human somatic mutation processes

The frequencies of rare pLoF variants within the individuals (y-axis) of the discovery cohort (TCGA-WES; n = 6799 individuals) and the validation cohort (PCAWG + Hartwig-WGS; n = 4683 individuals) (rows) across different variant sets (columns) for different gene sets (x-axis). Known deficient homologous recombination (dHR) gene set includes BRCA1, BRCA2, PALB2, and RAD51C, known deficient DNA mismatch repair (dMMR) gene set includes MSH2, MSH6, MLH1, and PMS2, the replicated 1% false discovery rate (FDR) set includes genes replicating at a FDR of 1% after excluding known dMMR and dHR genes, and the replicated 2% FDR only set includes all remaining genes that replicated at a FDR of 2%. Different pLoF variant sets include protein-truncating variants (PTVs) only, and PTVs plus missense variants defined as damaging based on the in silico prediction tool CADD (thresholds at ≥ 25 or ≥ 15). Color code shows frequency of individuals carrying rare pLoF variants for the gene sets in the utilized cancer genomic datasets (red), for matching variants in control samples from gnomAD dataset with non-Finnish European ancestry (blue), and for length-matched randomly selected protein-coding gene sets in cancer datasets (yellow). Random selection for length-matched genes was performed 10 times, and distribution shown in boxplot. Center of each boxplot shows median, bounds of box are at 25th and 75th percentiles and minimum and maximum extend to the smallest and largest value, excluding values more than 1.5 times the interquartile range from the hinges. Only rare pLoF variants were considered that were found in gnomAD. Data are provided as a Source Data file.