Extended Data Fig. 9: Calculations of natural selection mutagenesis. | Nature

Extended Data Fig. 9: Calculations of natural selection mutagenesis.

From: Sex and smoking bias in the selection of somatic mutations in human bladder

Extended Data Fig. 9

a) Comparison of the density of protein affecting mutations in 14 genes across two cohorts of bladder tumors (muscle invasive and non-muscle invasive) and in the normal urothelium of the 45 donors. Mutation density in tumors is calculated by dividing the number of observed mutations (normally 1) by the gene length in megabases (Mb). b) Percentage of amino acid residues in each gene with zero, one, two, or three or more mutations observed across 892 bladder tumor samples from the intOGen cohort. The order of the genes is as in Fig. 5a to facilitate visual comparison. c) Theoretical and observed curves of saturation mutagenesis for genes not shown in Fig. 5b. The grey dashed line represents the kinetic of saturation mutagenesis under the theoretical assumption of no selection, in which mutations are observed based only on their neutral probability of occurrence. The red circle denotes the degree of saturation achieved by probing the 79 samples in the cohort. The red dashed line is constructed through successive depth down-samples of the current observation and represents the observed kinetic of natural saturation mutagenesis (see details in Supplementary Note 12). d) Natural saturation mutagenesis of EP300 in normal bladder urothelium. Besides the tracks described in Fig. 5c for TP53, 3D clusters obtained via Oncodrive3D (second), dN/dS truncating and dN/dS missense values for each exon (fourth), and the distribution of tumor mutations (from intOGen; see Methods) along the sequence of the gene (last) have been added. These same types of plots are presented for the rest of genes in the study in Supplementary Figs. 2 and 3. Right plot, EP300 3D structure with residues with significant site selection highlighted in blue. e) dN/dS truncating and dN/dS missense values for each domain of EP300. The vertical lines represent the 95% confidence intervals of the dN/dS estimate. Solid border represents significant dN/dS values (p-value < 0.05) according to Omega (Supplementary Note 6). N = 79 samples.

Back to article page