Fig. 3: In vivo saturation mutagenesis in oral epithelium.

a, Mutation bar plot for TP53. The x axis represents coordinates along the coding sequence. Exons and protein domains are indicated along the x axis. The y axis represents number of mutations, either in the 1,042 TwinsUK oral epithelium samples used in this study (top) or in squamous cell carcinoma from the COSMIC database (bottom). Mutations are coloured according to mutation consequence category. Grey shading indicates cumulative duplex coverage across TwinsUK buccal swab samples. b, Numbers of mutations per gene found in this study and in the COSMIC catalogue (obtained from across all whole-genome sequencing (WGS) and whole-exome sequencing (WES) studies or only squamous cell carcinoma (SCC) WGS and WES studies), for a selection of driver genes. c, Mutation bar plots for NOTCH1, PPM1D, TP63 and RAC1. Elements are as indicated in a; COSMIC mutations not shown. d, Diagrams of the three-dimensional structure of RAC1, showing the clustering of sites under significant positive selection around the GDP/GTP binding pocket. Residues with site-level dN/dS q < 0.01 are coloured. Shading intensity denotes degree of significance. e, dN/dS ratios for driver sites under significant positive selection based on the withingenednds method. Driver sites are classified into six groups according to mutation consequence. Labels in grey indicate genes not identified as significant by gene-level dN/dS analyses. f, Mutation bar plot for TP53, including all mutations (top) and synonymous or non-coding mutations only (bottom). The x axis represents genomic coordinates along the gene body, with coding exons (red) and untranslated regions (UTRs) (blue) indicated by the gene diagram on top and the shading within each histogram. The grey line denotes cumulative duplex coverage across TwinsUK buccal swab samples. Coding mutation counts are coloured according to mutation consequence as indicated in a. TSS, transcription start site.