Extended Data Fig. 3: Identification of MM+ oncogenes by a permutation test across cancers. | Nature

Extended Data Fig. 3: Identification of MM+ oncogenes by a permutation test across cancers.

From: Landscape and function of multiple mutations within individual oncogenes

Extended Data Fig. 3

a, Number of mutated samples and frequency of MM+ samples for 30 oncogenes (as in Fig. 1a) after excluding microsatellite instability (MSI)-high or POLE/POLD1-mutated samples in the discovery cohort. b, Frequency of MM+ samples in PIK3CA, EGFR, CTNNB1, ERBB2 and ERBB3 according to tumour purity in primary samples from the TCGA cohort (n = 8,699). c, Number of mutated samples and frequency of MM+ samples for 30 oncogenes (as in Fig. 1a) in the TCGA, TARGET, HM, FM and GENIE cohorts. d, Representation of the permutation-based framework. In the standard approach, to identify genes significantly affected by driver mutations, the expected number of samples with mutations in gene X (the gene of interest; green) is estimated by permuting all coding mutations randomly across the coding region in all samples (for example, samples A–F). Statistical significance is determined by comparing the observed number of samples with nonsynonymous mutations and the expected distribution in gene X. In our approach, to identify genes significantly affected by putative driver–driver MMs, the expected number of samples with MMs in gene X (red) is estimated by permuting all coding mutations other than gene X mutations in samples harbouring gene X mutations (samples C–F). Statistical significance is determined by comparing the observed number of samples with MMs and the expected distribution in gene X. e, In the random-choice model, mutations are moved to another position with equal probability (P), whereas in the weighted-choice model, mutations are moved with unequal probability (P´), reflecting expression and DNA replication time. f, Synonymous mutation frequency per megabase according to expression and DNA replication time. g, Pathways related to 60 oncogenes analysed here. MM+ oncogenes identified in pan-cancer and cancer type-specific analyses are indicated in red (q < 0.001)/pink (q < 0.01) and green, respectively. h, Proportion of synonymous to total mutations according to MM status in TSGs and NFGs. i, Proportion of synonymous to total mutations according to MM status in MM oncogenes under high and low selective pressure (that is, oncogenes in which the proportion of synonymous to total mutations is less than and more than 15% in samples with single mutations, respectively). The proportion of synonymous mutations was substantially increased in MM+ samples, even in MM oncogenes under high selective pressure. b, h, i, Two-sided Fisher’s exact test. The numbers examined are shown in parentheses.

Back to article page