Extended Data Fig. 8: TOP1-mediated mutagenesis causes increased 2–5 bp deletions in cancer.
From: Signatures of TOP1 transcription-associated mutagenesis in cancer and germline

a, Of all indels, only 2–5 bp deletions are significantly increased in CLL with biallelic RNASEH2B loss. Box, 25–75%; line, median; whiskers 5–95% with data points for values outside this range. WT (2 copies), n = 201; monallelic loss (1 copy), n = 131; biallelic loss (0 copies), n = 16 independent tumours. Indels as percentage of all variants per sample (GEL and ICGC data combined). q-values, 2-sided Mann-Whitney test with 5% FDR. b, c, In RNase H2 null CLL, 2 bp deletions predominantly occur at STR and SNMH sequences (b), and at the TNT sequence motif (c), consistent with TOP-mediated mutagenesis. Mean ± s.e.m., percentage of all variants per sample. GEL and ICGC data combined. n = 1,711; 1,244; 443 2-bp indels identified in 201, 131, 16 biologically independent tumours, respectively. d, ID4 contribution in RNase H2 null CLL is greater in transcribed regions. Two-sided Fisher’s exact test, ID4 indels vs other indels (P = 9.2 x 10−16). e, Pan-cancer transcript expression data divided into ten expression strata for ubiquitously expressed genes (used in panel h and Fig. 5b analysis). Data points, median/maximum expression across cancer types for individual genes. Genes with similar median and maximum TPMs were considered to be ubiquitously expressed and divided into expression groups from low (1) to high (10) expression. f, Two bp deletions in cancer preferentially occur at STRs. g, ID-TOP1 deletions increase in frequency with TOP1 cleavage activity (measured by TOP1-Seq;38). Dotted line, relative rate in lowest TOP1-seq category set to 1. Solid lines, relative deletion rate. ID-TOP1, 2–5 bp MH and SSTR deletions containing the TN*T sequence motif. h, ID-TOP1, but not deletions in other sequence contexts, correlate with transcription. i, 2–5 bp deletions from prostate adenocarcinoma are most enriched amongst the top 10% of highly expressed prostate ‘tissue-restricted’ genes. Odds ratio (OR): number of 2–5 bp deletions in top 10% tissue restricted genes vs 2–5 bp deletions in other genes, relative to expected frequency from all other tissues. j, ID4 is not detected in the indel signature of irinotecan-treated colorectal cancers. Untreated (n = 78), treated (n = 39). k, 2–5 bp deletion frequency in cancer corresponds to TOP1 cleavage activity, in both genic and non-genic regions. Data analysed from PCAWG50, all tumours in e, h; ID4 positive tumours in g, k; Genomics England in j. In g, h and k, solid line, relative deletion rate; shading indicates 95% confidence intervals from 1,000 (g, k) or 100 (h) bootstrap replicates.