Extended Data Fig. 9: Human germline de novo indels are enriched for ID-TOP1 deletions. | Nature

Extended Data Fig. 9: Human germline de novo indels are enriched for ID-TOP1 deletions.

From: Signatures of TOP1 transcription-associated mutagenesis in cancer and germline

Extended Data Fig. 9

a, Most de novo 2 bp deletions occur at SSTR, STR and SNMH sequences. b, c, A TNT sequence motif is present at the majority of 2 bp STR and SNMH deletions (b). Sequence logos: 2-bit representation of the sequence context of 2 bp deletions. Top, all deletions, with those containing A (except AT/TA) reverse complemented, and deletions right-aligned on T (where present). Bottom, STR/SNMH deletions only (c). d, TN*T motifs extend beyond 2 bp deletions, with enrichment above expectation for 2 bp deletions at TNT, 3 bp deletions at TNNT and 4 bp deletions at TNNNT motifs (P < 0.001; two-tailed empirical P-value determined for each category). Bootstrap sampling (n = 1,000) of 2, 3 and 4 bp STR/MH sequences genome-wide to derive expected frequencies of those matching TN*T motifs. Sampling was performed to match the numbers of deletions at repeats observed in the Gene4Denovo database for each category defined by repeat type, repeat unit length and total repeat length. Histograms, distribution of the number of repeats matching TN*T motifs over these samplings. Solid blue lines, kernel density estimates for these distributions. Dotted red lines, number of deletions observed in Gene4Denovo matching TN*T motifs for each category. e, ID-TOP1 correlates with germline expression level. ID-TOP1, defined as 2–5 bp MH and SSTR deletions containing the TN*T sequence motif. Shading, 95% confidence intervals from 100 bootstrap replicates.

Back to article page