Fig. 4: Association of duplications arising in tsk-4 mutants with genomic features.

A Diagram of overlap quantification between genomic regions and the duplications. Created in BioRender. Thomson, G. (2025) https://BioRender.com/poypy51. B Overlap of 481 independent duplications on the genome (red), 10,000 random simulations plotted (gray), and their average (green). C Overlap of duplications with chromatin states45; constitutive heterochromatin (H1-6), facultative heterochromatin (F1-6), intergenic (I1-3), and euchromatin (E1-11). Blue arrows indicate observed overlap is in the bottom 2.5% of shuffled datasets. Line colors same as (B). D Diagram of the duplication breakends (+/−50 bp). Created in BioRender. Thomson, G. (2025) https://BioRender.com/poypy51. E Mean GC content of observed duplication breakends (red line) relative to the means of 10,000 random simulations (gray histogram). F Intersection of CNV breakends with relative timing of DNA replication during S phase44. E: Early S; EM: Early-Mid S; M: Mid S; ML: Mid-Late S; L: Late S. Line colors same as (B). Yellow and blue arrows indicate observed overlap is in the top or bottom 2.5% of shuffled datasets, respectively. G Intersection of duplication breakends with annotated protein coding genes97, or transposons122. Colors same as (B). H Mean number of breakend overlaps with S phase expressed genes116 in observed duplication breakends (red line) relative to the means of 10,000 random simulations (gray histogram). I Diagram of the border regions (+/−20 kb) around the duplication breakends. Created in BioRender. Thomson, G. (2025) https://BioRender.com/poypy51. J–N Mean intersection of genomic features with regions around duplication border regions. Observed levels (red lines) are plotted relative to 10,000 random simulations (gray histograms). These features are (J) tandem duplications in Arabidopsis accessions50,123, K the amount of annotated short tandem repeat sequence, L transposon sequence122, M chromatin boundaries63, and N T-DNA insertions67. In all graphs, the simulation mean histograms are scaled from zero to one. Pr is the probability (0 to 1) of obtaining a simulated set of duplications having a lower value relative to the observed. Source data are provided as a Source Data file.