Fig. 2: Candidate IGC events. | Nature

Fig. 2: Candidate IGC events.

From: Increased mutation and gene conversion within human segmental duplications

Fig. 2

a, Method to detect IGC. The assembled human haplotype query sequence from 1:1 syntenic alignments was fragmented into 1-kbp windows in 100-bp increments and realigned back to T2T-CHM13 v1.1 independent of the flanking sequence information using minimap2 v2.24 to identify each window’s single best alignment position. These alignments were compared to their original syntenic alignment positions, and if they were not overlapping, we considered them to be candidate IGC windows. Candidate IGC windows were then merged into larger intervals and realigned when windows were overlapping in both the donor and the acceptor sequence. We then used the CIGAR string to identify the number of matching and mismatching bases at the ‘donor’ site and compared that to the number of matching and mismatching bases at the acceptor site determined by the syntenic alignment to calculate the number of supporting SNVs. b, The amount of SDs (in megabase pairs) predicted to be affected by IGC per haplotype, as a function of the minimum number of SNVs that support the IGC call. Dashed lines represent individual haplotypes and the solid line represents the average. c, Empirical cumulative distribution of the megabase pairs of candidate IGC observed in HPRC haplotypes, as a function of the minimum underlying P-value threshold used to define the IGC callset (see Methods for IGC P-value calculation). Dashed lines represent individual haplotypes and the solid line represents the average. d, Correlation between IGC length and the number of supporting SNVs. e, Distribution of the distance between predicted IGC acceptor and donor sites for intrachromosomal events by chromosome.

Back to article page