Fig. 3: Paralog groups with low within-group diversity.
From: Genome-wide profiling of highly similar paralogous genes using HiFi sequencing

a Haplotypes of the AMY1 paralog group in a sample, realigned to AMY1A, showing two copies each of AMY1A, AMY1B and AMY1C. Reads in blue are consistent with a single haplotype. Reads in gray are consistent with more than one possible haplotype, i.e. when two or more haplotypes are identical over a region. The ends of the haplotypes extend into downstream non-homologous regions so we can assign the haplotypes into the three genes. b–d PCA of haplotype sequences of the AMY1A/AMY1B/AMY1C (b), BOLA2-SLX1B-SULT1A4/BOLA2B-SLX1A-SULT1A3 (three paralog groups in tandem and genotyped as one region by Paraphase) (c) and CTAG1A/CTAG1B (d). Each dot represents a haplotype in the population. Colors represent different genes in a paralog group as assigned according to the ending sequences of each haplotype (which extends into non-homologous regions). e Sequence divergence between haplotypes in cis vs. trans in three palindromic paralog groups. Within each boxplot, the center lines denote median values; boxes extend from the 25th to the 75th percentile of each group’s values; the whiskers extend from the box to the minimum (maximum) value that falls within 1.5 times the interquartile range below (above) the 25th (75th) percentile of each group; dots denote outlier values. One gene is selected to represent the name of each paralog group: CENPVL1 for CENPVL1/CENPVL2 (cis n = 93, trans n = 80), SSX2 for SSX2/SSX2B (cis n = 117, trans n = 163), SSX4 for SSX4/SSX4B (cis n = 275, trans n = 308).