Fig. 4: Global population diversity in allele-specific copy number variation.

a, PCAs of allele-specific copy numbers on the union of PA genotyping results and assembly annotations. b, Distribution of total autosomal gene copy numbers among unrelated 1kGP samples, including pseudogenes (AFR, n = 685; admixed AMR, n = 352; EAS, n = 511; EUR, n = 522; SAS, n = 516; box shows median and interquartile range; whiskers extend to 1.5× interquartile range, with outliers beyond). c, Population differentiation measured by F statistics of duplications among different continental populations. Genes with a paralogous subgroup with an F statistic of more than 0.35 are labeled. d, Mean absolute variation in copy numbers and RPD in sequences. Based on our genotyping results from unrelated 1kGP genomes, for genes found to be CNV to the population median in more than 20 samples, we determined the average aggreCN difference (MAE) between individuals and estimated the average paralog difference in sequences relative to the ortholog difference. e, mLD between pairs of CNV genes less than 100 kb apart. The larger MAE value of each pair is used for the x-axis values. The total locus length denotes the length from the beginning of the first gene to the end of the last gene.