Fig. 3: MAF and LD distributions of benchmark datasets from the 1000 Genomes Project. | Nature Communications

Fig. 3: MAF and LD distributions of benchmark datasets from the 1000 Genomes Project.

From: STICI: Split-Transformer with integrated convolutions for genotype imputation

Fig. 3: MAF and LD distributions of benchmark datasets from the 1000 Genomes Project.

MAF and maximum LD distributions are presented using kernel density estimation plots for SNVs and SVs in (a). HLA region on chromosome 6, (b) deletions in chromosome 22, (c) SVs in chromosome 22, (e) SVs in chromosome 6, (f) SVs in chromosome 10, (g) SVs in chromosome 16, and (h) SVs in chromosome 20. Overall, SVs exhibit a low LD value, posing a significant challenge to imputation methods. Plot (d) LD among different SV types in chromosome 22 shows that structural events are commonly correlated with deletions. Furthermore, deletion, copy number variation, and duplication events appear in different ranges of LD, while the rest of the events are limited to LD ≤ 0.1. Lastly, the majority of correlated SVs to deletions are of the same event, making deletions a good separate dataset for our experiment. Source data are provided as a Source Data file.

Back to article page