Extended Data Fig. 1: Variant calling and graph construction. | Nature Genetics

Extended Data Fig. 1: Variant calling and graph construction.

From: Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes

Extended Data Fig. 1

a) Shown are haplotype-resolved assemblies for three samples and corresponding variant calls made relative to a reference genome. On the right, we show how these variants are represented in a VCF file (simplified). The VCF file is biallelic and contains one record per (distinct) variant allele detected across the assemblies. b) Shown is the pangenome representation of the variants detected in panel a). Variants are represented as bubble structures. Sets of overlapping variants are merged into a single multi-allelic bubble (see first and last bubble for examples). Each haplotype can be represented as a path through the graph. We represent the pangenome in terms of a VCF file containing a record for each bubble and alleles corresponding to the branches of the bubble (right). We keep track of which callset variants each branch of the bubble was constructed from as illustrated in the VCF representation. In this way, we can later convert genotypes derived for a bubble back to genotypes for each individual variant inside of a bubble. Note that our VCFs contain the actual allele sequences in their ‘ALT’ column, we replaced them by their IDs in this figure for simplicity.

Back to article page