Fig. 1: Overview of methods. | Nature Genetics

Fig. 1: Overview of methods.

From: Assembly of a pan-genome from deep sequencing of 910 humans of African descent

Fig. 1: Overview of methods.

Raw reads are aligned to GRCh38, and unaligned reads are assembled with MaSuRCA. Assembled contigs are then filtered for contaminants with Centrifuge, and contigs shorter than 1 kb are removed (blue box). Assembled contigs are placed based on their mate’s alignment locations when possible by checking whether >95% of mates align to the same location. If such a placement is found, the exact breakpoint is determined via a nucmer alignment to the region for each end of the contig (yellow box). Contig placement locations are then compared between all individuals, nearby placements are clustered, and a representative is chosen. All contigs are then aligned to the representatives to determine which samples contain a given placed insertion. Contigs in or aligning to placed clusters are removed from the unplaced set, and the remaining unplaced contigs are aligned to one another with nucmer to remove redundancy and result in a final nonredundant unplaced set of contigs (purple box). EP, end placed; 1EP, one end placed; 2EP, two end placed; L, left; R, right.

Back to article page