Extended Data Fig. 3: Flow charts of assembly pipelines used to generate high-quality assemblies in this study. | Nature

Extended Data Fig. 3: Flow charts of assembly pipelines used to generate high-quality assemblies in this study.

From: Towards complete and error-free genome assemblies of all vertebrate species

Extended Data Fig. 3

a, Standard VGP assembly pipeline when sequencing data of one individual, that generated the highest quality assemblies: generate primary pseudo-haplotype and alternate haplotype contigs with CLR using FALCON-Unzip17; generate scaffolds with linked reads using Scaff10x74; break mis-joins and further scaffold with optical maps using Solve87; generate chromosome-scale scaffolds with Hi-C reads using Salsa279; fill in gaps and polish base-errors with CLR using Arrow (Pacific BioSciences); perform two or more rounds of short-read polishing with linked reads using FreeBayes85; and perform expert manual curation to correct potential assembly errors using gEVAL25,95 b, Standard VGP trio assembly pipeline when DNA is available for a child and parents20. Dashed line indicates that the other haplotype went through the same steps before curation. In addition to the curated assemblies of both haplotypes, a representative haplotype with both sex chromosomes is submitted. c, Mitochondrial assembly pipeline. Figure key applies to ac. Steps newly introduced in v1.5–v1.6 are highlighted in light blue. c, contigs; p, purged false duplications from primary contigs; q, purged alternate contigs; s, scaffolds; t, polished scaffolds. Further details and instructions are available elsewhere33 and at https://github.com/VGP/vgp-assembly.

Back to article page