Table 1 Comparison of human de novo assembly and haplotype phasing summary statistics

From: De novo assembly and phasing of a Korean human genome

 

AK1

HuRef

YH_2.0

NA12878

GRCh38

Assembly approach

WGS and BAC

WGS

WGS and fosmid

WGS

BAC and fosmid

Sequencing and physical mapping

PacBio and BioNano

Sanger

Illumina and CG

PacBio and BioNano

Sanger, FISH, OM and fingerprint contigs

De novo assembly algorithm

FALCON

Celera

SOAPdenovo2

Celera and FALCON

Multiple methods

Phasing approach

De novo

Reference-guided

De novo

Reference-guided

NA

Scaffold/contig N50 (Mb)

44.85/17.92

17.66/0.11

20.52/0.02

26.83/1.56

67.79/56.41

Scaffold/contig L50

21/50

48/7,164

39/40,005

37/532

16/19

No. of scaffolds/contigs

2,832/4,206

4,530/71,333

125,643/361,157

18,903/21,235

735/1,385

No. of gaps

264*

68,109

235,514

2,332*

999

Total gap length (Mb)

37.34

34.43

105.20

146.35

159.97

Total bases/non-N bases in assembly (bp)

2,904,207,288 /2,866,687,809

2,844,000,504 /2,809,571,127

2,911,235,363 /2,806,031,133

3,176,574,379 /3,030,222,093

3,209,286,105 /3,049,316,098

Phased block N50 (Mb)

11.55

0.35

NA

0.15

NA

No. of haplotigs

18,964

NA

24,597

NA

NA

Haplotig N50 (kb)

875

NA

484

NA

NA

Haplotig sum (bp)

4,804,460,182

NA

5,152,727,603

NA

NA

  1. We compared the sequencing platform, algorithms, assembly and phasing statistics of human assemblies so far. The comparison demonstrates the power of single-molecule technologies to generate assemblies with superior assembly statistics than that achieved by short-read sequencing. The assembly statistics were obtained from the NCBI and if the summary statistics were not available from NCBI, the numbers were directly acquired from relevant papers. The accession numbers for HuRef7, YH_2.0 (ref. 8), NA12878 (ref. 6) and GRCh38 assemblies are GCA_000002125.2, GCA_000004845.2, GCA_001013985.1 and GCA_000001405.15, respectively. CG, complete genomics; FISH, fluorescent in-situ hybridization; NA, not applicable; OM, optical mapping; WGS, whole-genome shotgun.
  2. *Number of spanned gaps.
  3. Number of spanned and unspanned gaps.