Table 1 Assembly statistics

From: Chromosome-scale, haplotype-resolved assembly of human genomes

Sample

HG002 (NA24385)

NA12878

PGP1

HG00733

Assembly algorithm

Trio Canu

Trio Peregrine

DipAsm

DipAsm

DipAsm

DipAsm

Strand-seq

Falcon-Phase

Long-read coverage

29.7 (HiFi)

30.1 (HiFi)

23.9 (HiFi)

33.4 (HiFi)

93.0 (CLR)

Long-read N50 (bp)

13,480

10,004

12,974

11,769

33,090

Hi-C read coverage

 

38.5

44.8

261.7

35.5

 

67.1

Scaffolding

 

3D-DNA

HiRise

HiRise

3D-DNA

  

Paternal/maternal contig size (Gbp)

2.96/3.04

2.81/2.88

2.98/2.97

2.97/2.97

2.98/2.98

2.93/2.93

2.90/2.90

2.89/2.89

Paternal/maternal contig NG50 (Mbp)

15.5/18.3

16.6/15.2

25.2/24.3

19.6/18.7

15.1/18.4

25.2/26.2

28.5/23.6

22.3/22.3

Paternal/maternal contig NGA50 (Mbp)

10.2/12.8

11.0/10.6

14.3/13.5

12.7/12.1

10.3/11.0

16.0/16.6

15.8/15.8

14.3/13.7

Phasing switch/Hamming error rate (%)

0.38/0.23

0.38/0.31

0.50/0.49

0.15/2.13

0.21/1.63

0.16/0.60

0.30/0.70

0.43/35.8

SNP/INDEL false-positive rate (×10−6)

1.9/31.6

2.6/32.0

2.4/27.7

2.0/4.2

SNP/INDEL false-negative rate (%)

4.31/5.85

3.28/5.00

0.36/2.09

0.56/1.22

3.32/–

4.00/–

7.89/–

SV sensitivity/precision (%)

90.7/92.8

90.6/92.6

93.4/92.6

  1. HiFi read N50: 50% of HiFi reads are longer than this number. Contig NG50: minimum contig length needed to cover 50% of the known genome (GRCh38). Contig NGA50: similar to NG50 but based on contig alignment lengths to GRCh38 rather than contig sizes. Phasing switch error rate: percentage of adjacent SNP pairs wrongly phased. Phasing Hamming error rate: percentage of SNPs wrongly phased in comparison to true phases. Gbp, giga base pairs. Mbp, mega base pairs.