Table 2 Systematic comparison of assembly quality.

From: An ethnically relevant consensus Korean reference genome is a step towards personal reference genomes

Assembly

Total sequence length (bp)

Scaffold or contig N50 (Mb)/L50

GRCh38 recovery rate (%)

Segmental duplication length (bp)

Repeat length (bp)

Detected RefSeq genes (intact only)

GRCh38C

3,209,286,105

67.79/16

—

212,777,868 (6.63%)

1,564,209,365 (48.74%)

20,135

KOREF_CS,L,M

3,211,075,818

26.46/35

88.47 (scaffolds)

149,353,191 (4.65%)

1,452,404,484 (45.23%)

17,758

CHM1_PacBio_r2L

2,996,426,293

26.90/30

88.02

205,559,250 (6.86%)

1,541,211,387 (51.43%)

17,657

CHM1_1.1S,B

3,037,866,619

50.36/20

—

157,426,845 (5.18%)

1,417,977,130 (46.68%)

18,040

NA12878_singleL,M

3,176,574,379

26.83/37

88.26

168,652,649 (5.31%)

1,545,168,387 (48.64%)

6,610

NA12878_AllpathsS

2,786,258,565

12.08/67

82.89

90,343,965 (3.24%)

1,250,655,296 (44.89%)

16,995

HuRefC

2,844,000,504

17.66/48

85.85

134,317,812 (4.72%)

1,411,487,301 (49.63%)

16,968

MongolianS

2,881,945,563

7.63/111

86.54

121,384,034 (4.21%)

1,399,420,366 (48.56%)

17,189

YH_2.0S

2,911,235,363

20.52/39

86.31

127,254,909 (4.37%)

1,397,013,571 (47.99%)

17,125

AfricanS

2,676,008,911

0.062/11,689

69.47

55,830,170 (2.09%)

968,988,149 (36.21%)

9,167

  1. NGS, next-generation sequencing. Major sequencing and mapping data used in the assembly are marked by superscript letters: C, chain-terminating Sanger sequences; B, indexed BAC end sequences; L, long reads; M, genome maps; S, NGS short reads.