Table 1 Comparison of main assembly characteristics and quality metrics.

From: An integrated personal and population-based Egyptian genome reference

 

EGYPT

EGYPT_wtdbg2

EGYPT_falcon

AK1

YORUBA

Assembly level

meta

contig

scaffold

scaffold

chromosome

Effective genome size

2,820,489,739

2,733,934,177

2,897,551,797

NA

NA

# Sequences

3235

3106

1615

2832

1647

Longest sequence

88,566,048

88,566,048

84,324,762

113,921,103

248,986,603

# N’s per 100 kbp

0

0

209.01

1285.7

7180.2

Base level QV

42.4

42.9

43.0

50.4a

NA

# Genes (thereof # partial)

20,908 (3226)

20,613 (3229)

21,176 (1578)

21,047 (1396)

21,077 (1721)

Genome fraction w.r.t. GRCh38 (%)

94.174

92.247

95.924

95.177

95.391

Duplication ratio w.r.t. GRCh38

1.01

0.999

1.018

1.023

1.088

Largest GRCh38 alignment

75,492,126

75,492,126

56,458,009

58,219,133

65,512,502

Total GRCh38 aligned length

2,800,100,449

2,713,712,375

2,865,356,241

2,829,006,639

2,832,740,986

NG50 w.r.t. GRCh38

20,857,787

20,857,787

28,071,354

39,609,866

145,208,384

LG50 w.r.t. GRCh38

35

35

33

24

9

NGA50 w.r.t. GRCh38

11,187,777

11,187,777

8,226,500

13,028,687

19,529,238

LGA50 w.r.t. GRCh38

71

71

95

66

43

# GRCh38 differences >1 kb (thereof # outside centromeres)

1276 (1103)

1276 (1103)

3499 (2832)

1952 (1685)

1756 (1472)

# GRCh38 mismatches per 100 kbp

139

138.72

143.64

126.92

141.56

# GRCh38 indels per 100 kbp

32.09

31.74

40.06

32.77

46.95

K-mer-based compl. w.r.t. GRCh38 (%)

86.01

85.15

87.75

87.68

85.82

  1. The table lists the final EGYPT meta assembly, the two alternative base assemblies EGYPT_wtdbg2 and EGYPT_falcon and two publicly available assemblies of the genomes of a Korean (AK1) and Yoruba individual (YORUBA). Metrics were largely obtained with QUAST-LG. The complete QUAST-LG report and additional assembly metrics are provided in Supplementary Data 2.
  2. aBased on error rate estimated by AK1 authors Seo et al.16.