Table 1 Phased block statistics

From: Platanus-allee is a de novo haplotype assembler enabling a comprehensive access to divergent heterozygous regions

Species

Assembler

Input data

Total (Mbp)

Bubble total (Mbp)

Bubble-total / genome-size

Scaffold NG50 (kbp)

Scaffold LG50 (#)

Contig NG50 (kbp)

Contig LG50 (#)

% gaps

BUSCO duplicate complete (%)

% exact-match MP15k pairs

P. polytes

Platanus-allee

PE + 4 MP

442

393

0.818

404

344

102

1,282

2.80

79.03

35.83

PE + 4 MP + PacBio(20 × )

473

456

0.950

3,225

51

161

868

2.79

89.15

33.82

PE + 4 MP + 10X

449

407

0.848

698

207

113

1,214

2.60

83.25

35.61

PE + 4 MP + PacBio(20 × ) + 10X

476

460

0.959

2,392

65

143

987

2.71

88.94

33.57

FALCON-Unzip

PacBio(99 × )

481

404

0.843

413

352

413

352

0.00

70.68

31.10

FALCON-Unzip, Pilon, PH

PacBio(99×) + PE

471

422

0.880

421

353

421

353

0.00

77.48

34.60

Supernova

10X

313

122

0.253

79

789

31

2,362

1.92

24.32

29.98

B. japonicum

Platanus-allee

PE + 3 MP

720

686

0.880

1,090

194

47

4,406

3.79

86.30

 

PE + 3 MP + PacBio(20 × )

739

715

0.916

1,514

142

48

4,300

4.40

87.53

 

PE + 3 MP + 10X

732

694

0.889

1,155

172

33

6,271

3.94

84.15

 

PE + 3 MP + PacBio(20 × ) + 10X 

750

720

0.923

1,516

140

34

6,124

4.53

85.28

 

FALCON-Unzip

PacBio(156×)

918

378

0.484

172

1,179

172

1,179

0.00

74.34

 

FALCON-Unzip, Pilon, PH

PE + PacBio(156 × )

978

852

1.092

1,075

162

1,075

162

0.00

80.98

 

Supernova

10X

697

177

0.227

18

8,779

11

17,306

2.98

41.41

 

C. elegans

Platanus-allee

PE + 3 MP

195

179

0.897

470

112

43

1,242

4.89

77.09

 

PE + 3 MP + PacBio(20 × )

205

198

0.988

902

64

60

992

4.95

86.05

 

FALCON-Unzip

PacBio(192 × )

243

224

1.121

511

105

511

105

0.00

82.28

 

FALCON-Unzip, Pilon, PH

PacBio(192 × ) + PE

232

222

1.109

512

105

512

105

0.00

82.49

 

H. sapiens

Platanus-allee

PE + 4 MP

3,898

2,018

0.325

4

194,055

2

454,273

6.25

6.35

 

PE + 4 MP + PacBio(x20)

5,684

5,406

0.872

306

5,294

19

78,213

6.72

59.71

 

PE + 4 MP + 10X

4,918

4,025

0.649

59

23,612

13

118,218

2.64

39.15

 

PE + 4 MP + PacBio(20 × ) + 10X

5,673

5,460

0.881

658

2,584

23

71,898

3.53

68.33

 

FALCON-Unzip

PacBio(77 × )

4,721

3,691

0.595

109

13,508

109

13,508

0.00

30.36

 

FALCON-Unzip, Pilon, PH

PacBio(77×) + PE

4,851

3,783

0.610

124

11,817

124

11,817

0.00

30.98

 

Supernova

10X

5,405

5,028

0.811

2,489

675

124

13,651

1.38

72.38

 

Mostovoy et al. 2016

PE + 1 MP + 10X + Bionano

5,535

5,353

0.863

3,998

423

9

184,955

8.25

75.10

 
  1. Statistics were calculated for phased blocks whose length ≥ 500 bp. A bold value indicates the best one for each species. Bubbles are phased heterozygous regions. Genome sizes and heterozygosities were estimated based on the k-mer frequency information of PEs and GenomeScope26. Bubble-total/genome-size, NG50s and LG50s were calculated based on the estimated diploid genome sizes (P. polytes, 480 Mbp; B. japonicum, 780 Mbp; C. elegans, 200 Mbp; H. sapiens 6.2 Gbp). Estimated heterozygosities are shown in Supplementary Table 5. BUSCO27 (version 3.0.2) was used to estimate the rate of the phased single-copy genes for P. polytes, B. japonicum, C. elegans and H. sapiens with the endopterygota set (2442 orthologs), the metazoa set (978 orthologs), the nematoda set (982 orthologs) and the euarchotoglires set (6192 orthologs), respectively