Table 5 Assembltrie’s compression performance is comparable to our entropy approximation for E. coli read collection

From: Optimal compressed representation of high throughput sequence data via light assembly

Sample

L / cov

LZ(G)

\({H}\left( {{R}^ \star |G} \right)\)

\(H\left( {{\cal R}|{\cal R}^ \star } \right)\)

\({H}({\cal R})\)

Assembltrie

Orcom

DH10B 1

120/40

0.048

0.020

0.047

0.115

0.146

0.372

DH10B 2

100/40

0.048

0.025

0.053

0.126

0.163

0.399

DH10B 3

80/40

0.048

0.028

0.042

0.119

0.164

0.379

DH10B 4

100/25

0.077

0.031

0.053

0.162

0.194

0.507

DH10B 5

100/80

0.024

0.017

0.053

0.094

0.141

0.292

DH10B 6

100/40

0.048

0.025

0.018

0.092

0.123

0.290

  1. The entropy approximation of the reads from the above E. coli read collections. LZ(G) denotes the bits/bits per base after compressing each genome with gzip; \({H}\left( {{\cal R}^ \star |G} \right)\) is the entropy approximation based on a multinomial sampling of each reference genome; \(H\left( {{\cal R}|{\cal R}^ \star } \right)\) is the binary entropy of the error process; and \({H}({\cal R})\) is the overall entropy approximation for each read collection. Finally, we compare the compression results given by Assembltrie (run with a single thread) and Orcom