Fig. 2

K-mer frequency distribution and estimated genome size of Cafeteria roenbergensis strain E4-10P. Frequency distribution of 19-mers in the quality-trimmed MiSeq read set of CrE4-10P. The major peak at ~120 × coverage corresponds to the majority of homozygous k-mers of the diploid (2n) genome, the smaller peak at half the coverage comprises haplotype-specific (1n) k-mers. Small peaks at 3n and 4n represent regions of higher copy numbers. Low-coverage k-mers derive from sequencing errors and bacterial contamination. Cumulatively, the k-mer distribution suggests an approximate haploid genome size of 40 Mbp.