Extended Data Fig. 8: The workflow of threshold selection and the correlations of the sequencing accuracies for large-scale epi-bit storage.
From: Parallel molecular data storage by printing epigenetic bits on DNA

a, The workflow of threshold selection. The thresholds of methylation calling on each epi-bit site were determined independently to avoid the DNMT1 context dependency. The methylation calling results were firstly grouped by DNA sequences of the carriers. Next, the methylation probabilities of each site on the carriers were fitted by a GMM (Gaussian mixture model). In the methylation results at each site, two peaks appeared on the fitted curve. In the fitted results, 95.73% sites followed clearly bipartite distributions, and the rest sites (4.27%) were indistinguishable for 0 s and 1 s. b, The correlations of the accuracies in 16 nanopore sequencing reactions in the experiments of the panda image. Here, the correlations are the Pearson’s correlation coefficients for per-site accuracy in every pair of sequencing batches. Note that the single-read methylation probabilities were used for the site-specific threshold determination based on GMM.