Supplementary Figure 1: Cross-validation procedures used to assess the prediction of epigenome from DNA motifs.

A schematic describes the different cross-validation procedures that were used to validate our predictions. In the top left the standard five-fold cross-validation procedure is shown. Beneath we show how shuffling of test sequences is carried out during cross-validation. This is done to ensure all the prediction performance is coming from the motifs and not simple sequence biases, such as GC-content. In the top right we show how sequences that are removed during SSB are still tested by training the model on the entire set of sequences that were selected during SSB.