Figure 1

Overall design of the present study. The study consisted of three steps. (a) Molecular dynamics (MD) simulations were performed for all the unique tetramers flanked by a fixed tetramer on both terminals and the conformational trajectory of the central four nucleotides was converted into a conformational ensemble by defining equal frequency ensemble bins from the entire data. (b) A set of 65 SVR models were trained, one each for the five ensemble bins of the 13 conformational parameters. Models could then use a nucleotide sequence as the input and predict 65 features (representing ensemble bin occupancies) of a nucleotide in the corresponding sequence environment. A number of benchmarks for the effectiveness of DynaSeq were performed. These included the models’ performance in recalling PDB deposited structures (using predicted occupancy-weighted averages of ensemble bins) and DREAM5 TF specificities (from the ensemble occupancies for a sequence window). (c) Benchmarks on DynaSeq’s ability to classify TFBS from genomic controls were performed. Predictors were trained by pooling all the 65 features together and also by using just a 5-bin ensemble of a single conformational parameter at a time as the sequence feature.