Extended Data Fig. 7: Sequence match relative to the map resolution.
From: CryoREAD: de novo structure modeling for nucleic acids in cryo-EM maps using deep learning

To compute sequence match, first we identified a nucleotide in the model that corresponds to each nucleotide in the reference structure by assigning the nucleotide in the model that has the closest average atom distance, then checked if the bases are identical or not. Sequence match only considers nucleotides in the reference structure that have a corresponding nucleotide in the model (an average atom pair distance of less than 5 Å). In this figure, we compared sequence match of the initial assignment and after the sequence alignment. The initial assignment considers the base type obtained by the base predictions at base nodes of the atomic structures being developed. The initial assignment here is different from the base moiety accuracy reported in Fig. 2a and Extended Data Fig. 5 because Fig. 2a and Extended Data Fig. 5 concern initial grid-based accuracy of bases by deep learning while the initial sequence assignment here considers accuracy of the base assignment in the modeled tertiary structure, where the base positions are determined in consideration of other atoms in the nucleic acids including phosphate and sugar positions. Seq Match is the reassigned base type by sequence assignment to backbone paths. a. Overall sequence match. For initial assignment, the equation of regression line is y = −0.125x + 0.984 (Pearson correlation coefficient: −0.782, p-value: 3.380e-15, standard error:0.012). For seq match, the equation of regression line is y = −0.110x + 0.997 (Pearson correlation coefficient: −0.684, p-value: 1.230e-10, standard error:0.014). b. Sequence match of Adenine (A) relative to the map resolution. For initial assignment, the equation of regression line is y = −0.169x + 1.045 (Pearson correlation coefficient: −0.684, p-value: 1.307e-11, standard error:0.022). For seq match, the equation of regression line is y = −0.143x + 1.040 (Pearson correlation coefficient: −0.591, p-value: 1.127e-7, standard error:0.024). c. Sequence match of Uracil/Thymine (U/T). For initial assignment, the equation of regression line is y = −0.137x + 1.018 (Pearson correlation coefficient: −0.671, p-value: 3.771e-10, standard error:0.019). For seq match, the equation of regression line is y = −0.132x + 1.042 (Pearson correlation coefficient: −0.680, p-value: 1.881e-10, standard error:0.018). d. Sequence match of Cytosine (C). For initial assignment, the equation of regression line is y = −0.072x + 0.873 (Pearson correlation coefficient: −0.482, p-value: 3.141e-8, standard error:0.016). For seq match, the equation of regression line is y = −0.074x + 0.911 (Pearson correlation coefficient: −0.452, p-value: 1.095e-4, standard error:0.018). e. Sequence match of Guanine (G). For initial assignment, the equation of regression line is y = −0.114x + 0.997 (Pearson correlation coefficient: −0.578, p-value: 2.381e-7, standard error:0.020). For seq match, the equation of regression line is y = −0.100x + 1.012 (Pearson correlation coefficient: −0.595, p-value: 9.017e-8, standard error:0.017).