Supplementary Figure 11: The precision of ECs depends on the amount of available sequence information. | Nature Methods

Supplementary Figure 11: The precision of ECs depends on the amount of available sequence information.

From: Protein structure determination by combining sparse NMR data with evolutionary couplings

Supplementary Figure 11

(A) The amount of sequence information available for EC inference is varied by sampling from the full alignment of P74712 (Neff/L= 227). The number of predicted high-confidence EC pairs exceeding the non-informative background level of coupling by a factor of two or more decreases sharply once more than 75% of sequences have been removed (dashed blue line). (B) The number of high-confidence EC pairs for each sampled alignment is a good predictor for the overall precision of the top L EC pairs when evaluating their distance in the protein structure. The size of the high-confidence EC set correlates with the size of the sequence alignments (Pearson r = 0.81), and starts to saturate at Neff/L = 75, where there is no more gain for increasing numbers of sequences (panel A). For this particular protein the high-confidence set of ECs ranges from 89 EC pairs for the full alignment, down to zero pairs for the smallest alignment. As one would expect, the proportion of ECs that are close in the crystal structure (out of the top 150 ECs) positively correlates with the number of sequences in the alignment, saturating at about Neff/L=100 (Pearson r = 0.91, panel B). The true positive rate of L ECs is higher than the number of high confidence ECs deduced from the corresponding alignment. This indicates that the scoring method is conservative and the number of high confidence ECs in a corresponding alignment sets a lower bound of true positives.

Back to article page