Correction to: Scientific Reports https://doi.org/10.1038/s41598-020-76317-6, published online 10 November 2020


The original version of this Article contained errors.


Following the publication of this Article the Authors detected programming errors that affected the accuracy of some of the results:


  • Instead of atan2() function to compute arctangent, atan() function was used in some cases. atan() function returns angles only in − 90 to 90 degree range while atan2() function returns in − 180 to 180 degree range taking the quadrant information into account. Backbone angles predicted are in the range of − 180 to 180. All calculations were now re-done using atan2() function.

  • Difference between two angles +190 and − 175 is 365. However, considering the periodicity of − 180 to 180, a difference can only be within 0 to 180. As such the difference of 365 was given as 5. In another case the difference between 170 and − 185 is 355 which is also given as 5. To compute the difference correctly in both cases, an abs() function should be used in the formula min(D, abs(360-D)). This was previously omitted, but calculations were re-done now using the correct version of the formula.

  • In MAE computation function, a parameter was passed by reference. However, the parameter was mistakenly assumed to be passed by value and was changed within the function with the understanding that the change does not affect outside the function. This assumption was incorrect, as the change within the function affects outside the function. Calculations were now re-done to reflect this.


As a result of the correction for these programming errors, in the Abstract,


“We then empirically show that SAP can significantly outperform existing state-of-the-art methods on well-known benchmark datasets: for some types of angles, the differences are 6–8 in terms of mean absolute error (MAE).”


now reads:


“We then empirically show that SAP significantly outperform existing state-of-the-art methods on well-known benchmark datasets: for some types of angles, the differences are above 3 in mean absolute error (MAE).”


In the Introduction,


“We then empirically show that SAP can significantly outperform the existing state-of-the-art methods SPOT-1D and OPUS-TASS6 on well-known benchmark datasets: for ψ and τ , the differences are 6–8 in terms of mean absolute error (MAE).”


now reads:


“We then empirically show that SAP significantly outperforms the existing state-of-the-art methods SPOT-1D and OPUS-TASS6 on well-known benchmark datasets: for ψ and τ , the differences are above 3 in mean absolute error (MAE).”


In the Results section, under the subheading ‘Calculating Absolute Errors’,


“Then, we take AE = min(D,360−D) as the absolute error (AE) for that predicted angle.”


now reads:


“Then, we take AE = min(D,|360−D|) as the absolute error (AE) for that predicted angle.”


In the Results section, under the subheading ‘Determining Best Settings’,


“Moreover, prediction of trigonometric ratios is better for ψ while prediction of direct angles is better for \(\phi\) , θ , and \(\tau\) . While not using ASA appears to be better than using, in contrast, using 7PCP appears to be better than not using. Overall, the best SAP settings to predict the 4 types of angles are listed below. Henceforth, we use these angle specific settings in further analysis.”


now reads:


“Moreover, prediction of direct angles is better than that of trigonometric ratios. While not using ASA appears to be better than using, in contrast, using 7PCP appears to be better than not using. Overall, the best SAP settings is using 7PCP, range-based normalisation, direct angle prediction, and window size 5. Henceforth, we use this setting in further analysis.”


In the same section, the following text was removed:

  • \(\phi\): 7PCP, range-based normalisation, direct angle prediction, and window size 5

  • ψ: 7PCP, z-score based normalisation, trigonometric ratio prediction, and window size 13

  • θ: 7PCP, range-based normalisation, direct angle prediction, and window size

  • τ: 7PCP, range-based normalisation, direct angle prediction, and window size 5


In the Results section, still under the subheading ‘Determining Best Settings’,


“However, in Table 3, we show the performance of the best angle specific SAP settings when run with DNNs having 2 and 4 hidden layers. In most cases DNNs having 3 hidden layers show the best results (shown in bold in Table 3); where this is not the case, DNNs with three hidden layers are a close second (shown in italics in Table 3), with the difference being < 0.05. So for the rest of the paper, we have chosen DNNs with 3 hidden layers as the selected SAP settings”


now reads:


“However, in Table 3, we show the performance of the best angle specific SAP setting when run with DNNs having 2 and 4 hidden layers. In most cases DNNs having 3 hidden layers show the best results (shown in bold in Table 3); where this is not the case, DNNs with three hidden layers are a close second (shown in italics in Table 3), with the difference being < 0.09. So for the rest of the paper, we have chosen the DNN with 3 hidden layers as the selected SAP setting.”


In the Results section, under the subheading ‘Performing cross-validation’,


“In Table 4, we again show the MAE values but only for the best settings of SAP”


now reads:


“In Table 4, we again show the MAE values but only for the best setting of SAP”


In the Results section, under the subheading ‘Comparison with state-of-the-art predictors’


“Since SPOT-1D and OPUS-TASS show their performance on two subsets namely TEST2016 and TEST2018 of the testing proteins, we also do the same although we show the accumulated results for all testing proteins. Note that both SPOT-1D’s and OPUS-TASS’s performances are not worse than their reported values as shown in the bottom part of Table 5. Moreover, notice from the table that SAP significantly outperforms both SPOT-1D and OPUS-TASS in all cases.”


now reads:


“Since SPOT-1D and OPUS-TASS show their performance on two subsets namely TEST2016 and TEST2018 of the testing proteins, we also do the same although we show the accumulated results for all testing proteins. Notice from the table that SAP significantly outperforms both SPOT-1D and OPUS-TASS in all cases.”


In the Results section, still under the subheading ‘Comparison with state-of-the-art predictors’


“To test the generality of performance of SAP over other datasets, we have run SAP on 71 proteins of PDB150 dataset and 55 proteins of CAMEO93 datasets. In Table 6, we also compare SAP’s performance with SPOT-1D’s performance on the PDB150 proteins and with OPUS-TASS’s performance on the CAMEO93 proteins. Notice that SAP significantly outperforms SPOT-1D and OPUS-TASS in ψ, θ, and τ angles, but performs worse in \(\phi\) prediction. We have performed t-tests to compare the performances of SPOT-1D and OPUS-TASS with SAP and the p values are < 0.01 in all cases, indicating the differences are statistically significant. Nevertheless, the margins in ψ and τ remain huge for SAP compared to the other methods.”


now reads:


“Although our results are in Table 5, to test the generality of performance of SAP over other datasets, we have run SAP on 71 proteins of PDB150 dataset and 55 proteins of CAMEO93 datasets. In Table 6, we also compare SAP’s performance with SPOT-1D’s performance on the PDB150 proteins and with OPUS-TASS’s performance on the CAMEO93 proteins. The performance of various methods are rather mixed here. We have performed t-tests to compare the performances of SPOT-1D and OPUS-TASS with SAP and the p values are < 0.05 in all cases, indicating the differences are statistically significant.”


In the Results section, under the subheading ‘Comparison on Protein Length Groups’,


“From the table, we see that for all four types of angles, SAP’s prediction accuracy gradually decreases as the protein length increases. When protein lengths are 300 or below, the MAE values are less than the overall MAE values and for protein lengths above 300, the MAE values are greater than the overall MAE values.”


now reads:


“From the table, we see that for all four types of angles, SAP’s prediction accuracy gradually decreases, with minor exceptions, as the protein length increases. When protein lengths are 300 or below (with minor exception for ϑ), the MAE values are less than the overall MAE values and for protein lengths above 300, the MAE values are greater than the overall MAE values.”


In the Results section, under the subheading ‘Using Angle Ranges from Predicted Secondary Structures’,


“When we do that for the residues that belong to SS types G, H and I, we get MAE values respectively 16.91, 8.78, and 24.02 for \(\phi\) and 27.71, 9.12, 22.04 for ψ. In contrast the MAE values for SAP predictions are respectively 12.39, 5.43, 11.34 for \(\phi\) for SS types G, H, and I, and 12.41, 5.73, 12.06 for ψ.”


now reads:


“When we do that for the residues that belong to SS types G, H and I, we get MAE values respectively 27.71, 9.12, and 22.04 for \(\phi\) and 18.71, 8.83, 21.17 for ψ. In contrast the MAE values for SAP predictions are respectively 12.40, 5.43, 11.34 for \(\phi\) for SS types G, H, and I, and 16.08, 6.40, 15.16 for ψ.”


In the Results section, under the subheading ‘Comparison of angle distributions’,


“Notice that the largest peaks of the predicted values are higher than the largest peaks of the actual values except in predicted ψ distributions of OPUS-TASS, SPOT-1D and SPIDER2. One noticeable fact in the ψ chart is OPUS-TASS, SPOT-1D and SPIDER2 predicted values are outside of [−90, 90] but actual values are roughly within the range. Another noticeable fact is in the θ chart: there are actual values between 0 and 90 although with almost zero probability, and these values are not much captured by the predictors.”


now reads:


“Notice that the largest peaks of the predicted values are higher than the largest peaks of the actual values. One noticeable fact is in the \(\theta\) chart: there are actual values between 0 and 90 although with almost zero probability, and these values are not much captured by the predictors.”


Finally, in the Results section, under the subheading ‘Comparison on Correct Prediction Per Protein’,


“We choose the threshold values to be 6 and 18 in the charts because SAP’s minimum and maximum MAE values are close to 6 and 18 respectively, for example for θ and τ.”


now reads:


“We choose the threshold values to be 6 and 18 in the charts.”


Additionally, Figure 4, Figure 5, Figure 6, and Figure 7 were corrected with the updated results for SAP. The original versions of these figures are reproduced below for the record.

Figure 4
figure 4

Performance of SAP, OPUS-TASS, SPOT-1D, SPIDER2 on the testing proteins when residues are grouped based (Top Four) on their SS types and (Bottom Four) on their AA types. In the charts, y-axis shows MAE values and x-axis shows SS or AA types. The dashed horizontal line in each chart shows the overall MAE value for SAP.

Figure 5
figure 5

Distributions of actual angles of testing proteins and predictions of SAP, OPUS-TASS, SPOT-1D, and SPIDER2.

Figure 6
figure 6

RMSD values for SAP, SPOT-1D, and OPUS-TASS on TEST2018 proteins.

Figure 7
figure 7

Percentages of proteins (y-axis) that have a given percentage of residues (x-axis) with AE at most a given threshold T where T is 6 and 18 and are denoted by T6 and T18. The lower the threshold, the better the prediction quality.


Furthermore, Table 2, Table 3, Table 4, Table 5, Table 6, Table 7 were corrected to reflect updated results for SAP. The original versions of these tables are reproduced below for the record.

Table 2 Performance of SAP settings on 1206 testing proteins. In the table, column ASA denotes whether accessible surface area is used (Yes/No), column 7PCP denotes whether 7 physicochemical properties are used (Yes/No), column OR denotes output representation is in direct angles (D) or trigonometric ratios (R), column NM denotes normalisation method for input feature encoding is [0,1] range based (R) or Z-score based (Z), WS denotes the best size of the sliding window. Note that the emboldened cells denote the best performance for each combination of ASA and 7PCP while the boxed plus emboldened cells in each respective column denote the best performance among all SAP settings.
Table 3 Performance of the best angle specific SAP settings when the numbers of hidden layers in the DNNs are varied.
Table 4 Average performance of the best settings of SAP after 10-fold cross validation is performed.
Table 5 Performances of SPIDER2, SPOT-1D, SAP, and OPUS-TASS on our testing dataset and its subsets TEST2016 and TEST2018. The emboldened values are the wining numbers for the corresponding types of angles and datasets. OPUS-TASS does not predict θ and τ angles while the other three methods predict all four types of angles.
Table 6 Performances of SPIDER2, SPOT-1D, OPUS-TASS, and SAP on filtered PDB150 and CAMEO93 proteins. The emboldened values are the wining numbers for the corresponding types of angles and datasets. OPUS-TASS does not predict θ and τ angles while the other three methods predict all four types of angles.
Table 7 Performance of SAP, OPUS-TASS, SPOT-1D, and SPIDER2 when our testing proteins are grouped based on their lengths. In the table, ΔMAE of a system (e.g. OPUS-TASS, SPOT-1D or SPIDER2) is its MAE minus the MAE of SAP. As such, the greater the value of ΔMAE, the worse the performance of the system w.r.t. the performance of SAP. The horizontal lines in SAP columns split those columns such that the upper parts have MAE values less than the overall MAE values and the lower parts have MAE values greater (a slight exception for θ).

Finally, legends of Table 3, 4, and 7 were also updated. In the legend of Table 3,


“Performance of the best angle specific SAP settings when the numbers of hidden layers in the DNNs are varied”


now reads:


“Performance of the best SAP setting when the numbers of hidden layers in the DNNs are varied”


In the legend of Table 4,


“Average performance of the best settings of SAP after 10-fold cross validation is performed.”


now reads:


“Average performance of the best setting of SAP after 10-fold cross validation is performed.”


and in the legend of Table 7,


“As such, the greater the value of ΔMAE, the worse the performance of the system w.r.t. the performance of SAP. The horizontal lines in SAP columns split those columns such that the upper parts have MAE values less than the overall MAE values (a slight exception for q) and the lower parts have MAE values greater (a slight exception for θ).”


now reads:


“As such, the greater the value of ΔMAE, the worse the performance of the system w.r.t. the performance of SAP.”


The original version of the Article has been corrected.