Author Correction: Enhancing protein backbone angle prediction by using simpler models of deep neural networks

Mataeimoghadam, Fereshteh; Newton, M. A. Hakim; Dehzangi, Abdollah; Karim, Abdul; Jayaram, B.; Ranganathan, Shoba; Sattar, Abdul

doi:10.1038/s41598-021-96666-0

Download PDF

Author Correction
Open access
Published: 06 September 2021

Author Correction: Enhancing protein backbone angle prediction by using simpler models of deep neural networks

Fereshteh Mataeimoghadam¹^na1,
M. A. Hakim Newton^1,2^na1,
Abdollah Dehzangi^3,4,
Abdul Karim¹,
B. Jayaram⁵,
Shoba Ranganathan⁶ &
…
Abdul Sattar^1,2

Scientific Reports volume 11, Article number: 18072 (2021) Cite this article

870 Accesses
Metrics details

The Original Article was published on 10 November 2020

Correction to: Scientific Reports https://doi.org/10.1038/s41598-020-76317-6, published online 10 November 2020

The original version of this Article contained errors.

Following the publication of this Article the Authors detected programming errors that affected the accuracy of some of the results:

Instead of atan2() function to compute arctangent, atan() function was used in some cases. atan() function returns angles only in − 90 to 90 degree range while atan2() function returns in − 180 to 180 degree range taking the quadrant information into account. Backbone angles predicted are in the range of − 180 to 180. All calculations were now re-done using atan2() function.
Difference between two angles +190 and − 175 is 365. However, considering the periodicity of − 180 to 180, a difference can only be within 0 to 180. As such the difference of 365 was given as 5. In another case the difference between 170 and − 185 is 355 which is also given as 5. To compute the difference correctly in both cases, an abs() function should be used in the formula min(D, abs(360-D)). This was previously omitted, but calculations were re-done now using the correct version of the formula.
In MAE computation function, a parameter was passed by reference. However, the parameter was mistakenly assumed to be passed by value and was changed within the function with the understanding that the change does not affect outside the function. This assumption was incorrect, as the change within the function affects outside the function. Calculations were now re-done to reflect this.

As a result of the correction for these programming errors, in the Abstract,

“We then empirically show that SAP can significantly outperform existing state-of-the-art methods on well-known benchmark datasets: for some types of angles, the differences are 6–8 in terms of mean absolute error (MAE).”

now reads:

“We then empirically show that SAP significantly outperform existing state-of-the-art methods on well-known benchmark datasets: for some types of angles, the differences are above 3 in mean absolute error (MAE).”

In the Introduction,

“We then empirically show that SAP can significantly outperform the existing state-of-the-art methods SPOT-1D and OPUS-TASS⁶ on well-known benchmark datasets: for ψ and τ , the differences are 6–8 in terms of mean absolute error (MAE).”

now reads:

“We then empirically show that SAP significantly outperforms the existing state-of-the-art methods SPOT-1D and OPUS-TASS⁶ on well-known benchmark datasets: for ψ and τ , the differences are above 3 in mean absolute error (MAE).”

In the Results section, under the subheading ‘Calculating Absolute Errors’,

“Then, we take AE = min(D,360−D) as the absolute error (AE) for that predicted angle.”

now reads:

“Then, we take AE = min(D,|360−D|) as the absolute error (AE) for that predicted angle.”

In the Results section, under the subheading ‘Determining Best Settings’,

“Moreover, prediction of trigonometric ratios is better for ψ while prediction of direct angles is better for \(\phi\) , θ , and \(\tau\) . While not using ASA appears to be better than using, in contrast, using 7PCP appears to be better than not using. Overall, the best SAP settings to predict the 4 types of angles are listed below. Henceforth, we use these angle specific settings in further analysis.”

now reads:

“Moreover, prediction of direct angles is better than that of trigonometric ratios. While not using ASA appears to be better than using, in contrast, using 7PCP appears to be better than not using. Overall, the best SAP settings is using 7PCP, range-based normalisation, direct angle prediction, and window size 5. Henceforth, we use this setting in further analysis.”

In the same section, the following text was removed:

\(\phi\): 7PCP, range-based normalisation, direct angle prediction, and window size 5
ψ: 7PCP, z-score based normalisation, trigonometric ratio prediction, and window size 13
θ: 7PCP, range-based normalisation, direct angle prediction, and window size
τ: 7PCP, range-based normalisation, direct angle prediction, and window size 5

In the Results section, still under the subheading ‘Determining Best Settings’,

“However, in Table 3, we show the performance of the best angle specific SAP settings when run with DNNs having 2 and 4 hidden layers. In most cases DNNs having 3 hidden layers show the best results (shown in bold in Table 3); where this is not the case, DNNs with three hidden layers are a close second (shown in italics in Table 3), with the difference being < 0.05. So for the rest of the paper, we have chosen DNNs with 3 hidden layers as the selected SAP settings”

now reads:

“However, in Table 3, we show the performance of the best angle specific SAP setting when run with DNNs having 2 and 4 hidden layers. In most cases DNNs having 3 hidden layers show the best results (shown in bold in Table 3); where this is not the case, DNNs with three hidden layers are a close second (shown in italics in Table 3), with the difference being < 0.09. So for the rest of the paper, we have chosen the DNN with 3 hidden layers as the selected SAP setting.”

In the Results section, under the subheading ‘Performing cross-validation’,

“In Table 4, we again show the MAE values but only for the best settings of SAP”

now reads:

“In Table 4, we again show the MAE values but only for the best setting of SAP”

In the Results section, under the subheading ‘Comparison with state-of-the-art predictors’

“Since SPOT-1D and OPUS-TASS show their performance on two subsets namely TEST2016 and TEST2018 of the testing proteins, we also do the same although we show the accumulated results for all testing proteins. Note that both SPOT-1D’s and OPUS-TASS’s performances are not worse than their reported values as shown in the bottom part of Table 5. Moreover, notice from the table that SAP significantly outperforms both SPOT-1D and OPUS-TASS in all cases.”

now reads:

“Since SPOT-1D and OPUS-TASS show their performance on two subsets namely TEST2016 and TEST2018 of the testing proteins, we also do the same although we show the accumulated results for all testing proteins. Notice from the table that SAP significantly outperforms both SPOT-1D and OPUS-TASS in all cases.”

In the Results section, still under the subheading ‘Comparison with state-of-the-art predictors’

“To test the generality of performance of SAP over other datasets, we have run SAP on 71 proteins of PDB150 dataset and 55 proteins of CAMEO93 datasets. In Table 6, we also compare SAP’s performance with SPOT-1D’s performance on the PDB150 proteins and with OPUS-TASS’s performance on the CAMEO93 proteins. Notice that SAP significantly outperforms SPOT-1D and OPUS-TASS in ψ, θ, and τ angles, but performs worse in \(\phi\) prediction. We have performed t-tests to compare the performances of SPOT-1D and OPUS-TASS with SAP and the p values are < 0.01 in all cases, indicating the differences are statistically significant. Nevertheless, the margins in ψ and τ remain huge for SAP compared to the other methods.”

now reads:

“Although our results are in Table 5, to test the generality of performance of SAP over other datasets, we have run SAP on 71 proteins of PDB150 dataset and 55 proteins of CAMEO93 datasets. In Table 6, we also compare SAP’s performance with SPOT-1D’s performance on the PDB150 proteins and with OPUS-TASS’s performance on the CAMEO93 proteins. The performance of various methods are rather mixed here. We have performed t-tests to compare the performances of SPOT-1D and OPUS-TASS with SAP and the p values are < 0.05 in all cases, indicating the differences are statistically significant.”

In the Results section, under the subheading ‘Comparison on Protein Length Groups’,

“From the table, we see that for all four types of angles, SAP’s prediction accuracy gradually decreases as the protein length increases. When protein lengths are 300 or below, the MAE values are less than the overall MAE values and for protein lengths above 300, the MAE values are greater than the overall MAE values.”

now reads:

“From the table, we see that for all four types of angles, SAP’s prediction accuracy gradually decreases, with minor exceptions, as the protein length increases. When protein lengths are 300 or below (with minor exception for ϑ), the MAE values are less than the overall MAE values and for protein lengths above 300, the MAE values are greater than the overall MAE values.”

In the Results section, under the subheading ‘Using Angle Ranges from Predicted Secondary Structures’,

“When we do that for the residues that belong to SS types G, H and I, we get MAE values respectively 16.91, 8.78, and 24.02 for \(\phi\) and 27.71, 9.12, 22.04 for ψ. In contrast the MAE values for SAP predictions are respectively 12.39, 5.43, 11.34 for \(\phi\) for SS types G, H, and I, and 12.41, 5.73, 12.06 for ψ.”

now reads:

“When we do that for the residues that belong to SS types G, H and I, we get MAE values respectively 27.71, 9.12, and 22.04 for \(\phi\) and 18.71, 8.83, 21.17 for ψ. In contrast the MAE values for SAP predictions are respectively 12.40, 5.43, 11.34 for \(\phi\) for SS types G, H, and I, and 16.08, 6.40, 15.16 for ψ.”

In the Results section, under the subheading ‘Comparison of angle distributions’,

“Notice that the largest peaks of the predicted values are higher than the largest peaks of the actual values except in predicted ψ distributions of OPUS-TASS, SPOT-1D and SPIDER2. One noticeable fact in the ψ chart is OPUS-TASS, SPOT-1D and SPIDER2 predicted values are outside of [−90, 90] but actual values are roughly within the range. Another noticeable fact is in the θ chart: there are actual values between 0 and 90 although with almost zero probability, and these values are not much captured by the predictors.”

now reads:

“Notice that the largest peaks of the predicted values are higher than the largest peaks of the actual values. One noticeable fact is in the \(\theta\) chart: there are actual values between 0 and 90 although with almost zero probability, and these values are not much captured by the predictors.”

Finally, in the Results section, under the subheading ‘Comparison on Correct Prediction Per Protein’,

“We choose the threshold values to be 6 and 18 in the charts because SAP’s minimum and maximum MAE values are close to 6 and 18 respectively, for example for θ and τ.”

now reads:

“We choose the threshold values to be 6 and 18 in the charts.”

Additionally, Figure 4, Figure 5, Figure 6, and Figure 7 were corrected with the updated results for SAP. The original versions of these figures are reproduced below for the record.

Furthermore, Table 2, Table 3, Table 4, Table 5, Table 6, Table 7 were corrected to reflect updated results for SAP. The original versions of these tables are reproduced below for the record.

Table 2 Performance of SAP settings on 1206 testing proteins. In the table, column ASA denotes whether accessible surface area is used (Yes/No), column 7PCP denotes whether 7 physicochemical properties are used (Yes/No), column OR denotes output representation is in direct angles (D) or trigonometric ratios (R), column NM denotes normalisation method for input feature encoding is [0,1] range based (R) or Z-score based (Z), WS denotes the best size of the sliding window. Note that the emboldened cells denote the best performance for each combination of ASA and 7PCP while the boxed plus emboldened cells in each respective column denote the best performance among all SAP settings.

Full size table

Table 3 Performance of the best angle specific SAP settings when the numbers of hidden layers in the DNNs are varied.

Full size table

Table 4 Average performance of the best settings of SAP after 10-fold cross validation is performed.

Full size table

Table 5 Performances of SPIDER2, SPOT-1D, SAP, and OPUS-TASS on our testing dataset and its subsets TEST2016 and TEST2018. The emboldened values are the wining numbers for the corresponding types of angles and datasets. OPUS-TASS does not predict θ and τ angles while the other three methods predict all four types of angles.

Full size table

Table 6 Performances of SPIDER2, SPOT-1D, OPUS-TASS, and SAP on filtered PDB150 and CAMEO93 proteins. The emboldened values are the wining numbers for the corresponding types of angles and datasets. OPUS-TASS does not predict θ and τ angles while the other three methods predict all four types of angles.

Full size table

Table 7 Performance of SAP, OPUS-TASS, SPOT-1D, and SPIDER2 when our testing proteins are grouped based on their lengths. In the table, ΔMAE of a system (e.g. OPUS-TASS, SPOT-1D or SPIDER2) is its MAE minus the MAE of SAP. As such, the greater the value of ΔMAE, the worse the performance of the system w.r.t. the performance of SAP. The horizontal lines in SAP columns split those columns such that the upper parts have MAE values less than the overall MAE values and the lower parts have MAE values greater (a slight exception for θ).

Full size table

Finally, legends of Table 3, 4, and 7 were also updated. In the legend of Table 3,

“Performance of the best angle specific SAP settings when the numbers of hidden layers in the DNNs are varied”

now reads:

“Performance of the best SAP setting when the numbers of hidden layers in the DNNs are varied”

In the legend of Table 4,

“Average performance of the best settings of SAP after 10-fold cross validation is performed.”

now reads:

“Average performance of the best setting of SAP after 10-fold cross validation is performed.”

and in the legend of Table 7,

“As such, the greater the value of ΔMAE, the worse the performance of the system w.r.t. the performance of SAP. The horizontal lines in SAP columns split those columns such that the upper parts have MAE values less than the overall MAE values (a slight exception for q) and the lower parts have MAE values greater (a slight exception for θ).”

now reads:

“As such, the greater the value of ΔMAE, the worse the performance of the system w.r.t. the performance of SAP.”

The original version of the Article has been corrected.

Author information

These authors contributed equally: Fereshteh Mataeimoghadam and M. A. Hakim Newton.

Authors and Affiliations

School of Information and Communication Technology, Griffith University, Nathan, QLD, Australia
Fereshteh Mataeimoghadam, M. A. Hakim Newton, Abdul Karim & Abdul Sattar
Institute of Integrated and Intelligent Systems, Griffith University, Nathan, QLD, Australia
M. A. Hakim Newton & Abdul Sattar
Department of Computer Science, Rutgers University, Camden, NJ, USA
Abdollah Dehzangi
Center for Computational and Integrative Biology, Rutgers University, Camden, NJ, USA
Abdollah Dehzangi
Department of Chemistry and School of Biological Sciences, IIT Delhi, Delhi, India
B. Jayaram
Department of Chemistry and Biomolecular Sciences, Macquarie University, Macquarie Park, NSW, Australia
Shoba Ranganathan

Authors

Fereshteh Mataeimoghadam
View author publications
Search author on:PubMed Google Scholar
M. A. Hakim Newton
View author publications
Search author on:PubMed Google Scholar
Abdollah Dehzangi
View author publications
Search author on:PubMed Google Scholar
Abdul Karim
View author publications
Search author on:PubMed Google Scholar
B. Jayaram
View author publications
Search author on:PubMed Google Scholar
Shoba Ranganathan
View author publications
Search author on:PubMed Google Scholar
Abdul Sattar
View author publications
Search author on:PubMed Google Scholar

Corresponding authors

Correspondence to Fereshteh Mataeimoghadam or M. A. Hakim Newton.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Mataeimoghadam, F., Newton, M.A.H., Dehzangi, A. et al. Author Correction: Enhancing protein backbone angle prediction by using simpler models of deep neural networks. Sci Rep 11, 18072 (2021). https://doi.org/10.1038/s41598-021-96666-0

Download citation

Published: 06 September 2021
Version of record: 06 September 2021
DOI: https://doi.org/10.1038/s41598-021-96666-0

Author information

Authors and Affiliations

Corresponding authors

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links