Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Scientific Reports
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. scientific reports
  3. articles
  4. article
Leveraging learned representations and multitask learning for lysine methylation site discovery
Download PDF
Download PDF
  • Article
  • Open access
  • Published: 23 February 2026

Leveraging learned representations and multitask learning for lysine methylation site discovery

  • François Charih1,2,3,
  • Mullen Boulter2,
  • Kyle K. Biggar2,3 &
  • …
  • James R. Green1,3 

Scientific Reports , Article number:  (2026) Cite this article

  • 60 Accesses

  • Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Biochemistry
  • Cancer
  • Computational biology and bioinformatics

Abstract

Lysine methylation is a dynamic and reversible post-translational modification of proteins carried out by lysine methyltransferase enzymes. The role of this modification in epigenetics and gene regulation is relatively well understood, but our understanding of the extent and the role of lysine methylation of non-histone substrates remains somewhat limited. Several lysine methyltransferases which methylate non-histone substrates are overexpressed in a number of cancers and are believed to be key drivers of cancer progression. There is great incentive to identify the lysine methylome, as this is a key step in identifying drug targets. While numerous computational models have been developed in the last decade to identify novel lysine methylation sites, the accuracy of these models has been modest, leaving much room for improvement. In this work, we leverage the most recent advancements in deep learning and present a transformer-based model for lysine methylation site prediction which achieves state-of-the-art accuracy. In addition, we show that other post-translational modifications of lysine are informative and that multitask learning is an effective way to integrate this prior knowledge into our lysine methylation site predictor, MethylSight 2.0. Finally, we validate our model by means of parallel reaction monitoring mass spectrometry experiments and identify 68 novel lysine methylation sites. This work constitutes another contribution towards the completion of a comprehensive map of the lysine methylome by providing a revised estimate of its extent to approximately 155,000 sites. Of those, MethylSight 2.0 is expected to correctly detect ~ 47,000, which is substantially more than expected with competing methods, which we show to be less sensitive on a subset of experimentally validated novel methylation sites. We foresee that MethylSight 2.0, whose performance significantly surpasses that of competing models, will facilitate the discovery of a large number of novel methylation sites.

Data availability

The source code (models and model weights) required to run MethylSight 2.0 and the associated datasets are available on GitHub ([https://github.com/GreenCUBIC/MethylSight2.git](https:/github.com/GreenCUBIC/MethylSight2.git)).

References

  1. Biggar, K. K. & Li, S. S. C. Non-Histone protein methylation as a regulator of cellular signalling and function. Nat. Rev. Mol. Cell Biol. 16, 5–17 (2015).

    Google Scholar 

  2. Carlson, S. M. & Gozani, O. Nonhistone lysine methylation in the regulation of cancer pathways. Cold Spring Harbor Perspect. Med. 6, a26435 (2016).

    Google Scholar 

  3. Han, D. et al. Lysine methylation of transcription factors in cancer. Cell Death Dis. 10, 290 (2019).

    Google Scholar 

  4. Huang, M. et al. Methylation modification of Non-Histone proteins in breast cancer: an emerging targeted therapeutic strategy. Pharmacol. Res. 208, 107354 (2024).

    Google Scholar 

  5. Straining, R., Eighmy, W. & Tazemetostat EZH2 inhibitor. J. Adv. Practitioner Oncol. 13, 158 (2022).

    Google Scholar 

  6. Feoli, A. et al. Lysine methyltransferase inhibitors: where we are now. RSC Chem. Biology. 3, 359–406 (2022).

    Google Scholar 

  7. Xu, K. et al. EZH2 oncogenic activity in Castration-Resistant prostate cancer cells is polycombindependent. Science 338, 1465–1469 (2012).

    Google Scholar 

  8. Kim, E. et al. Phosphorylation of EZH2 activates STAT3 signaling via STAT3 methylation and promotes tumorigenicity of glioblastoma Stem-like cells. Cancer Cell. 23, 839–852 (2013).

    Google Scholar 

  9. Lanouette, S., Mongeon, V., Figeys, D. & Couture, J. F. The functional diversity of protein lysine methylation. Mol. Syst. Biol. 10, 724 (2014).

    Google Scholar 

  10. Chen, H., Xue, Y., Huang, N., Yao, X. & Sun, Z. MeMo: A web tool for prediction of protein methylation modifications. Nucleic Acids Res. 34, W249–W253 (2006).

    Google Scholar 

  11. Qiu, W. R., Xiao, X., Lin, W. Z. & Chou, K. C. iMethyl-PseAAC: Identification of Protein Methylation Sites via a Pseudo Amino Acid Composition Approach. BioMed Research International 947416 (2014). (2014).

  12. Kawashima, S. et al. AAindex: amino acid index Database, progress report 2008. Nucleic Acids Res. 36, D202–205 (2008).

    Google Scholar 

  13. Deng, W. et al. Computational prediction of methylation types of covalently modified lysine and arginine residues in proteins. Brief. Bioinform. 18, 647–658 (2017).

    Google Scholar 

  14. Zheng, W., Wuyun, Q., Cheng, M., Hu, G. & Zhang, Y. Two-Level protein methylation prediction using structure Model-Based features. Sci. Rep. 10, 6008 (2020).

    Google Scholar 

  15. Biggar, K. K. et al. Proteome-Wide prediction of lysine methylation leads to identification of H2BK43 methylation and outlines the potential Methyllysine proteome. Cell. Rep. 32, 107896 (2020).

    Google Scholar 

  16. Ruiz-Blanco, Y. B., Paz, W., Green, J., Marrero-Ponce, Y. & ProtDCal A program to compute GeneralPurpose-Numerical descriptors for sequences and 3D-structures of proteins. BMC Bioinform. 16, 162 (2015).

    Google Scholar 

  17. Spadaro, A., Sharma, A. & Dehzangi, I. Predicting lysine methylation sites using a convolutional neural network. Methods (San Diego Calif). 226, 127–132 (2024).

    Google Scholar 

  18. Peng, F. Z. et al. PTM-Mamba: A PTM-aware protein Language model with bidirectional gated Mamba blocks. Nat. Methods. 22, 945–949 (2025).

    Google Scholar 

  19. Bepler, T. & Berger, B. Learning the protein language: Evolution, Structure, and function. Cell. Syst. 12, 654–669e3 (2021).

    Google Scholar 

  20. Hornbeck, P. V. et al. PhosphoSitePlus, : Mutations, PTMs and Recalibrations. Nucleic Acids Research 43, D512–520 (2015). (2014).

  21. Li, A., Deng, Y., Tan, Y. & Chen, M. A. Transfer Learning-Based approach for lysine propionylation prediction. Frontiers Physiology. 12, 658633 (2021).

  22. Lukinović, V., Casanova, A. G., Roth, G. S., Chuffart, F. & Reynoird, N. Lysine methyltransferases signaling: histones are just the tip of the iceberg. Curr. Protein Pept. Sci. 21, 655–674 (2020).

    Google Scholar 

  23. Narita, T., Weinert, B. T. & Choudhary, C. Functions and mechanisms of Non-Histone protein acetylation. Nat. Rev. Mol. Cell Biol. 20, 156–174 (2019).

    Google Scholar 

  24. Geiss-Friedlander, R. & Melchior, F. Concepts in sumoylation: A decade on. Nat. Rev. Mol. Cell Biol. 8, 947–956 (2007).

    Google Scholar 

  25. Damgaard, R. B. The ubiquitin system: from cell signalling to disease biology and new therapeutic opportunities. Cell. Death Differ. 28, 423–426 (2021).

    Google Scholar 

  26. Shrestha, P., Kandel, J., Tayara, H. & Chong, K. T. DL-SPhos: prediction of Serine phosphorylation sites using transformer Language model. Comput. Biol. Med. 169, 107925 (2024).

    Google Scholar 

  27. Xue, Y. et al. GPS: A comprehensive www server for phosphorylation sites prediction. Nucleic Acids Res. 33, W184–W187 (2005).

    Google Scholar 

  28. Shi, S. P. et al. PMeS: prediction of methylation sites based on enhanced feature encoding scheme. PLOS One. 7, e38772 (2012).

    Google Scholar 

  29. Shi, Y., Guo, Y., Hu, Y. & Li, M. Position-Specific prediction of methylation sites from sequence conservation based on information theory. Sci. Rep. 5, 12403 (2015).

    Google Scholar 

  30. Petersen, B., Petersen, T., Andersen, P., Nielsen, M. & Lundegaard, C. A generic method for assignment of reliability scores applied to solvent accessibility predictions. BMC Struct. Biol. 9, 51 (2009).

    Google Scholar 

  31. Høie, M. H. et al. NetSurfP-3.0: accurate and fast prediction of protein structural features by protein Language models and deep learning. Nucleic Acids Res. 50, W510–W515 (2022).

    Google Scholar 

  32. Fu, L., Niu, B., Zhu, Z., Wu, S. & Li, W. CD-HIT: accelerated for clustering the next-Generation sequencing data. Bioinf. (Oxford England). 28, 3150–3152 (2012).

    Google Scholar 

  33. Lin, Z. et al. Evolutionary-Scale prediction of Atomic-Level protein structure with a Language model. Science 379, 1123–1130 (2023).

    Google Scholar 

  34. Avraham, O., Tsaban, T., Ben-Aharon, Z. & Tsaban, L. Schueler-Furman, O. protein Language models can capture protein quaternary state. BMC Bioinform. 24, 433 (2023).

    Google Scholar 

  35. Hao, X. & Fan, L. ProtT5 and random Forests-Based viscosity prediction method for therapeutic mAbs. Eur. J. Pharm. Sci. 194, 106705 (2024).

    Google Scholar 

  36. Schmirler, R., Heinzinger, M. & Rost, B. Fine-Tuning protein Language models boosts predictions across diverse tasks. Nat. Commun. 15, 7407 (2024).

    Google Scholar 

  37. Elnaggar, A. et al. Ankh: Optimized Protein Language Model Unlocks General-Purpose Modelling. (2023).

  38. Brixi, G. et al. SaLT&PepPr is an Interface-Predicting Language model for designing Peptide-Guided protein degraders. Commun. Biology. 6, 1081 (2023).

    Google Scholar 

  39. Bhat, S. et al. De Novo design of peptide binders to conformationally diverse targets with contrastive Language modeling. Sci. Adv. 11, eadr8638 (2025).

    Google Scholar 

  40. Chen, L. T. et al. Target Sequence-Conditioned Design of Peptide Binders Using Masked Language Modeling. Nature Biotechnology 1–9 Preprint at (2025). https://www.nature.com/articles/s41587-02502761-2

  41. Elnaggar, A. et al. ProtTrans: Towards Cracking the Language of Lifes Code Through Self-Supervised Deep Learning and High Performance Computing. IEEE Transactions on Pattern Analysis and Machine Intelligence 7112–7127 Preprint at (2021). https://ieeexplore.ieee.org/document/9477085/

  42. Akiba, T. et al. A Next-generation Hyperparameter Optimization Framework. in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining 2623–2631Association for Computing Machinery, New York, NY, USA, Preprint at (2019). https://doi.org/10.1145/3292500.3330701

  43. Paszke, A. et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. Preprint at (2019). https://arxiv.org/abs/1912.01703

  44. Kingma, D. P., Ba, J. & Adam A Method for Stochastic Optimization. Preprint at (2017). http://arxiv.org/abs/1412.6980

  45. Vaswani, A. et al. Attention Is All You Need. (2017).

  46. The UniProt Consortium. UniProt: the universal protein knowledgebase in 2025. Nucleic Acids Res. 53, D609–D617 (2025).

    Google Scholar 

  47. Berryhill, C. A. et al. Global lysine methylome profiling using systematically characterized affinity reagents. Sci. Rep. 13, 377 (2023).

    Google Scholar 

  48. Levitsky, L. I., Klein, J. A., Ivanov, M. V. & Gorshkov, M. V. Pyteomics 4.0: five years of development of a python proteomics framework. J. Proteome Res. 18, 709–714 (2018).

    Google Scholar 

  49. MacLean, B. et al. Skyline: an open source document editor for creating and analyzing targeted proteomics experiments. Bioinformatics 26, 966–968 (2010).

    Google Scholar 

  50. Charih, F., Green, J. R. & Biggar, K. K. Using machine learning and targeted mass spectrometry to explore the Methyl-Lys proteome. STAR. Protocols. 1, 100135 (2020).

    Google Scholar 

  51. Fournier, Q. et al. Protein Language Models: Is Scaling Necessary? Preprint at (2024). https://doi.org/10.1101/2024.09.23.614603v1

  52. Cheng, X. et al. Training Compute-Optimal Protein Language Models. Preprint at http://biorxiv.org/lookup/doi/ (2024). https://doi.org/10.1101/2024.06.06.597716

  53. Leutert, M., Entwisle, S. W. & Villén, J. Decoding Post-Translational modification crosstalk with proteomics. Mol. Cell. Proteomics: MCP. 20, 100129 (2021).

    Google Scholar 

  54. Shukri, A. H., Lukinović, V., Charih, F. & Biggar, K. K. Unraveling the battle for lysine: A review of the competition among Post-Translational modifications. Biochim. Et Biophys. Acta (BBA) - Gene Regul. Mech. 1866, 194990 (2023).

    Google Scholar 

  55. Lee, J. M., Hammarén, H. M., Savitski, M. M. & Baek, S. H. Control of protein stability by Post-Translational modifications. Nat. Commun. 14, 201 (2023).

    Google Scholar 

  56. Hamey, J. J., Wienert, B., Quinlan, K. G. R. & Wilkins, M. R. METTL21B is a novel human lysine methyltransferase of translation elongation factor 1A: discovery by CRISPR/Cas9 knockout. Mol. Cell. Proteom. 16, 2229–2242 (2017).

    Google Scholar 

  57. Francis, J. W. et al. FAM86A methylation of eEF2 links mRNA translation elongation to tumorigenesis. Mol. Cell. 84, 1753–1763e7 (2024).

    Google Scholar 

  58. Michail, C., Rodrigues Lima, F., Viguier, M. & Deshayes, F. Structure and function of the lysine methyltransferase SETD2 in cancer: from histones to cytoskeleton. Neoplasia (New York N Y). 59, 101090 (2025).

    Google Scholar 

  59. Park, I. Y. et al. Dual Chromatin and Cytoskeletal Remodeling by SETD2. Cell 166, 950–962 (2016).

  60. Li, L. X. & Li, X. Epigenetically mediated ciliogenesis and cell cycle Regulation, and their translational potential. Cells 10, 1662 (2021).

    Google Scholar 

  61. Casanova, A. G. et al. Cytoskeleton remodeling induced by SMYD2 methyltransferase drives breast cancer metastasis. Cell. Discovery. 10, 1–22 (2024).

    Google Scholar 

  62. Kanehisa, M., Furumichi, M., Sato, Y., Matsuura, Y. & Ishiguro-Watanabe, M. KEGG: biological systems database as a model of the real world. Nucleic Acids Res. 53, D672–D677 (2025).

    Google Scholar 

  63. Sasikumar, A. N., Perez, W. B. & Kinzy, T. G. The many roles of the eukaryotic elongation factor 1 complex. WIREs RNA. 3, 543–555 (2012).

    Google Scholar 

  64. Olarewaju, O., Ortiz, P. A., Chowdhury, W. Q., Chatterjee, I. & Kinzy, T. G. The translation elongation factor eEF1B plays a role in the oxidative stress response pathway. RNA Biol. 1, 89–94 (2004).

    Google Scholar 

  65. Negrutskii, B. S. et al. The eEF1 family of mammalian translation elongation factors. BBA Adv. 3, 100067 (2023).

    Google Scholar 

  66. Vanwetswinkel, S. et al. Solution structure of the 162 residue C-terminal domain of human elongation factor 1Bγ. J. Biol. Chem. 278, 43443–43451 (2003).

    Google Scholar 

  67. Achilonu, I. et al. An update on the biophysical character of the human eukaryotic elongation factor 1 beta: perspectives from interaction with elongation factor 1 gamma. J. Mol. Recognit. 31, e2708 (2018).

    Google Scholar 

  68. Olatona, O. A., Choudhury, S. R., Kresman, R. & Heckman, C. A. Candidate proteins interacting with cytoskeleton in cells from the basal airway epithelium in vitro. Front. Mol. Biosci. 11, 1423503 (2024).

    Google Scholar 

  69. Mimori, K., Mori, M., Tanaka, S., Akiyoshi, T. & Sugimachi, K. The overexpression of elongation factor 1 gamma mRNA in gastric carcinoma. Cancer 75, 1446–1449 (1995).

    Google Scholar 

  70. Chi, K., Jones, D. V. & Frazier, M. L. Expression of an elongation factor 1 Gamma-Related sequence in adenocarcinomas of the colon. Gastroenterology 103, 98–102 (1992).

    Google Scholar 

  71. Lew, Y. et al. Expression of elongation Factor-1 Gamma-Related sequence in human pancreatic cancer. Pancreas 7, 144–152 (1992).

    Google Scholar 

  72. Kim, H. Y. & Hong, S. Multi-Faceted roles of DNAJB protein in cancer metastasis and clinical implications. Int. J. Mol. Sci. 23, 14970 (2022).

    Google Scholar 

  73. Liu, P., Zu, F., Chen, H., Yin, X. & Tan, X. Exosomal DNAJB11 promotes the development of pancreatic cancer by modulating the EGFR/MAPK pathway. Cell. Mol. Biol. Lett. 27, 87 (2022).

    Google Scholar 

  74. Pan, J., Cao, D. & Gong, J. The Endoplasmic reticulum Co-Chaperone ERdj3/DNAJB11 promotes hepatocellular carcinoma progression through suppressing AATZ degradation. Future Oncol. 14, 3001–3013 (2018).

    Google Scholar 

  75. Sun, R. et al. DNAJB11 predicts a poor prognosis and is associated with immune infiltration in thyroid carcinoma: A bioinformatics analysis. J. Int. Med. Res. 49, 03000605211053722 (2021).

    Google Scholar 

  76. Chen, H. Y. et al. ATM-mediated Co-Chaperone DNAJB11 phosphorylation facilitates α-Synuclein folding upon DNA Double-Stranded breaks. NAR Mol. Med. 1, ugae7 (2024).

    Google Scholar 

  77. Sondka, Z. et al. COSMIC: A curated database of somatic variants and clinical data for cancer. Nucleic Acids Res. 52, D1210–D1217 (2024).

    Google Scholar 

  78. Sakata-Yanagimoto, M. et al. Somatic RHOA mutation in angioimmunoblastic T cell lymphoma. Nat. Genet. 46, 171–175 (2014).

    Google Scholar 

  79. Ju, Z., Cao, J. Z. & Gu, H. iLM-2L: A Two-Level predictor for identifying protein lysine methylation sites and their methylation degrees by incorporating K-gap amino acid pairs into Chou׳s general PseAAC. J. Theor. Biol. 385, 50–57 (2015).

    Google Scholar 

Download references

Funding

This research was funded by the National Science and Engineering Research Council (NSERC) Canada Discovery grant awarded to Kyle K. Biggar (RGPIN-2023-04651) and James R. Green (RGPIN-2021-04184).

Author information

Authors and Affiliations

  1. Department of Systems and Computer Engineering, Carleton University, Ottawa, ON, Canada

    François Charih & James R. Green

  2. Institute of Biochemistry, Department of Biology, Carleton University, Ottawa, ON, Canada

    François Charih, Mullen Boulter & Kyle K. Biggar

  3. NuvoBio Corp, Ottawa, ON, Canada

    François Charih, Kyle K. Biggar & James R. Green

Authors
  1. François Charih
    View author publications

    Search author on:PubMed Google Scholar

  2. Mullen Boulter
    View author publications

    Search author on:PubMed Google Scholar

  3. Kyle K. Biggar
    View author publications

    Search author on:PubMed Google Scholar

  4. James R. Green
    View author publications

    Search author on:PubMed Google Scholar

Contributions

**François Charih: ** Conceptualization, Methodology, Software, Investigation, Formal analysis, Data Curation, Visualization, Writing - Original Draft, **Mullen Boulter: ** Formal analysis, **Kyle K. Biggar: ** Conceptualization, Resources, Formal analysis, Writing - Review & Editing, Funding acquisition, **James R. Green: ** Conceptualization, Writing - Review & Editing, Funding acquisitionAll authors approved of the manuscript.

Corresponding author

Correspondence to François Charih.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary Material 1

Supplementary Material 2

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Charih, F., Boulter, M., Biggar, K.K. et al. Leveraging learned representations and multitask learning for lysine methylation site discovery. Sci Rep (2026). https://doi.org/10.1038/s41598-026-39136-9

Download citation

  • Received: 02 September 2025

  • Accepted: 03 February 2026

  • Published: 23 February 2026

  • DOI: https://doi.org/10.1038/s41598-026-39136-9

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Keywords

  • Lysine methylation
  • Lysine methylome
  • Deep learning
  • Transformers
  • Multitask learning
Download PDF

Advertisement

Explore content

  • Research articles
  • News & Comment
  • Collections
  • Subjects
  • Follow us on Facebook
  • Follow us on X
  • Sign up for alerts
  • RSS feed

About the journal

  • About Scientific Reports
  • Contact
  • Journal policies
  • Guide to referees
  • Calls for Papers
  • Editor's Choice
  • Journal highlights
  • Open Access Fees and Funding

Publish with us

  • For authors
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Scientific Reports (Sci Rep)

ISSN 2045-2322 (online)

nature.com sitemap

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing: Cancer

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

Get what matters in cancer research, free to your inbox weekly. Sign up for Nature Briefing: Cancer