Deep learning prediction of ribosome profiling with Translatomer reveals translational regulation and interprets disease variants

He, Jialin; Xiong, Lei; Shi, Shaohui; Li, Chengyu; Chen, Kexuan; Fang, Qianchen; Nan, Jiuhong; Ding, Ke; Mao, Yuanhui; Boix, Carles A.; Hu, Xinyang; Kellis, Manolis; Li, Jingyun; Xiong, Xushen

doi:10.1038/s42256-024-00915-6

Article
Published: 23 October 2024

Deep learning prediction of ribosome profiling with Translatomer reveals translational regulation and interprets disease variants

Jialin He^1,2^na1,
Lei Xiong ORCID: orcid.org/0000-0002-2392-114X^3,4^na1,
Shaohui Shi^1,2,
Chengyu Li^1,2,
Kexuan Chen^1,2,
Qianchen Fang^1,2,
Jiuhong Nan^1,2,
Ke Ding^1,2,
Yuanhui Mao¹,
Carles A. Boix⁵,
Xinyang Hu^1,2,
Manolis Kellis ORCID: orcid.org/0000-0001-7113-9630³,
Jingyun Li⁶ &
…
Xushen Xiong ORCID: orcid.org/0000-0001-7090-7503^1,2

Nature Machine Intelligence volume 6, pages 1314–1329 (2024)Cite this article

7071 Accesses
5 Citations
6 Altmetric
Metrics details

Subjects

A preprint version of the article is available at bioRxiv.

Abstract

Gene expression involves transcription and translation. Despite large datasets and increasingly powerful methods devoted to calculating genetic variants’ effects on transcription, discrepancy between messenger RNA and protein levels hinders the systematic interpretation of the regulatory effects of disease-associated variants. Accurate models of the sequence determinants of translation are needed to close this gap and to interpret disease-associated variants that act on translation. Here we present Translatomer, a multimodal transformer framework that predicts cell-type-specific translation from messenger RNA expression and gene sequence. We train the Translatomer on 33 tissues and cell lines, and show that the inclusion of sequence improves the prediction of ribosome profiling signal, indicating that the Translatomer captures sequence-dependent translational regulatory information. The Translatomer achieves accuracies of 0.72 to 0.80 for the de novo prediction of cell-type-specific ribosome profiling. We develop an in silico mutagenesis tool to estimate mutational effects on translation and demonstrate that variants associated with translation regulation are evolutionarily constrained, both in the human population and across species. In particular, we identify cell-type-specific translational regulatory mechanisms independent of the expression quantitative trait loci for 3,041 non-coding and synonymous variants associated with complex diseases, including Alzheimer’s disease, schizophrenia and congenital heart disease. The Translatomer accurately models the genetic underpinnings of translation, bridging the gap between messenger RNA and protein levels as well as providing valuable mechanistic insights for uninterpreted disease variants.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Model design and performance of Translatomer.**

**Fig. 2: Translatomer enables accurate de novo prediction of ribosome profiling.**

**Fig. 3: Contributions of input modalities on translation prediction and in silico mutagenesis effect estimation.**

**Fig. 4: Translatomer reveals translation-dependent evolutionary constraints and interprets underpinnings of genetic diseases in a context-dependent manner.**

Protein translation: biological processes and therapeutic strategies for human diseases

Article Open access 23 February 2024

tRNA dysregulation and disease

Article 09 June 2022

Riboformer: a deep learning framework for predicting context-dependent translation dynamics

Article Open access 05 March 2024

Data availability

All data are publicly available via the Gene Expression Omnibus database at https://www.ncbi.nlm.nih.gov/geo/ (ref. ⁷⁶), with detailed information and accession numbers provided in Supplementary Tables 1 and 2. The example data and pretrained model are available via Zenodo at https://zenodo.org/records/13751434 (ref. ⁷⁷).

Code availability

Code for the ribosome profiling data processing and Translatomer model training is available via GitHub at https://github.com/xiongxslab/Translatomer and via Zenodo at https://zenodo.org/records/13777392 (ref. ⁷⁸).

References

Liu, Y., Beyer, A. & Aebersold, R. On the dependency of cellular protein levels on mRNA abundance. Cell 165, 535–550 (2016).
Google Scholar
Fortelny, N., Overall, C. M., Pavlidis, P. & Freue, G. V. C. Can we predict protein from mRNA levels? Nature 547, E19–E20 (2017).
Google Scholar
Buccitelli, C. & Selbach, M. mRNAs, proteins and the emerging principles of gene expression control. Nat. Rev. Genet. 21, 630–644 (2020).
Google Scholar
Franks, A., Airoldi, E. & Slavov, N. Post-transcriptional regulation across human tissues. PLoS Comput. Biol. 13, e1005535 (2017).
Google Scholar
Edfors, F. et al. Gene-specific correlation of RNA and protein levels in human cells and tissues. Mol. Syst. Biol. 12, 883 (2016).
Google Scholar
Ward, L. D. & Kellis, M. Interpreting noncoding genetic variation in complex traits and human disease. Nat. Biotechnol. 30, 1095–1106 (2012).
Google Scholar
Tak, Y. G. & Farnham, P. J. Making sense of GWAS: using epigenomics and genome engineering to understand the functional relevance of SNPs in non-coding regions of the human genome. Epigenetics Chromatin 8, 57 (2015).
Google Scholar
Võsa, U. et al. Large-scale cis- and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression. Nat. Genet. 53, 1300–1310 (2021).
Google Scholar
Kerimov, N. et al. A compendium of uniformly processed human gene expression and splicing quantitative trait loci. Nat. Genet. 53, 1290–1299 (2021).
Google Scholar
GTEx Consortium The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).
Google Scholar
Connally, N. J. et al. The missing link between genetic association and regulatory function. eLife 11, e74970 (2022).
Google Scholar
Huang, D. et al. QTLbase2: an enhanced catalog of human quantitative trait loci on extensive molecular phenotypes. Nucleic Acids Res. 51, D1122–D1128 (2023).
Google Scholar
Alberts, B. et al. Molecular Biology of the Cell (Garland Science, 2002).
Khan, Z. et al. Primate transcript and protein expression levels evolve under compensatory selection pressures. Science 342, 1100–1104 (2013).
Google Scholar
Battle, A. et al. Genomic variation. Impact of regulatory variation from RNA to protein. Science 347, 664–667 (2015).
Google Scholar
Ingolia, N. T., Ghaemmaghami, S., Newman, J. R. S. & Weissman, J. S. Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. Science 324, 218–223 (2009).
Google Scholar
Brar, G. A. & Weissman, J. S. Ribosome profiling reveals the what, when, where and how of protein synthesis. Nat. Rev. Mol. Cell Biol. 16, 651–664 (2015).
Google Scholar
Witte, F. et al. A trans locus causes a ribosomopathy in hypertrophic hearts that affects mRNA translation in a protein length-dependent fashion. Genome Biol. 22, 191 (2021).
Google Scholar
Li, Q. et al. Genome-wide search for exonic variants affecting translational efficiency. Nat. Commun. 4, 2260 (2013).
Google Scholar
Long, E., Wan, P., Chen, Q., Lu, Z. & Choi, J. From function to translation: decoding genetic susceptibility to human diseases via artificial intelligence. Cell Genomics 3, 100320 (2023).
Google Scholar
Eraslan, G., Avsec, Ž., Gagneur, J. & Theis, F. J. Deep learning: new computational modelling techniques for genomics. Nat. Rev. Genet. 20, 389–403 (2019).
Google Scholar
Huang, X., Rymbekova, A., Dolgova, O., Lao, O. & Kuhlwilm, M. Harnessing deep learning for population genetic inference. Nat. Rev. Genet. 25, 61–78 (2023).
Acosta, J. N., Falcone, G. J., Rajpurkar, P. & Topol, E. J. Multimodal biomedical AI. Nat. Med. 28, 1773–1784 (2022).
Google Scholar
Cui, H., Hu, H., Zeng, J. & Chen, T. DeepShape: estimating isoform-level ribosome abundance and distribution with Ribo-seq data. BMC Bioinf. 20, 678 (2019).
Google Scholar
Hu, H. et al. Riboexp: an interpretable reinforcement learning framework for ribosome density modeling. Brief. Bioinform. 22, bbaa412 (2021).
Google Scholar
Tunney, R. et al. Accurate design of translational output by a neural network model of ribosome distribution. Nat. Struct. Mol. Biol. 25, 577–582 (2018).
Google Scholar
Shao, B. et al. Riboformer: a deep learning framework for predicting context-dependent translation dynamics. Nat. Commun. 15, 2011 (2024).
Google Scholar
Tian, T., Li, S., Lang, P., Zhao, D. & Zeng, J. Full-length ribosome density prediction by a multi-input and multi-output model. PLoS Comput. Biol. 17, e1008842 (2021).
Google Scholar
Vaswani, A. et al. Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017).
Imataka, H., Gradi, A. & Sonenberg, N. A newly identified N-terminal amino acid sequence of human eIF4G binds poly(A)-binding protein and functions in poly(A)-dependent translation. EMBO J. 17, 7480–7489 (1998).
Google Scholar
Wells, S. E., Hillner, P. E., Vale, R. D. & Sachs, A. B. Circularization of mRNA by eukaryotic translation initiation factors. Mol. Cell 2, 135–140 (1998).
Google Scholar
Tarun, S. Z. Jr & Sachs, A. B. Association of the yeast poly(A) tail binding protein with translation initiation factor eIF-4G. EMBO J. 15, 7168–7177 (1996).
Google Scholar
Castillo Bennett, J., Roggero, C. M., Mancifesta, F. E. & Mayorga, L. S. Calcineurin-mediated dephosphorylation of synaptotagmin VI is necessary for acrosomal exocytosis. J. Biol. Chem. 285, 26269–26278 (2010).
Google Scholar
Roggero, C. M. et al. Protein kinase C-mediated phosphorylation of the two polybasic regions of synaptotagmin VI regulates their function in acrosomal exocytosis. Dev. Biol. 285, 422–435 (2005).
Google Scholar
Umezu, T., Yamanouchi, H., Iida, Y., Miura, M. & Tomooka, Y. Follistatin-like-1, a diffusible mesenchymal factor determines the fate of epithelium. Proc. Natl Acad. Sci. USA 107, 4601–4606 (2010).
Google Scholar
Geng, Y. et al. Follistatin-like 1 (Fstl1) is a bone morphogenetic protein (BMP) 4 signaling antagonist in controlling mouse lung development. Proc. Natl Acad. Sci. USA 108, 7058–7063 (2011).
Google Scholar
Sun, W. et al. FSTL1 promotes alveolar epithelial cell aging and worsens pulmonary fibrosis by affecting SENP1-mediated DeSUMOylation. Cell Biol. Int. 47, 1716–1727 (2023).
Google Scholar
Cockman, E., Anderson, P. & Ivanov, P. TOP mRNPs: molecular mechanisms and principles of regulation. Biomolecules 10, 969 (2020).
Google Scholar
Meyuhas, O. Synthesis of the translational apparatus is regulated at the translational level. Eur. J. Biochem. 267, 6321–6330 (2000).
Google Scholar
Kozak, M. The scanning model for translation: an update. J. Cell Biol. 108, 229–241 (1989).
Google Scholar
Kozak, M. Point mutations define a sequence flanking the AUG initiator codon that modulates translation by eukaryotic ribosomes. Cell 44, 283–292 (1986).
Google Scholar
Tuller, T. et al. An evolutionarily conserved mechanism for controlling the efficiency of protein translation. Cell 141, 344–354 (2010).
Google Scholar
Verma, M. et al. A short translational ramp determines the efficiency of protein synthesis. Nat. Commun. 10, 5774 (2019).
Google Scholar
Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).
Google Scholar
Rhead, B. et al. The UCSC Genome Browser database: update 2010. Nucleic Acids Res. 38, D613–D619 (2010).
Google Scholar
Sun, L. et al. Predicting dynamic cellular protein-RNA interactions by deep learning using in vivo RNA structures. Cell Res. 31, 495–516 (2021).
Google Scholar
Siepel, A., Pollard, K. S. & Haussler, D. New methods for detecting lineage-specific selection. in Research in Computational Molecular Biology 190–205 (Springer, 2006).
Josephs, E. B., Lee, Y. W., Stinchcombe, J. R. & Wright, S. I. Association mapping reveals the role of purifying selection in the maintenance of genomic variation in gene expression. Proc. Natl Acad. Sci. USA 112, 15390–15395 (2015).
Google Scholar
Park, C. Y. et al. Genome-wide landscape of RNA-binding protein target site dysregulation reveals a major impact on psychiatric disorder risk. Nat. Genet. 53, 166–173 (2021).
Google Scholar
Landrum, M. J. et al. ClinVar: improvements to accessing data. Nucleic Acids Res. 48, D835–D844 (2020).
Google Scholar
Turco, E. et al. Reconstitution defines the roles of p62, NBR1 and TAX1BP1 in ubiquitin condensate formation and autophagy initiation. Nat. Commun. 12, 5212 (2021).
Google Scholar
Bjørkøy, G. et al. p62/SQSTM1 forms protein aggregates degraded by autophagy and has a protective effect on huntingtin-induced cell death. J. Cell Biol. 171, 603–614 (2005).
Google Scholar
Rubino, E. et al. SQSTM1 mutations in frontotemporal lobar degeneration and amyotrophic lateral sclerosis. Neurology 79, 1556–1562 (2012).
Google Scholar
Ma, S., Attarwala, I. Y. & Xie, X.-Q. SQSTM1/p62: a potential target for neurodegenerative disease. ACS Chem. Neurosci. 10, 2094–2114 (2019).
Google Scholar
Lin, F. & Worman, H. J. Structural organization of the human gene encoding nuclear lamin A and nuclear lamin C. J. Biol. Chem. 268, 16321–16326 (1993).
Google Scholar
Kamat, A. K., Rocchi, M., Smith, D. I. & Miller, O. J. Lamin A/C gene and a related sequence map to human chromosomes 1q12.1-q23 and 10. Somat. Cell Mol. Genet. 19, 203–208 (1993).
Google Scholar
Tan, J. et al. Cell-type-specific prediction of 3D chromatin organization enables high-throughput in silico genetic screening. Nat. Biotechnol. 41, 1140–1150 (2023).
Google Scholar
Yin, Q., Wu, M., Liu, Q., Lv, H. & Jiang, R. DeepHistone: a deep learning approach to predicting histone modifications. BMC Genomics 20, 193 (2019).
Google Scholar
Li, Z. et al. Applications of deep learning in understanding gene regulation. Cell Rep. Methods 3, 100384 (2023).
Google Scholar
Matsumoto, K., Wassarman, K. M. & Wolffe, A. P. Nuclear history of a pre-mRNA determines the translational activity of cytoplasmic mRNA. EMBO J. 17, 2107–2121 (1998).
Google Scholar
Nott, A., Meislin, S. H. & Moore, M. J. A quantitative analysis of intron effects on mammalian gene expression. RNA 9, 607–617 (2003).
Google Scholar
Gudikote, J. P., Imam, J. S., Garcia, R. F. & Wilkinson, M. F. RNA splicing promotes translation and RNA surveillance. Nat. Struct. Mol. Biol. 12, 801–809 (2005).
Google Scholar
Moore, M. J. & Proudfoot, N. J. Pre-mRNA processing reaches back to transcription and ahead to translation. Cell 136, 688–700 (2009).
Google Scholar
Shaul, O. How introns enhance gene expression. Int. J. Biochem. Cell Biol. 91, 145–155 (2017).
Google Scholar
Pamudurti, N. R. et al. Translation of circRNAs. Mol. Cell 66, 9–21.E7 (2017).
Google Scholar
Jacob, A. G. & Smith, C. W. J. Intron retention as a component of regulated gene expression programs. Hum. Genet. 136, 1043–1057 (2017).
Google Scholar
Legnini, I. et al. Circ-ZNF609 Is a circular RNA that can be translated and functions in myogenesis. Mol. Cell 66, 22–37.E9 (2017).
Google Scholar
Sinha, T., Panigrahi, C., Das, D. & Chandra Panda, A. Circular RNA translation, a path to hidden proteome. Wiley Interdiscip. Rev.: RNA 13, e1685 (2022).
Google Scholar
Hwang, H. J. & Kim, Y. K. Molecular mechanisms of circular RNA translation. Exp. Mol. Med. 56, 1272–1280 (2024).
Gjoneska, E. et al. Conserved epigenomic signals in mice and humans reveal immune basis of Alzheimer’s disease. Nature 518, 365–369 (2015).
Google Scholar
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Google Scholar
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Google Scholar
Ramírez, F. et al. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res. 44, W160–W165 (2016).
Google Scholar
Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890 (2018).
Google Scholar
Shrikumar, A., Greenside, P. & Kundaje, A. Learning important features through propagating activation differences. In Proc. 34th International Conference on Machine Learning 70, 3145–3153 (PMLR, 2017).
Edgar, R., Domrachev, M. & Lash, A. E. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 30, 207–210 (2002).
Google Scholar
He, J. Example data and pretrained Translatomer model. Zenodo https://doi.org/10.5281/zenodo.13751434 (2024).
He, J. xiongxslab:Translatomer. Zenodo https://doi.org/10.5281/zenodo.13777392 (2024).

Download references

Acknowledgements

We thank X. Li for sharing the luciferase reporter plasmid for the experimental validation of the identified disease risk loci. We also thank L. Hou for the discussion and suggestions and the members of the Xiong laboratory for discussion and suggestions throughout the project. We acknowledge support from the core facilities and computing platform of Liangzhu Laboratory at Zhejiang University. This work was supported by the National Natural Science Foundation of China (nos. 32422017, 32370609 and 92353301 to X.X. and no. 82303974 to J.L.) and funding from Liangzhu Laboratory at Zhejiang University and the State Key Laboratory of Transvascular Implantation Devices to X.X.

Author information

These authors contributed equally: Jialin He, Lei Xiong.

Authors and Affiliations

The Second Affiliated Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, China
Jialin He, Shaohui Shi, Chengyu Li, Kexuan Chen, Qianchen Fang, Jiuhong Nan, Ke Ding, Yuanhui Mao, Xinyang Hu & Xushen Xiong
State Key Laboratory of Transvascular Implantation Devices, The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
Jialin He, Shaohui Shi, Chengyu Li, Kexuan Chen, Qianchen Fang, Jiuhong Nan, Ke Ding, Xinyang Hu & Xushen Xiong
Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, MA, USA
Lei Xiong & Manolis Kellis
Department of Genetics, Stanford University, Stanford, CA, USA
Lei Xiong
Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
Carles A. Boix
Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, China
Jingyun Li

Authors

Jialin He
View author publications
Search author on:PubMed Google Scholar
Lei Xiong
View author publications
Search author on:PubMed Google Scholar
Shaohui Shi
View author publications
Search author on:PubMed Google Scholar
Chengyu Li
View author publications
Search author on:PubMed Google Scholar
Kexuan Chen
View author publications
Search author on:PubMed Google Scholar
Qianchen Fang
View author publications
Search author on:PubMed Google Scholar
Jiuhong Nan
View author publications
Search author on:PubMed Google Scholar
Ke Ding
View author publications
Search author on:PubMed Google Scholar
Yuanhui Mao
View author publications
Search author on:PubMed Google Scholar
Carles A. Boix
View author publications
Search author on:PubMed Google Scholar
Xinyang Hu
View author publications
Search author on:PubMed Google Scholar
Manolis Kellis
View author publications
Search author on:PubMed Google Scholar
Jingyun Li
View author publications
Search author on:PubMed Google Scholar
Xushen Xiong
View author publications
Search author on:PubMed Google Scholar

Contributions

This study was designed by J.H., L.X. and X.X., and directed and coordinated by X.X. J.H. trained and fine-tuned the model with help from C.L., J.N., K.D., Y.M. and C.A.B., and under the supervision of L.X., M.K. and X.X. S.S., K.C. and Q.F. performed the experimental validation under the supervision of X.H. and J.L. All authors participated in the discussion of the project. J.H., L.X. and X.X. wrote the manuscript.

Corresponding authors

Correspondence to Lei Xiong or Xushen Xiong.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Bin Shao and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Features and performance of Translatomer model.

a, Sketch plot showing the architecture of the transformer layer used in this study. The full Translatomer model is shown in Fig. 1a. b, Model evaluation based on Spearman correlation (left) and MSE loss (right), between Translatomer and other cutting-edge ribosome profiling prediction models, including iXnos, RiboMIMO and Riboformer, using an 11-fold cross-validation strategy in K562, epithelial cells, and brain datasets. The bars represent the mean values, and the error bars represent the standard errors. Each bar contains 11 replicates derived from the 11-fold cross-validation. c, Comparison of the key features between Translatomer and other ribosome profiling prediction models. d, Pearson correlation coefficient (PCC) increases and converges as the number of training epoch increases. Multi-input and single-input models are in blue and red, respectively. Training accuracy is represented by a dotted line and validation accuracy is represented by a solid line. The accuracy difference between multi-input and single-input models is calculated based on the validation accuracy. e, Mean-squared error loss decreases and converges upon the increase of training epochs. f, Table showing the performance of different hyper-parameters tested during model construction using the datasets from 33 tissues and cell lines.

Extended Data Fig. 2 Translatomer accurately predicts ribosome profiling signal for new data.

a, Heatmap showing the pairwise Spearman correlation coefficients between the observed and predicted ribosome profiling across the four tissues or cell types evaluated. Hierarchical clustering was performed to evaluate the similarity between different datasets. b, Pearson (left) and Spearman (right) correlation coefficient between the predicted signal of a certain cell type and the observed signal in that cell type (in yellow), and between the predicted signal of a certain cell type and the observed signal in epithelial cells (in green) for the FSTL1 gene. c, MSE loss between the predicted signal of a certain cell type and the observed signal in that cell type (in yellow), and between the predicted signal of a certain cell type and the observed signal in epithelial cells (in green) for the FSTL1 gene. d, Observed and predicted ribosome profiling tracks in epithelial cells and non-epithelial cells for the ACTB gene. The Pearson correlation coefficient against the observed ribosome profiling in epithelial is labeled at the top right. e, Observed RNA-seq tracks of ACTB in epithelial and non-epithelial cells. The Pearson correlation coefficient is calculated against the RNA-seq signal in epithelial and is labeled at the top right. f, Evaluations of the human-data-trained model on the de novo prediction across 16 mouse datasets, with MSE loss (top), Spearman correlation coefficient (middle), and Pearson correlation coefficient (bottom) shown. The datasets were sorted based on the Pearson correlation coefficient. g, Evaluations of the mouse-data-trained model on the de novo prediction across 37 human datasets.

Extended Data Fig. 3 Validation of Translatomer based on in silico mutagenesis of Kozak sequence.

a, Example track showing the predicted Ribo-seq signal and the sequence contribution score along the RPSA mRNA. The pooled sequence contribution score was calculated by aggregating the scores in bins of 128 bp. The contribution of the 5′ TOP sequence is zoomed in and visualized. b, The predicted effect on translation upon the in silico mutagenesis from G to other nucleotides at position −3. P-value (unadjusted) is calculated using the two-sided Wilcoxon rank-sum test. The box shows the 25th–75th percentile; the line shows the median; the whiskers show 1.5 × IQR. c, The predicted effect on translation upon the in silico mutagenesis from T to other nucleotides at position −3. P-value (unadjusted) is calculated using Wilcoxon rank-sum test. No multi-testing correction applied. The box shows the 25th–75th percentile; the line shows the median; the whiskers show 1.5 × IQR. d, Scatter plot showing the correlation between the in silico mutagenesis effects based on the translation initiation ramp (x-axis) versus the whole coding region (y-axis). The R and p-value (unadjusted) of the correlation analysis were shown. e, The predicted effect on translation upon the in silico mutagenesis from G (left) and T (right) to other nucleotides at position −3. The effect was estimated based on the whole coding region. P-value (unadjusted) is calculated using Wilcoxon rank-sum test. The box shows the 25th–75th percentile; the line shows the median; the whiskers show 1.5 × IQR. f, The predicted effect on translation upon the in silico mutagenesis from G to other nucleotides at position +4. The effect was estimated based on the whole coding region. P-value (unadjusted) is calculated using Wilcoxon rank-sum test. The box shows the 25th–75th percentile; the line shows the median; the whiskers show 1.5 × IQR.

Extended Data Fig. 4 Evolutionary constraints interrogation and disease variants interpretation by Translatomer.

a, Effect size of in silico mutagenesis on translation across different ranges of PhyloP score, which represents evolutionary constraint across species. P-value was calculated using the two-sided Wilcoxon rank-sum test. The box shows the 25th–75th percentile; the line shows the median; the whiskers show 1.5 × IQR. The number of data points of each group was indicated in the figure. b, Effect size of in silico mutagenesis on translation across different ranges of minor allele frequency, which represents evolutionary constraint within human population. P-value was calculated using the two-sided Wilcoxon rank-sum test. The box shows the 25th–75th percentile; the line shows the median; the whiskers show 1.5 × IQR. The number of data points of each group was indicated in the figure. c, Procedure for the identification of translation-dependent ClinVar variants based on in silico mutagenesis. d, Number of translation-dependent ClinVar variants identified by Translatomer in brain-related disorders. e, Number of translation-dependent ClinVar variants identified by Translatomer in heart-related disorders. f, Correlation between the predicted mutagenesis effect and the gene length (top), and between the predicted mutagenesis effect and the translation level of the gene evaluated (bottom). Fitted lines and P-values were calculated based on linear regression, with correlations and P-values (unadjusted) labeled. g, Distribution of the absolute in silico mutagenesis effect on translation across the gnomAD variants. A threshold of 0.24, which corresponds to the effect ranking at the top 5%, is selected to define the candidate variants that influence translation efficiency. h, Number of ClinVar variants that are dependent (red) and independent (blue) of their impacts on translation. The percentage of the translation-dependent variants for each disease is labeled. i, The translation-dependent ClinVar variants showing eQTL significance are not lead eQTL SNPs in the corresponding loci. j, The number of translation-mediated variants identified for each disease curated by the ClinVar database. The sharing of the cell type/tissue contexts is shown at the bottom. k, The example tracks of the chr1:156,134,495:G > T effects on the translation of LMNA gene in the contexts of heart, brain, neuron and macrophage.

Supplementary information

Reporting Summary

Supplementary Tables 1–6

Supplementary Tables.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

He, J., Xiong, L., Shi, S. et al. Deep learning prediction of ribosome profiling with Translatomer reveals translational regulation and interprets disease variants. Nat Mach Intell 6, 1314–1329 (2024). https://doi.org/10.1038/s42256-024-00915-6

Download citation

Received: 05 March 2024
Accepted: 19 September 2024
Published: 23 October 2024
Issue date: November 2024
DOI: https://doi.org/10.1038/s42256-024-00915-6