Abstract
Gene expression involves transcription and translation. Despite large datasets and increasingly powerful methods devoted to calculating genetic variants’ effects on transcription, discrepancy between messenger RNA and protein levels hinders the systematic interpretation of the regulatory effects of disease-associated variants. Accurate models of the sequence determinants of translation are needed to close this gap and to interpret disease-associated variants that act on translation. Here we present Translatomer, a multimodal transformer framework that predicts cell-type-specific translation from messenger RNA expression and gene sequence. We train the Translatomer on 33 tissues and cell lines, and show that the inclusion of sequence improves the prediction of ribosome profiling signal, indicating that the Translatomer captures sequence-dependent translational regulatory information. The Translatomer achieves accuracies of 0.72 to 0.80 for the de novo prediction of cell-type-specific ribosome profiling. We develop an in silico mutagenesis tool to estimate mutational effects on translation and demonstrate that variants associated with translation regulation are evolutionarily constrained, both in the human population and across species. In particular, we identify cell-type-specific translational regulatory mechanisms independent of the expression quantitative trait loci for 3,041 non-coding and synonymous variants associated with complex diseases, including Alzheimer’s disease, schizophrenia and congenital heart disease. The Translatomer accurately models the genetic underpinnings of translation, bridging the gap between messenger RNA and protein levels as well as providing valuable mechanistic insights for uninterpreted disease variants.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout




Similar content being viewed by others
Data availability
All data are publicly available via the Gene Expression Omnibus database at https://www.ncbi.nlm.nih.gov/geo/ (ref. 76), with detailed information and accession numbers provided in Supplementary Tables 1 and 2. The example data and pretrained model are available via Zenodo at https://zenodo.org/records/13751434 (ref. 77).
Code availability
Code for the ribosome profiling data processing and Translatomer model training is available via GitHub at https://github.com/xiongxslab/Translatomer and via Zenodo at https://zenodo.org/records/13777392 (ref. 78).
References
Liu, Y., Beyer, A. & Aebersold, R. On the dependency of cellular protein levels on mRNA abundance. Cell 165, 535–550 (2016).
Fortelny, N., Overall, C. M., Pavlidis, P. & Freue, G. V. C. Can we predict protein from mRNA levels? Nature 547, E19–E20 (2017).
Buccitelli, C. & Selbach, M. mRNAs, proteins and the emerging principles of gene expression control. Nat. Rev. Genet. 21, 630–644 (2020).
Franks, A., Airoldi, E. & Slavov, N. Post-transcriptional regulation across human tissues. PLoS Comput. Biol. 13, e1005535 (2017).
Edfors, F. et al. Gene-specific correlation of RNA and protein levels in human cells and tissues. Mol. Syst. Biol. 12, 883 (2016).
Ward, L. D. & Kellis, M. Interpreting noncoding genetic variation in complex traits and human disease. Nat. Biotechnol. 30, 1095–1106 (2012).
Tak, Y. G. & Farnham, P. J. Making sense of GWAS: using epigenomics and genome engineering to understand the functional relevance of SNPs in non-coding regions of the human genome. Epigenetics Chromatin 8, 57 (2015).
Võsa, U. et al. Large-scale cis- and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression. Nat. Genet. 53, 1300–1310 (2021).
Kerimov, N. et al. A compendium of uniformly processed human gene expression and splicing quantitative trait loci. Nat. Genet. 53, 1290–1299 (2021).
GTEx Consortium The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).
Connally, N. J. et al. The missing link between genetic association and regulatory function. eLife 11, e74970 (2022).
Huang, D. et al. QTLbase2: an enhanced catalog of human quantitative trait loci on extensive molecular phenotypes. Nucleic Acids Res. 51, D1122–D1128 (2023).
Alberts, B. et al. Molecular Biology of the Cell (Garland Science, 2002).
Khan, Z. et al. Primate transcript and protein expression levels evolve under compensatory selection pressures. Science 342, 1100–1104 (2013).
Battle, A. et al. Genomic variation. Impact of regulatory variation from RNA to protein. Science 347, 664–667 (2015).
Ingolia, N. T., Ghaemmaghami, S., Newman, J. R. S. & Weissman, J. S. Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. Science 324, 218–223 (2009).
Brar, G. A. & Weissman, J. S. Ribosome profiling reveals the what, when, where and how of protein synthesis. Nat. Rev. Mol. Cell Biol. 16, 651–664 (2015).
Witte, F. et al. A trans locus causes a ribosomopathy in hypertrophic hearts that affects mRNA translation in a protein length-dependent fashion. Genome Biol. 22, 191 (2021).
Li, Q. et al. Genome-wide search for exonic variants affecting translational efficiency. Nat. Commun. 4, 2260 (2013).
Long, E., Wan, P., Chen, Q., Lu, Z. & Choi, J. From function to translation: decoding genetic susceptibility to human diseases via artificial intelligence. Cell Genomics 3, 100320 (2023).
Eraslan, G., Avsec, Ž., Gagneur, J. & Theis, F. J. Deep learning: new computational modelling techniques for genomics. Nat. Rev. Genet. 20, 389–403 (2019).
Huang, X., Rymbekova, A., Dolgova, O., Lao, O. & Kuhlwilm, M. Harnessing deep learning for population genetic inference. Nat. Rev. Genet. 25, 61–78 (2023).
Acosta, J. N., Falcone, G. J., Rajpurkar, P. & Topol, E. J. Multimodal biomedical AI. Nat. Med. 28, 1773–1784 (2022).
Cui, H., Hu, H., Zeng, J. & Chen, T. DeepShape: estimating isoform-level ribosome abundance and distribution with Ribo-seq data. BMC Bioinf. 20, 678 (2019).
Hu, H. et al. Riboexp: an interpretable reinforcement learning framework for ribosome density modeling. Brief. Bioinform. 22, bbaa412 (2021).
Tunney, R. et al. Accurate design of translational output by a neural network model of ribosome distribution. Nat. Struct. Mol. Biol. 25, 577–582 (2018).
Shao, B. et al. Riboformer: a deep learning framework for predicting context-dependent translation dynamics. Nat. Commun. 15, 2011 (2024).
Tian, T., Li, S., Lang, P., Zhao, D. & Zeng, J. Full-length ribosome density prediction by a multi-input and multi-output model. PLoS Comput. Biol. 17, e1008842 (2021).
Vaswani, A. et al. Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017).
Imataka, H., Gradi, A. & Sonenberg, N. A newly identified N-terminal amino acid sequence of human eIF4G binds poly(A)-binding protein and functions in poly(A)-dependent translation. EMBO J. 17, 7480–7489 (1998).
Wells, S. E., Hillner, P. E., Vale, R. D. & Sachs, A. B. Circularization of mRNA by eukaryotic translation initiation factors. Mol. Cell 2, 135–140 (1998).
Tarun, S. Z. Jr & Sachs, A. B. Association of the yeast poly(A) tail binding protein with translation initiation factor eIF-4G. EMBO J. 15, 7168–7177 (1996).
Castillo Bennett, J., Roggero, C. M., Mancifesta, F. E. & Mayorga, L. S. Calcineurin-mediated dephosphorylation of synaptotagmin VI is necessary for acrosomal exocytosis. J. Biol. Chem. 285, 26269–26278 (2010).
Roggero, C. M. et al. Protein kinase C-mediated phosphorylation of the two polybasic regions of synaptotagmin VI regulates their function in acrosomal exocytosis. Dev. Biol. 285, 422–435 (2005).
Umezu, T., Yamanouchi, H., Iida, Y., Miura, M. & Tomooka, Y. Follistatin-like-1, a diffusible mesenchymal factor determines the fate of epithelium. Proc. Natl Acad. Sci. USA 107, 4601–4606 (2010).
Geng, Y. et al. Follistatin-like 1 (Fstl1) is a bone morphogenetic protein (BMP) 4 signaling antagonist in controlling mouse lung development. Proc. Natl Acad. Sci. USA 108, 7058–7063 (2011).
Sun, W. et al. FSTL1 promotes alveolar epithelial cell aging and worsens pulmonary fibrosis by affecting SENP1-mediated DeSUMOylation. Cell Biol. Int. 47, 1716–1727 (2023).
Cockman, E., Anderson, P. & Ivanov, P. TOP mRNPs: molecular mechanisms and principles of regulation. Biomolecules 10, 969 (2020).
Meyuhas, O. Synthesis of the translational apparatus is regulated at the translational level. Eur. J. Biochem. 267, 6321–6330 (2000).
Kozak, M. The scanning model for translation: an update. J. Cell Biol. 108, 229–241 (1989).
Kozak, M. Point mutations define a sequence flanking the AUG initiator codon that modulates translation by eukaryotic ribosomes. Cell 44, 283–292 (1986).
Tuller, T. et al. An evolutionarily conserved mechanism for controlling the efficiency of protein translation. Cell 141, 344–354 (2010).
Verma, M. et al. A short translational ramp determines the efficiency of protein synthesis. Nat. Commun. 10, 5774 (2019).
Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).
Rhead, B. et al. The UCSC Genome Browser database: update 2010. Nucleic Acids Res. 38, D613–D619 (2010).
Sun, L. et al. Predicting dynamic cellular protein-RNA interactions by deep learning using in vivo RNA structures. Cell Res. 31, 495–516 (2021).
Siepel, A., Pollard, K. S. & Haussler, D. New methods for detecting lineage-specific selection. in Research in Computational Molecular Biology 190–205 (Springer, 2006).
Josephs, E. B., Lee, Y. W., Stinchcombe, J. R. & Wright, S. I. Association mapping reveals the role of purifying selection in the maintenance of genomic variation in gene expression. Proc. Natl Acad. Sci. USA 112, 15390–15395 (2015).
Park, C. Y. et al. Genome-wide landscape of RNA-binding protein target site dysregulation reveals a major impact on psychiatric disorder risk. Nat. Genet. 53, 166–173 (2021).
Landrum, M. J. et al. ClinVar: improvements to accessing data. Nucleic Acids Res. 48, D835–D844 (2020).
Turco, E. et al. Reconstitution defines the roles of p62, NBR1 and TAX1BP1 in ubiquitin condensate formation and autophagy initiation. Nat. Commun. 12, 5212 (2021).
Bjørkøy, G. et al. p62/SQSTM1 forms protein aggregates degraded by autophagy and has a protective effect on huntingtin-induced cell death. J. Cell Biol. 171, 603–614 (2005).
Rubino, E. et al. SQSTM1 mutations in frontotemporal lobar degeneration and amyotrophic lateral sclerosis. Neurology 79, 1556–1562 (2012).
Ma, S., Attarwala, I. Y. & Xie, X.-Q. SQSTM1/p62: a potential target for neurodegenerative disease. ACS Chem. Neurosci. 10, 2094–2114 (2019).
Lin, F. & Worman, H. J. Structural organization of the human gene encoding nuclear lamin A and nuclear lamin C. J. Biol. Chem. 268, 16321–16326 (1993).
Kamat, A. K., Rocchi, M., Smith, D. I. & Miller, O. J. Lamin A/C gene and a related sequence map to human chromosomes 1q12.1-q23 and 10. Somat. Cell Mol. Genet. 19, 203–208 (1993).
Tan, J. et al. Cell-type-specific prediction of 3D chromatin organization enables high-throughput in silico genetic screening. Nat. Biotechnol. 41, 1140–1150 (2023).
Yin, Q., Wu, M., Liu, Q., Lv, H. & Jiang, R. DeepHistone: a deep learning approach to predicting histone modifications. BMC Genomics 20, 193 (2019).
Li, Z. et al. Applications of deep learning in understanding gene regulation. Cell Rep. Methods 3, 100384 (2023).
Matsumoto, K., Wassarman, K. M. & Wolffe, A. P. Nuclear history of a pre-mRNA determines the translational activity of cytoplasmic mRNA. EMBO J. 17, 2107–2121 (1998).
Nott, A., Meislin, S. H. & Moore, M. J. A quantitative analysis of intron effects on mammalian gene expression. RNA 9, 607–617 (2003).
Gudikote, J. P., Imam, J. S., Garcia, R. F. & Wilkinson, M. F. RNA splicing promotes translation and RNA surveillance. Nat. Struct. Mol. Biol. 12, 801–809 (2005).
Moore, M. J. & Proudfoot, N. J. Pre-mRNA processing reaches back to transcription and ahead to translation. Cell 136, 688–700 (2009).
Shaul, O. How introns enhance gene expression. Int. J. Biochem. Cell Biol. 91, 145–155 (2017).
Pamudurti, N. R. et al. Translation of circRNAs. Mol. Cell 66, 9–21.E7 (2017).
Jacob, A. G. & Smith, C. W. J. Intron retention as a component of regulated gene expression programs. Hum. Genet. 136, 1043–1057 (2017).
Legnini, I. et al. Circ-ZNF609 Is a circular RNA that can be translated and functions in myogenesis. Mol. Cell 66, 22–37.E9 (2017).
Sinha, T., Panigrahi, C., Das, D. & Chandra Panda, A. Circular RNA translation, a path to hidden proteome. Wiley Interdiscip. Rev.: RNA 13, e1685 (2022).
Hwang, H. J. & Kim, Y. K. Molecular mechanisms of circular RNA translation. Exp. Mol. Med. 56, 1272–1280 (2024).
Gjoneska, E. et al. Conserved epigenomic signals in mice and humans reveal immune basis of Alzheimer’s disease. Nature 518, 365–369 (2015).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Ramírez, F. et al. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res. 44, W160–W165 (2016).
Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890 (2018).
Shrikumar, A., Greenside, P. & Kundaje, A. Learning important features through propagating activation differences. In Proc. 34th International Conference on Machine Learning 70, 3145–3153 (PMLR, 2017).
Edgar, R., Domrachev, M. & Lash, A. E. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 30, 207–210 (2002).
He, J. Example data and pretrained Translatomer model. Zenodo https://doi.org/10.5281/zenodo.13751434 (2024).
He, J. xiongxslab:Translatomer. Zenodo https://doi.org/10.5281/zenodo.13777392 (2024).
Acknowledgements
We thank X. Li for sharing the luciferase reporter plasmid for the experimental validation of the identified disease risk loci. We also thank L. Hou for the discussion and suggestions and the members of the Xiong laboratory for discussion and suggestions throughout the project. We acknowledge support from the core facilities and computing platform of Liangzhu Laboratory at Zhejiang University. This work was supported by the National Natural Science Foundation of China (nos. 32422017, 32370609 and 92353301 to X.X. and no. 82303974 to J.L.) and funding from Liangzhu Laboratory at Zhejiang University and the State Key Laboratory of Transvascular Implantation Devices to X.X.
Author information
Authors and Affiliations
Contributions
This study was designed by J.H., L.X. and X.X., and directed and coordinated by X.X. J.H. trained and fine-tuned the model with help from C.L., J.N., K.D., Y.M. and C.A.B., and under the supervision of L.X., M.K. and X.X. S.S., K.C. and Q.F. performed the experimental validation under the supervision of X.H. and J.L. All authors participated in the discussion of the project. J.H., L.X. and X.X. wrote the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Machine Intelligence thanks Bin Shao and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Features and performance of Translatomer model.
a, Sketch plot showing the architecture of the transformer layer used in this study. The full Translatomer model is shown in Fig. 1a. b, Model evaluation based on Spearman correlation (left) and MSE loss (right), between Translatomer and other cutting-edge ribosome profiling prediction models, including iXnos, RiboMIMO and Riboformer, using an 11-fold cross-validation strategy in K562, epithelial cells, and brain datasets. The bars represent the mean values, and the error bars represent the standard errors. Each bar contains 11 replicates derived from the 11-fold cross-validation. c, Comparison of the key features between Translatomer and other ribosome profiling prediction models. d, Pearson correlation coefficient (PCC) increases and converges as the number of training epoch increases. Multi-input and single-input models are in blue and red, respectively. Training accuracy is represented by a dotted line and validation accuracy is represented by a solid line. The accuracy difference between multi-input and single-input models is calculated based on the validation accuracy. e, Mean-squared error loss decreases and converges upon the increase of training epochs. f, Table showing the performance of different hyper-parameters tested during model construction using the datasets from 33 tissues and cell lines.
Extended Data Fig. 2 Translatomer accurately predicts ribosome profiling signal for new data.
a, Heatmap showing the pairwise Spearman correlation coefficients between the observed and predicted ribosome profiling across the four tissues or cell types evaluated. Hierarchical clustering was performed to evaluate the similarity between different datasets. b, Pearson (left) and Spearman (right) correlation coefficient between the predicted signal of a certain cell type and the observed signal in that cell type (in yellow), and between the predicted signal of a certain cell type and the observed signal in epithelial cells (in green) for the FSTL1 gene. c, MSE loss between the predicted signal of a certain cell type and the observed signal in that cell type (in yellow), and between the predicted signal of a certain cell type and the observed signal in epithelial cells (in green) for the FSTL1 gene. d, Observed and predicted ribosome profiling tracks in epithelial cells and non-epithelial cells for the ACTB gene. The Pearson correlation coefficient against the observed ribosome profiling in epithelial is labeled at the top right. e, Observed RNA-seq tracks of ACTB in epithelial and non-epithelial cells. The Pearson correlation coefficient is calculated against the RNA-seq signal in epithelial and is labeled at the top right. f, Evaluations of the human-data-trained model on the de novo prediction across 16 mouse datasets, with MSE loss (top), Spearman correlation coefficient (middle), and Pearson correlation coefficient (bottom) shown. The datasets were sorted based on the Pearson correlation coefficient. g, Evaluations of the mouse-data-trained model on the de novo prediction across 37 human datasets.
Extended Data Fig. 3 Validation of Translatomer based on in silico mutagenesis of Kozak sequence.
a, Example track showing the predicted Ribo-seq signal and the sequence contribution score along the RPSA mRNA. The pooled sequence contribution score was calculated by aggregating the scores in bins of 128 bp. The contribution of the 5′ TOP sequence is zoomed in and visualized. b, The predicted effect on translation upon the in silico mutagenesis from G to other nucleotides at position −3. P-value (unadjusted) is calculated using the two-sided Wilcoxon rank-sum test. The box shows the 25th–75th percentile; the line shows the median; the whiskers show 1.5 × IQR. c, The predicted effect on translation upon the in silico mutagenesis from T to other nucleotides at position −3. P-value (unadjusted) is calculated using Wilcoxon rank-sum test. No multi-testing correction applied. The box shows the 25th–75th percentile; the line shows the median; the whiskers show 1.5 × IQR. d, Scatter plot showing the correlation between the in silico mutagenesis effects based on the translation initiation ramp (x-axis) versus the whole coding region (y-axis). The R and p-value (unadjusted) of the correlation analysis were shown. e, The predicted effect on translation upon the in silico mutagenesis from G (left) and T (right) to other nucleotides at position −3. The effect was estimated based on the whole coding region. P-value (unadjusted) is calculated using Wilcoxon rank-sum test. The box shows the 25th–75th percentile; the line shows the median; the whiskers show 1.5 × IQR. f, The predicted effect on translation upon the in silico mutagenesis from G to other nucleotides at position +4. The effect was estimated based on the whole coding region. P-value (unadjusted) is calculated using Wilcoxon rank-sum test. The box shows the 25th–75th percentile; the line shows the median; the whiskers show 1.5 × IQR.
Extended Data Fig. 4 Evolutionary constraints interrogation and disease variants interpretation by Translatomer.
a, Effect size of in silico mutagenesis on translation across different ranges of PhyloP score, which represents evolutionary constraint across species. P-value was calculated using the two-sided Wilcoxon rank-sum test. The box shows the 25th–75th percentile; the line shows the median; the whiskers show 1.5 × IQR. The number of data points of each group was indicated in the figure. b, Effect size of in silico mutagenesis on translation across different ranges of minor allele frequency, which represents evolutionary constraint within human population. P-value was calculated using the two-sided Wilcoxon rank-sum test. The box shows the 25th–75th percentile; the line shows the median; the whiskers show 1.5 × IQR. The number of data points of each group was indicated in the figure. c, Procedure for the identification of translation-dependent ClinVar variants based on in silico mutagenesis. d, Number of translation-dependent ClinVar variants identified by Translatomer in brain-related disorders. e, Number of translation-dependent ClinVar variants identified by Translatomer in heart-related disorders. f, Correlation between the predicted mutagenesis effect and the gene length (top), and between the predicted mutagenesis effect and the translation level of the gene evaluated (bottom). Fitted lines and P-values were calculated based on linear regression, with correlations and P-values (unadjusted) labeled. g, Distribution of the absolute in silico mutagenesis effect on translation across the gnomAD variants. A threshold of 0.24, which corresponds to the effect ranking at the top 5%, is selected to define the candidate variants that influence translation efficiency. h, Number of ClinVar variants that are dependent (red) and independent (blue) of their impacts on translation. The percentage of the translation-dependent variants for each disease is labeled. i, The translation-dependent ClinVar variants showing eQTL significance are not lead eQTL SNPs in the corresponding loci. j, The number of translation-mediated variants identified for each disease curated by the ClinVar database. The sharing of the cell type/tissue contexts is shown at the bottom. k, The example tracks of the chr1:156,134,495:G > T effects on the translation of LMNA gene in the contexts of heart, brain, neuron and macrophage.
Supplementary information
Supplementary Tables 1–6
Supplementary Tables.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
He, J., Xiong, L., Shi, S. et al. Deep learning prediction of ribosome profiling with Translatomer reveals translational regulation and interprets disease variants. Nat Mach Intell 6, 1314–1329 (2024). https://doi.org/10.1038/s42256-024-00915-6
Received:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/s42256-024-00915-6