Abstract
Measuring protein abundance at the single-cell level can facilitate a high-resolution understanding of biological mechanisms in cellular processes and disease progression. However, current single-cell proteomic technologies face challenges such as limited coverage, constrained throughput and sensitivity, batch effects, high costs and stringent experimental operations. Inspired by the translation procedure in both natural language processing and the genetic central dogma, we propose a pre-trained, large generative model named single-cell translator (scTranslator). scTranslator can generate multi-omics data by inferring the missing single-cell proteome based on the transcriptome. Through systematic benchmarking and validation on independent datasets, we have confirmed the accuracy, stability and flexibility of scTranslator across various profiling techniques (for example, CITE-seq, spatial CITE-seq, REAP-seq, NEAT-seq), cell types (for example, monocytes, macrophages, T cells, B cells), tissues (for example, blood, lung, brain) and a wide range of disease contexts, including infectious, metabolic and oncologic conditions. Furthermore, scTranslator shows its superiority in assisting various downstream analyses and applications, including gene/protein interaction inference, perturbation prediction, cell clustering, batch correction and cell origin recognition in pan-cancer data.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to the full article PDF.
USD 39.95
Prices may be subject to local taxes which are calculated during checkout






Data availability
All datasets used in this study are publicly available from established community or institutional repositories. TCGA data were accessed from the National Cancer Institute (https://www.cancer.gov). The CPTAC and MSKCC datasets were accessed through cBioPortal for Cancer Genomics (https://www.cbioportal.org). The Broad Institute Cancer Cell Line Encyclopedia dataset was obtained from the Broad Institute (https://sites.broadinstitute.org/ccle). All single-cell datasets used for second-stage pretraining and the CITE-seq DCs dataset were accessed via the Single-cell Proteomic DataBase portal (https://scproteomicsdb.com). The Perturb-CITE-seq dataset was downloaded from the Broad Institute Single Cell Portal (https://singlecell.broadinstitute.org/single_cell/study/SCP1064). Other datasets were accessed from the NCBI Gene Expression Omnibus: Seurat CITE-seq and ECCITE-seq PBMCs datasets (GSE164378), REAP-seq PBMCs dataset (GSE100501), CITE-seq CBMCs dataset (GSE100866), CITE-seq MCs dataset (GSE163120), CITE-seq mouse dataset (GSE150599), spatial CITE-seq dataset (GSE213264), NEAT-seq dataset (GSE178707) and Single-cell pan-cancer dataset (GSE154763). Source data are provided with this paper.
Code availability
The source code is available under the Apache License 2.0 via GitHub at https://github.com/TencentAILabHealthcare/scTranslator. For reproducibility, the scripts for all experiments and results analysis were also included in the above repository.
References
Tang, F. et al. mRNA-seq whole-transcriptome analysis of a single cell. Nat. Methods 6, 377–382 (2009).
Schoof, E. M. et al. Quantitative single-cell proteomics as a tool to characterize cellular hierarchies. Nat. Commun. 12, 3341 (2021).
Schwanhäusser, Björn et al. Global quantification of mammalian gene expression control. Nature 473, 337–342 (2011).
Liu, Y., Beyer, A. & Aebersold, R. On the dependency of cellular protein levels on mRNA abundance. Cell 165, 535–550 (2016).
Vogel, C. & Marcotte, E. M. Insights into the regulation of protein abundance from proteomic and transcriptomic analyses. Nat. Rev. Genet. 13, 227–232 (2012).
Khan, Z. et al. Primate transcript and protein expression levels evolve under compensatory selection pressures. Science 342, 1100–1104 (2013).
Brunner, Andreas-David et al. Ultra-high sensitivity mass spectrometry quantifies single-cell proteome changes upon perturbation. Mol. Syst. Biol. 18, e10798 (2022).
OpenAI et al. GPT-4 technical report. Preprint at https://arxiv.org/2303.08774 (2023).
Yang, F. et al. scbert as a large-scale pretrained deep language model for cell type annotation of single-cell RNA-seq data. Nat. Mach. Intell. 4, 852–866 (2022).
Singhal, K. et al. Large language models encode clinical knowledge. Preprint at https://doi.org/10.48550/arXiv.2212.13138 (2022).
Theodoris, C. V. et al. Transfer learning enables predictions in network biology. Nature 618, 616–624 (2023).
Hao, Y. et al. Dictionary learning for integrative, multimodal and scalable single-cell analysis. Nat. Biotechnol. 42, 293–304 (2024).
Zhou, Z., Ye, C., Wang, J. & Zhang, N. R. Surface protein imputation from single cell transcriptomes by deep neural networks. Nat. Commun. 11, 651 (2020).
Lakkis, J. et al. A multi-use deep learning method for CITE-seq and single-cell RNA-seq data integration with cell surface protein prediction and imputation. Nat. Mach. Intell 4, 940–952 (2022).
Ashuach, T. et al. Multivi: deep generative model for the integration of multimodal data. Nat. Methods 20, 1222–1231 (2023).
Wu, K. E., Yost, K. E., Chang, H. Y. & Zou, J. Babel enables cross-modality translation between multiomic profiles at single-cell resolution. Proc. Natl Acad. Sci. USA 118, e2023070118 (2021).
Yang, KarrenDai et al. Multi-domain translation between single-cell imaging and sequencing data using autoencoders. Nat. Commun. 12, 31 (2021).
Minoura, K., Abe, K., Nam, H., Nishikawa, H. & Shimamura, T. A mixture-of-experts deep generative model for integrated analysis of single-cell multiomics data. Cell Rep. Methods 1, 100071 (2021).
Wen, H. et al. Graph neural networks for multimodal single-cell data integration. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (eds Zhang, A. & Rangwala, H.) 4153–4163 (Association for Computing Machinery, 2022).
Choromanski, K. et al. Rethinking attention with performers. In 9th International Conference on Learning Representations (eds Oh, A. et al.)(ICLR, 2021).
Frangieh, C. J. et al. Multimodal pooled Perturb-CITE-seq screens in patient models define mechanisms of cancer immune evasion. Nat. Genet. 53, 332–341 (2021).
Vaswani, A. et al. Attention is all you need. Adv. Neural Inf. Process. Syst. 30, 5998–6008 (2017).
Chen, M. X. et al. The best of both worlds: combining recent advances in neural machine translation. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (eds. Gurevych, I. & Miyao, Y.) 76–86 (Association for Computational Linguistics, 2018).
Dosovitskiy, A. et al. An image is worth 16 × 16 words: Transformers for image recognition at scale. In 9th International Conference on Learning Representations (eds Oh, A. et al.) (ICLR, 2021)
Radford, A. et al. Language models are unsupervised multitask learners. OpenAI Blog 1, 9 (2019).
Liu, Z. et al. Swin transformer: hierarchical vision transformer using shifted windows. In 2021 IEEE/CVF International Conference on Computer Vision (eds Hassner, T. et al.) 9992–10002 (ICCV, 2021).
Kim, W., Son, B., & Kim, I. Vilt: vision-and-language transformer without convolution or region supervision. In Proceedings of the 38th International Conference on Machine Learning (eds Meila, M. & Zhang, T.) 5583–5594 (PMLR, 2021).
Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (eds Burtsien, J. et al.) 1, 4171–4186 (Association for Computational Linguistics, 2019).
He, K. et al. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (eds Dana, K. et al.) 15979–15988 (IEEE, 2022).
Radford, A. et al. Learning transferable visual models from natural language supervision. In Proceedings of the 38th International Conference on Machine Learning (eds Meila, M. & Zhang, T.) 8748–8763 (PMLR, 2021).
Radford, A. et al. Improving language understanding by generative pre-training. OpenAI Blog https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf.(2018).
Ouyang, L. et al. Training language models to follow instructions with human feedback. Adv. Neural Inf. Process. Syst. 35, 27730–27744 (2022).
Brown, T. et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 33, 1877–1901 (2020).
He, K., Fan, H., Wu, Y., Xie, S. & Girshick, R. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (eds Liu, C. et al.) 9726–9735 (IEEE, 2020).
Stegle, O., Teichmann, S. A. & Marioni, J. C. Computational and analytical challenges in single-cell transcriptomics. Nat. Rev. Genet. 16, 133–145 (2015).
Shapiro, E., Biezuner, T. & Linnarsson, S. Single-cell sequencing-based technologies will revolutionize whole-organism science. Nat. Rev. Genet. 14, 618–630 (2013).
Zhou, H. et al. Informer: beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence (eds Fourcade, M. et al.) 35, 11106–11115 (AAAI Press, 2021).
Edfors, F. et al. Gene-specific correlation of RNA and protein levels in human cells and tissues. Mol. Syst. Biol. 12, 883 (2016).
Payne, S. H. The utility of protein and mRNA correlation. Trends Biochem. Sci. 40, 1–3 (2015).
Wegler, C. et al. Global variability analysis of mRNA and protein concentrations across and within human tissues. NAR Genom. Bioinform. 2, lqz010 (2020).
Asensio, J. O., Verheijen, M. & Caiment, F. Predicting missing proteomics values using machine learning: filling the gap using transcriptomics and other biological features. Comput. Struct. Biotechnol. J. 20, 2057–2069 (2022).
Greenbaum, D., Colangelo, C., Williams, K. & Gerstein, M. Comparing protein abundance and mRNA expression levels on a genomic scale. Genome Biol. 4, 1–8 (2003).
Desai, D. M., Sap, J., Silvennoinen, O., Schlessinger, J. & Weiss, A. The catalytic activity of the CD45 membrane-proximal phosphatase domain is required for TCR signaling and regulation. EMBO J. 13, 4002–4010 (1994).
Irles, C. et al. CD45 ectodomain controls interaction with gems and lck activity for optimal TCR signaling. Nat. Immunol. 4, 189–197 (2003).
Liu, Y. et al. Integrated pan-cancer genomic analysis reveals the role of slc30a5 in the proliferation, metastasis, and prognosis of hepatocellular carcinoma. J. Cancer 15, 4686 (2024).
Barresi, V. et al. Transcriptome analysis reveals an altered expression profile of zinc transporters in colorectal cancer. J. Cell. Biochem. 119, 9707–9719 (2018).
Riffault, B. et al. Pro-brain-derived neurotrophic factor inhibits GABAergic neurotransmission by activating endocytosis and repression of GABAA receptors. J. Neurosci. 34, 13516–13534 (2014).
Hilton, B. J. et al. An active vesicle priming machinery suppresses axon regeneration upon adult CNS injury. Neuron 110, 51–69 (2022).
Vallania, F. et al. Genome-wide discovery of functional transcription factor binding sites by comparative genomics: the case of Stat3. Proc. Natl Acad. Sci. USA 106, 5117–5122 (2009).
Utley, A. T., Carlson, L. & Lee, K. P. CD28 induces mitochondrial respiration through irf4 for long lived plasma cells survival. Blood 128, 128 (2016).
Li, P. et al. BATF-JUN is critical for IRF4-mediated transcription in T cells. Nature 490, 543–546 (2012).
Mandal, M. et al. BRWD1 orchestrates epigenetic landscape of late B lymphopoiesis. Nat. Commun. 9, 3888 (2018).
Adriani, M. et al. Impaired in vitro regulatory T cell function associated with Wiskott-Aldrich syndrome. Clin. Immunol. 124, 41–48 (2007).
Humblet-Baron, S. et al. Wiskott-Aldrich syndrome protein is required for regulatory T cell homeostasis. J. Clin. Invest. 117, 407–418 (2007).
Chen, W., Liu, S. & Wang, F. Potential impact and mechanism of long non-coding RNAs on cancer and associated T cells. J. Cancer 12, 4873–4882 (2021).
Zeni, P. F. & Mraz, M. LncRNAs in adaptive immunity: role in physiological and pathological conditions. RNA Biol. 18, 619–632 (2020).
Szklarczyk, D. et al. The STRING database in 2023: protein–protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic Acids Res. 51, D638–D646 (2023).
Stoeckius, M. et al. Simultaneous epitope and transcriptome measurement in single cells. Nat. Methods 14, 865–868 (2017).
Tran, HoaThiNhu et al. A benchmark of batch-effect correction methods for single-cell RNA sequencing data. Genome Biol. 21, 1–32 (2020).
Li, H., Brouwer, C. R. & Luo, W. A universal deep neural network for in-depth cleaning of single-cell RNA-seq data. Nat. Commun. 13, 1901 (2022).
Fishbein, L. et al. Comprehensive molecular characterization of pheochromocytoma and paraganglioma. Cancer Cell 31, 181–193 (2017).
Cancer Genome Atlas Research Network Analysis Working Group Comprehensive molecular characterization of clear cell renal cell carcinoma. Nature 499, 43–49 (2013).
Ciriello, G. et al. Comprehensive molecular portraits of invasive lobular breast cancer. Cell 163, 506–519 (2015).
Kahles, André et al. Comprehensive analysis of alternative splicing across tumors from 8,705 patients. Cancer Cell 34, 211–224 (2018).
Dou, Y. et al. Proteogenomic characterization of endometrial carcinoma. Cell 180, 729–748 (2020).
Cao, L. et al. Proteogenomic characterization of pancreatic ductal adenocarcinoma. Cell 184, 5031–5052 (2021).
Satpathy, S. et al. A proteogenomic portrait of lung squamous cell carcinoma. Cell 184, 4348–4371 (2021).
Gillette, M. A. et al. Proteogenomic characterization reveals therapeutic vulnerabilities in lung adenocarcinoma. Cell 182, 200–225 (2020).
Wang, Liang-Bo et al. Proteogenomic and metabolomic characterization of human glioblastoma. Cancer Cell 39, 509–528 (2021).
Petralia, F. et al. Integrated proteogenomic characterization across major histological types of pediatric brain cancer. Cell 183, 1962–1985 (2020).
Krug, K. et al. Proteogenomic landscape of breast cancer tumorigenesis and targeted therapy. Cell 183, 1436–1456 (2020).
Nusinow, D. P. et al. Quantitative proteomics of the cancer cell line encyclopedia. Cell 180, 387–402 (2020).
The Cancer Cell Line Encyclopedia. Consistency of drug profiles and predictors in large-scale cancer cell line data. Nature 528, 84 (2015).
Pietzak, E. J. et al. Genomic differences between “primary” and “secondary” muscle-invasive bladder cancer as a basis for disparate outcomes to cisplatin-based neoadjuvant chemotherapy. Eur. Urol. 75, 231–239 (2019).
Wang, F. et al. SPDB: a comprehensive resource and knowledgebase for proteomic data at the single-cell resolution. Nucleic Acids Res. 52, D562–D571 (2024).
Sparks, R. et al. Influenza vaccination reveals sex dimorphic imprints of prior mild COVID-19. Nature 614, 752–761 (2023).
Nathan, A. et al. Multimodally profiling memory T cells from a tuberculosis cohort identifies cell state associations with demographics, environment and disease. Nat. Immunol. 22, 781–793 (2021).
Kaufmann, M. et al. Identifying CNS-colonizing T cells as potential therapeutic targets to prevent progression of multiple sclerosis. Med 2, 296–312 (2021).
Liu, C. et al. Time-resolved systems immunology reveals a late juncture linked to fatal COVID-19. Cell 184, 1836–1857 (2021).
Sacco, K. et al. Immunopathological signatures in multisystem inflammatory syndrome in children and pediatric COVID-19. Nat. Med. 28, 1050–1062 (2022).
Beneyto-Calabuig, S. et al. Clonally resolved single-cell multi-omics identifies routes of cellular differentiation in acute myeloid leukemia. Cell Stem Cell 30, 706–721 (2023).
Kumar, P. et al. Single-cell transcriptomics and surface epitope detection in human brain epileptic lesions identifies pro-inflammatory signaling. Nat. Neurosci. 25, 956–966 (2022).
Guilliams, M. et al. Spatial proteogenomics reveals distinct and evolutionarily conserved hepatic macrophage niches. Cell 185, 379–396 (2022).
Hao, Y. et al. Integrated analysis of multimodal single-cell data. Cell 184, 3573–3587 (2021).
Peterson, V. M. et al. Multiplexed quantification of proteins and transcripts in single cells. Nat. Biotechnol. 35, 936–939 (2017).
Maier, B. et al. A conserved dendritic-cell regulatory program limits antitumour immunity. Nature 580, 257–262 (2020).
Pombo Antunes, AnaRita et al. Single-cell profiling of myeloid cells in glioblastoma across species and disease stage reveals macrophage competition and specialization. Nat. Neurosci. 24, 595–610 (2021).
Gayoso, A. et al. Joint probabilistic modeling of single-cell multi-omic data with totalvi. Nat. Methods 18, 272–282 (2021).
Liu, Y. et al. High-plex protein and whole transcriptome co-mapping at cellular resolution with spatial CITE-seq. Nat. Biotechnol. 41, 1405–1409 (2023).
Chen, A. F. et al. NEAT-seq: simultaneous profiling of intra-nuclear proteins, chromatin accessibility and gene expression in single cells. Nat. Methods 19, 547–553 (2022).
Cheng, S. et al. A pan-cancer single-cell transcriptional atlas of tumor infiltrating myeloid cells. Cell 184, 792–809 (2021).
Liu, L. et al. Machine learning protocols in early cancer detection based on liquid biopsy: a survey. Life 11, 638 (2021).
Minoura, K. & Shimamura, T. scMM source code. Zenodo. https://doi.org/10.5281/zenodo.5149733 (2021).
Wen, H. & Tang, J. scMoGNN source code. pypi. https://pydance.readthedocs.io/en/latest/ (2022).
Yang, K. D. & Uhler, C. CMAE source code. Zenodo. https://doi.org/10.5281/zenodo.4266733 (2021).
Hao, Y. & Rahul, S. Seurat source code. Github. https://github.com/satijalab/seurat (2023).
Ashuach, T. & Yosef, N. MultiVI source code. Zenodo. https://doi.org/10.5281/zenodo.5762077 (2021).
Wu, K. E. & Zou, J. BABEL source code. Github. https://github.com/wukevin/babel (2021).
Lakkis, J. & Li, M. sciPENN source code. Zenodo. https://doi.org/10.5281/zenodo.6944521 (2022).
Zhou, Z. & Zhang, N. R. cTP-net source code. Github. https://github.com/zhouzilu/ctpnetpy (2020).
Raudvere, U. et al. g: Profiler: a web server for functional enrichment analysis and conversions of gene lists (2019 update). Nucleic Acids Res. 47, W191–W198 (2019).
Acknowledgements
We thank Z. Zheng and D. Tang for their valuable suggestions and discussion during the preparation of this manuscript. This research was substantially sponsored by the research projects (grant numbers 32170654 and 32000464 (K.-C.W.)) supported by the National Natural Science Foundation of China and was substantially supported by the Shenzhen Research Institute, City University of Hong Kong. The work described in this paper was substantially supported by the grant from the Research Grants Council of the Hong Kong Special Administrative Region (CityU 11203723 (K.-C.W.)). This project was substantially funded by the Strategic Interdisciplinary Research Grant of City University of Hong Kong (project number 2021SIRG036 (K.-C.W.)). The work described in this paper was partially supported by the grant from City University of Hong Kong (CityU 9667265 (K.-C.W.)) and Key-Area Research and Development Program of Guangdong Province (2021B0101420005 (K.-C.W.)). F.Y. was supported by the Young Elite Scientists Sponsorship Program by CAST (2023QNRC001).
Author information
Authors and Affiliations
Contributions
L.L., F.Y. and J.Y. conceived the project. L.L. designed the algorithm framework and developed the method. L.L. performed research and conducted experiments under the supervision of K.-C.W., F.Y. and J.Y. L.L. and F.Y. analysed the results and wrote the manuscript. K.-C.W. and J.Y. revised the manuscript. W.L. provided suggestions for downstream analysis, assisted with figure polishing and contributed to manuscript improvement. F.W. and Y.L. contributed to data collection and manuscript improvement. L.-K.H. provided suggestions and plan for incremental learning. All authors reviewed and approved the manuscript.
Corresponding authors
Ethics declarations
Competing interests
Authors L.L., W.L., F.W., F.Y. and J.Y. are inventors on patent applications related to this work filed by Tencent Technology (Shenzhen) Company Ltd. F.W., L.-K.H., F.Y. and J.Y. are employees of Tencent AI for Life Science Lab. The other authors declare no competing interests.
Peer review
Peer review information
Nature Biomedical Engineering thanks Xiaohui Fan, Nguyen Quoc Khanh Le and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Joint plots for evaluating prediction performance at individual level, related to Fig. 2b.
Joint plots of distribution histogram and scatter plot for CD1 (a), CD49 (b), CD158 (c), and CD307 (d) protein family members.
Extended Data Fig. 2 Pre-training data size observation and ablation study results, related to Fig. 2.
a, Influence of pre-training dataset sizes. All plots are reported based on 10 independent runs with different random seeds. Lines show the mean performance, and shaded error bands represent the 95% confidence interval (CI) across replicates. b, Ablation study of re-indexed GPE module. Each box plot represents results from 10 technical replicates with varying random seeds. Box plots show medians (center lines), interquartile range (boxes), whiskers extending to 1.5 × IQR, and outliers as individual points.
Extended Data Fig. 3 Influence of sample amount on scTranslator in fine-tuning stage.
Each data point represents the mean performance computed across 10 independent runs using different random seeds. Shaded error bands denote the 95% confidence interval (CI) around the mean. a, Experiment results of sample amount effect in aligned mode. b, Experiment results of sample amount effect in un-aligned mode.
Extended Data Fig. 4 Benchmarking results of aligned mode evaluated by cosine similarity, PCC, MSE, and MAE of SOTA methods on the test sets.
All metrics are computed over 10 independent runs using different random seeds. Box plots show medians, interquartile ranges, and whiskers extending to 1.5 × IQR. All replicate values are visualized as jittered dots to reflect the distribution across runs.
Extended Data Fig. 5 Bechmarking results of un-aligned mode evaluated by cosine similarity, PCC, MSE, and MAE of SOTA methods on the test sets.
All metrics are computed over 10 independent runs using different random seeds. Box plots show medians, interquartile ranges, and whiskers extending to 1.5 × IQR. All replicate values are visualized as jittered dots to reflect the distribution across runs.
Extended Data Fig. 6 Gene regulatory and protein interaction investigation.
a, Heatmaps of attention score matrices, where the left and right panels correspond to the encoder’s (gene regulatory) and decoder’s (protein interaction) attention scores, respectively. b, Zoomed in attention matrix, related to Fig. 5a. c, Bar plots show the most highly focused gene for proteins of TCR α/β, TCR γ/δ, CD45, and CLEC2. d, The overlap among TCR α/β, TCR γ/δ, and CLEC2. e, The overlap among TCR α/β, TCR γ/δ, and CD45.
Extended Data Fig. 7 Gene knockout results of IFN- γ response pathway.
Violin plots show the distribution of predicted protein levels for downregulated (left panels) and upregulated (right panels) proteins following gene knockout of (a) JAK1 (n = 360 cells), (b) JAK2 (n= 401 cells), (c) IFNGR1 (n = 287 cells), and (d) IFNGR2 (n = 435 cells), compared to the control group (n = 12,627 cells). Statistical significance was assessed using a two-sided Mann–Whitney U test (no correction for multiple comparisons). Significance levels are denoted as: P < 0.05 (*), P < 0.01(**), P < 0.001 (***). Outliers were excluded using z-score filtering (∣z∣ < 3). Each subplot displays the density distribution of protein levels for both perturbed and control cells, with embedded box plots showing the median, interquartile range, whiskers extending to 1.5 × IQR.
Extended Data Fig. 8 Visualization of markers.
UMAP plots of predicted protein colored by expression level of potential markers for monocyte (a), NK cells (b), DCs (c), and B cells (d).
Extended Data Fig. 9 Visualization to observe batch effect.
UMAP plots of protein abundance on Dataset 1, colored by cell type in top panel and by batch information in bottom panel. The left plots are visualization for predicted protein by scTranslator and the right plots are visualization for observed protein.
Extended Data Fig. 10
Illustration of gene position embedding and expression value embedding.
Supplementary information
Supplementary Tables (download XLSX )
Supplementary Tables 1–7.
Source data
Source Data Figs. 2, 4, 5 and 6 and Extended Data Figs. 2 and 3 (download XLSX )
Statistical source data.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liu, L., Li, W., Wang, F. et al. A pre-trained large generative model for translating single-cell transcriptomes to proteomes. Nat. Biomed. Eng (2025). https://doi.org/10.1038/s41551-025-01528-z
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41551-025-01528-z