Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Nature Communications
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. nature communications
  3. articles
  4. article
scLong: a billion-parameter foundation model for capturing long-range gene context in single-cell transcriptomics
Download PDF
Download PDF
  • Article
  • Open access
  • Published: 05 February 2026

scLong: a billion-parameter foundation model for capturing long-range gene context in single-cell transcriptomics

  • Ding Bai  ORCID: orcid.org/0009-0004-4624-41551 na1,
  • Shentong Mo1 na1,
  • Ruiyi Zhang2 na1,
  • Yingtao Luo  ORCID: orcid.org/0000-0003-1794-36573,
  • Jiahao Gao4,
  • Jeremy Parker Yang5,
  • Qiuyang Wu6,
  • Hamidreza Rahmani  ORCID: orcid.org/0000-0003-0222-69897,
  • Tiffany Amariuta4,8,
  • Danielle Grotjahn  ORCID: orcid.org/0000-0001-5908-78827,
  • Sheng Zhong6,
  • Nathan Lewis  ORCID: orcid.org/0000-0001-7700-36546,9,
  • Wei Wang  ORCID: orcid.org/0000-0003-4377-50605,10,
  • Trey Ideker  ORCID: orcid.org/0000-0002-1708-84544,6,
  • Pengtao Xie  ORCID: orcid.org/0000-0003-0521-174X1,2,4,6,8,11 &
  • …
  • Eric Xing  ORCID: orcid.org/0000-0002-3683-42801,3 

Nature Communications , Article number:  (2026) Cite this article

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Gene expression
  • Gene ontology
  • Gene regulatory networks
  • Machine learning
  • RNA sequencing

Abstract

Single-cell RNA sequencing (scRNA-seq) has revolutionized the study of cellular heterogeneity by providing gene expression data at single-cell resolution, uncovering insights into rare cell populations, cell-cell interactions, and gene regulation. Foundation models pretrained on large-scale scRNA-seq datasets have shown great promise in analyzing such data, but existing approaches are often limited to modeling a small subset of highly expressed genes and lack the integration of external gene-specific knowledge. To address these limitations, we present scLong, a billion-parameter foundation model pretrained on 48 million cells. scLong performs self-attention across the entire set of 28,000 genes in the human genome. This enables the model to capture long-range dependencies between all genes, including lowly expressed ones (containing unexpressed genes with zero expressions), which often play critical roles in cellular processes but are typically excluded by existing foundation models. Additionally, scLong integrates gene knowledge from the Gene Ontology using a graph convolutional network, enriching its contextual understanding of gene functions and relationships. In extensive evaluations, scLong surpasses both state-of-the-art scRNA-seq foundation models and task-specific models across diverse tasks, including predicting transcriptional responses to genetic and chemical perturbations, forecasting cancer drug responses, and inferring gene regulatory networks.

Data availability

The pretraining datasets were collected from public datasets hosted on CELLxGENE (https://cellxgene.cziscience.com/datasets), Cell Blast (https://cblast.gao-lab.org/), and the Human Cell Atlas (https://www.humancellatlas.org/). The datasets used for downstream tasks are accessible from the following links: genetic perturbation dataset (https://github.com/snap-stanford/GEARS); chemical perturbation dataset (https://github.com/njpipeorgan/L1000-bayesian); single drug and drug combination response datasets (https://github.com/kimmo1019/DeepCDR and https://github.com/Sinwang404/DeepDDS); GRN inference datasets (https://github.com/HantaoShu/DeepSEM); zero-shot batch integration dataset (https://figshare.com/articles/dataset/Benchmarking_atlas-level_data_integration_in_single-cell_genomics_-_integration_task_datasets_Immune_and_pancreas_/12420968); and marker gene clustering dataset (https://zenodo.org/records/3357167). The datasets curated and utilized in this study, trained model parameters, and other files necessary to reproduce the experimental results, figures, and tables can be accessed at https://mbzuaiac-my.sharepoint.com/:f:/g/personal/ding_bai_mbzuai_ac_ae/EpvKzQW4hI5Bnb88-iM7vE0B_e2_U5r_ZGXb_FILCLTw3Qand https://figshare.com/account/articles/30105148. Source data are provided with this paper.

Code availability

The source code for this work is available at https://github.com/BaiDing1234/scLongand is archived at https://zenodo.org/records/1751056789.

References

  1. Consortium, T. T. M. A single-cell transcriptomic atlas characterizes ageing tissues in the mouse. Nature 583, 590–595 (2020).

    Google Scholar 

  2. Domcke, S. et al. A human cell atlas of fetal chromatin accessibility. Science 370, eaba7612 (2020).

    Google Scholar 

  3. Han, X. et al. Construction of a human cell landscape at single-cell level. Nature 581, 303–309 (2020).

    Google Scholar 

  4. Grün, D. et al. Single-cell messenger RNA sequencing reveals rare intestinal cell types. Nature 525, 251–255 (2015).

    Google Scholar 

  5. Jin, S. et al. Inference and analysis of cell-cell communication using CellChat. Nat. Commun. 12, 1088 (2021).

    Google Scholar 

  6. Gasperini, M. et al. A genome-wide framework for mapping gene regulation via cellular genetic screens. Cell 176, 377–390.e19 (2019).

    Google Scholar 

  7. Yang, F. et al. scBERT as a large-scale pretrained deep language model for cell type annotation of single-cell RNA-seq data. Nat. Mach. Intell. 4, 852–866 (2022).

    Google Scholar 

  8. Theodoris, C. V. et al. Transfer learning enables predictions in network biology. Nature 618, 616–624 (2023).

    Google Scholar 

  9. Cui, H. et al. scgpt: towards building a foundation model for single-cell multi-omics using generative ai. Nat. Methods 21, 1470–1480 (2024).

    Google Scholar 

  10. Hao, M. et al. Large scale foundation model on single-cell transcriptomics. Nat. Methods 21, 1481–1491 (2024).

    Google Scholar 

  11. Yang, X. et al. Genecompass: deciphering universal gene regulatory mechanisms with knowledge-informed cross-species foundation model. Cell Res. https://doi.org/10.1038/s41422-024-01034-y (2024).

  12. Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: pre-training of deep bidirectional transformers for language understanding. In Proc. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (eds Burstein, J., Doran, C. & Solorio, T.) 4171–4186. https://aclanthology.org/N19-1423 (Association for Computational Linguistics, Minneapolis, Minnesota, 2019)

  13. Vaswani, A. et al. Attention is all you need. In Advances in Neural Information Processing Systems Vol. 30 (eds Guyon, I. et al.). https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf (Curran Associates, Inc., 2017).

  14. Roohani, Y., Huang, K. & Leskovec, J. Predicting transcriptional outcomes of novel multigene perturbations with GEARS. Nat. Biotechnol. 42, 927–935 (2024).

  15. Shu, H. et al. Modeling gene regulatory networks using neural network architectures. Nat. Comput. Sci. 1, 491–501 (2021).

    Google Scholar 

  16. Rosen, Y. et al. Universal cell embeddings: a foundation model for cell biology. bioRxiv https://doi.org/10.1101/2023.11.28.568918 (2023).

  17. Sha, Y., Phan, J. H. & Wang, M. D. Effect of low-expression gene filtering on detection of differentially expressed genes in RNA-seq data. In Proc. 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) 6461–6464 https://api.semanticscholar.org/CorpusID:11532112 (2015).

  18. Zhao, H. et al. Lowly-expressed lncRNA GAS5 facilitates progression of ovarian cancer through targeting miR-196-5p and thereby regulating HOXA5. Gynecol. Oncol. 151, 345–355 (2018).

    Google Scholar 

  19. Yang, L., Takuno, S., Waters, E. R. & Gaut, B. S. Lowly expressed genes in Arabidopsis thaliana bear the signature of possible pseudogenization by promoter degradation. Mol. Biol. Evol. 28, 1193–1203 (2011).

    Google Scholar 

  20. Zhou, Z. et al. Codon usage is an important determinant of gene expression levels largely through its effects on transcription. Proc. Natl. Acad. Sci. 113, E6117–E6125 (2016).

    Google Scholar 

  21. Huang, M. et al. Saver: gene expression recovery for single-cell RNA sequencing. Nat. methods 15, 539–542 (2018).

    Google Scholar 

  22. Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25–29 (2000).

    Google Scholar 

  23. Hamilton, W., Ying, Z. & Leskovec, J. Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems Vol. 30 (eds Guyon, I. et al.). https://proceedings.neurips.cc/paper_files/paper/2017/file/5dd9db5e033da9c6fb5ba83c7a7ebea9-Paper.pdf (Curran Associates, Inc., 2017).

  24. Wu, F. et al. Simplifying graph convolutional networks. In Proc. 36th International Conference on Machine Learning 6861–6871 (PMLR, 2019).

  25. Choromanski, K. M. et al. Rethinking attention with performers. In International Conference on Learning Representations https://openreview.net/forum?id=Ua6zuk0WRH (2021).

  26. Dixit, A. et al. Perturb-seq: dissecting molecular circuits with scalable single-cell RNA profiling of pooled genetic screens. Cell 167, 1853–1866 (2016).

    Google Scholar 

  27. Norman, T. M. et al. Exploring genetic interaction manifolds constructed from rich single-cell phenotypes. Science 365, 786–793 (2019).

    Google Scholar 

  28. Ahlmann-Eltze, C., Huber, W. & Anders, S. Deep learning-based predictions of gene perturbation effects do not yet outperform simple linear methods. Nat. Methods 22, 1657–1661 (2025).

  29. Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B (Methodol.) 57, 289–300 (1995).

    Google Scholar 

  30. Al-Lazikani, B., Banerji, U. & Workman, P. Combinatorial drug therapy for cancer in the post-genomic era. Nat. Biotechnol. 30, 679–692 (2012).

    Google Scholar 

  31. Subramanian, A. et al. A next generation connectivity map: L1000 platform and the first 1,000,000 profiles. Cell 171, 1437–1452.e17 (2017).

    Google Scholar 

  32. Pham, T.-H., Wang, Y., Xu, J. & Zhang, P. A deep learning framework for high-throughput mechanism-driven phenotype compound screening and its application to COVID-19 drug repurposing. Nat. Mach. Intell. 3, 247–257 (2021).

    Google Scholar 

  33. Kuenzi, B. M. et al. Predicting drug response and synergy using a deep learning model of human cancer cells. Cancer Cell 38, 672–684.e6 (2020).

    Google Scholar 

  34. Unger, F. T., Witte, I. & David, K. A. Prediction of individual response to anticancer therapy: historical and future perspectives. Cell. Mol. Life Sci. 72, 729–757 (2015).

    Google Scholar 

  35. Kipf, T. N. & Welling, M. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations (ICLR). https://openreview.net/forum?id=SJU4ayYgl (2017).

  36. Liu, Q., Hu, Z., Jiang, R. & Zhou, M. DeepCDR: a hybrid graph convolutional network for predicting cancer drug response. Bioinformatics 36, i911–i918 (2020).

    Google Scholar 

  37. Pedregosa, F. et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).

    Google Scholar 

  38. Zhao, S. et al. Systems pharmacology of adverse event mitigation by drug combinations. Sci. Transl. Med. 5, 206ra140 (2013).

    Google Scholar 

  39. O’Neil, J. et al. An unbiased oncology compound screen to identify novel combination strategies. Mol. Cancer Ther. 15, 1155–1162 (2016).

    Google Scholar 

  40. Menden, M. P. et al. Community assessment to advance computational prediction of cancer drug combinations in a pharmacogenomic screen. Nat. Commun. 10, 1–17 (2019).

    Google Scholar 

  41. Wang, J., Liu, X., Shen, S., Deng, L. & Liu, H. DeepDDS: deep graph neural network with attention mechanism to predict synergistic drug combinations. Brief. Bioinform. 23, bbab390 (2021).

    Google Scholar 

  42. Berthelot, C., Villar, D., Horvath, J. E., Odom, D. T. & Flicek, P. Complexity and conservation of regulatory landscapes underlie evolutionary resilience of mammalian gene expression. Nat. Ecol. Evol. 2, 152–163 (2018).

    Google Scholar 

  43. Thompson, D., Regev, A. & Roy, S. Comparative analysis of gene regulatory networks: from network reconstruction to evolution. Annu. Rev. Cell Dev. Biol. 31, 399–428 (2015).

    Google Scholar 

  44. Pratapa, A., Jalihal, A. P., Law, J. N., Bharadwaj, A. & Murali, T. M. Benchmarking algorithms for gene regulatory network inference from single-cell transcriptomic data. Nat. Methods 17, 147–154 (2020).

    Google Scholar 

  45. Chu, L.-F. et al. Single-cell rna-seq reveals novel regulators of human embryonic stem cell differentiation to definitive endoderm. Genome Biol. 17, 1–20 (2016).

    Google Scholar 

  46. Higgins, I. et al. beta-VAE: learning basic visual concepts with a constrained variational framework. In International Conference on Learning Representations. https://openreview.net/forum?id=Sy2fzU9gl (2017).

  47. Oki, S. et al. Chip-atlas: a data-mining suite powered by full integration of public chip-seq data. EMBO Rep. 19, e46255 (2018).

    Google Scholar 

  48. Huynh-Thu, V. A., Irrthum, A., Wehenkel, L. & Geurts, P. Inferring regulatory networks from expression data using tree-based methods. PLoS ONE 5, e12776 (2010).

    Google Scholar 

  49. Luecken, M. D. et al. Benchmarking atlas-level data integration in single-cell genomics. Nat. methods 19, 41–50 (2022).

    Google Scholar 

  50. Hie, B. et al. Computational methods for single-cell RNA sequencing. Annu. Rev. Biomed. Data Sci. 3, 339–364 (2020).

    Google Scholar 

  51. Argelaguet, R., Cuomo, A. S., Stegle, O. & Marioni, J. C. Computational principles and challenges in single-cell data integration. Nat. Biotechnol. 39, 1202–1215 (2021).

    Google Scholar 

  52. Heumos, L. et al. Best practices for single-cell analysis across modalities. Nat. Rev. Genet. 24, 550–572 (2023).

    Google Scholar 

  53. Kedzierska, K. Z., Crawford, L., Amini, A. P. & Lu, A. X. Zero-shot evaluation reveals limitations of single-cell foundation models. Genome Biol.26, 101 (2025).

  54. Lopez, R., Regier, J., Cole, M. B., Jordan, M. I. & Yosef, N. Deep generative modeling for single-cell transcriptomics. Nat. methods 15, 1053–1058 (2018).

    Google Scholar 

  55. Zheng, G. X. et al. Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 8, 14049 (2017).

    Google Scholar 

  56. Bai, D., Ellington, C. N., Mo, S., Song, L. & Xing, E. P. Attentionpert: accurately modeling multiplexed genetic perturbations with multi-scale effects. Bioinformatics 40, i453–i461 (2024).

    Google Scholar 

  57. Geeleher, P., Cox, N. J. & Huang, R. S. Clinical drug response can be predicted using baseline gene expression levels and in vitro drug sensitivity in cell lines. Genome Biol. 15, 1–12 (2014).

    Google Scholar 

  58. Fabregat, A. et al. The reactome pathway knowledgebase. Nucleic Acids Res. 46, D649–D655 (2018).

    Google Scholar 

  59. Szklarczyk, D. et al. The STRING database in 2021: customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets. Nucleic Acids Res. 49, D605–D612 (2020).

    Google Scholar 

  60. Buenrostro, J. D., Wu, B., Litzenburger, U., Greenleaf, W. J. & Chang, H. Y. Single-cell chromatin accessibility reveals principles of regulatory variation. Nature 523, 486–490 (2015).

    Google Scholar 

  61. Selvaraju, R. R. et al. Grad-cam: Visual explanations from deep networks via gradient-based localization. Int. J. Comput. Vis. 128, 336–359 (2016).

    Google Scholar 

  62. Han, S., Pool, J., Tran, J. & Dally, W. J. Learning both weights and connections for efficient neural network. In Neural Information Processing Systems. https://api.semanticscholar.org/CorpusID:2238772 (2015).

  63. Hinton, G., Vinyals, O. & Dean, J. Distilling the knowledge in a neural network. In NIPS Deep Learning and Representation Learning Workshop. http://arxiv.org/abs/1503.02531 (2015).

  64. Domínguez Conde, C. et al. Cross-tissue immune cell analysis reveals tissue-specific features in humans. Science 376, eabl5197 (2022).

    Google Scholar 

  65. Yazar, S. et al. Single-cell eqtl mapping identifies cell type–specific genetic control of autoimmune disease. Science 376, eabf3041 (2022).

    Google Scholar 

  66. Consortium*, T. T. S. et al. The tabula sapiens: a multiple-organ, single-cell transcriptomic atlas of humans. Science 376, eabl4896 (2022).

  67. Sikkema, L. et al. An integrated cell atlas of the lung in health and disease. Nat. Med. 29, 1563–1577 (2023).

    Google Scholar 

  68. Perez, R. K. et al. Single-cell RNA-seq reveals cell type–specific molecular and genetic associations to lupus. Science 376, eabf1970 (2022).

    Google Scholar 

  69. Cao, Z.-J., Wei, L., Lu, S., Yang, D.-C. & Gao, G. Searching large-scale scRNA-seq databases via unbiased cell embedding with cell blast. Nat. Commun. 11, 3458 (2020).

    Google Scholar 

  70. Lindeboom, R. G., Regev, A. & Teichmann, S. A. Towards a human cell atlas: taking notes from the past. Trends Genet. 37, 625–630 (2021).

    Google Scholar 

  71. Harrison, P. W. et al. Ensembl 2024. Nucleic Acids Res. 52, D891–D899 (2023).

    Google Scholar 

  72. Du, J. et al. Gene2vec: distributed representation of genes based on co-expression. BMC Genom. 20, 82 (2019).

    Google Scholar 

  73. Booeshaghi, A. S. & Pachter, L. Normalization of single-cell RNA-seq counts by log (x+ 1) or log (1+ x). Bioinformatics 37, 2223–2224 (2021).

    Google Scholar 

  74. Maas, A. L. et al. Rectifier nonlinearities improve neural network acoustic models. In Proc. ICML Vol. 30, 3 (PMLR, Atlanta, GA, 2013).

  75. Li, S. et al. Pytorch distributed: experiences on accelerating data parallel training. Proc. VLDB Endow. 13, 3005–3018 (2020).

    Google Scholar 

  76. Kingma, D. P. & Ba, J. Adam: A method for stochastic optimization. In Proc. International Conference on Learning Representations (ICLR). https://arxiv.org/abs/1412.6980 (2015).

  77. Ioffe, S. & Szegedy, C. Batch normalization: accelerating deep network training by reducing internal covariate shift. In Proc. 32nd International Conference on Machine Learning (ICML-15), 448–456 (PMLR, 2015).

  78. Waskom, M. L. Seaborn: statistical data visualization. J. Open Source Softw. 6, 3021 (2021).

    Google Scholar 

  79. Grover, A. & Leskovec, J. node2vec: Scalable feature learning for networks. In Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Association for Computing Machinery, 2016).

  80. Barretina, J. et al. The cancer cell line encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483, 603–607 (2012).

    Google Scholar 

  81. Iorio, F. et al. A landscape of pharmacogenomic interactions in cancer. Cell 166, 740–754 (2016).

  82. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014).

    Google Scholar 

  83. Kingma, D. P. & Welling, M. Auto-Encoding Variational Bayes. In 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings. http://arxiv.org/abs/1312.6114v10 (2014).

  84. Hinton, G. Lecture 6e rmsprop: divide the gradient by a running average of its recent magnitude. https://www.cs.toronto.edu/t̃ijmen/csc321/slides/lecture_slides_lec6.pdf (2012).

  85. Barski, A. et al. High-resolution profiling of histone methylations in the human genome. Cell 129, 823–837 (2007).

    Google Scholar 

  86. Qiu, Y., Wang, J., Lei, J. & Roeder, K. Identification of cell-type-specific marker genes from co-expression patterns in tissue samples. Bioinformatics 37, 3228–3234 (2021).

    Google Scholar 

  87. Pliner, H. A., Shendure, J. & Trapnell, C. Supervised classification enables rapid annotation of cell atlases. Nat. Methods 16, 983–986 (2019).

    Google Scholar 

  88. Zhang, Z. et al. Scina: a semi-supervised subtyping algorithm of single cells and bulk samples. Genes 10, 531 (2019).

    Google Scholar 

  89. Bai, D. et al. Baiding1234/sclong: sclong v1.0 https://doi.org/10.5281/zenodo.17510567 (2025).

Download references

Acknowledgements

P.X. acknowledges funding support from NIH R35GM157217, NSF IIS2405974, and NSF IIS2339216. E.X. acknowledges funding support from NSF CNS2414087, NSF BCS2040381, NSF IIS2123952, NSF IIS1955532, NSF IIS2311990, NIH R01GM140467, NGA HM04762010002, SRC AIHW award 2024AH3210, NIGMS R01GM140467, and DARPA ECOLE HR00112390063.

Author information

Author notes
  1. These authors contributed equally: Ding Bai, Shentong Mo, Ruiyi Zhang.

Authors and Affiliations

  1. Mohamed bin Zayed University of Artificial Intelligence, Masdar City, Abu Dhabi, UAE

    Ding Bai, Shentong Mo, Pengtao Xie & Eric Xing

  2. Department of Electrical and Computer Engineering, University of California San Diego, La Jolla, CA, USA

    Ruiyi Zhang & Pengtao Xie

  3. Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA

    Yingtao Luo & Eric Xing

  4. Department of Medicine, University of California San Diego, La Jolla, CA, USA

    Jiahao Gao, Tiffany Amariuta, Trey Ideker & Pengtao Xie

  5. Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, CA, USA

    Jeremy Parker Yang & Wei Wang

  6. Department of Bioengineering, University of California San Diego, La Jolla, CA, USA

    Qiuyang Wu, Sheng Zhong, Nathan Lewis, Trey Ideker & Pengtao Xie

  7. Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA

    Hamidreza Rahmani & Danielle Grotjahn

  8. Halıcıoğlu Data Science Institute, University of California San Diego, La Jolla, CA, USA

    Tiffany Amariuta & Pengtao Xie

  9. Department of Pediatrics, School of Medicine, University of California San Diego, La Jolla, CA, USA

    Nathan Lewis

  10. Department of Cellular and Molecular Medicine, School of Medicine, University of California San Diego, La Jolla, CA, USA

    Wei Wang

  11. School of Biological Sciences, University of California San Diego, La Jolla, CA, USA

    Pengtao Xie

Authors
  1. Ding Bai
    View author publications

    Search author on:PubMed Google Scholar

  2. Shentong Mo
    View author publications

    Search author on:PubMed Google Scholar

  3. Ruiyi Zhang
    View author publications

    Search author on:PubMed Google Scholar

  4. Yingtao Luo
    View author publications

    Search author on:PubMed Google Scholar

  5. Jiahao Gao
    View author publications

    Search author on:PubMed Google Scholar

  6. Jeremy Parker Yang
    View author publications

    Search author on:PubMed Google Scholar

  7. Qiuyang Wu
    View author publications

    Search author on:PubMed Google Scholar

  8. Hamidreza Rahmani
    View author publications

    Search author on:PubMed Google Scholar

  9. Tiffany Amariuta
    View author publications

    Search author on:PubMed Google Scholar

  10. Danielle Grotjahn
    View author publications

    Search author on:PubMed Google Scholar

  11. Sheng Zhong
    View author publications

    Search author on:PubMed Google Scholar

  12. Nathan Lewis
    View author publications

    Search author on:PubMed Google Scholar

  13. Wei Wang
    View author publications

    Search author on:PubMed Google Scholar

  14. Trey Ideker
    View author publications

    Search author on:PubMed Google Scholar

  15. Pengtao Xie
    View author publications

    Search author on:PubMed Google Scholar

  16. Eric Xing
    View author publications

    Search author on:PubMed Google Scholar

Contributions

D.B., S.M., and R.Z. contributed to conceptualization, methodology, software, investigation, analysis, writing-original draft, and writing-editing. Y.L. contributed to conceptualization, methodology, and software. J.G., J.Y., Q.W., H.R., T.A., D.G., S.Z., N.L., W.W., and T.I. contributed to investigation, analysis, and writing-editing. P.X. and E.X. contributed to conceptualization, methodology, investigation, analysis, writing-original draft, and writing-editing.

Corresponding authors

Correspondence to Pengtao Xie or Eric Xing.

Ethics declarations

Competing interests

T.I. is a cofounder, member of the advisory board and has an equity interest in Data4Cure and Serinus Biosciences. T.I. is a consultant for and has an equity interest in IDEAYA Biosciences. The terms of these arrangements for T.I. have been reviewed and approved by the University of California, San Diego, in accordance with its conflict of interest policies. The remaining authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks the anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Reporting Summary

Transparent Peer Review file

Source data

Source Data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bai, D., Mo, S., Zhang, R. et al. scLong: a billion-parameter foundation model for capturing long-range gene context in single-cell transcriptomics. Nat Commun (2026). https://doi.org/10.1038/s41467-026-69102-y

Download citation

  • Received: 24 March 2025

  • Accepted: 23 January 2026

  • Published: 05 February 2026

  • DOI: https://doi.org/10.1038/s41467-026-69102-y

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Download PDF

Advertisement

Explore content

  • Research articles
  • Reviews & Analysis
  • News & Comment
  • Videos
  • Collections
  • Subjects
  • Follow us on Facebook
  • Follow us on Twitter
  • Sign up for alerts
  • RSS feed

About the journal

  • Aims & Scope
  • Editors
  • Journal Information
  • Open Access Fees and Funding
  • Calls for Papers
  • Editorial Values Statement
  • Journal Metrics
  • Editors' Highlights
  • Contact
  • Editorial policies
  • Top Articles

Publish with us

  • For authors
  • For Reviewers
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Nature Communications (Nat Commun)

ISSN 2041-1723 (online)

nature.com sitemap

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing