scLong: a billion-parameter foundation model for capturing long-range gene context in single-cell transcriptomics

Bai, Ding; Mo, Shentong; Zhang, Ruiyi; Luo, Yingtao; Gao, Jiahao; Yang, Jeremy Parker; Wu, Qiuyang; Rahmani, Hamidreza; Amariuta, Tiffany; Grotjahn, Danielle; Zhong, Sheng; Lewis, Nathan; Wang, Wei; Ideker, Trey; Xie, Pengtao; Xing, Eric

doi:10.1038/s41467-026-69102-y

Article
Open access
Published: 05 February 2026

scLong: a billion-parameter foundation model for capturing long-range gene context in single-cell transcriptomics

Nature Communications , Article number: (2026) Cite this article

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

Abstract

Single-cell RNA sequencing (scRNA-seq) has revolutionized the study of cellular heterogeneity by providing gene expression data at single-cell resolution, uncovering insights into rare cell populations, cell-cell interactions, and gene regulation. Foundation models pretrained on large-scale scRNA-seq datasets have shown great promise in analyzing such data, but existing approaches are often limited to modeling a small subset of highly expressed genes and lack the integration of external gene-specific knowledge. To address these limitations, we present scLong, a billion-parameter foundation model pretrained on 48 million cells. scLong performs self-attention across the entire set of 28,000 genes in the human genome. This enables the model to capture long-range dependencies between all genes, including lowly expressed ones (containing unexpressed genes with zero expressions), which often play critical roles in cellular processes but are typically excluded by existing foundation models. Additionally, scLong integrates gene knowledge from the Gene Ontology using a graph convolutional network, enriching its contextual understanding of gene functions and relationships. In extensive evaluations, scLong surpasses both state-of-the-art scRNA-seq foundation models and task-specific models across diverse tasks, including predicting transcriptional responses to genetic and chemical perturbations, forecasting cancer drug responses, and inferring gene regulatory networks.

Data availability

The pretraining datasets were collected from public datasets hosted on CELLxGENE (https://cellxgene.cziscience.com/datasets), Cell Blast (https://cblast.gao-lab.org/), and the Human Cell Atlas (https://www.humancellatlas.org/). The datasets used for downstream tasks are accessible from the following links: genetic perturbation dataset (https://github.com/snap-stanford/GEARS); chemical perturbation dataset (https://github.com/njpipeorgan/L1000-bayesian); single drug and drug combination response datasets (https://github.com/kimmo1019/DeepCDR and https://github.com/Sinwang404/DeepDDS); GRN inference datasets (https://github.com/HantaoShu/DeepSEM); zero-shot batch integration dataset (https://figshare.com/articles/dataset/Benchmarking_atlas-level_data_integration_in_single-cell_genomics_-_integration_task_datasets_Immune_and_pancreas_/12420968); and marker gene clustering dataset (https://zenodo.org/records/3357167). The datasets curated and utilized in this study, trained model parameters, and other files necessary to reproduce the experimental results, figures, and tables can be accessed at https://mbzuaiac-my.sharepoint.com/:f:/g/personal/ding_bai_mbzuai_ac_ae/EpvKzQW4hI5Bnb88-iM7vE0B_e2_U5r_ZGXb_FILCLTw3Qand https://figshare.com/account/articles/30105148. Source data are provided with this paper.

Code availability

The source code for this work is available at https://github.com/BaiDing1234/scLongand is archived at https://zenodo.org/records/17510567⁸⁹.

References

Consortium, T. T. M. A single-cell transcriptomic atlas characterizes ageing tissues in the mouse. Nature 583, 590–595 (2020).
Google Scholar
Domcke, S. et al. A human cell atlas of fetal chromatin accessibility. Science 370, eaba7612 (2020).
Google Scholar
Han, X. et al. Construction of a human cell landscape at single-cell level. Nature 581, 303–309 (2020).
Google Scholar
Grün, D. et al. Single-cell messenger RNA sequencing reveals rare intestinal cell types. Nature 525, 251–255 (2015).
Google Scholar
Jin, S. et al. Inference and analysis of cell-cell communication using CellChat. Nat. Commun. 12, 1088 (2021).
Google Scholar
Gasperini, M. et al. A genome-wide framework for mapping gene regulation via cellular genetic screens. Cell 176, 377–390.e19 (2019).
Google Scholar
Yang, F. et al. scBERT as a large-scale pretrained deep language model for cell type annotation of single-cell RNA-seq data. Nat. Mach. Intell. 4, 852–866 (2022).
Google Scholar
Theodoris, C. V. et al. Transfer learning enables predictions in network biology. Nature 618, 616–624 (2023).
Google Scholar
Cui, H. et al. scgpt: towards building a foundation model for single-cell multi-omics using generative ai. Nat. Methods 21, 1470–1480 (2024).
Google Scholar
Hao, M. et al. Large scale foundation model on single-cell transcriptomics. Nat. Methods 21, 1481–1491 (2024).
Google Scholar
Yang, X. et al. Genecompass: deciphering universal gene regulatory mechanisms with knowledge-informed cross-species foundation model. Cell Res. https://doi.org/10.1038/s41422-024-01034-y (2024).
Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: pre-training of deep bidirectional transformers for language understanding. In Proc. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (eds Burstein, J., Doran, C. & Solorio, T.) 4171–4186. https://aclanthology.org/N19-1423 (Association for Computational Linguistics, Minneapolis, Minnesota, 2019)
Vaswani, A. et al. Attention is all you need. In Advances in Neural Information Processing Systems Vol. 30 (eds Guyon, I. et al.). https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf (Curran Associates, Inc., 2017).
Roohani, Y., Huang, K. & Leskovec, J. Predicting transcriptional outcomes of novel multigene perturbations with GEARS. Nat. Biotechnol. 42, 927–935 (2024).
Shu, H. et al. Modeling gene regulatory networks using neural network architectures. Nat. Comput. Sci. 1, 491–501 (2021).
Google Scholar
Rosen, Y. et al. Universal cell embeddings: a foundation model for cell biology. bioRxiv https://doi.org/10.1101/2023.11.28.568918 (2023).
Sha, Y., Phan, J. H. & Wang, M. D. Effect of low-expression gene filtering on detection of differentially expressed genes in RNA-seq data. In Proc. 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) 6461–6464 https://api.semanticscholar.org/CorpusID:11532112 (2015).
Zhao, H. et al. Lowly-expressed lncRNA GAS5 facilitates progression of ovarian cancer through targeting miR-196-5p and thereby regulating HOXA5. Gynecol. Oncol. 151, 345–355 (2018).
Google Scholar
Yang, L., Takuno, S., Waters, E. R. & Gaut, B. S. Lowly expressed genes in Arabidopsis thaliana bear the signature of possible pseudogenization by promoter degradation. Mol. Biol. Evol. 28, 1193–1203 (2011).
Google Scholar
Zhou, Z. et al. Codon usage is an important determinant of gene expression levels largely through its effects on transcription. Proc. Natl. Acad. Sci. 113, E6117–E6125 (2016).
Google Scholar
Huang, M. et al. Saver: gene expression recovery for single-cell RNA sequencing. Nat. methods 15, 539–542 (2018).
Google Scholar
Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25–29 (2000).
Google Scholar
Hamilton, W., Ying, Z. & Leskovec, J. Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems Vol. 30 (eds Guyon, I. et al.). https://proceedings.neurips.cc/paper_files/paper/2017/file/5dd9db5e033da9c6fb5ba83c7a7ebea9-Paper.pdf (Curran Associates, Inc., 2017).
Wu, F. et al. Simplifying graph convolutional networks. In Proc. 36th International Conference on Machine Learning 6861–6871 (PMLR, 2019).
Choromanski, K. M. et al. Rethinking attention with performers. In International Conference on Learning Representations https://openreview.net/forum?id=Ua6zuk0WRH (2021).
Dixit, A. et al. Perturb-seq: dissecting molecular circuits with scalable single-cell RNA profiling of pooled genetic screens. Cell 167, 1853–1866 (2016).
Google Scholar
Norman, T. M. et al. Exploring genetic interaction manifolds constructed from rich single-cell phenotypes. Science 365, 786–793 (2019).
Google Scholar
Ahlmann-Eltze, C., Huber, W. & Anders, S. Deep learning-based predictions of gene perturbation effects do not yet outperform simple linear methods. Nat. Methods 22, 1657–1661 (2025).
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B (Methodol.) 57, 289–300 (1995).
Google Scholar
Al-Lazikani, B., Banerji, U. & Workman, P. Combinatorial drug therapy for cancer in the post-genomic era. Nat. Biotechnol. 30, 679–692 (2012).
Google Scholar
Subramanian, A. et al. A next generation connectivity map: L1000 platform and the first 1,000,000 profiles. Cell 171, 1437–1452.e17 (2017).
Google Scholar
Pham, T.-H., Wang, Y., Xu, J. & Zhang, P. A deep learning framework for high-throughput mechanism-driven phenotype compound screening and its application to COVID-19 drug repurposing. Nat. Mach. Intell. 3, 247–257 (2021).
Google Scholar
Kuenzi, B. M. et al. Predicting drug response and synergy using a deep learning model of human cancer cells. Cancer Cell 38, 672–684.e6 (2020).
Google Scholar
Unger, F. T., Witte, I. & David, K. A. Prediction of individual response to anticancer therapy: historical and future perspectives. Cell. Mol. Life Sci. 72, 729–757 (2015).
Google Scholar
Kipf, T. N. & Welling, M. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations (ICLR). https://openreview.net/forum?id=SJU4ayYgl (2017).
Liu, Q., Hu, Z., Jiang, R. & Zhou, M. DeepCDR: a hybrid graph convolutional network for predicting cancer drug response. Bioinformatics 36, i911–i918 (2020).
Google Scholar
Pedregosa, F. et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Google Scholar
Zhao, S. et al. Systems pharmacology of adverse event mitigation by drug combinations. Sci. Transl. Med. 5, 206ra140 (2013).
Google Scholar
O’Neil, J. et al. An unbiased oncology compound screen to identify novel combination strategies. Mol. Cancer Ther. 15, 1155–1162 (2016).
Google Scholar
Menden, M. P. et al. Community assessment to advance computational prediction of cancer drug combinations in a pharmacogenomic screen. Nat. Commun. 10, 1–17 (2019).
Google Scholar
Wang, J., Liu, X., Shen, S., Deng, L. & Liu, H. DeepDDS: deep graph neural network with attention mechanism to predict synergistic drug combinations. Brief. Bioinform. 23, bbab390 (2021).
Google Scholar
Berthelot, C., Villar, D., Horvath, J. E., Odom, D. T. & Flicek, P. Complexity and conservation of regulatory landscapes underlie evolutionary resilience of mammalian gene expression. Nat. Ecol. Evol. 2, 152–163 (2018).
Google Scholar
Thompson, D., Regev, A. & Roy, S. Comparative analysis of gene regulatory networks: from network reconstruction to evolution. Annu. Rev. Cell Dev. Biol. 31, 399–428 (2015).
Google Scholar
Pratapa, A., Jalihal, A. P., Law, J. N., Bharadwaj, A. & Murali, T. M. Benchmarking algorithms for gene regulatory network inference from single-cell transcriptomic data. Nat. Methods 17, 147–154 (2020).
Google Scholar
Chu, L.-F. et al. Single-cell rna-seq reveals novel regulators of human embryonic stem cell differentiation to definitive endoderm. Genome Biol. 17, 1–20 (2016).
Google Scholar
Higgins, I. et al. beta-VAE: learning basic visual concepts with a constrained variational framework. In International Conference on Learning Representations. https://openreview.net/forum?id=Sy2fzU9gl (2017).
Oki, S. et al. Chip-atlas: a data-mining suite powered by full integration of public chip-seq data. EMBO Rep. 19, e46255 (2018).
Google Scholar
Huynh-Thu, V. A., Irrthum, A., Wehenkel, L. & Geurts, P. Inferring regulatory networks from expression data using tree-based methods. PLoS ONE 5, e12776 (2010).
Google Scholar
Luecken, M. D. et al. Benchmarking atlas-level data integration in single-cell genomics. Nat. methods 19, 41–50 (2022).
Google Scholar
Hie, B. et al. Computational methods for single-cell RNA sequencing. Annu. Rev. Biomed. Data Sci. 3, 339–364 (2020).
Google Scholar
Argelaguet, R., Cuomo, A. S., Stegle, O. & Marioni, J. C. Computational principles and challenges in single-cell data integration. Nat. Biotechnol. 39, 1202–1215 (2021).
Google Scholar
Heumos, L. et al. Best practices for single-cell analysis across modalities. Nat. Rev. Genet. 24, 550–572 (2023).
Google Scholar
Kedzierska, K. Z., Crawford, L., Amini, A. P. & Lu, A. X. Zero-shot evaluation reveals limitations of single-cell foundation models. Genome Biol.26, 101 (2025).
Lopez, R., Regier, J., Cole, M. B., Jordan, M. I. & Yosef, N. Deep generative modeling for single-cell transcriptomics. Nat. methods 15, 1053–1058 (2018).
Google Scholar
Zheng, G. X. et al. Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 8, 14049 (2017).
Google Scholar
Bai, D., Ellington, C. N., Mo, S., Song, L. & Xing, E. P. Attentionpert: accurately modeling multiplexed genetic perturbations with multi-scale effects. Bioinformatics 40, i453–i461 (2024).
Google Scholar
Geeleher, P., Cox, N. J. & Huang, R. S. Clinical drug response can be predicted using baseline gene expression levels and in vitro drug sensitivity in cell lines. Genome Biol. 15, 1–12 (2014).
Google Scholar
Fabregat, A. et al. The reactome pathway knowledgebase. Nucleic Acids Res. 46, D649–D655 (2018).
Google Scholar
Szklarczyk, D. et al. The STRING database in 2021: customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets. Nucleic Acids Res. 49, D605–D612 (2020).
Google Scholar
Buenrostro, J. D., Wu, B., Litzenburger, U., Greenleaf, W. J. & Chang, H. Y. Single-cell chromatin accessibility reveals principles of regulatory variation. Nature 523, 486–490 (2015).
Google Scholar
Selvaraju, R. R. et al. Grad-cam: Visual explanations from deep networks via gradient-based localization. Int. J. Comput. Vis. 128, 336–359 (2016).
Google Scholar
Han, S., Pool, J., Tran, J. & Dally, W. J. Learning both weights and connections for efficient neural network. In Neural Information Processing Systems. https://api.semanticscholar.org/CorpusID:2238772 (2015).
Hinton, G., Vinyals, O. & Dean, J. Distilling the knowledge in a neural network. In NIPS Deep Learning and Representation Learning Workshop. http://arxiv.org/abs/1503.02531 (2015).
Domínguez Conde, C. et al. Cross-tissue immune cell analysis reveals tissue-specific features in humans. Science 376, eabl5197 (2022).
Google Scholar
Yazar, S. et al. Single-cell eqtl mapping identifies cell type–specific genetic control of autoimmune disease. Science 376, eabf3041 (2022).
Google Scholar
Consortium*, T. T. S. et al. The tabula sapiens: a multiple-organ, single-cell transcriptomic atlas of humans. Science 376, eabl4896 (2022).
Sikkema, L. et al. An integrated cell atlas of the lung in health and disease. Nat. Med. 29, 1563–1577 (2023).
Google Scholar
Perez, R. K. et al. Single-cell RNA-seq reveals cell type–specific molecular and genetic associations to lupus. Science 376, eabf1970 (2022).
Google Scholar
Cao, Z.-J., Wei, L., Lu, S., Yang, D.-C. & Gao, G. Searching large-scale scRNA-seq databases via unbiased cell embedding with cell blast. Nat. Commun. 11, 3458 (2020).
Google Scholar
Lindeboom, R. G., Regev, A. & Teichmann, S. A. Towards a human cell atlas: taking notes from the past. Trends Genet. 37, 625–630 (2021).
Google Scholar
Harrison, P. W. et al. Ensembl 2024. Nucleic Acids Res. 52, D891–D899 (2023).
Google Scholar
Du, J. et al. Gene2vec: distributed representation of genes based on co-expression. BMC Genom. 20, 82 (2019).
Google Scholar
Booeshaghi, A. S. & Pachter, L. Normalization of single-cell RNA-seq counts by log (x+ 1) or log (1+ x). Bioinformatics 37, 2223–2224 (2021).
Google Scholar
Maas, A. L. et al. Rectifier nonlinearities improve neural network acoustic models. In Proc. ICML Vol. 30, 3 (PMLR, Atlanta, GA, 2013).
Li, S. et al. Pytorch distributed: experiences on accelerating data parallel training. Proc. VLDB Endow. 13, 3005–3018 (2020).
Google Scholar
Kingma, D. P. & Ba, J. Adam: A method for stochastic optimization. In Proc. International Conference on Learning Representations (ICLR). https://arxiv.org/abs/1412.6980 (2015).
Ioffe, S. & Szegedy, C. Batch normalization: accelerating deep network training by reducing internal covariate shift. In Proc. 32nd International Conference on Machine Learning (ICML-15), 448–456 (PMLR, 2015).
Waskom, M. L. Seaborn: statistical data visualization. J. Open Source Softw. 6, 3021 (2021).
Google Scholar
Grover, A. & Leskovec, J. node2vec: Scalable feature learning for networks. In Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Association for Computing Machinery, 2016).
Barretina, J. et al. The cancer cell line encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483, 603–607 (2012).
Google Scholar
Iorio, F. et al. A landscape of pharmacogenomic interactions in cancer. Cell 166, 740–754 (2016).
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014).
Google Scholar
Kingma, D. P. & Welling, M. Auto-Encoding Variational Bayes. In 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings. http://arxiv.org/abs/1312.6114v10 (2014).
Hinton, G. Lecture 6e rmsprop: divide the gradient by a running average of its recent magnitude. https://www.cs.toronto.edu/t̃ijmen/csc321/slides/lecture_slides_lec6.pdf (2012).
Barski, A. et al. High-resolution profiling of histone methylations in the human genome. Cell 129, 823–837 (2007).
Google Scholar
Qiu, Y., Wang, J., Lei, J. & Roeder, K. Identification of cell-type-specific marker genes from co-expression patterns in tissue samples. Bioinformatics 37, 3228–3234 (2021).
Google Scholar
Pliner, H. A., Shendure, J. & Trapnell, C. Supervised classification enables rapid annotation of cell atlases. Nat. Methods 16, 983–986 (2019).
Google Scholar
Zhang, Z. et al. Scina: a semi-supervised subtyping algorithm of single cells and bulk samples. Genes 10, 531 (2019).
Google Scholar
Bai, D. et al. Baiding1234/sclong: sclong v1.0 https://doi.org/10.5281/zenodo.17510567 (2025).

Download references

Acknowledgements

P.X. acknowledges funding support from NIH R35GM157217, NSF IIS2405974, and NSF IIS2339216. E.X. acknowledges funding support from NSF CNS2414087, NSF BCS2040381, NSF IIS2123952, NSF IIS1955532, NSF IIS2311990, NIH R01GM140467, NGA HM04762010002, SRC AIHW award 2024AH3210, NIGMS R01GM140467, and DARPA ECOLE HR00112390063.

Author information

These authors contributed equally: Ding Bai, Shentong Mo, Ruiyi Zhang.

Authors and Affiliations

Mohamed bin Zayed University of Artificial Intelligence, Masdar City, Abu Dhabi, UAE
Ding Bai, Shentong Mo, Pengtao Xie & Eric Xing
Department of Electrical and Computer Engineering, University of California San Diego, La Jolla, CA, USA
Ruiyi Zhang & Pengtao Xie
Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
Yingtao Luo & Eric Xing
Department of Medicine, University of California San Diego, La Jolla, CA, USA
Jiahao Gao, Tiffany Amariuta, Trey Ideker & Pengtao Xie
Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, CA, USA
Jeremy Parker Yang & Wei Wang
Department of Bioengineering, University of California San Diego, La Jolla, CA, USA
Qiuyang Wu, Sheng Zhong, Nathan Lewis, Trey Ideker & Pengtao Xie
Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
Hamidreza Rahmani & Danielle Grotjahn
Halıcıoğlu Data Science Institute, University of California San Diego, La Jolla, CA, USA
Tiffany Amariuta & Pengtao Xie
Department of Pediatrics, School of Medicine, University of California San Diego, La Jolla, CA, USA
Nathan Lewis
Department of Cellular and Molecular Medicine, School of Medicine, University of California San Diego, La Jolla, CA, USA
Wei Wang
School of Biological Sciences, University of California San Diego, La Jolla, CA, USA
Pengtao Xie

Authors

Ding Bai
View author publications
Search author on:PubMed Google Scholar
Shentong Mo
View author publications
Search author on:PubMed Google Scholar
Ruiyi Zhang
View author publications
Search author on:PubMed Google Scholar
Yingtao Luo
View author publications
Search author on:PubMed Google Scholar
Jiahao Gao
View author publications
Search author on:PubMed Google Scholar
Jeremy Parker Yang
View author publications
Search author on:PubMed Google Scholar
Qiuyang Wu
View author publications
Search author on:PubMed Google Scholar
Hamidreza Rahmani
View author publications
Search author on:PubMed Google Scholar
Tiffany Amariuta
View author publications
Search author on:PubMed Google Scholar
Danielle Grotjahn
View author publications
Search author on:PubMed Google Scholar
Sheng Zhong
View author publications
Search author on:PubMed Google Scholar
Nathan Lewis
View author publications
Search author on:PubMed Google Scholar
Wei Wang
View author publications
Search author on:PubMed Google Scholar
Trey Ideker
View author publications
Search author on:PubMed Google Scholar
Pengtao Xie
View author publications
Search author on:PubMed Google Scholar
Eric Xing
View author publications
Search author on:PubMed Google Scholar

Contributions

D.B., S.M., and R.Z. contributed to conceptualization, methodology, software, investigation, analysis, writing-original draft, and writing-editing. Y.L. contributed to conceptualization, methodology, and software. J.G., J.Y., Q.W., H.R., T.A., D.G., S.Z., N.L., W.W., and T.I. contributed to investigation, analysis, and writing-editing. P.X. and E.X. contributed to conceptualization, methodology, investigation, analysis, writing-original draft, and writing-editing.

Corresponding authors

Correspondence to Pengtao Xie or Eric Xing.

Ethics declarations

Competing interests

T.I. is a cofounder, member of the advisory board and has an equity interest in Data4Cure and Serinus Biosciences. T.I. is a consultant for and has an equity interest in IDEAYA Biosciences. The terms of these arrangements for T.I. have been reviewed and approved by the University of California, San Diego, in accordance with its conflict of interest policies. The remaining authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks the anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Reporting Summary

Transparent Peer Review file

Source data

Source Data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Bai, D., Mo, S., Zhang, R. et al. scLong: a billion-parameter foundation model for capturing long-range gene context in single-cell transcriptomics. Nat Commun (2026). https://doi.org/10.1038/s41467-026-69102-y

Download citation

Received: 24 March 2025
Accepted: 23 January 2026
Published: 05 February 2026
DOI: https://doi.org/10.1038/s41467-026-69102-y