Abstract
Sequence-to-function models that predict gene expression from genomic DNA sequence have proven valuable for many biological tasks, including understanding cis-regulatory syntax and interpreting noncoding genetic variation. However, current state-of-the-art models are trained largely on bulk expression profiles from healthy tissues or cell lines and have not learned the properties of precise cell types and states that are captured in large-scale single-cell transcriptomic datasets. Thus, they cannot perform these tasks at the resolution of specific cell types or states. Here we present Decima, a model that predicts the cell type- and condition-specific expression of a gene from its surrounding DNA sequence. Decima is trained on single-cell or single-nucleus RNA sequencing data from over 22 million cells and successfully predicts the cell-type-specific expression of unseen genes. We demonstrate Decima’s ability to reveal cis-regulatory mechanisms driving cell-type-specific gene expression and its changes in disease, predict noncoding-variant effects at cell type resolution and design context-specific regulatory DNA elements.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to the full article PDF.
USD 39.95
Prices may be subject to local taxes which are calculated during checkout






Similar content being viewed by others
Data availability
Publicly available sc/snRNA-seq count matrices were downloaded from the following sources. SCimilarity (sc/snRNA-seq): Individual datasets were downloaded and prepared as described in ref. 15. Brain Cell Atlas (snRNA-seq): https://www.braincellatlas.org/dataSet. Human skin atlas: https://singlecell.broadinstitute.org/single_cell/study/SCP2738. Human retina atlas: https://cellxgene.cziscience.com/collections/4c6eaf5c-6d57-4c76-b1e9-60df8c655f1e. Genome annotation files containing gene and exon coordinates were obtained via CellRanger at https://www.10xgenomics.com/support/software/cell-ranger/latest and via the National Center for Biotechnology Information (NCBI) gene database at https://www.ncbi.nlm.nih.gov/gene. sc-eQTL variants were obtained from the EBI eQTL catalog (accession no. QTS000038). Sources for all GWAS variant datasets are presented in Supplementary Table 5. A reference set of fine-mapped eQTLs was obtained from the September 2022 release of Open Targets at https://ftp.ebi.ac.uk/pub/databases/opentargets/genetics/22.09/. Model weights and predictions made by Decima for all genes and variants are available via Zenodo at https://doi.org/10.5281/zenodo.18142522 (ref. 65).
Code availability
Decima is available via GitHub at https://github.com/Genentech/decima. Data were processed using Scanpy v1.10.2, AnnData v0.10.8 and pandas v2.1.4. The models used in this paper were trained and applied using Decima version 0.1, PyTorch v2.2.2, PyTorch Lightning v2.4.0, wandb v0.17 and Python v3.11.9. Models were trained on a single NVIDIA A100 GPU with CUDA 12.1. Analyses of the model’s predictions used modisco-lite v2.2.1, MEME suite v0.5.5 and gimmemotifs v0.18.0. Fine-mapping was performed using DENTIST v0.9.2.1, PolyFun v1.0.0 and susieR v0.11. Code used to process data, train models and perform all analyses in this paper is available via GitHub at https://github.com/Genentech/decima-applications and via Zenodo at https://doi.org/10.5281/zenodo.18142522 (ref. 65). A Snakemake-based pipeline for GWAS fine-mapping is available upon request.
References
Zhou, J. et al. Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk. Nat. Genet. 50, 1171–1179 (2018).
Trevino, A. E. et al. Chromatin and gene-regulatory dynamics of the developing human cerebral cortex at single-cell resolution. Cell 184, 5053–5069 (2021).
Chen, K. M., Wong, A. K., Troyanskaya, O. G. & Zhou, J. A sequence-based global map of regulatory activity for deciphering human genetics. Nat. Genet. 54, 940–949 (2022).
Eraslan, G., Avsec, Ž, Gagneur, J. & Theis, F. J. Deep learning: new computational modelling techniques for genomics. Nat. Rev. Genet. 20, 389–403 (2019).
Sasse, A., Chikina, M. & Mostafavi, S. Unlocking gene regulation with sequence-to-function models. Nat. Methods 21, 1374–1377 (2024).
Agarwal, V. & Shendure, J. Predicting mRNA abundance directly from genomic sequence using deep convolutional neural networks. Cell Rep. 31, 107663 (2020).
Avsec, Ž et al. Base-resolution models of transcription-factor binding reveal soft motif syntax. Nat. Genet. 53, 354–366 (2021).
Avsec, Ž et al. Effective gene expression prediction from sequence by integrating long-range interactions. Nat. Methods 18, 1196–1203 (2021).
Linder, J., Srivastava, D., Yuan, H., Agarwal, V. & Kelley, D. R. Predicting RNA-seq coverage from DNA sequence as a unifying model of gene regulation. Nat. Genet. 57, 949–961 (2025).
Holland, C. H. et al. Robustness and applicability of transcription factor and pathway analysis tools on single-cell RNA-seq data. Genome Biol. 21, 1–19 (2020).
Badia-i-Mompel, P. et al. Gene regulatory network inference in the era of single-cell multi-omics. Nat. Rev. Genet. 24, 739–754 (2023).
Schwessinger, R., Deasy, J., Woodruff, R. T., Young, S. & Branson, K. M. Single-cell gene expression prediction from DNA sequence at large contexts. Preprint at bioRxiv https://doi.org/10.1101/2023.07.26.550634 (2023).
Li, J. et al. Deep learning of cross-species single-cell landscapes identifies conserved regulatory programs underlying cell types. Nat. Genet. 54, 1711–1720 (2022).
Hingerl, J. C. et al. scooby: modeling multi-modal genomic profiles from DNA sequence at single-cell resolution. Nat. Methods 22, 2275–2285 (2025).
Heimberg, G. et al. A cell atlas foundation model for scalable search of similar human cells. Nature 638, 1085–1094 (2025).
Chen, X. et al. A brain cell atlas integrating single-cell transcriptomes across human brain regions. Nat. Med. 30, 2679–2691 (2024).
Fiskin, E. et al. Multi-modal skin atlas identifies a multicellular immune-stromal community associated with altered cornification and specific T cell expansion in atopic dermatitis. Nat. Commun. 17, 3194 (2026).
Li, J. et al. Integrated multi-omics single cell atlas of the human retina. Preprint at bioRxiv https://doi.org/10.1101/2023.11.07.566105 (2023).
Eraslan, G. et al. Single-nucleus cross-tissue molecular reference maps toward understanding disease gene function. Science 376, eabl4290 (2022).
Shrikumar, A., Greenside, P. & Kundaje, A. Learning important features through propagating activation differences. In International Conference on Machine Learning 3145–3153 (PMLR, 2017).
ENCODE Project Consortium, et al. Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature 583, 699–710 (2020).
Gasperini, M. et al. A genome-wide framework for mapping gene regulation via cellular genetic screens. Cell 176, 377–390(2019).
Karollus, A., Mauermeier, T. & Gagneur, J. Current sequence-based models capture gene expression determinants in promoters but mostly ignore distal enhancers. Genome Biol. 24, 56 (2023).
Zhang, K. et al. A single-cell atlas of chromatin accessibility in the human genome. Cell 184, 5985–6001 (2021).
Grainger, S., Hryniuk, A. & Lohnes, D. Cdx1 and Cdx2 exhibit transcriptional specificity in the intestine. PLoS ONE 8, e54757 (2013).
Shrikumar, A. et al. Technical note on transcription factor motif discovery from importance scores (TF-MoDISco) version 0.5.6.5. Preprint at https://arxiv.org/abs/1811.00416 (2020).
Piasecki, B. P., Burghoorn, J. & Swoboda, P. Regulatory Factor X (RFX)-mediated transcriptional rewiring of ciliary genes in animals. Proc. Natl Acad. Sci. USA 107, 12969–12974 (2010).
Little, D. R. et al. Differential chromatin binding of the lung lineage transcription factor NKX2-1 resolves opposing murine alveolar cell fates in vivo. Nat. Commun. 12, 1–18 (2021).
Daniely, Y. et al. Critical role of p63 in the development of a normal esophageal and tracheobronchial epithelium. Am. J. Physiol. Cell Physiol. 287, C171–C181 (2004).
Yang, H., Lu, M. M., Zhang, L., Whitsett, J. A. & Morrisey, E. E. GATA6 regulates differentiation of distal lung epithelium. Development129, 2233–2246 (2002).
Shiraishi, K. et al. Airway epithelial cell identity and plasticity are constrained by Sox2 during lung homeostasis, tissue regeneration, and in human disease. npj Regen. Med. 9, 1–14 (2024).
Kersbergen, A. et al. Lung morphogenesis is orchestrated through Grainyhead-like 2 (Grhl2) transcriptional programs. Dev. Biol. 443, 1–9 (2018).
Mall, M. et al. Myt1l safeguards neuronal identity by actively repressing many non-neuronal fates. Nature 544, 245–249 (2017).
Qureshi, I. A., Gokhan, S. & Mehler, M. F. REST and CoREST are transcriptional and epigenetic regulators of seminal neural fate decisions. Cell Cycle 9, 4477 (2010).
Masuda, S., Matsuura, K. & Shimizu, T. GATA6 regulates anti-angiogenic properties in human cardiac fibroblasts via modulating LYPD1 expression. Regen. Ther. 23, 8–16 (2023).
Song, S. et al. TEA domain transcription factor 1 (TEAD1) induces cardiac fibroblasts cells remodeling through BRD4/Wnt4 pathway. Signal Transduct. Targeted Ther. 9, 1–12 (2024).
Burgos Villar, K. N., Liu, X. & Small, E. M. Transcriptional regulation of cardiac fibroblast phenotypic plasticity. Curr. Opin. Physiol. 28, 100556 (2022).
Yazar, S. et al. Single-cell eQTL mapping identifies cell type-specific genetic control of autoimmune disease. Science 376, eabf3041 (2022).
Kerimov, N. et al. A compendium of uniformly processed human gene expression and splicing quantitative trait loci. Nat. Genet. 53, 1290–1299 (2021).
Rosenbauer, F. & Tenen, D. G. Transcription factors in myeloid development: balancing differentiation with transformation. Nat. Rev. Immunol. 7, 105–117 (2007).
Mostafavi, H., Spence, J. P., Naqvi, S. & Pritchard, J. K. Systematic differences in discovery of genetic effects on gene expression and complex traits. Nat. Genet. 55, 1866–1875 (2023).
Zhang, M. J. et al. Polygenic enrichment distinguishes disease associations of individual cells in single-cell RNA-seq data. Nat. Genet. 54, 1572–1580 (2022).
Alves-Bezerra, M. & Cohen, D. E. Triglyceride metabolism in the liver. Compr. Physiol. 8, 1 (2017).
Wenzel, P. Monocytes as immune targets in arterial hypertension. Br. J. Pharmacol. 176, 1966 (2018).
Hung, H. L. et al. Stimulation of NF-E2 DNA binding by CREB-binding protein (CBP)-mediated acetylation. J. Biol. Chem. 276, 10715–10721 (2001).
Kim, S. et al. DNA-guided transcription factor cooperativity shapes face and limb mesenchyme. Cell 187, 692–711 (2024).
Yeo, S.-Y. et al. A positive feedback loop bi-stably activates fibroblasts. Nat. Commun. 9, 3016 (2018).
Schreiber, S., Nikolaus, S. & Hampe, J. Activation of nuclear factor κB in inflammatory bowel disease. Gut 42, 477 (1998).
Han, Y. M. et al. NF-kappa B activation correlates with disease phenotype in Crohn’s disease. PLoS ONE 12, e0182071 (2017).
Taskiran, I. I. et al. Cell-type-directed design of synthetic enhancers. Nature 626, 212–220 (2024).
de Almeida, B. P. et al. Targeted design of synthetic enhancers for selected tissues in the Drosophila embryo. Nature 626, 207–211 (2024).
Gosai, S. J. et al. Machine-guided design of cell-type-targeting cis-regulatory elements. Nature 634, 1211–1220 (2024).
Lal, A., Gunsalus, L., Nair, S., Biancalani, T. & Eraslan, G. gReLU: a comprehensive framework for DNA sequence modeling and design. Nat. Methods 22, 2253–2257 (2025).
Wang, D., Tai, P. W. L. & Gao, G. Adeno-associated virus vector as a platform for gene therapy delivery. Nat. Rev. Drug Discov. 18, 358–378 (2019).
Ponjavic, J. et al. Transcriptional and structural impact of TATA-initiation site spacing in mammalian core promoters. Genome Biol. 7, R78 (2006).
Kulakovskiy, I. V. et al. HOCOMOCO: towards a complete collection of transcription factor binding models for human and mouse via large-scale ChIP-seq analysis. Nucleic Acids Res. 46, D252–D259 (2018).
Gupta, S., Stamatoyannopoulos, J. A., Bailey, T. L. & Noble, W. S. Quantifying similarity between motifs. Genome Biol. 8, R24 (2007).
Bruse, N. & van Heeringen, S. J. GimmeMotifs: an analysis framework for transcription factor motif analysis. Preprint at bioRxiv https://doi.org/10.1101/474403 (2018).
Majdandzic, A., Rajesh, C. & Koo, P. K. Correcting gradient-based interpretations of deep neural networks for genomics. Genome Biol. 24, 1–13 (2023).
Wang, G., Sarkar, A., Carbonetto, P. & Stephens, M. A simple new approach to variable selection in regression, with application to genetic fine mapping. J. R. Stat. Soc. Series B 82, 1273–1300 (2020).
Chen, W. et al. Improved analyses of GWAS summary statistics by reducing data heterogeneity and errors. Nat. Commun. 12, 7117 (2021).
Weissbrod, O. et al. Functionally informed fine-mapping and polygenic localization of complex trait heritability. Nat. Genet. 52, 1355–1363 (2020).
Gazal, S. et al. Functional architecture of low-frequency variants highlights strength of negative selection across coding and non-coding annotations. Nat. Genet. 50, 1600–1607 (2018).
Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).
Lal, A. Decoding sequence determinants of gene expression in diverse cellular and disease states. Zenodo https://doi.org/10.5281/zenodo.18142522 (2024).
Acknowledgements
We thank the following for their helpful comments: A. Regev, J. Rock, O. Ursu, H. Jasper, T. Sterne-Weiler, X. Yao, S. Mostafavi, C. Cox, M. H. Celik, K. Fletez-Brant, D. Chang, N. Jorstad, J. Song and J. Hingerl.
Author information
Authors and Affiliations
Contributions
G.E., J.L.C. and A.L. processed single-cell data. A.L. trained the model. A.L., A.K., D.G., L.G., G.E. and A.M.T. performed analyses with assistance from S.N., M.G.G. and N.D. J.B., B.V.D.G. and T. Bhangale processed GWAS data. G.E., T. Biancalani, H.C.B. and G.S. supervised the work. G.E., A.L., A.K., D.G., L.G. and H.C.B. wrote the paper. All authors read and approved the paper.
Corresponding authors
Ethics declarations
Competing interests
All authors were employed by Genentech, Inc. while contributing to this study.
Peer review
Peer review information
Nature Methods thanks the anonymous reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Lin Tang, in collaboration with the Nature Methods team. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Decima predicts the expression patterns of cell type-specific genes.
a) Schematic illustrating the approach used to evaluate Decima’s performance on identification of cell type-specific genes. b) Histogram showing the Area Under the Receiver Operator Characteristic (AUROC) for classification of specific vs. nonspecific genes in each cell type based on Decima’s predictions in the same cell type.
Extended Data Fig. 2 Design of a fibroblast-specific regulatory element in the context of Crohn’s disease using directed evolution.
a) A schematic showing promoter design through directed evolution with Decima b) Predicted expression of the cargo gene across healthy and diseased fibroblast and non-fibroblast cells over 100 rounds of directed evolution. c) Predicted specificity of cargo gene expression in fibroblasts and disease fibroblasts, which were optimized in design in rounds 0–50 for cell-type specificity and 50–100 for disease-state specificity. d) In silico mutagenesis (ISM) of the synthetic regulatory element reveals key sequence motifs whose perturbation is predicted to uniquely affect expression of the cargo gene in fibroblasts. e, f) ISM with respect to disease fibroblast expression identifies key motifs generated in the design process for the fibroblast (top) and disease fibroblast (bottom) evolved sequences, including a TWIST1, C/EBP, and IRF motif, which are implicated in fibroblast-specific and immune-specific regulation. g) These motifs match HOCOMOCO v12 motifs.
Supplementary information
Supplementary Information (download PDF )
Supplementary Figs. 1–20, Tables 2–8, Methods and references.
Supplementary Table 1 (download XLSX )
List of single-cell RNA-seq or single-nucleus RNA-seq datasets included in Decima’s training data.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Lal, A., Karollus, A., Gunsalus, L. et al. Decoding sequence determinants of gene expression in diverse cellular and disease states. Nat Methods (2026). https://doi.org/10.1038/s41592-026-03102-0
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41592-026-03102-0


