gReLU: a comprehensive framework for DNA sequence modeling and design

Lal, Avantika; Gunsalus, Laura; Nair, Surag; Biancalani, Tommaso; Eraslan, Gokcen

doi:10.1038/s41592-025-02868-z

Download PDF

Brief Communication
Open access
Published: 15 October 2025

gReLU: a comprehensive framework for DNA sequence modeling and design

Nature Methods volume 22, pages 2253–2257 (2025) Cite this article

25k Accesses
10 Citations
21 Altmetric
Metrics details

Subjects

Abstract

Deep learning models trained on DNA sequences can predict cell-type-specific regulatory activity, reveal cis-regulatory grammar, prioritize genetic variants and design synthetic DNA. However, building and interpreting these models correctly remains difficult, and models and software built by different groups are often not interoperable. Here we present gReLU, a comprehensive software framework that enables advanced sequence modeling pipelines, including data preprocessing, modeling, evaluation, interpretation, variant effect prediction and regulatory element design.

Predicting gene expression from DNA sequence using deep learning models

Article 13 May 2025

Controlling gene expression with deep generative design of regulatory DNA

Article Open access 30 August 2022

The evolution, evolvability and engineering of gene regulatory DNA

Article 09 March 2022

Main

Deep learning models trained on DNA sequences along with functional data are capable of learning the cis-regulatory code across biological contexts^1,2. These models can be queried to perform in silico experiments, such as prioritizing functional noncoding variants³, conducting in silico genome engineering⁴ and designing synthetic regulatory elements^5,6. Furthermore, interpreting these models may reveal novel cis-regulatory mechanisms⁷.

However, such models are difficult to train, and minor errors can result in misleading predictions⁸. In addition, the field is hampered by a lack of interoperability between tools. Instead of a common underlying framework being adapted to create new models, new models are often accompanied by custom code for data processing, training and evaluation. This makes it difficult to combine and compare models, or fine-tune them for novel tasks. Furthermore, separate tools have been built for individual tasks such as sequence design⁹, or model interpretation¹⁰, making it difficult to chain these into a workflow.

While unifying software frameworks have been proposed^11,12,13,14, these remain limited in scope, being designed largely around convolutional models that produce scalar-valued predictions for short sequences. They lack support for modern transformer architectures, long-context profile models^15,16, advanced interpretation methods and comprehensive workflows, particularly those related to sequence design^6,9. Here, we present gReLU, an open-source Python framework that unifies diverse sequence models and downstream tasks under one umbrella, minimizing the need for custom code and switching between incompatible tools. Its main features are summarized below (Fig. 1) and detailed in Supplementary Note 1.

1.
Data input: gReLU accepts DNA sequences or genomic coordinates along with functional data in standard formats. Given genomic coordinates, it can load the corresponding sequences and annotations from public databases.
2.
Data processing: gReLU’s preprocessing functions include filtering sequences, matching genomic regions with similar sequence content, calculating sequencing coverage and splitting datasets for training, validation and testing. gReLU provides PyTorch dataset classes to load data from each input format, which support batching, data augmentation, normalization and transformation.
3.
Model design: gReLU provides customizable architectures ranging from small convolutional models to large transformer models^15,16,17. Users can change parameters such as the number of layers, number and width of filters, and activation functions.
4.
Model training: gReLU trains models to perform single- or multitask regression, single- or multitask binary classification, segmentation or multiclass classification. Suitable loss functions^7,16,18 are provided for each task (Supplementary Note 1). Class or example weighting allows users to emphasize subsets of examples. Training is performed using PyTorch Lightning, and logging and hyperparameter sweeps are enabled with Weights & Biases. Finally, gReLU saves model checkpoints that include comprehensive metadata, ensuring reproducibility and allowing all relevant information to be distributed in a single file.
5.
Inference and evaluation: gReLU evaluates models on held-out data using appropriate metrics (Supplementary Note 1). For profile models, gReLU returns predictions along with the corresponding genomic coordinates, accounting for cropping and pooling performed by the model. It also enables data augmentation to improve robustness during inference.
6.
Sequence interpretation: gReLU can score the importance of each base in an input sequence using in silico mutagenesis (ISM), DeepLift/SHAP or gradient-based⁹ methods. Unlike previous frameworks (Supplementary Table 1), it can also annotate important regions by scanning them with position weight matrices (PWMs). Finally, it can derive learned motifs with TF-MoDISco¹⁰ and match the results to known motifs.
7.
Model interpretation: gReLU can generate synthetic DNA sequences in which motifs are shuffled, repositioned or inserted into random backgrounds at varying positions, orientations or spacing (Supplementary Note 1). Computing predictions on such sequences can reveal the regulatory grammar learned by the model^7,19,20. For transformer models, gReLU visualizes attention matrices, which can highlight distal enhancer–gene interactions¹⁵.
8.
Variant effect prediction: Given genetic variants in tabular format, gReLU extracts sequences surrounding the reference and alternate alleles, and performs inference on them using any trained model. It computes an effect size for each variant by comparing the results for the two alleles. While this functionality already exists (Supplementary Table 1), gReLU provides additional robustness using data augmentation and statistical testing (Supplementary Note 2), along with improved interpretability via PWM scanning to identify motifs that are created or disrupted by a variant.
9.
Sequence design: gReLU enables model-driven design of DNA using directed evolution or gradient-based approaches. Users can define the design objective, constrain which positions to edit and encourage or discourage specific patterns (for example, CpGs or motifs).
10.
Prediction transform layers: Previous implementations of tasks such as interpretation, variant effect prediction and directed evolution were designed for short-context, single-task models that produce scalar outputs. These tasks become more complicated with multitask, long-context and/or profile models. gReLU introduces prediction transform layers, flexible layers that can be appended to such models to modify their output—for example, to compute the difference in predictions between two cell types, or the ratio of predictions over introns versus exons of a gene. This enables interpretation, variant effect prediction and design with reference to any derived function of the model’s output.
11.
Model zoo: gReLU includes a freely available model zoo hosted on Weights & Biases, which contains widely applicable models^6,21,22 including Enformer¹⁵ and Borzoi¹⁶ (Supplementary Note 1). It stores model checkpoints alongside code, datasets and logs documenting their creation. gReLU includes functions to programmatically search the zoo and download any model.

**Fig. 1: An overview of the gReLU package.**

To illustrate how gReLU can nominate regulatory variants, we trained a regression model to predict DNase I hypersensitive site sequencing (DNase-seq) signal in GM12878 cells (Fig. 2a and Supplementary Table 2). Using this model, we predicted the effects of 28,274 single-nucleotide variants, of which ~2% were dsQTLs identified in lymphoblastoid cell lines^23,24. The model classified DNase-seq quantitative trait loci (dsQTLs) with an area under the precision–recall curve (AUPRC) of 0.27, outperforming both a random predictor and a published gkmSVM model²⁴ (Fig. 2b).

**Fig. 2: Example analyses with gReLU.**

Leveraging gReLU’s model zoo and prediction transform layers, we benchmarked Enformer⁷ on the same variants (Fig. 2b) and obtained a higher AUPRC of 0.60, probably due to its long input length, profile modeling and multispecies training. For both models, augmenting sequences with reverse complementation during inference increased the AUPRC (Supplementary Table 3).

To analyze the regulatory mechanisms underlying variant effects, we used gReLU to compute saliency scores surrounding both alleles of each variant and run TF-MoDISco¹⁰ (Supplementary Fig. 1a). dsQTLs were significantly (Fisher’s exact test, OR = 20, P value < 2.2 × 10⁻¹⁶) more likely than control single-nucleotide variants to overlap a TF-MoDISco identified motif, suggesting that dsQTLs tend to alter transcription factor (TF) binding motifs (Supplementary Fig. 1b). For example, gReLU’s motif scanning functions revealed that the rs10804244 variant weakens an interferon regulatory factor (IRF) binding site (Fig. 2c).

This example demonstrates several unique features of gReLU. First, a recent study²⁰ highlighted the difficulty of comparing convolutional models with long-context models such as Enformer, because Enformer makes predictions for ~100 kb of sequence at 128 bp resolution, whereas the convolutional model returns a scalar prediction for ~1 kb. gReLU streamlined this comparison by automatically generating sequences of the correct length for each model, identifying the bins in Enformer’s output corresponding to the output of the convolutional model, and subsetting and aggregating Enformer’s predictions over those bins before computing variant effects. Second, gReLU’s motif analysis functions enabled comparison of the model’s predictions to known biology. Finally, gReLU’s data augmentation functionality, the most comprehensive among current frameworks (Supplementary Table 1), improved performance during both training and inference.

Unlike previous frameworks (Supplementary Table 1), gReLU enables systematic interpretation and sequence design not only with small single-task models but also with multitask, long-context and profile models. To demonstrate, we applied the Borzoi model¹⁶ from the model zoo to the PPIF gene in humans. Using gReLU, we visualized the predicted RNA sequencing (RNA-seq) coverage and found that it closely mirrored ground-truth data, showing higher PPIF expression in monocytes than in T cells (Fig. 2d).

Visualizing the model’s attention matrix identified strong attention between PPIF and a known enhancer 61.7 kb upstream of the transcription start site⁴ (Fig. 2e). Using gReLU’s sequence manipulation tools, we simulated 5-bp tiled mutations across the enhancer and predicted their effect on PPIF expression. The results (Fig. 2f, top) were well correlated with experimental Variant-FlowFISH data⁴ from analogous THP-1²⁵ and Jurkat²⁶ cell lines. Although their magnitude was lower, as previously observed⁴ (Spearman’s ρ = 0.58; Supplementary Fig. 2), they correctly identified a central region that is particularly sensitive to perturbation (Fig. 2f, bottom).

Using gReLU’s directed evolution and prediction transform functions, we iteratively modified the enhancer, aiming to maximize the difference in PPIF exon coverage between monocytes and T cells. With 20 base edits to the enhancer, we achieved a predicted 41.76% increase in monocyte expression with only a 16.75% increase in predicted T cell expression (Fig. 2g). ISM and motif scanning over the evolved enhancer (Supplementary Fig. 3) revealed novel CEBP motifs (Extended Data Fig. 1a). This is consistent with experiments showing that inserting a CEBP motif in this enhancer increases PPIF expression in THP-1 relative to Jurkat cells, and with the differential expression of the CEBPA TF between these cells⁴. The specificity of the evolved enhancer was also validated using orthogonal models from the gReLU model zoo (Extended Data Fig. 1b, Supplementary Figs. 4–7 and Supplementary Methods)

In conclusion, gReLU enables scientists to easily train, apply, fine-tune and interpret state-of-the-art sequence models and to design novel regulatory elements with complex properties. It is open-source and includes instructions for users to make contributions and suggestions, along with detailed tutorials. However, there remain many useful features that have not yet been included in the package and would be valuable future extensions. These include efficient architectures to model longer genomic contexts^27,28, additional design algorithms⁶, efficient training on larger, multispecies datasets, modeling biases in genomic assays²⁰ and modeling individual genomes²⁹.

Methods

Processing of GM12878 DNase-seq data

From the ENCODE portal, we downloaded data corresponding to the GM12878 DNase-seq experiment ENCSR000EJD. Specifically, we downloaded the read-depth normalized bigwig (ENCFF093VXI) and the peaks narrowPeak file (ENCFF588OCA), aligned to the hg38 reference genome.

Using gReLU, we extended the peaks 250 bp on each side of the summit and merged overlapping regions. We filtered the regions to autosomes and removed regions overlapping blacklisted regions³⁰. We obtained a set of GC-matched negative regions using the grelu.data.preprocess.get_gc_matched_intervals function with binwidth = 0.05. We used sequences from chromosome 10 as the validation set and sequences from chromosome 11 as the test set.

Architecture and training of the DNase-seq regression model

We used the DilatedConvModel architecture provided in gReLU, which implements a dilated convolutional architecture, similar to BPNet⁷. The following parameters were used: channels = 512, n_conv = 9. Thus, the model consists of 9 convolution layers with 512 channels each.

We created a dataset using the BigWigSeqDataset class with input length 2,114 bp and output lengths of 1,000 bp, label_aggfunc = ‘sum’, label_transform_func = np.log1p, augment_mode = ‘random’. Thus, the labels are transformed to the log of the summed counts over the output 1,000-bp region. Therefore, the model is trained to take as input 2,114 bp of sequence and output the log of the total DNase-seq counts in the central 1,000 bp.

We performed a hyperparameter sweep using Bayesian optimization over different combinations of data augmentation parameters (rc = True or False, max_seq_shift = 0, 1 or 3, max_pair_shift = 0, 10, 50 or 100). Each model was trained using the mean squared error loss, a learning rate of 10⁻⁴ and batch size 512, for a total of 15 epochs. The model with the lowest validation set loss was selected as the best model, evaluated on the test set and used for variant effect prediction.

Variant effect prediction and interpretation

We downloaded the previously curated list of 574 lymphoblastoid cell line (LCL) dsQTL single-nucleotide polymorphisms (SNPs) and 27,735 control SNPs²⁴. We removed regions that would be close to chromosome edges using the filter_chrom_ends function with genome = ‘hg19’ and pad = 98,304 (half the input length for Enformer). This resulted in 574 dsQTL SNPs and 27,700 control SNPs.

We scored the variants using grelu.variant.predict_variant_effects with compare_func = ‘subtract’ and genome = ‘hg19’, using the provided hg19 coordinates. This returns the log fold-change (LFC) of the predicted counts between the reference and the alternate alleles. We ran this step twice, setting rc = True and rc = False, respectively, to test the effect of reverse complement data augmentation. The setting rc = True causes reverse complementation to be applied, that is, for each allele-containing sequence, we predicted its activity in both orientations and averaged the results.

For gkmSVM, we used the predictions provided by the authors²⁴. For Enformer, we used the model from the gReLU model zoo (project = ‘enformer’, model_name = ‘human’). We used grelu.variant.predict_variant_effects to compute variant LFCs comparable with our regression model. To do so, we applied a prediction transform that summed predictions in the eight bins centered around the variant (total width of 128 × 8 = 1,024 bp) for the ENCFF093VXI task and log transformed the sum, along with compare_func = ‘subtract’. We ran this step twice, setting rc = True and rc = False, respectively, to test the effect of reverse complement data augmentation.

We assessed performance using the sklearn.metrics.average_precision_score function with the provided labels and the absolute value of the predicted LFC. We generated the model interpretations at select loci using the grelu.interpret.score.get_attributions function with the parameters method = ‘saliency’, correct_grad = True, which apply the saliency method with gradient correction³¹.

For a global analysis, we performed TF-MoDISco on 100-bp windows centered at the reference and alternate versions for both dsQTLs and control SNPs using grelu.interpret.modisco.run_modisco with method = ‘saliency’, correct_grad = True, window = 100. From the output TF-MoDISco object, we plotted the motifs. Then, we extracted the seqlets (that is, motif instances) and computed the number of SNPs overlapping any motif separately for the dsQTL and control sets. A variant was counted as overlapping if either the reference or alternate versions overlap with a motif instance.

Inference and interpretation using the Borzoi model

We used the grelu.resources.load_model function to load the Borzoi model (replicate 0) from the public gReLU model zoo. We used the following CD14⁺ monocyte and CD4⁺ T cell RNA-seq tracks within Borzoi:

ENCFF023YXV+ (CD14-positive monocyte female),

ENCFF946ZPT (with multiple sclerosis; CD14-positive monocyte),

ENCFF853SNW (CD14-positive monocyte),

ENCFF735XXE (CD14-positive monocyte),

ENCFF848ZVQ (with multiple sclerosis; CD14-positive monocyte),

ENCFF926QTW (CD14-positive monocyte male adult (37 years)),

ENCFF623LHV (with multiple sclerosis; CD14-positive monocyte),

ENCFF004DOF (CD14-positive monocyte),

ENCFF579IBH+ (CD4-positive, alpha-beta T cell male adult (20 years)),

ENCFF515TIF+ (CD4-positive, alpha-beta T cell male adult (20 years)),

ENCFF223MUU (CD4-positive, alpha-beta T cell male adult (37 years)),

ENCFF089BOJ (CD4-positive, alpha-beta T cell male adult (21 years)).

Gene annotations were loaded using grelu.io.genome.read_gtf and filtered to remove overlapping transcripts using grelu.data.preprocess.filter_overlapping.

To predict and visualize RNA-seq coverage across the PPIF gene, we used the model.predict_on_seqs function, followed by conversion from genomic intervals to output bins with model.input_intervals_to_output_bins. The resulting predicted tracks were visualized using grelu.visualize.plot_tracks.

To quantify gene expression from model predictions, we used the ‘Aggregate’ prediction transform to compute the average predicted RNA-seq signal across bins overlapping the exons of the PPIF gene. These scores were used to compare relative expression between cell types and to track changes across sequence edits.

To produce the attention map surrounding the PPIF gene, we used the grelu.interpret.score.get_attention_scores function with parameter ‘block_idx = −1’, which extracts attention weights from the model’s final attention layer. We then averaged these weights across all attention heads of this layer before visualizing the matrix.

ISM, sequence design and motif discovery

We modified the distal enhancer located at chr10:79285732–79287386 (hg38), using the grelu.sequence.mutate function to substitute 5-bp windows with an alternate sequence. Model predictions were computed for each mutated sequence, and expression scores were calculated as above. The predicted expression changes were compared with experimental results from Variant-FlowFISH assays in K562 cells, which were obtained from supplementary table 8 of Martyn et al.⁴, and coordinates were converted from hg19 to hg38 using LiftOver³².

We computed ISM scores using the grelu.interpret.score.ISM_predict function, comparing predicted expression between the reference sequence and versions with single-base substitutions. These were visualized as log₂ fold-change heatmaps using grelu.visualize.plot_ISM.

To design sequences with enhanced cell-type specificity, we defined a prediction transform using the ‘Specificity’ class to compute the difference in predicted RNA-seq expression between monocytes and T cells, averaged across bins overlapping the PPIF exons. This metric served as the objective for optimization. We then used the grelu.design.evolve function to iteratively modify the enhancer sequence. At each round, a single base was mutated to improve the specificity objective.

To investigate the mechanisms driving the evolved activity, we scanned the original and evolved enhancer for known motifs using grelu.interpret.motifs.scan_sequences. Motifs were extracted from the HOCOMOCO v12 database³³. Comparative analysis with the original sequence was performed using grelu.interpret.motifs.compare_motifs.

Validation of the edited PPIF enhancer using orthogonal models

To validate the design using orthogonal models, we used a chromatin accessibility model (human-atac-catlas) from the gReLU model zoo. This is a multitask binary classification model, trained on binarized pseudobulk single-cell assay for transposase-accessible chromatin using sequencing (ATAC-seq) data from 203 human cell types²². Given a 200-bp input sequence, it predicts the probability of accessibility of the sequence in each cell type. We applied this model to the original and evolved PPIF enhancer sequences. Predictions are shown for both monocyte-like (Macrophage General, Macrophage Gen or Alv, Fetal Macrophage Placental, Fetal Macrophage Hepatic 1, Fetal Macrophage General 1, Fetal Macrophage General 2, Fetal Macrophage General 3, Fetal Macrophage Hepatic 2, Fetal Macrophage Hepatic 3 and Fetal Macrophage General 4) and T cell-like (T Lymphocyte 1 (CD8⁺), T lymphocyte 2 (CD4⁺), Natural Killer T, Naive T, Fetal T Lymphocyte 1 (CD4⁺), Fetal T Lymphocyte 2 (Cytotoxic) and Fetal T Lymphocyte 3 (IL2⁺)) tracks. We further validated that these tracks are similar to the THP-1 and Jurkat cell lines used by Martyn et al. (Supplementary Fig. 4 and Supplementary Methods).

We additionally validated the cell type specificity of the enhancer using the following DNAse tracks with the Borzoi model: ENCFF678LXL (CD14-positive monocyte female),

ENCFF659BVQ (CD14-positive monocyte female),

ENCFF724HAH (CD14-positive monocyte male adult (21 years)),

ENCFF722MXM (CD14-positive monocyte male adult (37 years)),

ENCFF759CKK (CD14-positive monocyte female adult (34 years)),

ENCFF522NPH (CD4-positive, alpha-beta T cell male adult (37 years)),

ENCFF133LMW (CD4-positive, alpha-beta T cell male adult (37 years)),

ENCFF658FXQ (CD4-positive, alpha-beta T cell male adult (21 years)),

ENCFF502EPK (CD4-positive, alpha-beta T cell female adult (33 years)),

ENCFF131RGB (CD4-positive, alpha-beta T cell male adult (21 years)).

We used the same ten DNAse tracks in the Enformer model. In these cases, grelu.transforms.prediction_transforms.Aggregate and score.compute were used to compute accessibility across the distal enhancer.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

All data used in this study are publicly available at the following locations: ENCODE ENCFF093VXI (GM12878 DNase-seq read depth normalized bigwig) at https://www.encodeproject.org/files/ENCFF093VXI/; ENCODE ENCFF588OCA (GM12878 DNase-seq peaks) at https://www.encodeproject.org/files/ENCFF588OCA/; ENCODE ENCSR636HFF (hg38 blacklist) at https://www.encodeproject.org/annotations/ENCSR636HFF/; list of dsQTL and control SNPs at https://static-content.springer.com/esm/art%3A10.1038%2Fng.3331/MediaObjects/41588_2015_BFng3331_MOESM26_ESM.xlsx; JASPAR 2024 vertebrate nonredundant position frequency matrices at https://jaspar.elixir.no/download/data/2024/CORE/JASPAR2024_CORE_vertebrates_non-redundant_pfms_jaspar.txt; HOCOMOCO v12 PWMs at https://hocomoco12.autosome.org/downloads_v12; and Martyn et al. Variant-FlowFISH data (Supplementary Table 8) at https://www.biorxiv.org/content/biorxiv/early/2023/12/21/2023.12.20.572268/DC1/embed/media-1.xlsx.

Code availability

gReLU is available under an MIT license via GitHub at https://github.com/Genentech/gReLU and via Zenodo at https://doi.org/10.5281/zenodo.15627612 (ref. ³⁴). Documentation is available at https://genentech.github.io/gReLU/. Tutorials are available via GitHub at https://github.com/Genentech/gReLU/tree/main/docs/tutorials. The model zoo is available at https://wandb.ai/grelu/ and includes all the models described in this Brief Communication. All models are also available via Zenodo at https://doi.org/10.5281/zenodo.15603450 (ref. ³⁵). Code used to perform the analyses in this Brief Communication is available via GitHub at https://github.com/Genentech/gReLU-applications and via Zenodo at https://doi.org/10.5281/zenodo.15603450 (ref. ³⁵). All experiments were run on a single NVIDIA DGX-1 using a single NVIDIA A100 graphics processing unit, with Python v3.11.8, PyTorch v2.2.2, PyTorch Lightning v2.2.4, CUDA v12.1.0 and gReLU v1.0.2.

References

Eraslan, G., Avsec, Ž., Gagneur, J. & Theis, F. J. Deep learning: new computational modelling techniques for genomics. Nat. Rev. Genet. 20, 389–403 (2019).
Article CAS PubMed Google Scholar
Lal, A. Deciphering the regulatory syntax of genomic DNA with deep learning. J. Biosci. 47, 47 (2022).
Article CAS PubMed Google Scholar
Trevino, A. E. et al. Chromatin and gene-regulatory dynamics of the developing human cerebral cortex at single-cell resolution. Cell 184, 5053–5069 (2021).
Article CAS PubMed Google Scholar
Martyn, G. E. et al. Rewriting regulatory DNA to dissect and reprogram gene expression. Cell 188, 3349–3366.e23 (2025).
Article CAS PubMed Google Scholar
Taskiran, I. I. et al. Cell-type-directed design of synthetic enhancers. Nature 626, 212–220 (2024).
Article CAS PubMed Google Scholar
Gosai, S. J. et al. Machine-guided design of cell-type-targeting cis-regulatory elements. Nature 634, 1211–1220 (2024).
Article CAS PubMed Central PubMed Google Scholar
Avsec, Ž et al. Base-resolution models of transcription-factor binding reveal soft motif syntax. Nat. Genet. 53, 354–366 (2021).
Article CAS PubMed Central PubMed Google Scholar
Whalen, S., Schreiber, J., Noble, W. S. & Pollard, K. S. Navigating the pitfalls of applying machine learning in genomics. Nat. Rev. Genet. 23, 169–181 (2022).
Article CAS PubMed Google Scholar
Schreiber, J. & Lu, Y. Y. Ledidi: designing genomic edits that induce functional activity. Preprint at bioRxiv https://doi.org/10.1101/2020.05.21.109686 (2020).
Shrikumar, A. et al. Technical note on Transcription Factor Motif Discovery from Importance Scores (TF-MoDISco) version 0.5.6.5. Preprint at https://arxiv.org/abs/1811.00416 (2018).
Chen, K. M., Cofer, E. M., Zhou, J. & Troyanskaya, O. G. Selene: a PyTorch-based deep learning library for sequence data. Nat. Methods 16, 315–318 (2019).
Article CAS PubMed Central PubMed Google Scholar
Avsec, Ž et al. The Kipoi repository accelerates community exchange and reuse of predictive models for genomics. Nat. Biotechnol. 37, 592–600 (2019).
Article CAS PubMed Central PubMed Google Scholar
Kopp, W., Monti, R., Tamburrini, A., Ohler, U. & Akalin, A. Deep learning for genomics using Janggu. Nat. Commun. 11, 3488 (2020).
Article CAS PubMed Central PubMed Google Scholar
Klie, A. et al. Predictive analyses of regulatory sequences with EUGENe. Nat. Comput Sci. 3, 946–956 (2023).
Article PubMed Central PubMed Google Scholar
Avsec, Ž et al. Effective gene expression prediction from sequence by integrating long-range interactions. Nat. Methods 18, 1196–1203 (2021).
Article CAS PubMed Central PubMed Google Scholar
Linder, J., Srivastava, D., Yuan, H., Agarwal, V. & Kelley, D. R. Predicting RNA-seq coverage from DNA sequence as a unifying model of gene regulation. Nat. Genet. 57, 949–961 (2025).
Article CAS PubMed Central PubMed Google Scholar
Novakovsky, G., Fornes, O., Saraswat, M., Mostafavi, S. & Wasserman, W. W. ExplaiNN: interpretable and transparent neural networks for genomics. Genome Biol. 24, 154 (2023).
Article CAS PubMed Central PubMed Google Scholar
Lal, A. et al. Decoding sequence determinants of gene expression in diverse cellular and disease states. Preprint at bioRxiv https://doi.org/10.1101/2024.10.09.617507 (2024).
Toneyan, S. & Koo, P. K. Interpreting cis-regulatory interactions from large-scale deep neural networks. Nat. Genet. 56, 2517–2527 (2024).
Article CAS PubMed Central PubMed Google Scholar
Pampari, A. et al. ChromBPNet: bias factorized, base-resolution deep learning models of chromatin accessibility reveal cis-regulatory sequence syntax, transcription factor footprints and regulatory variants. Preprint at bioRxiv https://doi.org/10.1101/2024.12.25.630221 (2025).
Vu, H. & Ernst, J. Universal annotation of the human genome through integration of over a thousand epigenomic datasets. Genome Biol. 23, 9 (2022).
Article CAS PubMed Central PubMed Google Scholar
Zhang, K. et al. A single-cell atlas of chromatin accessibility in the human genome. Cell 184, 5985–6001 (2021).
Article CAS PubMed Central PubMed Google Scholar
Degner, J. F. et al. DNase I sensitivity QTLs are a major determinant of human expression variation. Nature 482, 390–394 (2012).
Article CAS PubMed Central PubMed Google Scholar
Lee, D. et al. A method to predict the impact of regulatory variants from DNA sequence. Nat. Genet. 47, 955–961 (2015).
Article CAS PubMed Central PubMed Google Scholar
Bosshart, H. & Heinzelmann, M. THP-1 cells as a model for human monocytes. Ann. Transl. Med. 4, 438 (2016).
Article PubMed Central PubMed Google Scholar
Carrasco-Padilla, C. et al. T cell activation and effector function in the human Jurkat T cell model. Methods Cell Biol. 178, 25–41 (2023).
Article CAS PubMed Google Scholar
Poli, M. et al. Hyena hierarchy: towards larger convolutional language models. In Proc. 40th International Conference on Machine Learning 28043–28078 (ICML, 2023).
Gu, A. & Dao, T. Mamba: linear-time sequence modeling with selective state spaces. Preprint at http://arxiv.org/abs/2312.00752 (2023).
Drusinsky, S., Whalen, S. & Pollard, K. S. Deep-learning prediction of gene expression from personal genomes. Preprint at bioRxiv https://doi.org/10.1101/2024.07.27.605449 (2024).
Amemiya, H. M., Kundaje, A. & Boyle, A. P. The ENCODE Blacklist: identification of problematic regions of the genome. Sci. Rep. 9, 9354 (2019).
Article PubMed Central PubMed Google Scholar
Majdandzic, A., Rajesh, C. & Koo, P. K. Correcting gradient-based interpretations of deep neural networks for genomics. Genome Biol. 24, 109 (2023).
Article PubMed Central PubMed Google Scholar
Hinrichs, A. S. et al. The UCSC genome browser database: update 2006. Nucleic Acids Res. 34, D590–D598 (2006).
Article CAS PubMed Google Scholar
Vorontsov, I. E. et al. HOCOMOCO in 2024: a rebuild of the curated collection of binding models for human and mouse transcription factors. Nucleic Acids Res. 52, D154–D163 (2024).
Article CAS PubMed Google Scholar
Lal, A. et al. Genentech/gReLU: v1.0.6. Zenodo https://doi.org/10.5281/zenodo.15627612 (2025).
Nair, S. Genentech/gReLU-applications: v1.0. Zenodo https://doi.org/10.5281/zenodo.15603450 (2025).

Download references

Acknowledgements

We thank D. Garfield, O. Fornes, K. Fletez-Brant, M. Hasan Celik, K. Gjoni, C. de Donno, J. Kancherla, J.-P. Fortin and J. Kageyama for their advice, feedback and contributions to the package.

Author information

These authors contributed equally: Laura Gunsalus, Surag Nair.

Authors and Affiliations

Biology Research | AI Development, gRED Computational Sciences, Genentech, South San Francisco, CA, USA
Avantika Lal, Laura Gunsalus, Surag Nair, Tommaso Biancalani & Gokcen Eraslan

Authors

Avantika Lal
View author publications
Search author on:PubMed Google Scholar
Laura Gunsalus
View author publications
Search author on:PubMed Google Scholar
Surag Nair
View author publications
Search author on:PubMed Google Scholar
Tommaso Biancalani
View author publications
Search author on:PubMed Google Scholar
Gokcen Eraslan
View author publications
Search author on:PubMed Google Scholar

Contributions

A.L. and G.E. built the package. L.G. and S.N. performed analyses. T.B. provided mentorship and supervision. A.L., G.E., L.G. and S.N. wrote the paper. All authors read and approved the paper.

Corresponding author

Correspondence to Gokcen Eraslan.

Ethics declarations

Competing interests

All authors are employees of Genentech, Inc.

Peer review

Peer review information

Nature Methods thanks Alexander Sasse and the other, anonymous reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Lin Tang, in collaboration with the Nature Methods team. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Interpretation and validation of the edited PPIF enhancer.

a) ISM sequence logos of a subsequence in the edited PPIF enhancer, showing the emergence of new motifs. b) Predicted change in the probability of chromatin accessibility across the PPIF enhancer in T-cell/Jurkat-like and Monocyte/THP-1 like tracks using an orthogonal binary classification model trained to predict ATAC-seq peaks from DNA sequence.

Supplementary information

Supplementary Information (download PDF )

Supplementary Notes 1 and 2, methods, Tables 1–3 and Figs. 1–8.

Reporting Summary (download PDF )

Peer Review File (download PDF )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Lal, A., Gunsalus, L., Nair, S. et al. gReLU: a comprehensive framework for DNA sequence modeling and design. Nat Methods 22, 2253–2257 (2025). https://doi.org/10.1038/s41592-025-02868-z

Download citation

Received: 17 October 2024
Accepted: 17 September 2025
Published: 15 October 2025
Version of record: 15 October 2025
Issue date: November 2025
DOI: https://doi.org/10.1038/s41592-025-02868-z