iAODE for benchmarking and continuum modeling of single-cell chromatin accessibility

Fu, Zeyu; Chen, Chunlin; Wang, Song; Wang, Junping; Chen, Shilei

doi:10.1038/s42003-026-09768-8

Download PDF

Article
Open access
Published: 03 March 2026

iAODE for benchmarking and continuum modeling of single-cell chromatin accessibility

Communications Biology volume 9, Article number: 507 (2026) Cite this article

2069 Accesses
1 Altmetric
Metrics details

Subjects

Abstract

Single-cell chromatin accessibility profiles are extremely sparse but reflect continuous developmental trajectories. Most existing methods for dimensionality reduction and trajectory analysis optimize reconstruction error or cluster separation, without encoding temporal continuity in the model or providing metrics tailored to this objective. We introduce iAODE, a variational autoencoder that couples a zero-inflated negative binomial likelihood with a latent Neural ODE, low-weight Kullback–Leibler (KL) regularization, and an interpretable reconstruction bottleneck to learn generative, temporally continuous latent spaces. Around iAODE, we build a standardized AnnData benchmark of 248 single-cell Assay for Transposase-Accessible Chromatin using sequencing (scATAC-seq) and 123 single-cell RNA sequencing (scRNA-seq) datasets and a 20-metric evaluation suite that quantifies latent-space continuity, embedding quality, and clustering-coupling structure. Simulations confirm that the metrics respond smoothly to controlled continuity perturbations, and large-scale benchmarks show that the ODE, low-β, and bottleneck components synergistically improve trajectory structure and robustness over established generative and manifold-learning baselines.

Simultaneous dimensionality reduction and integration for single-cell ATAC-seq data using deep learning

Article Open access 23 February 2022

sciCAN: single-cell chromatin accessibility and gene expression data integration via cycle-consistent adversarial network

Article Open access 12 September 2022

Simultaneous profiling of RNA isoforms and chromatin accessibility of single cells of human retinal organoids

Article Open access 13 September 2024

Introduction

Single-cell Assay for Transposase-Accessible Chromatin using sequencing (scATAC-seq) characterizes open chromatin states at single-cell resolution, but the resulting data are typically extremely sparse, high-dimensional, zero-inflated, and strongly affected by batch effects, making it difficult to distinguish technical dropouts from truly inaccessible regions^1,2. Deep generative models, especially variational autoencoders (VAEs) and their extensions, provide a unified probabilistic framework for addressing these challenges. SCALE builds on a VAE and models binarized peak accessibility as Bernoulli observations³; PeakVI further introduces more flexible likelihoods and explicit batch-effect modeling to improve the resolution of biological heterogeneity⁴; subsequent work pushes modeling down to fragment counts to exploit count intensity more directly⁵. On this basis, SAILER penalizes information correlated with sequencing depth and batch to learn “biologically invariant” representations⁶; scMVP extends generative modeling to joint scRNA-seq and scATAC-seq in a multimodal setting⁷; GFETM integrates pre-trained genomic embeddings into a topic model to inject prior knowledge⁸.

Beyond VAEs, diffusion-based generative models provide another class of techniques. scDiffusion proposes diffusion models for conditional generation of high-quality single-cell data⁹; scButterfly performs single-cell cross-modality translation via dual-aligned variational autoencoders¹⁰. Complementing algorithmic approaches under extreme scATAC-seq sparsity, large-scale atlas resources provide reference accessibility landscapes across tissues and cell types¹¹. However, as sequencing scales increase and multi-omics measurements become more common, current methods, though successful for static latent representations, batch correction, and multimodal integration, largely treat developmental trajectories as a post-processing task: typically, trajectories are inferred after representation learning using pseudotime or graph abstraction algorithms applied to the learned embeddings^12,13. Systematic benchmarking further shows that different models emphasize different aspects, such as embedding quality, clustering performance, or downstream tasks¹⁴, making it difficult to answer the core question: “which type of latent structure is most favorable for recovering temporal continuity and regulatory trajectories?” Overall, the current landscape highlights two key gaps: (i) a lack of generative frameworks that encode temporal continuity directly in model structure, and (ii) a lack of unified evaluation and benchmarking specifically targeting continuum modeling.

Continuum modeling aims to recover the ordered evolution of cell states in latent space and support trajectory inference and time-related downstream analyses^15,16. For scATAC-seq, this involves smooth transitions of open-chromatin patterns across states and preservation of trajectory topology and temporal consistency under strong noise, motivating a multi-layer evaluation system. At the clustering level, metrics such as average silhouette width (ASW), Calinski-Harabasz index (CAL), and Davies-Bouldin index (DAV) assess separation and organization of cell states^17,18,19. At the embedding level, distance correlation (DC) evaluates preservation of pairwise distances; local and global quality metrics (QL, QG) derived from co-ranking matrices characterize neighborhood preservation at different scales; trustworthiness and continuity quantify local neighborhood distortion from complementary perspectives^20,21,22. At the trajectory level, the dynverse framework proposes geodesic distance correlation, Hamming-Ipsen-Mikhailov (HIM) distance, and branch F1 scores to compare inferred trajectories with references^23,24,25; partition-based graph abstraction (PAGA) abstracts cell graphs at the cluster level to identify backbones and branch structures²⁶; continuity evaluation also considers concordance between pseudotime orderings and known time labels, and whether key marker regions exhibit smooth changes along pseudotime^15,24. For scATAC-seq, extreme sparsity makes peak-gene link inference sensitive to noise¹⁹, requiring coherence examination at peak, gene, and multi-modal integration levels simultaneously^27,28.

Recent benchmarks emphasize that evaluations should integrate trajectory topology, differential accessibility, and multi-modal integration under a unified framework^14,18,29. However, these metrics are mostly used for post-hoc evaluation and do not directly constrain temporal dynamics at the model level, leaving room for explicit dynamical modeling. Unlike earlier metric suites designed primarily around discrete clustering or specific trajectory inference algorithms, this work focuses on continuous trajectories and smooth transitions in latent or embedding spaces rather than “continuity” of raw accessibility counts. Building on previous work, we define a 20-metric evaluation framework focused on latent continuum modeling, organized along three complementary dimensions: continuum (intrinsic latent-space continuity), embedding quality, and clustering/coupling, enabling systematic assessment of whether models can jointly capture generative distributions, trajectory structure, and interpretable regulatory modules.

To simultaneously capture temporal continuity and coordinated changes across multiple features within a deep generative model, we need explicit constraints on both latent dynamics and information structure—this is the central design motivation for iAODE. Neural ordinary differential equations (Neural ODEs) parameterize a vector field with a neural network and combine it with an ODE solver, assuming that latent states are continuously differentiable in time, which fits gradual processes such as development and differentiation. DeepVelo used Neural ODEs to fit smooth RNA kinetics in single-cell transcriptomics³⁰; related approaches have extrapolated future transcriptional states, reconstructed past trajectories^31,32, and modeled complex trajectories with stochastic perturbations³³. On the representation side, standard VAE Kullback–Leibler (KL) regularization encourages independence across latent dimensions, which can break the modular coupling structures that naturally arise in biological processes^34,35. Reducing the KL weight (low-β configuration) retains correlations between dimensions, allowing the model to capture coordinated directions corresponding to regulatory modules. Information bottleneck (IB) theory formalizes learning compressed yet informative representations³⁶; biolord and scInfoVAE instantiate this principle for single-cell data, introducing constraints that encourage latent variables to be both informative and well-structured^37,38.

Combining these three elements yields a continuous and coupled dynamical picture in latent space: the information bottleneck retains feature combinations relevant to trajectories; low-β KL allows these features to form correlated clusters; the Neural ODE learns a vector field over this space to model directions of coordinated change and their temporal evolution, making trajectories appear as paths along which multiple regulatory modules change in a coordinated fashion^30,37,38,39. This is consistent with classical results on β-VAE⁴⁰: when β > 1, stronger KL regularization trades reconstruction accuracy for more disentangled latent factors; when β is near 1 or slightly below, one can maintain good reconstruction while retaining correlation across dimensions. In this work, we mainly explore the low-penalty region β≤1 to study its effects on latent-space continuity and geometry, using high-β configurations only as controls to verify that they can impair both reconstruction and continuity. Our goal is not to seek fully factorized representations but to use low-β configurations to enhance coupling structures related to regulatory modules in latent space.

Existing deep generative models for scATAC-seq excel at static representation, batch correction, and multimodal integration but do not explicitly model continuous developmental dynamics; continuity relies on post-hoc metrics. We ask: how to learn a latent space that is both generative and endowed with intrinsic continuous trajectory structure under extreme scATAC-seq sparsity? We propose iAODE (interpretable Accessibility ODE-based variational autoencoder), built on a variational autoencoder (VAE) with zero-inflated negative binomial (ZINB) likelihood for discrete counts and zero inflation. The encoder predicts normalized pseudotime and introduces a Neural ODE vector field in latent space describing continuous chromatin-state evolution; low KL weights (β≤0.1) preserve coupling across latent dimensions associated with regulatory modules; an interpretable reconstruction (irecon) bottleneck acts as an information-bottleneck constraint encouraging interpretable accessibility patterns. We build a standardized evaluation environment with simulated topologies and multi-scale real datasets to validate continuum metrics, perform component/ablation experiments, and benchmark iAODE against deep generative models (PoissonVI, scVI, PeakVI, scTour) and traditional methods (principal component analysis (PCA), independent component analysis (ICA), Diffusion Maps). Our contributions are: (1) an ODE-VAE framework jointly modeling generative distributions and continuous trajectories for scATAC-seq; (2) standardized benchmark resources and evaluation suite spanning simulated and multi-scale real data; (3) multi-dimensional benchmarks demonstrating ODE-based frameworks’ potential for trajectory continuity, embedding quality, and biological interpretability. We focus on VAEs and variants—including high-β variant (HB, β = 5) and mutual-information/total-correlation regularizers (InfoVAE, β-TCVAE, DIP-VAE) that strengthen decoupling—to enable fair comparisons of latent-structure constraints under unified probabilistic objectives, contrasting iAODE’s low-β, coupling-emphasizing, dynamics-centric design. Systematic benchmarking of diffusion and non-VAE paradigms awaits future work building on this evaluation framework.

Results

iAODE framework and standardized multi-modal benchmark resources

We summarize the iAODE framework together with the standardized benchmark resources and evaluation protocol that support all experiments. To address the high-dimensional sparsity and heterogeneous origins of scATAC-seq data, we couple local model training with an online resource platform (Fig. 1A): core algorithms are implemented in Python/PyTorch, while a static Next.js/TypeScript site serves curated data samples, metrics, and documentation. scATAC-seq and scRNA-seq data from multiple species and platforms are obtained from public repositories and uniformly processed from Cell Ranger filtered matrices (filtered_feature_bc_matrix.h5; Fig. 1B). The initial collection contains 434 scATAC-seq samples from 93 studies and 183 scRNA-seq samples from 20 studies; after stratifying by cell count into Tiny (<5k), Small (5k–10k), Medium (10k–20k), and Large (>20k) tiers and discarding Tiny data samples, the final benchmark comprises 248 scATAC-seq and 123 scRNA-seq datasets that span diverse real biological topologies. The iAODE architecture (Fig. 1C) unifies static feature extraction and dynamic trajectory inference in a single probabilistic model: term frequency–inverse document frequency (TF-IDF)-normalized, highly variable peak/gene-filtered matrices enter an encoder that produces low-dimensional latents and a normalized pseudotime, which parametrizes a latent Neural ODE; a dual-path decoder reconstructs counts and aligns static and ODE-propagated latents via reconstruction and consistency losses under low-β KL regularization. For each dataset, we apply a fixed protocol (Fig. 1E) with a 70%/15%/15% train/validation/test split and a 20-metric suite covering continuum quality, embedding fidelity, and clustering/coupling. An interactive “Dataset Browser” and “Continuity Explorer” (Fig. 1D) expose these standardized resources and metrics, providing a reusable platform for single-cell continuum modeling.

**Fig. 1: Overall architecture of the iAODE platform, benchmark resources, and evaluation protocol.**

Topological simulations and continuum metrics on real scATAC data, behavior and hyperparameter priors

To validate metric behavior under controlled conditions, we evaluate continuum metrics on simulated datasets with cyclic, linear, and branching topologies in two scenarios: a Continuum scenario with fine-tuned continuity (0.85–0.95; Fig. 2A,B) and a Clustering to Continuum scenario transitioning from discrete clusters to continuous manifolds (0.00–1.00; Fig. 2C,D). In the Continuum scenario, six core metrics—Spectral Decay, Anisotropy, Participation Ratio, Trajectory Directionality, Manifold Dimensionality, and Noise Resilience—exhibit nearly linear, monotonic relationships with continuity (Fig. 2B; Supplementary Table 1). Across topologies, goodness-of-fit is consistently high (typically R² > 0.8), supporting smooth, approximately linear responses to small continuity perturbations. Clustering (ASW, CAL, DAV), embedding (DC, QL, QG), and latent coupling (COR) metrics show consistent patterns (Supplementary Fig. 1A,B). In the Clustering to Continuum scenario, linearity remains strong across the full 0–1 interval (Fig. 2D; Supplementary Fig. 1C; Supplementary Table 1), and most correlations are highly significant (p < 0.001), justifying metric use in real-data comparisons.

**Fig. 2: Validation of continuum metrics on simulated topologies and hyperparameter prescreening on real scATAC-seq data.**

We then perform hyperparameter prescreening on 248 real scATAC-seq datasets using a baseline VAE without an ODE module to isolate regularizer effects. Friedman tests across KL weight β and irecon weight I_rec show significant differences (global p < 0.001; Fig. 2E; Supplementary Fig. 1D; Supplementary Table 2). Lower β (≤0.1) consistently outperforms higher-β configurations: in Medium datasets, β = 0.1 achieves advantage scores of +0.15 and +0.09 for Manifold Dimensionality and Spectral Decay versus β = 10. Increasing I_rec improves geometric metrics and especially Noise Resilience; in Small datasets, β = 0.01, I_rec = 10 versus β = 0.01, I_rec = 0.1 yields +0.87 for Noise Resilience (most p < 0.001). Notably, metric variance across β and I_rec is substantial in this no-ODE setting, indicating high sensitivity when latent dynamics are absent. These results motivate focusing on low-β and moderate-to-high I_rec ranges for iAODE, while final defaults are determined by the robustness analyses below.

Component synergy and ablations in multi-scale scATAC data

After validating metric behavior and hyperparameter ranges, we investigate the roles of low-β KL (LB, β = 0.01), irecon bottleneck (IR), and ODE module via additive and pairwise ablation experiments across 165 Small, 68 Medium, and 15 Large scATAC-seq datasets (Fig. 3A,B; Supplementary Table 3). In additive experiments (Fig. 3A), full iAODE outperforms baseline VAE (Base) in Small by +0.57 in Overall intrinsic quality, +0.86 in Noise Resilience, and +0.60 in Trajectory Directionality, with gains of +0.30–0.52 in Manifold Dimensionality, Spectral Decay, Participation Ratio, and Anisotropy (all p < 0.001). Compared to single-component variants (LB, IR, or ODE), Full gains +0.44–0.47 in Overall quality over LB and IR but only +0.13 over ODE; for Trajectory Directionality, gains are +0.48–0.50 relative to LB/IR and +0.08 relative to ODE, suggesting ODE substantially improves trajectory directionality while LB and IR refine geometry and denoising. For embedding quality under UMAP, Full versus Base yields +0.30, +0.26, +0.15, and +0.24 in DC, QL, QG, and OV; versus LB/IR gains are +0.17–0.25 for DC, QL, OV and +0.11–0.12 for QG, while versus ODE differences shrink to +0.02–0.05 for DC, QG, OV and +0.11 for QL (Supplementary Fig. 2A; Supplementary Table 4). Clustering metrics show Full versus Base advantages of +0.15 in ASW, +1,028 in CAL, and +4.52 in COR; Full versus ODE reduces to +0.11, +659, and +1.39 (Supplementary Fig. 3A; Supplementary Table 5). Medium-group differences are slightly larger, and Large-group trends remain consistent.

**Fig. 3: Component addition and pairwise ablations across scales.**

Pairwise ablations clarify component synergy (Fig. 3B; Supplementary Fig. 3B). In Small, LB+IR (no ODE) degrades most strongly: relative to Full, Overall quality drops by 0.35, Noise Resilience by 0.65, Trajectory Directionality by 0.36, while CAL and COR decrease by 889 and 2.94. In contrast, LB+ODE and IR+ODE differ minimally from Full: Overall quality is only 0.03–0.04 lower, Trajectory Directionality 0.01–0.04 lower, and Noise Resilience/COR differ by only 0.04–0.09 and 0.55–0.60, supporting that ODE provides the global backbone for trajectories while LB and IR refine geometry and denoising. Embedding metrics mirror these patterns (Supplementary Fig. 2B): Full versus LB+IR shows +0.17, +0.15, +0.09, and +0.14 for DC, QL, QG, and OV under UMAP, whereas Full versus LB+ODE/IR+ODE yields near-zero differences in DC, QG, OV (within 0–0.01) and only +0.03–0.06 in QL. Cross-scale results show component synergy persists: Full versus Base gains in Overall quality increase from +0.57 (Small) to +0.58 (Medium) and +0.63 (Large); CAL and COR gains grow from +1,028/+4.52 (Small) to +1,394/+4.67 (Medium) and +4,958/+5.34 (Large). Overall, the three components form a cooperative system where low-β KL frees correlated directions, the irecon bottleneck filters noise, and the ODE encodes trajectories, with the full configuration consistently delivering balanced performance across continuum, embedding, and clustering metrics.

Continuum and embedding benchmarks against deep generative models across multi-scale scATAC data

We benchmark iAODE against VAE-family and deep generative baselines across Small (165), Medium (68), and Large (15) scATAC-seq datasets (Fig. 4A-C; Supplementary Table 6). Baselines include high-β variant (HB, β = 5), disentangling VAEs (DIP, TC, INFO), PoissonVI, scVI, PeakVI, and ODE-based scTour; Friedman tests and repeated-measures analysis of variance (RM-ANOVA) indicate significant overall differences (global p < 0.001). On continuum metrics, iAODE consistently outperforms other baselines. In Small, Full versus scVI yields +0.29 in Overall quality and +0.54 in Noise Resilience; versus PeakVI +0.21 and +0.38; versus scTour +0.13 and +0.27 (all p < 0.001); Manifold Dimensionality, Spectral Decay, Participation Ratio, Anisotropy, and Trajectory Directionality improve by +0.15–0.33 versus scVI and +0.05–0.13 versus scTour. Medium and Large show similar or amplified trends: in Medium, iAODE versus scVI yields +0.35 and +0.74 in Overall quality and Noise Resilience; in Large, differences grow to +0.53 and +0.81. HB, DIP, INFO, and TC sometimes achieve slightly higher Participation Ratios due to stronger decoupling but fall behind on Noise Resilience and Trajectory Directionality, consistent with strongly disentangled configurations favoring isolated factors over continuous trajectories.

**Fig. 4: Continuum and embedding benchmarks across multi-scale scATAC-seq datasets for deep generative models.**

Embedding quality under Uniform Manifold Approximation and Projection (UMAP) and t-distributed stochastic neighbor embedding (t-SNE) also favors iAODE or places it among top models (Fig. 4A-C; Supplementary Table 6). In Small (UMAP), iAODE versus scVI yields +0.07, +0.05, +0.05, and +0.06 in DC, QL, QG, and OV; versus scTour +0.02, +0.10, +0.02, and +0.05. In Medium, DC(UMAP), QL(UMAP), and OV(UMAP), differences versus scVI rise to +0.18, +0.09, and +0.12. In Large, DC/QG differences versus scTour shrink slightly (-0.02 to -0.04, mostly non-significant), but iAODE remains better in QL and OV, suggesting prioritization of trajectory and noise robustness at high complexity while scTour retains competitive distance consistency. Clustering/coupling metrics further highlight iAODE’s advantages (Supplementary Fig. 3C; Supplementary Table 7). In Small versus scTour, iAODE gains +0.11 in ASW (p < 0.001), while DAV improves by -0.12 to -0.58 versus scTour/PoissonVI/PeakVI; CAL increases by +252 to +599; COR by +1.58 to +2.25 (all p < 0.001). In Medium, gaps are larger: ASW differences versus scVI/PeakVI/scTour +0.06–0.12; CAL versus PoissonVI +1,431.5; COR versus scVI/PeakVI +3.62/+3.23. Importantly, iAODE does not dominate every metric: PoissonVI, scVI, and PeakVI sometimes show slightly larger Participation Ratios in Medium, and scTour occasionally exhibits comparable DC/QG in Large, reflecting trade-offs between local focus and compact continuity-oriented structures. Yet, combining continuum, embedding, and clustering dimensions, iAODE offers a more balanced and stable deep generative solution.

Continuum and clustering benchmarks against linear dimensionality reduction and manifold-learning methods

We compare iAODE with six traditional methods—linear reductions (principal component analysis (PCA), independent component analysis (ICA), factor analysis (FA), and non-negative matrix factorization (NMF)) and non-linear manifold tools (Diffusion Maps, Palantir)—across Small (165), Medium (68), and Large (15) scATAC-seq datasets (Fig. 5A–C; Supplementary Table 8). Friedman and RM-ANOVA tests show significant differences for most metrics (overall p < 0.001). On continuum metrics, iAODE outperforms linear methods consistently: in Small, Overall quality versus ICA/FA/NMF/PCA differs by +0.74, +0.74, +0.41, and +0.14; Noise Resilience versus PCA by +0.39 (all p < 0.001); Manifold Dimensionality and Participation Ratio versus ICA/FA/NMF/PCA differ by +0.55, +0.55, +0.28, +0.09 and +0.87, +0.87, +0.27, +0.06, respectively. Similar patterns hold in Medium and Large, reflecting limitations of linear projections in capturing complex non-linear manifolds. Comparisons with Diffusion Maps and Palantir highlight complementary strengths: for Trajectory Directionality, iAODE shows substantially higher values, with differences of +0.77, +0.73, +0.77 versus Diffusion Maps and +0.67, +0.72, +0.76 versus Palantir (Small, Medium, Large; all p < 0.001 or p < 0.01), consistent with graph-based methods tending to fragment trajectories whereas iAODE’s latent ODE provides smoother interpolation. Clustering/coupling metrics amplify these differences (Supplementary Fig. 3D; Supplementary Table 9): in Small, CAL versus Palantir and Diffusion Maps differ by +8,294 and +8,629; COR by +6.12 and +6.23; in Medium, these increase to CAL +9,088/+9,317 and COR +5.34/+5.63.

**Fig. 5: Continuum and embedding benchmarks against linear dimensionality reduction and manifold-learning methods.**

Conversely, Palantir and Diffusion Maps retain advantages on local compactness. In Small, ASW differences versus Palantir and Diffusion Maps are about −0.17 and −0.15; in Medium and Large, ASW declines by -0.18 to -0.28; UMAP/t-SNE QL values are also sometimes slightly lower for iAODE versus Palantir, while QG and OV generally favor iAODE (Supplementary Table 8). This pattern suggests graph-based methods prioritize local neighborhood preservation, whereas iAODE allows slightly looser local packing to achieve stronger global topological continuity and coupling—particularly beneficial for reconstructing continuous trajectories spanning multiple lineages. Overall, iAODE shows substantial, scale-robust improvements over linear and classical manifold-learning methods in continuum and coupling metrics, while Palantir and Diffusion Maps retain strengths for local compactness. In practice, graph-based methods can serve as complementary tools for fine-grained subpopulation structure, whereas ODE-based deep generative models are more suitable when global trajectory topology and regulatory interpretability are primary objectives.

Cross-modal transferability of iAODE components in the scRNA-seq modality and clustering-coupling performance

Although iAODE is designed for sparse scATAC-seq data, its latent ODE-based continuous dynamical modeling is applicable to scRNA-seq. We systematically evaluated baseline VAE (Base), single-component variants (LB, IR, ODE), and full configuration (Full: LB+IR+ODE) on 62 Small and 61 Medium scRNA-seq datasets (Figure 6A,B; Supplementary Table 10); Friedman tests indicate significant overall differences (global p < 0.001). For continuum metrics, full iAODE achieves performance patterns in RNA closely mirroring those in ATAC: compared with Base, Small shows median gains of approximately +0.73 and +0.95 in Overall intrinsic quality and Noise Resilience, and +0.79 in Trajectory Directionality; in Medium these gains are +0.75, +0.95, and +0.80. Manifold Dimensionality, Spectral Decay, Participation Ratio, Anisotropy, and Core intrinsic quality improve monotonically as components are added: in Medium, Full versus Base yields increases of roughly +0.59, +0.41, +0.81, +0.71, and +0.63. Comparisons with single-component variants show ODE contributes most to continuity: relative to Full, additional gains over ODE-only are only +0.37 (Small) and +0.50 (Medium) for Noise Resilience, much smaller than gains over Base. For embedding quality, Full outperforms Base on UMAP/t-SNE metrics (DC, QL, QG, OV) in both groups: in Medium, UMAP-based DC, QL, QG, and OV improve by +0.46, +0.36, +0.21, and +0.34 versus Base. Relative to ODE-only, QL and OV increase by +0.17 and +0.07, whereas DC is nearly unchanged, indicating ODEs capture most global distance structure while LB and IR improve local neighborhood geometry.

**Fig. 6: Component addition and ablations in the scRNA-seq modality: continuity, embedding quality, and clustering-coupling.**

For clustering and coupling metrics, iAODE in RNA shows the same pattern of simultaneously improved cluster separation and continuity as in ATAC (Fig. 6C). Using Base as reference, Small yields gains of roughly +0.21, −0.92, and +2,714 in ASW, DAV, and CAL; for Medium these reach +0.24, −1.09, and +4,684 (all p < 0.001). Importantly, latent coupling (COR) rises markedly: compared with Base, COR increases by +6.38 and +6.48 in Small and Medium; even relative to ODE-only, gains of +2.90 and +3.03 remain, showing iAODE enhances cluster-level separation while strengthening coordinated variation. Pairwise ablations clarify cooperative roles (Supplementary Fig. 4A--D; Supplementary Table 11): LB+IR (without ODE) degrades most strongly in continuity and clustering; in Medium, Full versus LB+IR yields gains of +0.42, +0.47, and +0.70 in Overall quality, Trajectory Directionality, and Noise Resilience, and differences of approximately +4,291, -0.40, +0.11, and +3.99 in CAL, DAV, ASW, and COR. Overall, in scRNA-seq the three-way combination yields performance patterns highly consistent with ATAC: Full shows stable, significant gains over Base and single-component variants while maintaining or improving clustering geometry, supporting iAODE as a continuous modeling framework that transfers across single-cell modalities.

Robustness and deployability of iAODE across hyperparameters, encoder architectures, and computational cost

Having established iAODE’s relative performance across scales, baselines, and modalities, we assess its sensitivity to hyperparameters, encoder architectures, and computational requirements. iAODE is most sensitive to KL divergence weight β, while showing relaxed tolerance around reconstruction weight I_rec and mixing coefficient α (Fig. 7A-C; Supplementary Table 12; Supplementary Fig. 5A–C; Supplementary Table 13). In Medium (n = 68), increasing β from 0.1 to 10.0 causes monotonic decreases in Overall intrinsic quality, Noise Resilience, Manifold Dimensionality, Anisotropy, and Trajectory Directionality; at β = 10.0, Overall quality and Noise Resilience drop by 0.29 and 0.61, CAL is markedly reduced, DAV increases, and COR weakens (all p < 0.001), indicating overly strong KL regularization substantially impairs continuity. In contrast, varying I_rec has milder effects: reducing I_rec from 10.0 to 0.1 mainly lowers Noise Resilience and Overall quality (Medium declines of 0.42 and 0.15), while core geometric metrics remain stable. Mixing coefficient α is relatively robust for continuum metrics but impacts clustering: when α decreases from 0.75 to 0.25, Medium shows modest drops of 0.14 in Overall quality, but CAL decreases by 3,200 (p < 0.001), much larger than maximum CAL changes induced by I_rec tuning (800). Small (n = 165) exhibits consistent patterns. Based on these analyses, we use β = 0.1, I_rec = 10.0, and α = 0.75 in main benchmarks.

**Fig. 7: Hyperparameter sensitivity, encoder architectures, and computational scalability of iAODE.**

At the encoder level, Transformer-based architectures outperform multilayer perceptrons (MLPs) for most continuity and clustering metrics, at the cost of mildly lower distance correlation (DC) (Fig. 7D; Supplementary Fig. 5D; Supplementary Tables 12 and 13). In Medium, Transformers yield gains of 0.073, 0.060, 0.171, 0.129, 0.102, and 0.102 for Manifold Dimensionality, Spectral Decay, Anisotropy, Trajectory Directionality, Noise Resilience, and Overall quality (all p < 0.001); for clustering, CAL increases by 4,979.3, DAV decreases by 0.336, ASW increases by 0.107, and COR rises by 1.274 (all p < 0.001). However, DC(UMAP) and DC(t-SNE) for Transformers are lower than MLPs by 0.031 and 0.074 (p < 0.001), while QL improves by 0.062/0.065; QG and OV differences are negligible.

Regarding computational cost (Fig. 7E), we measured per-epoch wall-clock time and peak GPU memory as functions of cell number and fitted linear models. Because Large datasets are few and unevenly distributed, the fits use only Small and Medium datasets (up to ~10⁵ cells), where iAODE, scVI, PeakVI, and PoissonVI all show approximately linear scaling and iAODE incurs a modest constant-factor overhead that remains compatible with a single 24 GB GPU. Extrapolating these Small+Medium fits to a hypothetical 10⁶-cell dataset yields predicted per-epoch times of 250.5 s (iAODE), 16.4 s (scVI), 16.0 s (PeakVI), and 17.0 s (PoissonVI), and peak GPU memories of 64.4 GB, 27.3 GB, 17.1 GB, and 14.9 GB, respectively. These extrapolated values should be viewed as upper bounds under our current architecture and fixed-batch protocol—particularly the 64.4 GB estimate for iAODE, which would require memory-saving strategies such as gradient checkpointing or multi-GPU training—whereas in the empirically evaluated regime ( ≲ 10⁵ cells) iAODE is readily trainable on a single workstation-class GPU.

Multi-scale trajectory reconstruction and biological interpretability of latent dynamics

While the quantitative metrics above systematically compare models on continuum modeling, intuitive trajectory visualizations and biological interpretability remain crucial for assessing the practical utility of deep generative models. We therefore first visually compare iAODE and several representative baselines across three cell scales (approximately 5k, 10k, and 20k cells) on benchmark datasets (Fig. 8A). In contrast to the fragmented branches, over-clustered structures, or poorly oriented trajectories that sometimes appear in scVI, PoissonVI, and PeakVI, the latent embeddings produced by iAODE at all three scales form smooth, continuous “cell flows” with consistent directions, closely recovering the expected developmental backbone and branches.

**Fig. 8: Multi-scale trajectory visualization and biological interpretability of iAODE latent dynamics.**

To investigate biological meaning in latent space, we next align high-activation regions of specific latent dimensions with the spatial distributions of key marker genes and regulatory elements in a mouse brain dataset and a human peripheral blood mononuclear cell (PBMC) dataset (Fig. 8B). In the mouse brain data, high-valued regions in certain latent dimensions strongly co-localize with the distributions of canonical neurogenesis markers such as the Notch1 gene body and the Neurod2 promoter. In the human PBMC data, promoter regions of immunologically relevant genes such as BICDL1 are enriched along corresponding latent directions. This spatial concordance between “latent axes” and marker features suggests that iAODE’s latent dimensions are not merely abstract statistical factors but align with concrete transcriptional regulatory modules and epigenetic states.

Finally, we perform gene ontology (GO) enrichment analysis on genes associated with high-weight features from iAODE to assess higher-level functional relevance (Fig. 8C). In the mouse brain dataset, genes linked to prominent latent features are significantly enriched for terms such as “central nervous system development” and “neurogenesis”, consistent with tissue origin; in human PBMC, enriched terms include “lymphocyte activation” and “hematopoietic process”, matching immune and hematopoietic context. Overall, these analyses indicate that, in addition to recovering global trajectory topology and continuity, iAODE’s latent representations are biologically interpretable across marker genes, regulatory regions, and functional pathways, providing a foundation for integrating regulatory network inference and causal intervention design. We stress that the mouse brain and human PBMC analyses are computational case studies on well-characterized differentiation systems used to check whether iAODE produces plausible results under established marker and trajectory frameworks. Stronger biological validation—such as systematic comparisons with lineage tracing, time-course, and perturbation experiments—remains an important direction for future work.

Discussion

iAODE is a deep generative framework for single-cell chromatin accessibility that models continuity as a structural prior in latent space. It combines a Neural ODE vector field with a low-β KL setting and an interpretable reconstruction (irecon) bottleneck. Across simulated topologies and a standardized library of real scATAC-seq and scRNA-seq datasets, iAODE improves continuum geometry metrics, embedding quality, and clustering/coupling scores relative to VAE-family baselines and time-regularized models.

A key implication is that continuity can be enforced during representation learning rather than recovered only by downstream pseudotime or graph-based procedures^41,42. In ablations, the ODE component primarily supports global trajectory directionality, while low-β and irecon improve denoising and local geometry, and the full combination yields the best balance.

Chromatin accessibility is often shaped by gradual regulatory remodeling^43,44, making smooth latent dynamics a natural inductive bias for scATAC-seq. iAODE also transfers to scRNA-seq as a model of dominant trends, although RNA expression can include sharper transient changes that motivate alternative kinetic priors in some settings⁴⁵.

iAODE is not intended to replace all existing approaches. Topic and graph-based methods can be strong for discrete state discovery and local compactness, and diffusion-based generative models are increasingly effective for generation and augmentation, but they are not yet standardized for continuity-focused evaluation under a unified protocol. Our results suggest that when the primary objective is coherent long-range trajectory structure under strong sparsity, explicitly parameterizing latent dynamics (via an ODE) and relaxing overly factorizing regularization can provide a robust inductive bias.

Beyond the model, we provide a multi-metric evaluation suite spanning intrinsic continuum geometry, embedding fidelity, and clustering/coupling, together with a standardized benchmark collection to promote reproducible comparisons. These experiments clarify trade-offs between local compactness and global topology, and highlight iAODE as a practical option when coherent long-range structure is a priority.

The benchmarking component is an equally important contribution. By releasing a standardized multi-scale dataset collection and a multi-metric evaluation suite that separates intrinsic continuum geometry, embedding fidelity, and clustering/coupling behavior, we aim to reduce ambiguity in future comparisons. In practice, the results show clear trade-offs (e.g., local compactness versus global directionality) that can be obscured when a single metric or a single downstream task is used as a proxy.

Limitations remain. Performance depends on hyperparameters (notably β, I_rec, and the ODE mixing weight α), and a single smooth vector field may over-smooth abrupt or multi-phase processes. Some application settings may require explicit covariate modeling beyond our default protocol, and biological validation remains indirect when only snapshot data are available. Future work could incorporate stochastic or piecewise dynamics, extend the framework to joint multi-modal (RNA+ATAC) trajectory inference, and use time-course or lineage-tracing data to directly assess dynamical fidelity beyond marker-based interpretation⁴⁶.

Methods

iAODE latent ODE-VAE architecture and irecon bottleneck design

The encoder network E_ϕ transforms preprocessed scATAC-seq input vectors ${{\bf{x}}}\in {{\mathbb{R}}}^{d}$ (e.g., TF-IDF features) derived from raw count vectors x_raw to latent distribution parameters through a hierarchical neural network, where d is the number of input features (peaks for scATAC-seq):

$${{\bf{h}}}=\,{{\rm{ReLU}}}\,({{{\bf{W}}}}_{2}\,\,{{\rm{ReLU}}}\,({{{\bf{W}}}}_{1}{{\bf{x}}}+{{{\bf{b}}}}_{1})+{{{\bf{b}}}}_{2}).$$

(1)

The latent distribution parameters are computed as

$$\left[{{{\boldsymbol{\mu }}}}_{z},\log {{{\boldsymbol{\sigma }}}}_{z}^{2}\right]={{{\bf{W}}}}_{3}{{\bf{h}}}+{{{\bf{b}}}}_{3}.$$

(2)

Latent variables are sampled using the reparameterization trick:

$${{\bf{z}}}={{{\boldsymbol{\mu }}}}_{z}+{{{\boldsymbol{\sigma }}}}_{z}\odot {{\boldsymbol{\epsilon }}},\,{{\boldsymbol{\epsilon }}} \sim {{\mathcal{N}}}({{\bf{0}}},{{\bf{I}}}).$$

(3)

To capture temporal dynamics in chromatin accessibility, the encoder additionally predicts a time parameter

$$t=\sigma ({{{\bf{W}}}}_{t}{{\bf{h}}}+{{{\bf{b}}}}_{t}),$$

(4)

where σ( ⋅ ) ensures t ∈ [0, 1] and represents the relative position along the accessibility trajectory.

The temporal evolution of chromatin accessibility states is modeled with a Neural ODE that learns a velocity field in latent space:

$$\frac{d{{\bf{z}}}}{dt}={f}_{\varphi }(t,{{\bf{z}}})={{{\bf{W}}}}_{f}^{(2)}\,\,{{\rm{ELU}}}\,({{{\bf{W}}}}_{f}^{(1)}{{\bf{z}}}+{{{\bf{b}}}}_{f}^{(1)})+{{{\bf{b}}}}_{f}^{(2)}.$$

(5)

This enables continuous modeling of chromatin-state transitions and regulatory dynamics inherent in scATAC-seq data.

To enforce biologically meaningful and structured representations, we implement an interpretable reconstruction (irecon) module that creates a compressed bottleneck in latent space:

$${{{\bf{z}}}}_{I}=\,{{\rm{Linear}}}\,({{\bf{z}}};{{{\bf{W}}}}_{I},{{{\bf{b}}}}_{I}),\,\widehat{{{\bf{z}}}}=\,{{\rm{Linear}}}\,({{{\bf{z}}}}_{I};{{{\bf{W}}}}_{O},{{{\bf{b}}}}_{O}),$$

(6)

where ${{{\bf{z}}}}_{I}\in {{\mathbb{R}}}^{{d}_{I}}$ is a compressed irecon representation with d_I < d_z, d_z is the latent space dimension, capturing essential chromatin-accessibility patterns while reducing redundancy.

The decoder network D_ψ reconstructs scATAC-seq counts from latent representations. Given the discrete and sparse nature of accessibility measurements, we parameterize the reconstruction with a softmax-normalized output:

$$\widehat{{{\bf{x}}}}=\,{{\rm{Softmax}}}\,\,\left({{{\bf{W}}}}_{d}^{(2)}\,\,{{\rm{ReLU}}}\,({{{\bf{W}}}}_{d}^{(1)}{{\bf{z}}}+{{{\bf{b}}}}_{d}^{(1)})+{{{\bf{b}}}}_{d}^{(2)}\right).$$

(7)

For zero-inflated scenarios, the decoder additionally predicts dropout probabilities

$${{\boldsymbol{\pi }}}={{{\bf{W}}}}_{\pi }{{{\bf{h}}}}_{d}+{{{\bf{b}}}}_{\pi },$$

(8)

where ${{{\bf{h}}}}_{d}=\,{{\rm{ReLU}}}\,({{{\bf{W}}}}_{d}^{(1)}{{\bf{z}}}+{{{\bf{b}}}}_{d}^{(1)})$ is the decoder hidden state.

The overall iAODE-VAE objective combines multiple loss components:

$${{{\mathcal{L}}}}_{{{\rm{total}}}}={{{\mathcal{L}}}}_{{{\rm{recon}}}}+{{{\mathcal{L}}}}_{{{\rm{irecon}}}}+{{{\mathcal{L}}}}_{{{\rm{ODE}}}}+{{{\mathcal{L}}}}_{{{\rm{KL}}}}.$$

(9)

For scATAC-seq counts we use a ZINB reconstruction loss:

$${{{\mathcal{L}}}}_{{{\rm{recon}}}}=-{{\mathbb{E}}}_{q({{\bf{z}}}| {{\bf{x}}})}\left[\log {p}_{{{\rm{ZINB}}}}({{\bf{x}}}| \widehat{{{\bf{x}}}},{{\boldsymbol{\theta }}},{{\boldsymbol{\pi }}})\right],$$

(10)

where $\widehat{{{\bf{x}}}}=\,{{\rm{Softmax}}}\,({D}_{\psi }({{\bf{z}}}))\cdot \parallel {{{\bf{x}}}}_{{{\rm{raw}}}}{\parallel }_{1}$ ensures library-size normalization with $\parallel {{{\bf{x}}}}_{{{\rm{raw}}}}{\parallel }_{1}={\sum }_{j}{x}_{{{\rm{raw}}},j}$ being the cell library size, θ denotes peak-specific dispersion parameters, and π models technical dropouts.

The irecon module defines a compressed reconstruction loss

$${{{\mathcal{L}}}}_{{{\rm{irecon}}}}=-{{\mathbb{E}}}_{q({{{\bf{z}}}}_{I}| {{\bf{z}}})}\left[\log {p}_{{{\rm{ZINB}}}}({{\bf{x}}}| {\widehat{{{\bf{x}}}}}_{I},{{{\boldsymbol{\theta }}}}_{I},{{{\boldsymbol{\pi }}}}_{I})\right],$$

(11)

where ${{{\bf{z}}}}_{I}\in {{\mathbb{R}}}^{{d}_{I}}$ with d_I ≪ d_z enforces an information-constrained latent space that encourages learning of essential accessibility patterns. Here, θ_I and π_I play the same roles as θ and π in the main decoder, but are predicted from the compressed representation z_I and thus capture bottleneck-specific dispersion and dropout patterns. Analogously, ${\widehat{{{\bf{x}}}}}_{I}$ denotes the ZINB mean reconstructed from the compressed representation z_I and is rescaled by the same library size ∥x_raw∥₁ as $\widehat{{{\bf{x}}}}$, so that both reconstruction terms operate on a comparable count scale.

Temporal consistency is encouraged via an ODE loss:

$${{{\mathcal{L}}}}_{{{\rm{ODE}}}}=\parallel {{\bf{z}}}-{{{\bf{z}}}}_{{{\rm{ODE}}}}{\parallel }_{2}^{2},$$

(12)

where z_ODE is the evolved latent state produced by integrating $\frac{d{{\bf{z}}}}{dt}={f}_{\varphi }(t,{{\bf{z}}})$.

The standard KL divergence matches the posterior to the prior:

$${{{\mathcal{L}}}}_{{{\rm{KL}}}}={{\rm{KL}}}\left(q({{\bf{z}}}| {{\bf{x}}})\,\parallel \,p({{\bf{z}}})\right).$$

(13)

For trajectory inference in scATAC-seq data, we construct transition matrices from ODE-derived velocity fields:

$${{\bf{v}}}={f}_{\varphi }(t,{{\bf{z}}}),\,{{{\bf{z}}}}_{{{\rm{future}}}}={{\bf{z}}}+\epsilon \cdot {{\bf{v}}}.$$

(14)

Transition probabilities use adaptive Gaussian kernels with median-based bandwidth:

$${P}_{ij}=\frac{\exp (-\parallel {{{\bf{z}}}}_{i}-{{{\bf{z}}}}_{{{\rm{future}}},j}{\parallel }^{2}/2{\sigma }^{2})}{{\sum }_{k}\exp (-\parallel {{{\bf{z}}}}_{i}-{{{\bf{z}}}}_{{{\rm{future}}},k}{\parallel }^{2}/2{\sigma }^{2})}.$$

(15)

The final representation linearly combines static and dynamic latents:

$${{{\bf{z}}}}_{{{\rm{final}}}}=(1-\alpha )\,{{\bf{z}}}+\alpha \,{{{\bf{z}}}}_{{{\rm{ODE}}}},$$

(16)

where α ∈ [0, 1] controls the strength of ODE dynamics, enabling joint modeling of chromatin landscapes and regulatory dynamics.

Standardized benchmark library construction and quality control

To ensure reproducibility and facilitate systematic benchmarking, we constructed a standardized library of scATAC-seq and scRNA-seq datasets from publicly available sources in the Gene Expression Omnibus (GEO). All datasets were derived from Cell Ranger standard outputs: for scATAC-seq we collected filtered_peak_bc_matrix.h5 files containing fragment counts per cell for consensus peak regions, and for scRNA-seq we collected filtered_feature_bc_matrix.h5 files containing unique molecular identifier (UMI) counts per gene. Because Cell Ranger applies standard quality-control (QC) filters during matrix generation—removing low-quality cells based on total counts, fraction of reads in peaks or genes, and other technical metrics—these filtered matrices represent data that have already passed instrument- and protocol-specific QC thresholds. We therefore use these outputs directly without imposing additional uniform filtering criteria, recognizing that optimal QC varies across experimental protocols, tissue types, and sequencing depths.

After loading each filtered matrix, we categorized datasets by cell count N into four size groups: Tiny (N < 5, 000), Small (5, 000≤N < 10, 000), Medium (10, 000≤N < 20, 000), and Large (N≥20, 000). Our initial collection comprised 434 scATAC-seq datasets distributed as 186 Tiny, 165 Small, 68 Medium, and 15 Large, and 183 scRNA-seq datasets. To ensure statistical robustness for manifold geometry estimation and continuum evaluation, we excluded Tiny-scale datasets and, for scRNA-seq, focused on Small and Medium scales due to limited availability of large-scale RNA data. After these filters, our final standardized benchmark library comprises 248 scATAC-seq datasets (165 Small, 68 Medium, 15 Large) and 123 scRNA-seq datasets (62 Small, 61 Medium). All datasets were converted to AnnData objects with consistent structure.

scATAC-seq preprocessing and selection of highly variable peaks

Across all benchmark datasets we use a unified scATAC-seq preprocessing and feature selection pipeline. Let ${{\bf{X}}}\in {{\mathbb{R}}}^{N\times d}$ be the raw fragment count matrix, where N is the number of cells and d the number of peaks. For each cell i we compute its library size

$${L}_{i}={\sum }_{j=1}^{d}{X}_{ij},$$

(17)

and obtain per-cell normalized term frequency (TF):

$${{{\rm{TF}}}}_{ij}=\frac{{X}_{ij}}{{L}_{i}}.$$

(18)

For each peak j, we compute the number of accessible cells n_j and its accessible fraction n_j/N, and then define the inverse document frequency (IDF) following standard Signac/SnapATAC2 practice:

$${{{\rm{idf}}}}_{j}=\log \left(1+\frac{N}{{n}_{j}}\right).$$

(19)

The resulting TF-IDF matrix is

$${\widetilde{X}}_{ij}={{{\rm{TF}}}}_{ij}\cdot {{{\rm{idf}}}}_{j}\cdot s,$$

(20)

where s is a global scaling factor (default s = 10⁴). We store $\widetilde{{{\bf{X}}}}$ as a sparse matrix to maximize efficiency for large datasets.

For feature selection, we first pre-filter peaks by accessibility fraction: we retain only peaks whose accessibility rate lies in $[{p}_{\min },{p}_{\max }]$ (with default thresholds ${p}_{\min }=0.01$ and ${p}_{\max }=0.99$) to remove extremely rare and nearly ubiquitously accessible peaks. Denote the filtered peak set by ${{\mathcal{P}}}$. We then compute variance- or variance-to-mean-based variability measures for each peak in ${{\mathcal{P}}}$, rank them, and select the top n_top highly variable peaks as the final feature subset, used both for training iAODE and as inputs to comparison methods. By default we keep 20,000 highly variable peaks for scATAC-seq and 5000 highly variable genes for scRNA-seq (using Scanpy’s default highly variable gene (HVG) selection pipeline).

iAODE training strategy and dataset splitting

After preprocessing and feature selection, each dataset is split into training, validation, and test sets with a default ratio of 0.7/0.15/0.15, using a fixed random seed for reproducibility. For each mini-batch, we form pairs (x, x_raw), where x is the TF-IDF-transformed and peak-filtered input and x_raw is the corresponding raw count vector. The encoder estimates the latent posterior q(z∣x) and time variable t in log-transformed space; the decoder predicts library-normalized means which are rescaled by the cell-specific library size L = ∥x_raw∥₁ to reconstruct counts. These predictions are used to compute the reconstruction loss ${{{\mathcal{L}}}}_{{{\rm{recon}}}}$ and irecon loss ${{{\mathcal{L}}}}_{{{\rm{irecon}}}}$ under a negative binomial (NB) or ZINB likelihood.

We then add the ODE consistency term ${{{\mathcal{L}}}}_{{{\rm{ODE}}}}=\parallel {{\bf{z}}}-{{{\bf{z}}}}_{{{\rm{ODE}}}}{\parallel }_{2}^{2}$ and KL regularization ${{{\mathcal{L}}}}_{{{\rm{KL}}}}={{\rm{KL}}}\left(q({{\bf{z}}}| {{\bf{x}}})\,\parallel \,p({{\bf{z}}})\right)$. In practice, we weight each loss component as

$${{{\mathcal{L}}}}_{{{\rm{total}}}}={\lambda }_{{{\rm{recon}}}}{{{\mathcal{L}}}}_{{{\rm{recon}}}}+{\lambda }_{{{\rm{irecon}}}}{{{\mathcal{L}}}}_{{{\rm{irecon}}}}+{\lambda }_{{{\rm{ODE}}}}{{{\mathcal{L}}}}_{{{\rm{ODE}}}}+\beta {{{\mathcal{L}}}}_{{{\rm{KL}}}},$$

(21)

where λ_recon, λ_irecon (denoted I_rec in hyperparameter analysis), λ_ODE, and β control the relative contributions of reconstruction, irecon bottleneck, ODE consistency, and KL regularization, respectively.

We optimize with Adam in a mini-batch setting. After every fixed number of epochs we compute validation loss and internal clustering/coupling metrics (ASW, CAL, DAV, COR) and use early stopping with patience: if validation loss does not improve for a predefined number of evaluations, training is terminated, and parameters are rolled back to the best epoch. This ensures convergence while mitigating overfitting and reducing compute cost.

For the iAODE objective, we set λ_recon = 1.0, λ_irecon = 1.0, and λ_ODE = 1.0. When illustrating the “LB” configuration we use a low-β KL weight β = 0.01 to enhance latent correlation. The baseline VAE and other VAE variants use β = 1 as standard, while a High-Beta (HB) variant uses a stronger KL weight β = 5 to represent more disentangled, cluster-oriented configurations. In the hyperparameter sensitivity analysis, we systematically vary β ∈ {0.1, 1, 10}, I_rec ∈ {0.1, 1, 10}, and the mixing coefficient α ∈ {0.25, 0.5, 0.75}. Based on these sensitivity results, the main benchmarks in this manuscript use β = 0.1, I_rec = 10.0, and α = 0.75. Latent ODE trajectories are solved using an adaptive-step numerical integrator from the PyTorch ODE stack, which automatically adjusts internal step sizes based on error control; explicit time-step size is thus not exposed as a separate hyperparameter.

Baseline models and unified implementation of regularization terms

For baseline and ablation comparisons we include common regularization strategies used in deep generative modeling to assess how different latent-structure constraints affect continuum modeling and representation quality in a unified framework. These regularizers are only enabled when reproducing classical β-VAE, DIP-VAE, β-TCVAE, and InfoVAE baselines and are not essential components of iAODE.

The β-VAE framework augments the reconstruction loss with a weighted KL penalty:

$${{{\mathcal{L}}}}_{{{\rm{VAE}}}}={{{\mathcal{L}}}}_{{{\rm{recon}}}}+\beta \,{{{\mathcal{L}}}}_{{{\rm{KL}}}},\,{{{\mathcal{L}}}}_{{{\rm{KL}}}}={{\rm{KL}}}(q({{\bf{z}}}| {{\bf{x}}})\,\parallel \,p({{\bf{z}}})),$$

(22)

where higher β > 1 encourages more disentangled latent factors while β < 1 retains more correlations⁴⁰.

DIP-VAE further constrains the covariance of the latent means by penalizing deviations of Cov_q(z)[μ_z] from the identity:

$${{{\mathcal{L}}}}_{{{\rm{DIP}}}}={\left\Vert {{\rm{offdiag}}}\,\left({{{\rm{Cov}}}}_{q({{\bf{z}}})}[{{{\boldsymbol{\mu }}}}_{z}]\right)\right\Vert }_{F}^{2}+{\lambda }_{{{\rm{diag}}}}{\left\Vert {{\rm{diag}}}\,\left({{{\rm{Cov}}}}_{q({{\bf{z}}})}[{{{\boldsymbol{\mu }}}}_{z}]\right)-{{\bf{1}}}\right\Vert }_{2}^{2},$$

(23)

where offdiag and diag select off-diagonal and diagonal entries respectively, and λ_diag balances the two penalties⁴⁷.

β-TCVAE isolates total correlation (TC) in the KL decomposition and penalizes it explicitly:

$${{\rm{TC}}}({{\bf{z}}}) = {{\rm{KL}}} \left(\left.q({{\bf{z}}})\right\Vert {\prod }_{j}q({z}_{j})\right),$$

(24)

with regularizer ${{{\mathcal{L}}}}_{{{\rm{TC}}}}=\gamma \,{{\rm{TC}}}({{\bf{z}}})$, where γ controls the strength of disentanglement⁴⁸.

InfoVAE and related mutual-information extensions add mutual information and maximum mean discrepancy (MMD) terms to balance reconstruction fidelity and prior matching:

$${{{\mathcal{L}}}}_{{{\rm{Info}}}}={{{\mathcal{L}}}}_{{{\rm{recon}}}}+{\alpha }_{{{\rm{Info}}}}\,{{{\mathcal{L}}}}_{{{\rm{KL}}}}+{\lambda }_{{{\rm{MMD}}}}\,{{\rm{MMD}}}(q({{\bf{z}}}),p({{\bf{z}}})),$$

(25)

where MMD measures the discrepancy between q(z) and p(z) in kernel mean embedding space, and α_Info and λ_MMD weight KL and MMD respectively. In our baselines, we use the same encoder/decoder architecture and switch regularization terms and weights to reproduce or approximate DIP-VAE, β-TCVAE, InfoVAE, and related methods, enabling quantitative comparisons of different latent-structure constraints under the same implementation and metrics⁴⁹.

Comprehensive evaluation of continuity, embedding quality, clustering, and coupling

To assess the quality of learned latent spaces—especially the ability to preserve biological coupling and model continuous trajectories of cell states—we design a multi-dimensional evaluation system. It covers standard clustering performance as well as co-ranking analysis and spectral-geometric features to quantify intrinsic manifold properties independently of 2D visualization. Our final framework uses 20 metrics across three complementary dimensions: 8 intrinsic metrics describing latent-space continuity and manifold geometry, 8 structure-fidelity metrics based on 2D UMAP/t-SNE embeddings, and 4 clustering/coupling metrics.

Let ${{\bf{X}}}\in {{\mathbb{R}}}^{N\times d}$ be the original high-dimensional data and ${{\bf{Z}}}\in {{\mathbb{R}}}^{N\times {d}_{z}}$ the learned latent representation, where N is the number of cells and d_z is the latent dimension.

Intrinsic continuity and manifold-geometry metrics.

We consider a set of metrics based on the eigenvalue spectrum ${{\boldsymbol{\lambda }}}=[{\lambda }_{1},\ldots ,{\lambda }_{{d}_{z}}]$ of the covariance matrix in latent space to characterize intrinsic geometry, independent of explicit 2D projections.

Manifold dimensionality score (${M}_{\dim }$). This measures representational compactness. Higher values indicate that essential biological variation is captured with fewer effective dimensions:

$${M}_{\dim }=1-\frac{{d}_{{{\rm{eff}}}}-1}{{d}_{z}-1},$$

(26)

where d_eff is the number of dimensions required to explain 95% of the variance.

Spectral decay (S_decay). This quantifies how steeply eigenvalues decline. A stronger decay suggests a clear hierarchical structure, with dominant developmental axes and secondary variation patterns:

$${S}_{{{\rm{decay}}}}=\frac{1}{1+\exp (| \kappa | )}\cdot \frac{{\lambda }_{1}}{{\sum }_{i=1}^{{d}_{z}}{\lambda }_{i}},$$

(27)

where κ is the slope of a linear regression on log-eigenvalues.

Participation ratio (P_ratio). This measures how evenly variance is distributed across dimensions. Higher values indicate balanced representation of multiple biological processes rather than reliance on a few dominant factors:

$${P}_{{{\rm{ratio}}}}=\frac{{({\sum }_{i}{\lambda }_{i})}^{2}}{{d}_{z}{\sum }_{i}{\lambda }_{i}^{2}}.$$

(28)

Anisotropy score (${A}_{{{\rm{score}}}}$). Using a hyperbolic tangent transform of log-eigenvalues, this quantifies the directional strength of the manifold. Large values indicate strong anisotropy, often corresponding to clear differentiation pathways:

$${A}_{{{\rm{score}}}}=\tanh \left(\frac{\log ({\lambda }_{1})-\log ({\lambda }_{{d}_{z}}+\epsilon )}{4}\right),$$

(29)

where ϵ = 10⁻⁸ is a small constant to prevent division by zero.

Trajectory directionality (T_dir) and noise resilience (${N}_{{{\rm{res}}}}$). The former measures dominance of the first principal component as a putative developmental axis; the latter approximates a signal-to-noise ratio assessing separation of biological signal from technical noise:

$${T}_{{{\rm{dir}}}}=\frac{{\lambda }_{1}}{{\sum }_{i=2}^{{d}_{z}}{\lambda }_{i}+\epsilon },\,{N}_{{{\rm{res}}}}=\min \left(\frac{{\lambda }_{1}+{\lambda }_{2}}{{\sum }_{i=3}^{{d}_{z}}{\lambda }_{i}+\epsilon }\cdot \frac{1}{10},1\right).$$

(30)

Composite manifold-quality scores.

We define two composite scores to summarize manifold quality, distinguishing core geometric integrity from trajectory-oriented performance.

Core intrinsic quality (${Q}_{{{\rm{core}}}}$). This aggregates ${M}_{\dim }$, S_decay, P_ratio, and ${A}_{{{\rm{score}}}}$ to capture basic geometric properties relevant for generic representation learning:

$${Q}_{{{\rm{core}}}}=\frac{1}{4}\left({M}_{\dim }+{S}_{{{\rm{decay}}}}+{P}_{{{\rm{ratio}}}}+{A}_{{{\rm{score}}}}\right).$$

(31)

Overall intrinsic quality (Q_overall). This task-oriented score combines core geometry with trajectory-specific metrics via a weighted sum:

$${Q}_{{{\rm{overall}}}}={w}_{{{\rm{core}}}}\cdot {Q}_{{{\rm{core}}}}+{w}_{{{\rm{dir}}}}\cdot {T}_{{{\rm{dir}}}}+{w}_{{{\rm{noise}}}}\cdot {N}_{{{\rm{res}}}},$$

(32)

with $({w}_{{{\rm{core}}}},{w}_{{{\rm{dir}}}},{w}_{{{\rm{noise}}}})=(0.5,0.3,0.2)$. This gives highest weight to core structure (${w}_{{{\rm{core}}}}=0.5$), substantial weight to trajectory directionality (w_dir = 0.3) to reflect developmental analysis needs, and moderate weight to noise resilience (w_noise = 0.2) to handle sparse, noisy single-cell data.

Embedding-quality metrics.

To assess structural fidelity of high-dimensional latent representations under commonly used 2D visualizations, we fix UMAP/t-SNE hyperparameters and project each model’s latent space to 2D under this common protocol, then compute distance-correlation and co-ranking-based metrics. Crucially, we are evaluating how well different latent spaces preserve structure under the same 2D operator, not comparing visualization algorithms themselves, which here serve only as standard embedding operators for downstream visualization scenarios.

Distance correlation (DC). We compute Spearman correlation between vectorized pairwise distance matrices in high and low dimensions, capturing global structural preservation:

$${\rho }_{{{\rm{dist}}}}={\rho }_{s}\left({{\rm{vec}}}({{{\bf{D}}}}_{{{\rm{high}}}}),{{\rm{vec}}}({{{\bf{D}}}}_{{{\rm{low}}}})\right),$$

(33)

where D_high and D_low are pairwise distances in the original and embedded spaces, ρ_s denotes Spearman’s rank correlation, and vec( ⋅ ) vectorizes a matrix by stacking its columns.

Local and global quality (QL, QG). Based on the co-ranking matrix Q, where Q_kl counts neighbor pairs ranked k-th in high dimension and l-th in low dimension, we define

$${Q}_{NX}(K)=\frac{1}{KN}{\sum }_{k=1}^{K}{\sum }_{l=1}^{K}{Q}_{kl}.$$

(34)

We use the local continuity meta-criterion (LCMC) to find the optimal neighborhood size

$${K}_{\max }=\arg {\max }_{K}{{\rm{LCMC}}}(K),$$

(35)

and define

$${Q}_{{{\rm{local}}}}=\frac{1}{{K}_{\max }}{\sum }_{k=1}^{{K}_{\max }}{Q}_{NX}(k),\,{Q}_{{{\rm{global}}}}=\frac{1}{N-{K}_{\max }}{\sum }_{k={K}_{\max }+1}^{N}{Q}_{NX}(k).$$

(36)

${K}_{\max }$ is an intermediate quantity, not reported as an independent metric.

We define overall embedding quality (OV) as

$${Q}_{{{\rm{embed}}}}=\frac{1}{3}({\rho }_{{{\rm{dist}}}}+{Q}_{{{\rm{local}}}}+{Q}_{{{\rm{global}}}}).$$

(37)

Latent representations, clustering quality, and coupling.

Although our focus is on continuous trajectories and manifold structure rather than purely discrete clustering, we retain intrinsic clustering metrics that depend directly on latent distances to constrain continuity from the perspective of “state segments” (e.g., cell types or developmental stages). Unlike external metrics such as normalized mutual information (NMI) or adjusted Rand index (ARI), ASW, DAV, and CAL here are computed solely from latent-space distances and do not rely on external label assignments; they capture cluster compactness and separation and partly reflect latent geometry. High scores under continuum modeling indicate that the model preserves smooth trajectories and topology while maintaining local separability and group boundaries—important when distinguishing discrete types within continuous pseudotime.

Average silhouette width (ASW). This measures intra-cluster cohesion and inter-cluster separation:

$${{\rm{ASW}}}=\frac{1}{N}{\sum }_{i=1}^{N}\frac{{b}_{i}-{a}_{i}}{\max ({a}_{i},{b}_{i})},$$

(38)

where a_i is the mean distance from sample i to points in the same cluster and b_i is the mean distance to the nearest other cluster. Values near 1 indicate well-separated, compact clusters.

Calinski-Harabasz index (CAL). This is the ratio of between-cluster to within-cluster dispersion:

$${{\rm{CAL}}}=\frac{{{{\rm{SS}}}}_{B}/(C-1)}{{{{\rm{SS}}}}_{W}/(N-C)},$$

(39)

where SS_B and SS_W are between- and within-cluster sums of squares and C is the number of clusters. Larger CAL implies tighter and more separated clusters.

Davies-Bouldin index (DAV). This averages similarity between each cluster and its most similar neighbor:

$${{\rm{DAV}}}=\frac{1}{C}{\sum }_{i=1}^{C}{\max }_{j\ne i}\left(\frac{{S}_{i}+{S}_{j}}{{M}_{i,j}}\right),$$

(40)

where S_i is the average intra-cluster distance and M_i,j is the distance between cluster centroids. Lower values are better.

Latent coupling (COR). We compute the mean absolute Pearson correlation of latent dimensions:

$${{\rm{COR}}}=\frac{1}{{d}_{z}({d}_{z}-1)}{\sum }_{i < j}| {{\rm{corr}}}({{{\bf{z}}}}_{i},{{{\bf{z}}}}_{j})| ,$$

(41)

where z_i is the i-th latent dimension across cells. Higher COR indicates stronger coordinated variation among latent axes, consistent with modular biological programs.

iAODE software and visualization ecosystem for training, data browsing, and continuum exploration

To reduce the barrier to entry and ensure reproducibility, we build an ecosystem around iAODE that integrates model training, data management, and continuum evaluation. The core implementation is in Python 3.9+ and PyTorch, packaged as an open-source library iAODE (https://github.com/PeterPonyu/iAODE) and archived on Zenodo (https://doi.org/10.5281/zenodo.18453104). We provide a static web frontend hosted on GitHub Pages (https://peterponyu.github.io/iAODE) for browsing standardized datasets and interactively inspecting continuity metrics.

In a Python/Scanpy environment, users can pass preprocessed AnnData objects directly to iAODE, configure latent dimension, loss weights, and optimization parameters, launch training from the command line or notebooks, and access latent embeddings, pseudotime, and continuity metrics via a unified API. Alternatively, they can run the local FastAPI server and use a browser GUI to upload data, set hyperparameters, and export results.

The frontend has two main pages. The Dataset Browser shows metadata of standardized AnnData resources (cell/feature counts, species, platform, batches, etc.) and supports filtering and subsampling. The Continuity Explorer focuses on representative simulated datasets and interactively visualizes latent embeddings along with continuity metrics such as manifold dimensionality, spectral decay, trajectory directionality, and noise resilience, providing geometric intuition for numerical results. To facilitate reproduction of all experiments, the GitHub repository includes example scripts in notebooks/ and examples/ for both ATAC and RNA modalities, covering raw-matrix loading, TF-IDF normalization, highly variable (HV) peak/gene selection, iAODE training, trajectory inference, metric computation, and visualization.

Statistics and reproducibility

Unless noted otherwise, cross-dataset comparisons use dataset-level aggregation: each dataset yields one summary metric and is treated as an independent observation. Within a given size category (Small, Medium, Large) or modality (scATAC-seq, scRNA-seq), we use Friedman tests or repeated-measures ANOVA (RM-ANOVA) to assess global differences across multiple models, components, or hyperparameter configurations. When comparing three or more settings and normality is doubtful, we use Friedman tests; for smaller sample sizes with approximate homoscedasticit,y we use RM-ANOVA. If the global test is significant (typically p < 0.001), we perform pairwise comparisons: for each dataset, we compare metrics between two configurations using paired Wilcoxon signed-rank tests or paired t-tests for continuous variables; we apply Bonferroni correction to all pairwise p-values in multi-group comparisons.

For interpretability, we report “advantage scores” in summary tables, e.g., Δ = Full − Variant for component analysis, representing the difference between full iAODE and a given variant on a given metric. In hyperparameter analyses, we use “reference configuration - alternative configuration” to summarize relative gains. For metrics where lower is better (e.g., DAV), signs are interpreted accordingly; for others, positive Δ indicates the first configuration is superior. For simulated data, we assess linear relationships between the user-specified continuum setting and metric scores via simple linear regression, reporting Pearson R, slope, R², and two-sided p-values. All statistical analyses are conducted in Python using scipy and numpy, with visualizations produced via matplotlib and seaborn.

Hardware and computational-resource evaluation

All experiments are run on a single-GPU workstation equipped with an NVIDIA GeForce RTX 5090 Laptop GPU (24 GB VRAM), a 24-core CPU, and 64 GB RAM. To ensure fair comparisons across models and settings, we fix the number of mini-batches per epoch across data scales by adjusting batch sizes: 128, 256, and 512 for Small, Medium, and Large datasets, respectively, yielding approximately 27 mini-batches per epoch per dataset. The maximum number of epochs is 400, with validation every 5 epochs. Early stopping with a patience of 25 validation checks is used: if validation loss does not decrease across 25 consecutive evaluations, training stops and parameters are rolled back to the epoch with the minimum validation loss.

Data availability

All datasets analyzed in this study are publicly available from existing repositories as described in the Methods and in the Supplementary Information. Numerical results and summary statistics supporting the figures and plots are provided in the Supplementary Information (Supplementary Tables 1–13). The Supplementary Information is also available on Figshare (https://doi.org/10.6084/m9.figshare.31225099)⁵⁰.

Code availability

The iAODE source code is available at https://github.com/PeterPonyu/iAODE. The exact version used in this study is archived on Zenodo (https://doi.org/10.5281/zenodo.18453104)⁵¹.

References

Li, Z. et al. Chromatin-accessibility estimation from single-cell ATAC-seq data with scOpenn. Nat. Commun. 12, 6386 (2021).
Article CAS PubMed PubMed Central Google Scholar
Rachid, Z. S. MOCHA’s advanced statistical modeling of scATAC-seq data enables functional genomic inference in large human cohorts. Nat. Commun. 15, 528 (2024).
Google Scholar
Xiong, L. et al. SCALE method for single-cell ATAC-seq analysis via latent feature extraction. Nat. Commun. 10, 4576 (2019).
Article PubMed PubMed Central Google Scholar
Ashuach, T., Reidenbach, D. A., Gayoso, A. & Yosef, N. PeakVI: A deep generative model for single-cell chromatin accessibility analysis. Cell Rep. Methods 2, 100182 (2022).
Article CAS PubMed PubMed Central Google Scholar
Martens, L. D., Fischer, D. S., Yépez, V. A., Theis, F. J. & Gagneur, J. Modeling fragment counts improves single-cell ATAC-seq analysis. Nat. Methods 21, 28–31 (2024).
Article CAS PubMed Google Scholar
Cao, Y., Jia, L., Wang, L. & Zhang, J. SAILER: Scalable and accurate invariant representation learning for single-cell ATAC-seq processing and integration. Bioinformatics 37, i317–i327 (2021).
Article CAS PubMed PubMed Central Google Scholar
Li, G. et al. A deep generative model for multi-view profiling of single-cell RNA-seq and ATAC-seq data. Genome Biol. 23, 20 (2022).
Article CAS PubMed PubMed Central Google Scholar
Fan, Y., Li, Y., Ding, J. & Li, Y. GFETM: Genome Foundation-Based Embedded Topic Model for scATAC-seq Modeling. In: Research in Computational Molecular Biology. RECOMB 2024. Lecture Notes in Computer Science. (ed.Ma, J.) vol 14758, pp. 314–319 (Springer, Cham, 2024).
Zheng, S. C. et al. scDiffusion: conditional generation of high-quality single-cell data using diffusion model. Bioinformatics 40, btae518 (2024).
Article Google Scholar
Cao, Z. J. et al. scButterfly: a versatile single-cell cross-modality translation method via dual-aligned variational autoencoders. Nat. Commun. 15, 2885 (2024).
Google Scholar
Zhang, K. et al. A single-cell atlas of chromatin accessibility in the human genome. Cell 184, 5985–6001.e19 (2021).
Article PubMed PubMed Central Google Scholar
Baek, S., Song, K. & Lee, I. Single-cell foundation models: bringing artificial intelligence into cell biology. Exp. Mol. Med. 56, 2169–2181 (2024).
Google Scholar
Gayoso, A. et al. A Python library for probabilistic analysis of single-cell omics data. Nat. Biotechnol. 40, 163–166 (2022).
Article CAS PubMed Google Scholar
Brombacher, E., Hackenberg, M. & Kreutz, C. The performance of deep generative models for learning joint embeddings of single-cell multi-omics data. Front Mol. Biosci. 9, 962644 (2022).
Article CAS PubMed PubMed Central Google Scholar
Buenrostro, J. D. et al. Integrated single-cell analysis maps the continuous regulatory landscape of human hematopoietic differentiation. Cell 173, 1535–1548.e16 (2018).
Article PubMed PubMed Central Google Scholar
Trevino, A. E. et al. Chromatin and gene-regulatory dynamics of the developing human cerebral cortex at single-cell resolution. Cell 184, 5053–5069.e23 (2021).
Article PubMed Google Scholar
Song, Q. & Su, J. SMGR: a joint statistical method for integrative analysis of single-cell multi-omics data. NAR Genom. Bioinform. 4, lqac056 (2022).
Article PubMed PubMed Central Google Scholar
Luo, S., Germain, P. L., Robinson, M. D. & von Meyenn, F. Benchmarking computational methods for single-cell chromatin data analysis. Genome Biol. 25, 225 (2024).
Article PubMed PubMed Central Google Scholar
Chen, H. et al. Assessment of computational methods for the analysis of single-cell ATAC-seq data. Genome Biol. 20, 241 (2019).
Article PubMed PubMed Central Google Scholar
Tian, T., Wan, J., Song, Q. & Wei, Z. Complex hierarchical structures in single-cell genomics data unveiled by deep hyperbolic manifold learning. Genome Res. 33, 821–835 (2023).
Article Google Scholar
Ahlmann-Eltze, C. & Huber, W. Analysis of multi-condition single-cell data with latent embedding multivariate regression. Nat. Methods 21, 659–667 (2024).
Google Scholar
Sidarta-Oliveira, D., Jara, C. P., Ferruzzi, A. J., Skaf, M. S. & Velloso, L. A. TopOMetry systematically learns and evaluates the latent space of high-dimensional data using topology and deep learning. eLife 13, RP100361 (2024).
Google Scholar
Smolander, J., Junttila, S., Venäläinen, M. S. & Elo, L. L. Cell-connectivity-guided trajectory inference from single-cell data. Bioinformatics 39, btad515 (2023).
Article CAS PubMed PubMed Central Google Scholar
Van den Berge, K. et al. Trajectory-based differential expression analysis for single-cell sequencing data. Nat. Commun. 11, 1201 (2020).
Article PubMed PubMed Central Google Scholar
Shi, Y. et al. scCRT: a contrastive-based dimensionality reduction model for single-cell RNA-seq data clustering and pseudo-time trajectory inference. Brief. Bioinform. 25, bbae204 (2024).
Article CAS PubMed PubMed Central Google Scholar
Wolf, F. A. et al. PAGA: graph abstraction reconciles clustering with trajectory inference through a topology preserving map of single cells. Genome Biol. 20, 59 (2019).
Article PubMed PubMed Central Google Scholar
Du, J. H., Chen, T., Gao, M. & Wang, J. Joint trajectory inference for single-cell genomics using deep learning with a mixture prior. Proc. Natl. Acad. Sci. USA 121, e2316256121 (2024).
Article PubMed PubMed Central Google Scholar
Xiao, C. et al. Benchmarking multi-omics integration algorithms across cell-type identification, trajectory inference and gene regulatory network reconstruction. Brief. Bioinform. 25, bbae095 (2024).
Article CAS PubMed PubMed Central Google Scholar
Luecken, M. D. et al. Benchmarking atlas-level data integration in single-cell genomics. Nat. Methods 19, 41–50 (2022).
Article CAS PubMed Google Scholar
Chen, Z., King, W. C., Hwang, A., Gerstein, M. & Zhang, J. DeepVelo: Single-cell transcriptomic deep velocity field learning with neural ordinary differential equations. Sci. Adv. 8, eabq3745 (2022).
Article CAS PubMed PubMed Central Google Scholar
Erbe, R., Stein-O’Brien, G. & Fertig, E. J. Transcriptomic forecasting with neural ordinary differential equations. Patterns 4, 100793 (2023).
Article CAS PubMed PubMed Central Google Scholar
Zhang, J., Xu, C. & Chen, Z. scNODE: generative model for temporal single cell transcriptomic data prediction. Bioinformatics 40, ii146–ii154 (2024).
Article PubMed PubMed Central Google Scholar
Zhang, K., Zhu, J., Kong, D. & Zhang, Z. Modeling single cell trajectory using forward-backward stochastic differential equations. PLoS Comput. Biol. 20, e1012015 (2024).
Article CAS PubMed PubMed Central Google Scholar
Gao, Y., Huang, X., Chen, A., Sharma, A. & Zhang, L. Causal disentanglement for single-cell representations and controllable counterfactual generation. Nat. Commun. 16, 6775 (2025).
Article CAS PubMed PubMed Central Google Scholar
Yu, H. & Welch, J. D. MichiGAN: sampling from disentangled representations of single-cell data using generative adversarial networks. Genome Biol. 22, 158 (2021).
Article PubMed PubMed Central Google Scholar
Alemi, AA, Fischer, I, Dillon, JV, Murphy, K. Deep variational information bottleneck. In: International Conference on Learning Representations (ICLR); 2017 Apr 24-26; Toulon, France.
Piran, Z., Cohen, N., Hoshen, Y. & Nitzan, M. Disentanglement of single-cell data with biolord. Nat. Biotechnol. 42, 1678–1683 (2024).
Pan, W., Long, F. & Pan, J. ScInfoVAE: interpretable dimensional reduction of single cell transcription data with variational autoencoders and extended mutual information regularization. BioData Min. 16, 20 (2023).
Article Google Scholar
Majima, K. et al. LineageVAE: reconstructing historical cell states and transcriptomes toward unobserved progenitors. Bioinformatics 40, btae520 (2024).
Article CAS PubMed PubMed Central Google Scholar
Higgins, I et al. beta-VAE: Learning basic visual concepts with a constrained variational framework. In: 5th International Conference on Learning Representations (ICLR); 2017 Apr 24-26; Toulon, France.
Street, K. et al. Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics. BMC Genomics. 19, 477 (2018).
Article PubMed PubMed Central Google Scholar
Chen, RTQ, Rubanova, Y, Bettencourt, J, Duvenaud, D. Neural ordinary differential equations. In: Advances in Neural Information Processing Systems (NeurIPS); 2018 Dec 3-8; Montréal, Canada. p. 6571-6583.
Meeussen, J. V. W. & Lenstra, T. L. Time will tell: comparing timescales to gain insight into transcriptional bursting. Trends Genet. 40, 160–174 (2024).
Article CAS PubMed PubMed Central Google Scholar
Kartha, V. K. et al. Functional inference of gene regulation using single-cell multi-omics. Cell Genom. 2, 100166 (2022).
Article CAS PubMed PubMed Central Google Scholar
Otto, D. J., Jordan, C., Dury, B., Dien, V. & Setty, M. Quantifying cell-state densities in single-cell phenotypic landscapes using Mellon. Nat. Methods 21, 1185–1195 (2024).
Article CAS PubMed PubMed Central Google Scholar
Li, C., Virgilio, M. C., Collins, K. L. & Welch, J. D. Multi-modal single-cell velocity inference with MultiVelo. Nature 618, 377–385 (2023).
Google Scholar
Kumar, A, Sattigeri, P, Balakrishnan, A. Variational inference of disentangled latent concepts from unlabeled observations. In: International Conference on Learning Representations (ICLR); 2018.
Chen, RTQ, Li, X, Grosse, RB, Duvenaud, DK. Isolating sources of disentanglement in variational autoencoders. In: Advances in Neural Information Processing Systems (NeurIPS); 2018. p. 2610-2620.
Zhao, S, Song, J, Ermon, S. InfoVAE: Balancing learning and inference in variational autoencoders. In: AAAI Conference on Artificial Intelligence; 2019. p. 5885-5892.
Fu, Z. Supplementary information for iAODE manuscript. Figshare https://doi.org/10.6084/m9.figshare.31225099 (2026).
Fu, Z. iAODE: Interpretable Accessibility ODE VAE for scATAC-seq (v0.3.0). Zenodo. https://doi.org/10.5281/zenodo.18453104 (2026).

Download references

Acknowledgements

This research was funded by the National Natural Science Foundation of China (grant numbers 82222060, 82430103, 82473572, 81930090, 81725019, 82073487, 81602790).

Author information

These authors contributed equally: Zeyu Fu, Chunlin Chen.

Authors and Affiliations

State Key Laboratory of Trauma and Chemical Poisoning, Institute of Combined Injury, Chongqing Engineering Research Center for Nanomedicine, College of Preventive Medicine, Army Medical University, Chongqing, China
Zeyu Fu, Song Wang, Junping Wang & Shilei Chen
Department of Rehabilitation Medicine, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
Chunlin Chen

Authors

Zeyu Fu
View author publications
Search author on:PubMed Google Scholar
Chunlin Chen
View author publications
Search author on:PubMed Google Scholar
Song Wang
View author publications
Search author on:PubMed Google Scholar
Junping Wang
View author publications
Search author on:PubMed Google Scholar
Shilei Chen
View author publications
Search author on:PubMed Google Scholar

Contributions

Z.F.: Conceptualization, methodology, software, validation, formal analysis, investigation, data curation, resources, writing-original draft, writing-review and editing, visualization. C.C.: Formal analysis, data curation, visualization, writing-original draft, writing-review and editing. S.W.: Resources, supervision, project administration. J.W.: Resources, supervision, project administration, funding acquisition. S.C.: Writing-review and editing, resources, supervision, project administration, funding acquisition.

Corresponding authors

Correspondence to Zeyu Fu, Junping Wang or Shilei Chen.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Communications Biology thanks Yuge Wang and the other anonymous reviewer(s) for their contribution to the peer review of this work. Primary Handling Editors: Xiangtao Li and Mengtan Xing. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Transparent Peer Review file (download PDF )

Supplementary information (download PDF )

nr-reporting-summary (download PDF )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Fu, Z., Chen, C., Wang, S. et al. iAODE for benchmarking and continuum modeling of single-cell chromatin accessibility. Commun Biol 9, 507 (2026). https://doi.org/10.1038/s42003-026-09768-8

Download citation

Received: 11 July 2025
Accepted: 18 February 2026
Published: 03 March 2026
Version of record: 09 April 2026
DOI: https://doi.org/10.1038/s42003-026-09768-8