A cell as a dynamical system

The number of possible states that a human cell could occupy in a roughly 20,000-dimensional gene expression space is immeasurably large. Yet, although the precise number is debated, there are likely no more than a few hundred distinct cell types in the human body1. One explanation for this relatively small number comes from dynamical systems theory2. In this perspective, the network of gene regulatory interactions in a cell generates a dynamical system, i.e., one whose evolution in time can be described by a set of coupled ordinary differential equations involving time derivatives. Such a system has a finite number of stable states or “attractors” that correspond to discrete cell types. Transitions between these attractors constitute trajectories followed by cells during physiological development or under various perturbations3. This “dynamical paradigm”4, the quantitative basis of the field of systems biology, has yielded rich insights into cell behavior and cell fate choice5,6,7,8,9,10.

Single-cell biology in light of dynamical systems theory

The emergence of single-cell genomics methods over the past decade has provided extremely rich biological data sets, giving us unprecedented views of cell-to-cell heterogeneity, cellular developmental trajectories, and disease-associated pathology11,12,13, culminating in the assembly of a human cell atlas14. In addition to their biological richness, these data sets have the potential to revolutionize our understanding of disease pathology and the development of remedies15. However, a theoretical framework to understand these enormous data sets is still lacking. Importantly, is there a causal mechanism that explains the underlying data-generative process?

Dynamical systems theory, by explaining the layout of cellular states and trajectories in gene expression space, can serve as this much-needed quantitative framework and help us make sense of single-cell data. A major advantage of a dynamical formulation is that the rate equations involved are mechanistic, intuitive, and easily interpretable, with rate terms based on mass-action, Michaelis–Menten, or Hill-type biochemical kinetics16,17. This advantage cannot be overstated in the modern data analytics landscape where hard-to-interpret “black box” models are the norm. Dynamical systems theory is also simpler than alternative theoretical frameworks in that it makes fewer assumptions on how cells traverse the high-dimensional gene expression space. As such, the dynamical systems framework is more likely to describe the underlying generative process for single-cell gene expression data and can therefore be more predictive of cellular developmental trajectories. It gives a clear sense of the “arrow of time” in cell fate choice, given that the rate terms in the governing equations define a resultant vector for each cell with specific magnitude and direction in gene expression space. Conceptual innovations like RNA velocity18,19,20, which can be interpreted as gene-specific temporal rate terms, fit comfortably into a dynamical systems perspective21.

The dynamical systems framework would also explain common structural features of two-dimensional projections of single-cell gene expression data, which tend to show cells clustered by type in stable states corresponding to specific attractors of the underlying gene regulatory network. The dynamical systems formalism is fully compatible with the observation that cells occupy a lower-dimensional “manifold” in the ambient high-dimensional gene expression space, given the restriction of cell states and trajectories to specific attractors and optimal transitional paths between them22. It can also serve as a theoretical basis for the burgeoning field of gene regulatory network inference from single-cell data23, given that these networks are essentially complex dynamical systems. We should note that there is a long history of data-driven modeling of protein signaling networks dating back to before the dawn of the current single-cell biology ere, often based on Bayesian approaches24,25,26,27. Below, we lay out possible approaches to derive dynamical rate equations for gene regulation from single-cell transcriptomics data (schematic in Fig. 1).

Fig. 1: A workflow for quantitative modeling of single-cell transcriptomic data using a dynamical systems framework.
figure 1

Single-cell transcriptomics produces high-resolution snapshots of gene expression (a) but does not directly generate mechanistic predictive models of cellular states and transitions. A dynamical systems model, expressed as a set of coupled ordinary differential equations (ODEs) describing the evolution in time of n genes xi (i = 1 to n) (b), can potentially be derived from single-cell data using dynamical systems inference tools. The derived models can be used to map putative cell states as computed attractors on a gene expression landscape (c) and predict physiological developmental trajectories (solid blue arrows) as well as potential pathways of cell fate reprogramming (dashed red arrow). Panels (a, c) created in https://BioRender.com.

Deriving dynamical system equations

The most common criticism of the dynamical systems approach for modeling cell behavior is methodological rather than conceptual; namely, that it is only viable for small gene regulatory networks with known structure and previously characterized parameters28. However, there is an established field of data-driven dynamical system inference going back at least two decades29,30,31,32, which may be useful for the discovery of the governing rate equations underlying single-cell dynamics. As an example, an elegant mathematical approach called “sparse identification of nonlinear dynamics” (SINDy) has been proposed to derive interpretable governing equations of a dynamical system from time-course data33. Given a library of user-defined functions of the system variables, SINDy performs sparse regression to derive a minimal set of functions that capture the system dynamics. Thus, one can derive from data both the form of the rate equations for gene regulatory networks and the parameters therein. Successive tools have built on SINDy to specifically derive the governing equations of reactive systems34, model implicit dynamics and rational nonlinearities35, and systems with smaller, noisy data sets36.

One problem here is that, given the destructive nature of current single-cell assays, we only have static, single-time-point snapshots of gene expression across a population of cells. However, one can make use of reordered “pseudotime” gene expression values along a cellular trajectory as a proxy for time-course data37. By providing a library of biologically plausible functions of the mass action, Michaelis–Menten, or Hill form to SINDy, we can potentially derive rate equations for the temporal evolution of the transcriptome of cells along a trajectory. The predictive power of the rate equations can then be tested on a different trajectory that the model has not previously “seen”. This approach, by weighting the more significant regulatory interactions in the gene network, would also yield the structure of the network. The rate terms in the derived governing equations can then be used to generate a quantitative map of the much-invoked Waddington gene expression landscape, identifying attractors on the landscape as optimal states of cell occupancy, and transition paths between them (Fig. 1). Importantly, a quantitative derivation of the gene expression landscape using a dynamical systems approach does not require the modeled network to be a “gradient system” with a closed-form potential function, a condition that is not met by realistic gene regulatory networks. Even for non-gradient networks, an incremental “quasi-potential” (as opposed to a closed-form potential) can be defined along a cellular trajectory that, upon numerical integration, yields the surface of the gene expression landscape38. This deterministic quasi-potential shapes the “valleys” around attractors, thus imposing constraints on the extent of permissible noise for a cell to remain in a specific stable attractor state without switching to a different neighboring attractor.

Challenges in applying dynamical systems theory to single-cell transcriptomics

There are potential challenges in using the dynamical systems approach to analyze single-cell RNA-sequencing (scRNA-seq) data. Notably, these data sets are often sparse, and it may be difficult to distinguish true biological non-expression from “dropout” due to technical artifacts, which can complicate downstream analysis tasks. This problem can be addressed in part by specialized computation tools that have been developed to impute the expression of missing genes in single-cell data39,40,41. Secondly, a characteristic feature of single-cell data is the presence of cell-to-cell variability even among cells of the same type, arising from noisy gene expression. The dynamical systems approach can be modified to account for the stochasticity in gene regulatory networks arising from gene expression noise. The Fokker–Planck or Langevin approximations to the chemical master equation, or the Gillespie algorithm in case of low molecular copy number, explicitly model noise in the biochemical reactions underlying cellular dynamics42. Notably, recent extensions to the SINDy approach have been shown to be robust to the presence of sparsity and noise in the data36.

Alternative and complementary approaches to modeling single-cell gene expression dynamics

Other approaches have been proposed for predictive modeling of cellular dynamics without relying on explicit derivation of systems of ODEs describing gene regulation. Dynamo43 builds on the RNA velocity concept to reconstruct transcriptomic vector fields from sparse and noisy single-cell data, and successfully predicted optimal paths and potential regulators of cell state transitions, as well as the results of genetic perturbations. Another tool, Scribe44 derives causal gene regulatory networks by estimating the strength of regulator-target gene interactions. Scribe demonstrated limitations in using pseudotime for network inference, emphasizing the importance of temporal coupling between gene expression measurements. Velorama is another approach for gene regulatory network inference using graph neural networks and Granger causality45. Traditionally, dynamical systems analysis has been applied solely to gene expression data. In the light of recent developments in multi-omic technologies, this approach can be improved upon by using, for example, data from epigenomics assays. An interesting example of such data integration is CellOracle46, which combines single-cell ATAC-seq (chromatin accessibility) and scRNA-seq data to infer quantitative models of gene regulatory networks and simulate network perturbations.

Other broadly applicable approaches for dynamical system inference specifically take into account the risk of overfitting associated with very complex models when experimental data is limited. Sir Isaac47, for example, builds parsimonious coarse-grained models of network dynamics by using a combination of nonlinear differential equations with power-law kinetics, continuous time sigmoidal networks, and Bayesian inference. Another class of models uses machine learning to first reduce the dimensionality of the data into a smaller number of “latent dimensions” before deriving the governing equations of that reduced system48,49. In the context of gene regulatory network inference, it would be interesting to see if these inferred latent variables correspond to meaningful groups of genes and thus provide new biological insights. Other computational tools have focused on formulating potential-based approaches to model the gene expression landscape. Geometric models based on a gradient system assumption and the resulting potential function have been proposed to explore the underlying spatiotemporal dynamics of gene networks50. Very recently, biologically-constrained neural networks using the concept of a potential function have been used to infer two-dimensional gene expression landscapes51.

Practical implications

Dynamical systems theory, as a mechanistic and causal framework, can allow us to progress from the question of “What do cells do?” (alterations in gene expression, for example), to “How do they do it?” (rewiring of underlying signaling pathways and gene regulatory networks), to “Why do they do it?” (cells are driven by a quasi-potential that is progressively minimized along a developmental trajectory till they reach a stable attractor). The question of whether dynamical systems theory is an appropriate theoretical framework to understand and predict cell fate choice and cellular transitions is of more than academic interest. There are several practical consequences to choosing a particular quantitative basis to decipher cellular dynamics. Adoption of the dynamical systems formalism would allow for the application of powerful associated tools to the analysis of cellular trajectories in health and disease. For example, the prediction of critical state transitions in temporally evolving systems relies on the theory of bifurcations, a core tenet of dynamical systems theory52. Indeed, bifurcation theory has been applied fruitfully to single-cell data to demonstrate critical transitions in cell state during hematopoiesis and differentiation of pluripotent stem cells53,54. Additionally, the prediction of such “tipping points” in developmental trajectories would be extremely useful in the analysis of chemical- and drug-induced toxicity, where a central and unsettled question is whether dose-response curves are smooth and gradual, or whether they exhibit abrupt transitions at critical dose thresholds55,56. A mechanistic understanding of this conundrum is vital to drug development and the establishment of rigorous quantitative standards for chemical risk assessment in environmental and public health.