Fig. 1: Statistical inference of RNA velocity with a manifold-constrained framework for the cell cycle.

a, Schematic of a joint framework for parameterization of the gene expression manifold and RNA velocity field. b, Schematic of unconstrained velocity estimation described by standard approaches. c, Plate diagram of the probabilistic relationship among latent variables and observable data. S is sampled from the expectation, manifold coordinates and manifold geometry. U is sampled from the manifold information, kinetic parameters and velocity function. Coordinates define each cell’s position on the latent space and geometry defines expression changes along the manifold. d, Manifold formulation is defined for the spliced counts (s) using cell-specific coordinates (x) and a gene-specific geometric family (f), with which observed data can be directly mapped to the high-dimensional space (top). Bottom: velocity formulation is defined for unspliced counts (u) as a velocity field function (V) with interlocked kinetic parameters (β, γ). We obtain a velocity estimate by taking the chain rule over these entities, describing velocity as a direct function of the manifold x(t). e, Schematic of manifold-constrained velocity estimation for periodic processes. First, manifold learning estimates the coordinates and geometry; second, velocity learning estimates the kinetic parameters and velocity function. f, Schematic of the new types of velocity analyses possible with VeloCycle: (i) statistical credibility testing between multiple samples and against a null hypothesis; (ii) posterior marginal distribution analysis of model parameters by MCMC sampling; (iii) velocity extrapolation to real biological time, verifiable by live microscopy; and (iv) transfer learning of the gene manifold from large references to small target datasets. The asterisk indicates statistical significance. NS, not significant.