Fig. 2: Overview of the Genes2Genes alignment framework and workflow for comparing single-cell transcriptomic trajectories. | Nature Methods

Fig. 2: Overview of the Genes2Genes alignment framework and workflow for comparing single-cell transcriptomic trajectories.

From: Gene-level alignment of single-cell trajectories

Fig. 2: Overview of the Genes2Genes alignment framework and workflow for comparing single-cell transcriptomic trajectories.The alternative text for this image may have been generated using AI.

Given log1p-normalized cell-by-gene expression matrices of a reference (R) and query (Q) and their pseudotime estimates, G2G infers individual alignments for all genes of interest. It first interpolates data by extending mean-based interpolation in Alpert et al. (2018)7 to distributional interpolation and then runs Gotoh’s DP algorithm16 adapted for all the five alignment states (M,W,V,I,D) defined in Fig. 1b. All reported alignments are then clustered and used to deliver statistics on the overall alignment between R and Q, supporting further downstream analyses. The DP algorithm utilizes a match cost function defined under MML26 inference framework (top left). Given a hypothesis (model) and data, MML defines the total message length of encoding them for lossless compression along an imaginary message transmission. G2G defines two hypotheses: (1) \(\varPhi\): \({R}_{j}\) and \({Q}_{i}\) time points mismatch and (2) A: \({R}_{j}\) and \({Q}_{i}\) time points match. Under \(\varPhi\), the message length is the sum of independent encoding lengths of their interpolated expression data and corresponding Gaussian distributions. Under A, the message length is the joint encoding length of their interpolated expression data under a single Gaussian distribution (either of \({R}_{j}\) or \({Q}_{i}\)). The match cost is computed as the difference of A and \(\varPhi\) per-datum encoding lengths. The DP algorithm incorporates a symmetric five-state machine which can generate a string over the alphabet, \(\varOmega\) = [M, W, V, D, I] describing the optimal sequential alignment states (Fig. 1b) between R and Q time points (middle left). Each arrow represents a state transition. Arrows with the same hatch mark implies equal probability of state transition. G2G computes a pairwise Levenshtein distance matrix across all five-state alignment strings to cluster genes of similar alignment pattern (bottom left). Example output of five-state alignment strings for all genes (top right). Example clustermap showcasing the clustering structure of alignments resulted from agglomerative hierarchical clustering (bottom right). The color represents the Levenshtein distance. Illustrations were created using BioRender (https://biorender.com).

Back to article page