Extended Data Fig. 2: Simulation Data Experiment 1.

a, Distributions of start-match lengths (following a false mismatch if any), end-mismatch lengths (prior to a false match if any), start-mismatch (false mismatch) lengths (number of false mismatches starting from time point 0), and the number of intermediate false mismatches within the match regions, in the 1500 Divergence alignments across the three bifurcation subgroups (that is, under approximate bifurcation point [0.25,0.5,0.75]; each subgroup has 500 alignments), generated by Genes2Genes, TrAGEDyMINIMUM, and TrAGEDyNULL. 15 equispaced time points over pseudotime [0,1] were used for distribution interpolation and alignment. Colored boxes (in blue, orange, and green) in the two leftmost columns display possible ranges of expected match lengths corresponding to the three different, approximate bifurcation points: [0.25,0.5,0.75], respectively. Each violin plot shows the length distribution across all n = 500 alignments in each group as a kernel density estimation. Inside the violin is a box showing the interquartile range (covering the 25% and 75% quantiles with a point indicating median). b, Similar statistics as in a, reported for the 1500 Convergence alignments across the three bifurcation subgroups, generated by Genes2Genes, TrAGEDyMINIMUM, and TrAGEDyNULL. Distributions of end-match lengths (prior to a false mismatch if there is any), start-mismatch lengths (following a false match if there is any), end-mismatch lengths (number of false mismatches until time point 1), and the number of intermediate false mismatches within the match regions. c, Cluster diagnostic plots for the hierarchical agglomerative clustering of the 3500 alignments across all pattern classes (including 500 Matching alignments, 1500 Divergence alignments, 1500 Convergence alignments), in terms of the mean Silhouette score when varying the Levenshtein distance threshold (or the number of clusters). The highest number of clusters represent the number of all unique 5-state alignment strings (that is 113 strings). Bold highlighted circles mark the local optimal mean Silhouette score which gives 15 optimal clusters for the genes at 0.22 distance threshold. d, Mis-clustering rates of the CellAlign k-means clustering outputs for all 3500 alignments, versus the number of clusters (k) ranging from k = 7 to k = 50.