Extended Data Fig. 1: Systematic characterization of MethylTree performance in a homogeneous population.

a, b, Analysis on simulated single-cell expansion with more realistic features. a, The impact of division-free CpG mutations on lineage inference accuracy. After simulated clonal expansion as in Fig. 1d, we randomly mutated a given fraction of CpG sites in each of the 128 cells. b, Heatmap of lineage accuracy as a function of CpG coverage and the variation of epimutation rate controlled by the parameter \(\lambda\). Compared with Fig. 1f, we modeled epimutation on a diploid genome with a CpG-site specific epimutation rate sampled from a uniform distribution with a maximum value \(\lambda\). Each observed CpG status is obtained from sampling once on the same CpG site from either of the two DNA molecules. c–j, MethylTree analysis of a clonal expansion dataset of human HEK 293T cells. c, Heatmap of the similarity matrix computed with the cell-by-CpG matrix, without binning. d, Schematic of region selection. Non-overlapping 500-bp genomic bins with an intermediate methylation rate between \({m}_{0}\) and \({m}_{1}\) were selected. e, Merging neighboring bins after selection in d. This procedure was used in analyzing all datasets in this article. f, Heatmap of MethylTree lineage accuracies on the 293T dataset using ‘merged’ genomic regions selected at different thresholds according to e. The parameters indicated on this plot (m0 = 0.5, m1 = 0.9) were used to generate Fig. 1i–k. g, A scatter plot showing the number of genomic regions associated with each selection and the corresponding accuracy of MethylTree-inferred lineages, using the data from f. The selection parameters (m0, m1) for some data points are highlighted. h, Number of detected CpG sites per cell on the methylation embedding of 293T cells. i, Lineage accuracy using different metrics to compute the cell-cell similarity. With Euclidean distance matrix X, we converted it to a similarity with 1 − X/max(X), where \(\max (X)\) is the largest value in this matrix. j, Similarity heatmap ordered with the phylogenetic tree inferred from the neighbor-joining56 (NJ, left) or FastME57 (right) method.