Extended Data Fig. 1: Systematic characterization of MethylTree performance in a homogeneous population. | Nature Methods

Extended Data Fig. 1: Systematic characterization of MethylTree performance in a homogeneous population.

From: High-resolution, noninvasive single-cell lineage tracing in mice and humans based on DNA methylation epimutations

Extended Data Fig. 1

a, b, Analysis on simulated single-cell expansion with more realistic features. a, The impact of division-free CpG mutations on lineage inference accuracy. After simulated clonal expansion as in Fig. 1d, we randomly mutated a given fraction of CpG sites in each of the 128 cells. b, Heatmap of lineage accuracy as a function of CpG coverage and the variation of epimutation rate controlled by the parameter \(\lambda\). Compared with Fig. 1f, we modeled epimutation on a diploid genome with a CpG-site specific epimutation rate sampled from a uniform distribution with a maximum value \(\lambda\). Each observed CpG status is obtained from sampling once on the same CpG site from either of the two DNA molecules. c–j, MethylTree analysis of a clonal expansion dataset of human HEK 293T cells. c, Heatmap of the similarity matrix computed with the cell-by-CpG matrix, without binning. d, Schematic of region selection. Non-overlapping 500-bp genomic bins with an intermediate methylation rate between \({m}_{0}\) and \({m}_{1}\) were selected. e, Merging neighboring bins after selection in d. This procedure was used in analyzing all datasets in this article. f, Heatmap of MethylTree lineage accuracies on the 293T dataset using ‘merged’ genomic regions selected at different thresholds according to e. The parameters indicated on this plot (m0 = 0.5, m1 = 0.9) were used to generate Fig. 1i–k. g, A scatter plot showing the number of genomic regions associated with each selection and the corresponding accuracy of MethylTree-inferred lineages, using the data from f. The selection parameters (m0, m1) for some data points are highlighted. h, Number of detected CpG sites per cell on the methylation embedding of 293T cells. i, Lineage accuracy using different metrics to compute the cell-cell similarity. With Euclidean distance matrix X, we converted it to a similarity with 1 − X/max(X), where \(\max (X)\) is the largest value in this matrix. j, Similarity heatmap ordered with the phylogenetic tree inferred from the neighbor-joining56 (NJ, left) or FastME57 (right) method.

Back to article page