Fig. 1: Strategy of the CGE method to lineage-trace cells with distinct genome editing events using sequence tags with silent or near-silent mutations.

a, The CGE method uses a library of HDR templates with two experimental variants: original genomic sequence (blue) and desired mutation (orange). In addition, the HDR templates harbor sequence tags that can be identified by Illumina sequencing of the targeted locus, enabling lineage tracing of the edited clones and creating a large number of internal replicates in each experiment. The sequence tags are generated by mutating nucleotides flanking the region of interest with the probability of 24%, a strategy that typically introduces 2–3 mutated nucleotides (indicated with red diamonds; Extended Data Fig. 1), leaving most of the flanking sequence intact, as demonstrated by the position weight matrices. b, Experimental strategy using a mixture of HDR template libraries harboring the original and mutated sequences for the same target. The abundance of each HDR template in the cell population is analyzed from the sequence tags after different assays and compared to respective baseline: cellular fitness (gDNA at day 8/day 2), TF binding (chromatin-immunoprecipitated DNA/input DNA) and mRNA expression (mRNA abundance/respective gDNA). c, The number of possible sequence variations with zero (n = 1), one (n = 30), two (n = 405) and three (n = 3,240) flanking mutations when the sequence tags are created by mutating ten nucleotides with the probability of 24% and their abundance in the HDR template library analyzed from read counts in ChIP input sample of the edited SHMT2 E-box locus. The box plots indicate the median read count with upper and lower quartiles, and the whiskers extend to 1.5 times the interquartile range. The number of sequence tags recovered in each experiment is shown in Supplementary Table 3. d, The effect of E-box mutation at the RPL23 gene promoter on fitness of HAP1 cells shown by read count ratios for mutated/original sequences for each cell lineage pair harboring identical sequence tags with one flanking mutation (see also Extended Data Figs. 1b and 2b). Of note, the sequence tags with two flanking mutations are used in Fig. 2 for more robust analysis (Methods).