Extended Data Fig. 7: Lineages of CTNNB1 and BET002 datasets before filtering out the potential artifact sequences. | Nature Biotechnology

Extended Data Fig. 7: Lineages of CTNNB1 and BET002 datasets before filtering out the potential artifact sequences.

From: Deep distributed computing to reconstruct extremely large lineage trees

Extended Data Fig. 7

a, b, Reconstructed lineages. The dendrogram only represents the upstream lineage of the largest clades each composed of less than 15,000 sequences. Number of sequences, proportions of their source sample wells, and entropy of the well proportions are represented for each of the clades. a, CTNNB1. b, BET002. c, d, Distribution of entropies for the clades with 1,000 or more sequences. c, CTNNB1. d, BET002. The statistical differences between the entropy distributions and the null distributions given by random sequence-well assignment were tested by two-sided Brunner-Munzel test. e, f, Unique read counts of the second-generation sequences uniquely best-matched to single parental sequences in the expected and unexpected wells and those best-matched to multiple parental sequences. The ones uniquely best-matched to single parental sequences are color-coded according to the parental wells. The second-generation sequences best-matched to single parental sequences of unexpected wells can be assumed to be cross-contaminants derived during the second EP-PCR and the following steps. The second-generation sequences redundantly best-matched to multiple parental sequences can be assumed to have parental sequences that were either cross-contaminated before the second EP-PCR or were conferred an insufficient number of mutations. e, CTNNB1. f, BET002.

Back to article page