Figure 1

Construction of phylogeny from scRNA-Seq data. (A) Generating a phylogeny from scRNA-Seq alignments consists of three major steps: (1) extracting the read counts, (2) calling and smoothing the genotypes, and (3) reconstructing the phylogeny. (B) The read count extraction process: We start with the alignments of cells from a single-cell RNA-Seq experiment. To extract matrices of read counts, we identify sites of interest by merging the alignments in a pseudobulk sample and calling variants. Then, at each of the variant sites, in each individual cell, we count the number of reads with the reference and alternate alleles. (C) The genotype calling and smoothing processes: We start from matrices of read counts of reference and alternate alleles seen across sites (rows) in single cells (columns). (i) Given the number of reads, we assigned a probability of a (R)eference, (A)lternate, or (H)eterozygous genotype by integrating over a beta-binomial density function. (ii) To compare the genotypes of two cells, we sample 100 genotype profiles by drawing from their probability distributions. (iii) Comparing every pair of cells leads to a pairwise similarity matrix of genetic distance scores. By looking for the highest scores (excluding itself), we find the K nearest neighbors for each cell. (iv) With the nearest neighbors, we can smooth the genotype probability of a cell by averaging with the weighted (\(\delta\)) average probabilities of its neighbors. We call the genotype with the highest probability score. (D) Phylogenetic reconstruction: We use BEAST2 to infer the phylogeny and produce a final tree using the max clade credibility method.