Fig. 3: Bayesian model for UMI-informed fitness inference in pooled screens. | Nature Communications

Fig. 3: Bayesian model for UMI-informed fitness inference in pooled screens.

From: Accurate determination of CRISPR-mediated gene fitness in transplantable tumours

Fig. 3

a Bayesian probabilistic graphical model schematic. b Posterior distributions of guide fitness for three biological replicate transplants from C1368 transduced with a 192-guide library targeting 56 signaling genes. Right hand panel displays posterior distributions for a group inference run using data from the three replicates. Mix-NB model is used. Figures above each panel indicate mean sgRNA-UMI clones per guide. Guide distributions coloured red (low fitness) and blue (high fitness) indicate estimates that differ from neutrality (fitness = 1) with false discovery rate <0.05 (Benjamini-Hochberg method). c Bayesian goodness of fit test for different count likelihood models. Improved fit is obtained using mixtures of two distributions. (YS = Yule Simon, NB = negative binomial, BNB = beta negative binomial, mix = weighted linear mix of 2 distributions of the same model type with same mixture weight for all sgRNAs in each sample, except mix-BNB local in which a separate mixture parameter is used for each sgRNA.) d Higher sgRNA-UMI diversity is associated with screens with narrower credible intervals and closer correlation between biological replicates. Left panel shows the median fitness credible interval width against Shannon diversity. Right panel shows Spearman rank correlation of guides against mean Shannon diversity in pairwise comparisons between biological replicates. e Effect on screen resolution of reducing sgRNA-UMI numbers through in silico clone subsampling (unfilled circles), or increasing numbers by combining biological replicate datasets (triangles). Data shown for starting datasets from 5 different PDX lines (solid circles). f Effect on the number of guide pairs with differentially resolved fitness, as sgRNA-UMI numbers are varied by in silico clone subsampling (unfilled circles) or combining biological replicate datasets (triangles). Resolved pairs are Hasse plot edges connecting guide pairs with non-overlapping 95% credible intervals (fitness a < b, and no c such that a < c < b). As fewer sgRNA-UMIs are sampled, the power to resolve fitness pairwise differences between guides decreases. Data shown for starting datasets from 3 different PDX lines (solid circles). Source data are provided as Source Data files.

Back to article page