Extended Data Fig. 11: Comparison of genetic-, transcriptomic- and drug-response-based clustering trees, genomic distances and CRISPR dependencies. | Nature

Extended Data Fig. 11: Comparison of genetic-, transcriptomic- and drug-response-based clustering trees, genomic distances and CRISPR dependencies.

From: Genetic and transcriptional evolution alters cancer cell line drug response

Extended Data Fig. 11

a, Comparison of clustering trees using the Fowlkes–Mallows approach. The dendrograms were based on SNVs, gene-level CNAs, arm-level CNAs, gene expression profiles and drug response patterns and were all compared to each other. The Fowlkes–Mallows index (Bk) was computed for all potential numbers of clusters (k values) ranging from 5 to 26. The red lines indicate the observed Bk values, whereas the grey lines represent the 95% upper quantile of the randomized distribution. The maximum Bk value represents the degree of similarity between the compared pair of dendrograms. The grey shading represents the difference between the observed Bk values and those of the 95% upper quantile of the randomized distribution. b, Force-directed layout of screened lines using a similarity matrix determined by the probability of cell lines clustering together in dependency space. Cell lines (nodes) are coloured by lineage. c, Left, the overlap of dependencies in KPL1 and MCF7 using corrected CERES scores, with genes showing depletion effects in all cell lines (that is, pan-essential genes) excluded. The threshold for dependency was set as a CERES score <−0.5. Right, overlap in dependency with genes of indeterminate dependency status (CERES scores between −0.4 and −0.6) in either cell line excluded. d, A two-sample GSEA of MCF7 and KPL1 against the oestrogen response gene sets (n = 1 sample per group). Expression of the oestrogen signalling pathway is strongly enriched in MCF7. e, The correlation between ESR1 dependency values and the single-sample GSEA enrichment scores of the oestrogen response hallmark gene sets (n = 27 cell lines). The difference in oestrogen response signalling between MCF7 and KPL1 predicts their differing levels of dependency on ESR1. f, The correlation between GATA3 dependency and GATA3 protein levels (z-scored values for reverse-phase protein arrays; n = 27 cell lines). The difference in GATA3 protein levels between MCF7 and KPL1 predicts their differing levels of dependency on GATA3. Spearman’s ρ and P values indicate the strength and significance of the correlations, respectively. g, Top, comparison of proliferation rates between a parental MCF7 population and its single-cell-derived clones. Bottom, comparison of proliferation rates between two cultures of the same single-cell clone, separated by six months of continuous passaging. Box plots show the population doubling time of each sample. Bar, median; box, 25th and 75th percentiles; whiskers, data within 1.5× IQR of lower or upper quartile; circles, all data points. Two-tailed t-test; n, replicate wells. h, Top, comparison of the sensitivity to oestrogen depletion between a parental MCF7 population and its single-cell-derived clones. Bottom, comparison of the sensitivity to oestrogen depletion between two cultures of the same single-cell clone, separated by six months of continuous passaging. Box plots show the relative growth rate in oestrogen-depleted medium. Bar, median; box, 25th and 75th percentiles; whiskers, data within 1.5× IQR of lower or upper quartile; circles, all data points. Two-tailed t-test; n, replicate wells. i, The correlation between sensitivity to tamoxifen (relative viability at 20 μM) and the sensitivity to oestrogen depletion (relative growth rate), across the parental MCF7 populations and their single-cell clones (n = 7). Spearman’s ρ value and P values indicate the strength and significance of the correlation, respectively. j, Correlation plots between various measures to estimate cell line strains (n = 351 strain pairs). CNA distances (based on ultra-low-pass whole-genome sequencing or targeted sequencing), SNV distances, gene expression distances and drug response distances were compared to each other. CNA distance based on ultra-low-pass whole-genome DNA-sequencing was determined by the fraction of the genome affected by discordant CNA calls. CNA and SNV distances based on targeted sequencing were determined by Jaccard indices. Gene expression and drug-response distances were determined by Euclidean distances. Spearman’s ρ and P values indicate the strength and significance of the correlation, respectively.

Back to article page