Extended Data Fig. 1: Quality control metrics and cell annotation procedures. | Nature Genetics

Extended Data Fig. 1: Quality control metrics and cell annotation procedures.

From: Single-cell transcriptomic analysis of endometriosis

Extended Data Fig. 1: Quality control metrics and cell annotation procedures.

(a) The number of genes detected per cell is not significantly different across the major classes of study, nor by fresh/frozen status. (b) Observed cell number is positively correlated with observed cell number (Pearson correlation and Analysis of Variance). (c) The number of cells passing QC filters and (d) the number of reads per cell is not significantly different across the major classes of study, nor by fresh/frozen status. Coral color denotes samples that were processed immediately - ‘Fresh’; teal denotes samples that were processed into single cells and viably cryopreserved and thawed before capture - ‘Frozen’. Number of samples in (a, b and d) – endometrioma, fresh = 6, frozen = 2; endometriosis, fresh = 12, frozen = 11; eutopic endometrium, fresh = 2, frozen = 8; no endometriosis detected, fresh = 2, frozen = 2; unaffected ovary, fresh = 1, frozen = 3. (e) Decision tree showing workflow for cell type assignment. Black boxes indicate action/processes, blue boxes indicate cell-type assignment endpoints. (f) Heatmap showing expression of cell-type specific markers, by cluster. Expression is scaled between 0-1. Possible cell type assignments are indicated by column labels with the number of cell-type specific genes for each cell type indicated in brackets. For a list of marker genes see Methods and panel (g). (g) Expression of cell-type specific markers across the 96 clusters. (h) UMAP plot with 114 clusters (using Seurat shared nearest neighbor (SNN) for cluster identification considering resolution parameter of 3). (i) Correlation values for pair-wise comparisons of the 6 unassigned clusters compared to a background distribution of correlation values for 100 pairs of clusters selected at random, red dots indicate the correlation value used for cell type assignment. (j) Pearson’s correlation between clusters, based on expression of a union set of 1,960 genes differentially expressed by one or more cluster (log2 FC > 0; p <0.05). Clusters with no cell markers were assigned the identity associated with the most correlated cell type with known identity (see Methods). (k) Correlation of gene expression across major cell types, comparing fresh and cryopreserved specimens. Correlation values shown above each plot, Pearson’s correlation. (l) Principal component analysis based on cell-type composition of endometriosis and control tissues. In the box and whisker plots shown in (a,c,d,i), boxes denote the interquartile range, bar denotes median. The limits of the whiskers represent 1.5 * IQR (interquartile range) and outlier values are indicated with individual dots. Red dots denote the correlation value for expression in the given cluster compared to the cluster used to assign identity.

Back to article page