Fig. 3: Generating a pseudo-time for DCIS.

a Principal component analysis (PCA) plot based on the most significant (p < 0.00001) differentially expressed genes between DCIS and co-occurring IDC. All samples plotted according to principal components 1 and 2 (PC1 and PC2 respectively) with their fitted principal curve (left), and with their projection onto the curve (right). b Heatmap showing expression of each of the 53 genes with samples ordered by their projection to the principal curve. Top bars indicate AIMS subtype classification, ERBB2, PGR, and ESR1 status, age of patient at the time of consent, tissue classification group for each sample, and patient distribution. Relative expression is provided as log2 counts per million (CPM) minus the mean log2 CPM for each gene. E1 – E2 indicate the Early stage and L1 – L2 indicates the Late stage. The ‘*’ assigned for ‘Yellow Not Pure DCIS’ and ‘Orange IDC’ indicates samples used in the analysis comparing gene expression of DCIS vs IDC for co-occurring patients. ‘Blue Not Pure DCIS’ and ‘Red IDC’ are from tissue biopsies that did not have co-occurring DCIS and IDC in the same sections and were therefore not used for this expression analysis. (c) Boxplots illustrating per sample expression data for highly differential genes found when comparing samples in the Early group (E1-E2) with those in the Late group (L1–L2). Differential expression analysis was done using limma-voom and two-sided p-values were adjusted for multiple testing using Benjamini-Hochberg correction. Centre line represents the median, box limits represent upper and lower quartiles, whiskers represent minimum and maximum values and at most 1.5x the interquartile range. Each point represents a sample, n = 339 early and n = 287 late samples from 106 patients).