Fig. 2: Association of molecular features with pCR.

A Heatmap of differentially expressed genes showing association with pCR (* = FDR < 10%) compared to patients with residual disease (RD) in at least one of the indicated comparisons. The association was evaluated in the overall population and separately for the two trial cohorts and using either baseline or Day14 measurements. Manually selected representative genes are highlighted. Gene to cell type annotation based on an external HER2+ scRNA-seq dataset22 is reported. Genes are assigned to a specific cell type when their expression is significantly higher in that cell type according to the interrogated single-cell dataset. Unassigned genes are in grey, genes not measured/expressed in the single cell dataset are in white; see Methods for details. An extended version reporting all the gene symbols is presented in Supplementary Fig. 3. B Heatmap of representative manually selected genesets showing a positive or negative enrichment in patients achieving pCR compared to RD (* = FDR < 0.1%). A comprehensive list of all significant genesets is presented in Supplementary Fig. 4A. C Scatterplot of ESR1 and B2M expression at baseline in the overall population. Dashed lines indicate the first tertile and the median value for ESR1 and B2M respectively. D Area under the ROC curve (AUC) distribution after fitting 100 internally cross-validated regularized logistic regression models using either baseline or Day14 gene expression as candidate features to predict pCR. AUC distributions were compared using two-sided Student’s t-test. E Association with pCR for the PAM50 subtypes defined at baseline, two-sided Fisher’s test. F Association between presence of PIK3CA (top) or TP53 (bottom) mutation and pCR, two-sided Fisher’s test. G Association of sTILs with pCR in the overall population and per treatment cohort, quantified either at baseline or in Day14 biopsy, two-sided Student’s t-test. H Association of iTILs with pCR in the overall population and per treatment cohort, quantified either at baseline or in Day14 biopsy, two-sided Student’s t-test. All boxplots are defined as follow: centre line = median; box limits = upper and lower quartiles; whiskers = 1.5x interquartile range; points = outliers. sTILs = stromal tumour infiltrating lymphocytes; iTILs = intra-tumoral infiltrating lymphocytes; pCR = pathologic complete response; RD = residual disease; NES = Normalised Enrichment score. Source data are provided as a Source Data file.