Fig. 2: Investigating fibroblast heterogeneity in NSCLC through the integration of seven scRNA-seq datasets.

a Schematic overview of the data processing pipeline implemented to generate an integrated dataset for analysing fibroblast’s transcriptomic heterogeneity in NSCLC. b 2D visualisation (UMAP dimensionality reduction) of fibroblast transcriptomes, highlighting the three major subpopulations identified through unsupervised clustering. Further analysis is shown in Supplementary Fig. 1a–e. c Heatmap showing the sample-level expression (averaged over single cells) for the ten most significant markers of each subpopulation. Complete differential expression results are provided in Supplementary Data 2. d Bar plot showing the log2 fold change for the most significantly upregulated REACTOME pathways in each subpopulation, calculated through GSVA and Empirical Bayes Statistics for differential expression (exact Bonferroni adjusted p values are also shown). Complete results from this analysis are provided in Supplementary Data 3. e Bar plot showing the proportion of different matrisome components differentially expressed by each subpopulation. f Boxplot showing sample-level expression (averaged over single cells) of genes encoding basement membrane components for each subpopulation. Nominal p values for the Wilcoxon signed-ranks test are also shown(n = 78 [adventitial], 87 [alveolar] and 92 [myo]). g Boxplot showing sample-level expression (averaged over single cells) of genes encoding interstitial collagens for each subpopulation. Nominal p values for the Wilcoxon signed-ranks test are also shown (n = 78 [adventitial], 87 [alveolar] and 92 [myo] independent samples). h Boxplot showing sample-level expression (averaged over single cells) of genes encoding interstitial collagens for each subpopulation split by tissue type. Nominal p values for the Wilcoxon signed-ranks test are also shown (n = 36/42 [adventitial, control/tumour], 38/49 [alveolar], 28/64 [myo]). i Boxplot showing sample-level expression (averaged over single cells) of genes encoding myoCAF markers for each subpopulation split by tissue type. Nominal p values for the Wilcoxon signed-ranks test are also shown (n as per panel h). j Boxplot showing sample-level expression (averaged over single cells) of genes encoding iCAF markers for each subpopulation split by tissue type. Nominal p values for the Wilcoxon signed-ranks test are also (n as per panel h). All statistical tests carried out were two-sided and boxplots are displayed using the Tukey method (centre line, median; box limits, upper and lower quartiles; whiskers, last point within a 1.5x interquartile range). Source data for panels f–j are provided in the Source Data file.