Fig. 2: Identification of the initiation-associated epithelial markers and pathways.

a UMAP plots showing epithelial cells of diverse stages. Their proportions in C1 and C2 clusters were circled in the map and the statistics were shown in bar plot. b Statistical results showing CNV-meanSquare of epithelial cells at different initiation stages. Endothelial cells and fibroblasts as a whole represented the baseline reference. Nonparametric unpaired t-tests were used to calculate the statistical significance. ***P < 0.001. c Characterization of OSCC initiation process with ST feature plots from P6 showing cancer-related pathways (Top). Changes of these pathways’ activities in patients of ST. Each dot indicated the median of the pathway activity in the corresponding N, DN or T region (Bottom). d Enriched GO functions of DEGs with gradual-increased expression among epithelial cells from N, DN, and T stages. FDR q value < 0.05. e Changes of FRA pathway activities along with tumor initiation process in ST. Each dot indicated the median of the pathway activity in the corresponding region. f Dotplots showing the significance (−log10 P value) and strength (mean value) of specific interactions between T, Myeloid, and cancer-associated fibroblast (CAF) cells with epithelial cells at N, DN, and T stages. Significant mean and significance (P < 0.05) were calculated based on the interaction and the normalized cell matrix was achieved by Seurat Normalization. g Heatmap of DEGs between epithelial cells of diverse initiation stages. The eight genes confirmed by ST were marked red. h Heatmap for statistical analyses of 8 initiation-associated genes expression in ST feature plots (each row shared a color scale, while different columns did not). i Statistical analysis of the eight initiation-associated genes expression in five ST feature plots. The dot plots display statistical score values for theses eight-gene sets in each region of tissue sections. A two-tailed paired Student’s t-test for the P values. *P < 0.05. j Spatial feature plots of TFAP2A in tissue sections of P6. k Statistical results showing proportions of TFAP2A positive points in different tissue regions by ST analyses. l The gene-expression levels of TFAP2A in siTFAP2A and siNC group. Data were presented as the mean ± s.d. NC control group. m Quantification of EdU-positive cell proportions in siTFAP2A groups. Four representative pictures of each group were used for quantification. A two-tailed unpaired Student’s t-test for the P values in l, m. *P < 0.05; ***P < 0.001.