Introduction

Keloids are dermal fibroproliferative skin disorders caused by abnormal wound healing, affecting approximately 4.5% to 16% of the population1. Keloids belong to benign skin tumors and manifest clinically as raised amorphous scars that typically extend beyond the original wound boundary and invade adjacent normal skin tissues2. These lesions mainly occur in the earlobe, face, chest, and upper back, causing impaired skin function and aesthetic defects3. Additionally, keloids are often accompanied by pruritus and pain, which seriously affect patients’ physical and mental health2.

Currently, various therapeutic approaches for keloids have been developed, including intralesional corticosteroid injection, cryotherapy, surgical resection, radiotherapy, and laser therapy4. Nevertheless, keloids still have a high recurrence rate, ranging from 70% to 100% after surgical resection alone5. Moreover, some of these treatments are associated with unavoidable side effects, such as pain and atrophy of surrounding normal tissues4. As genetic, endocrine, and local tissue factors (such as wound tension or infection) are all closely related to keloid formation, significant differences exist in the phenotype and prognosis of interlesional and intralesional keloids6. Thus, keloids should not be regarded or treated as homogenous growths. To date, the pathogenesis of keloids has not been clearly elucidated, and radical treatments remain lacking. Based on these clinical dilemmas, exploring the complex cellular and molecular mechanisms and decoding a precise molecular classification of keloids may help uncover their pathogenesis and potential treatments. However, few studies have focused on constructing a diagnostic classification of keloids. A previous study found that the downregulated cytotoxic CD8 + T-cells regulated by the NKG2A-soluble HLA-E axis may serve as predictive biomarkers and potential therapeutic targets in keloids, and subsequently proposed the H/N classification7. Additionally, another study proposed a keloid classification based on glycosphingolipid metabolism-related genes8.

Fibroblasts are vital cells that promote keloid formation. Abnormal fibroblast proliferation and excessive collagen deposition constitute the histopathological characteristics of keloids4. With the development of single-cell RNA sequencing (scRNA-seq), analyzing the transcriptome landscapes, heterogeneity, and regulatory networks of fibroblasts under both homeostatic and pathological conditions at a single-cell level is feasible9. The latest scRNA-seq studies revealed that fibroblasts in the normal dermis could be divided into different subgroups and played various roles in skin development and injury repair10,11,12,13. Research showed that fibroblasts in keloids were significantly increased and could be divided into four subsets: secretory-papillary, secretory-reticular, mesenchymal, and pro-inflammatory cells. Among these, mesenchymal fibroblasts were expanding and played an essential role in excessive collagen deposition14. The results have also been validated in scleroderma, another fibrotic skin disease14. Additionally, a study combining scRNA-seq with spatial transcriptomics data found that co-localization of fibroblasts and endothelial cells, together with enhanced intercellular communication, may promote keloid development15.

Considering the importance of fibroblasts, the study reconstructed the de novo differentiation trajectory of keloid fibroblasts and identified 25 differentially expressed fibroblast-differentiation-related genes (DEFDRGs) associated with keloid development. Using integrated bioinformatics analysis of data from scRNA-seq (3 keloids and 3 normal scars) and RNA-seq (37 keloids, 27 healthy skins), the study finally developed a keloid classification based on multi-dimensional analyses and experimental validation. This approach deepens our understanding of new mechanisms underlying keloid pathogenesis and contributes to a theoretical basis for clinical precision treatments of keloid patients.

Results

Single-Cell Atlas Reveals Dominant Cellular Composition in Keloids

After stringent quality control (QC) filtering, a total of 44,463 single cells (keloids: 23,638; normal scars: 20,825) from scRNA-seq data met the criteria and were included for downstream analyses. The experimental workflow is detailed in Fig. 1A and Supplementary Fig. 1. Uniform Manifold Approximation and Projection (UMAP) -based clustering identified 18 distinct cell clusters, which were annotated as 9 major cell types: endothelial cells, fibroblasts, glandular cells, immune cells, keratinocytes, lymphatic endothelial cells, melanocytes, smooth muscle cells, and neural cells (Fig. 1B). Spatial distribution patterns of these cell populations between keloid and normal scar tissues are visualized in Fig. 1B. Feature plots demonstrate the expression levels and cellular distribution of canonical biomarkers retrieved from the CellMarker database across all annotated cell types (Supplementary Fig. 2A). Supplementary Fig. 2B further validates the cellular annotation accuracy through marker gene expression patterns. The Cleveland dot plot was used to describe expression levels of typical marker genes (DCN, KRT1, PTPRC, and VWF) and the proportions of the 9 cell types in different samples (Supplementary Fig. 2C). Figure 1C illustrates the number and proportion of each cell type. The results indicate that endothelial cells and fibroblasts are predominant across all samples, which is in line with previous research16.

Fig. 1: Comprehensive scRNA-seq analysis reveals cellular heterogeneity in keloids and normal scars.
figure 1

A Schematic workflow of the study design and analytical pipeline. B UMAP dimensionality reduction clustering 44,463 cells into nine cell types, with color-coding depicting their distribution in keloid and normal scar tissues. C Quantitative analysis of cell number and proportional distribution of the nine cell types across samples, highlighting the predominant abundance of fibroblasts and endothelial cells. D, E Cell communications networks among cell types, demonstrating that fibroblasts participate in the most extensive signaling interactions. F Left: Comparison of interaction counts between keloids and normal scars; right: Weighted analysis of cell-type-specific interaction strengths in the two groups. LFDC low fibroblast differentiation classification, MFDC moderate fibroblast differentiation classification, HFDC high fibroblast differentiation classification, PCA principal component analysis, DEFDRGs differentially expressed fibroblast-differentiation-related genes, KL keloid, NS normal skin, CC cell communication, UMAP Uniform Manifold Approximation and Projection.

Fibroblasts emerge as central hubs in cell communication networks

Cell communication analysis demonstrated fibroblasts as the predominant cellular interactors, exhibiting the highest communication frequency across all cell types (Fig. 1D, E). Quantitative evaluation revealed significantly enhanced global communication activity in keloids compared to normal scar tissues (Fig. 1F). Notably, fibroblasts exhibited the strongest outgoing interaction strength in both keloids and normal scars (Supplementary Fig. 2D).

Single-cell profiling unveils fibroblast heterogeneity in keloids

A total of 13,925 fibroblasts were extracted only for further analysis. Through unsupervised clustering, 6 distinct clusters were identified and annotated as fibroblast 1-6 (Fig. 2A). Among them, fibroblast 1 and 2 predominated in normal scars, while fibroblast 3 occupied most of the keloids (Fig. 2B). The expression levels of typical marker genes (ACTA2, COL1A1, TGFBI, and TOP2A) were used to validate the accuracy of the fibroblast screening (Fig. 2C). The proportion of each fibroblast cluster in keloids and normal scars was visualized by a bar plot, further confirming that fibroblast 3 is predominantly present in keloids (Fig. 2C). Fibroblast 3 highly expresses POSTN, ASPN, and COMP, which have been reported to be involved in keloid formation (Fig. 2D)14. Extensive cell communication existed among different fibroblast subtypes. Particularly, fibroblast 3 interacted most frequently with other subtypes, and COL1A1 and COL1A2 served as key intercellular association genes (Fig. 2E-F). Enriched hallmark signaling pathways in fibroblast 3 included collagen formation and extracellular matrix (ECM) receptor interaction (Supplementary Fig. 3A). Supplementary Fig. 3B, C demonstrates distinct cell cycle phases among fibroblast subpopulations.

Fig. 2: Identification of fibroblast subtypes and cell communication networks.
figure 2

A UMAP dimensionality reduction reveals six distinct fibroblast subtypes across keloid and normal scar tissues. B Quantitative analysis of cell number and proportional distribution of fibroblast subtypes across samples. Fibroblast 3 is predominant in keloids, whereas Fibroblast 2 constitutes the major population in normal scars. C The Cleveland dot plot depicts expression of classical markers (ACTA2, COL1A1, TGFBI, TOP2A) in fibroblast subsets, with bar plots illustrating the proportional distribution of each cluster in keloid and normal scar samples. D Marker gene expression profiles highlight elevated ASPN, COMP, and POSTN expression in Fibroblast 3. E, F Cell communication between fibroblast subtypes. Fibroblast 3 engages in more extensive signaling interactions compared to other subtypes. KL keloid, NS normal skin, CC cell communication, UMAP Uniform Manifold Approximation and Projection.

Trajectory analysis decodes differentiation fates and expression dynamics in keloid fibroblasts

To delineate the differentiation hierarchy of keloid fibroblasts, we employed Monocle 2 for single-cell pseudotemporal ordering analysis. This computational framework reconstructed lineage trajectories through dimensionality reduction (DDRTree algorithm), revealing 7 distinct cellular states along fibroblast maturation paths (Fig. 3A–C). Furthermore, the pseudotime and cell lineage trajectory of each fibroblast cluster were illustrated separately in Fig. 3D. Notably, fibroblast 3 predominantly occupied intermediate-late pseudotime positions (States 3–6). Through trajectory analysis, differentially expressed genes (DEGs) associated with fibroblast differentiation were identified and named as fibroblast-differentiation-related genes (FDRGs). The top 100 FDRGs were visualized in a heatmap (Fig. 3E).

Fig. 3: Pseudotime trajectory analysis and identification of DEFDRGs.
figure 3

A Pseudotime plot depicts the potential fibroblast differentiation sequence. B Identification of seven distinct differentiation states. C, D) Visualization of fibroblast subtypes mapped onto differentiation trajectories. E Expression of the top 100 fate-determining genes differentially expressed across differentiation states. F Schematic workflow for identifying DEFDRGs, defined by the overlap of fibroblast differentiation-associated DEGs and clinical diagnosis/time-related DEGs. DEFDRGs differentially expressed fibroblast-differentiation-related genes, DEGs differentially expressed genes.

Integrated trajectory-clinical analysis identifies core fate-determining regulators in keloid fibroblasts

Through the aforementioned analyses, we mapped the transcriptomes of all fibroblast states and used Monocle 2 to identify 14,467 FDRGs (Monocle-sig, P < 0.05), which are the genes determining cell fate. Subsequent integration of RNA-seq profiles with clinical metadata from the SAR database through univariate Cox regression revealed 32 diagnosis-associated (Diagnosis-sig, P < 0.05) and 137 time-significant (Time-sig, P < 0.05) DEGs. Intersectional analysis of these three gene sets identified 25 core FDRGs, named DEFDRGs, exhibiting both differentiation-dependent expression patterns and clinical relevance (Fig. 3F and Supplementary Fig. 4A). To elucidate clinical correlations, a co-expression network of DEFDRGs was constructed using nonparametric tests and Pearson correlation (Supplementary Fig. 4B). Labels in various colors represented genes from different Monocle states. Different colored curves represented positive or negative correlations. The DEFDRGs showed tight associations, indicating underlying regulatory mechanisms of clinical outcomes and fibroblast fates. Most genes were positively correlated, while Kallikrein Related Peptidase 6 (KLK6) and Tryptase Beta 2 (TPSB2) were mostly negatively correlated with the others. Boxplot validation confirmed differential expression patterns across diagnosis and time (Supplementary Fig. 4C).

Construct fibroblast differentiation-based keloid classification (FDBKC) based on DEFDRGs

Through consensus clustering analysis of 25 fibroblast differentiation-regulated genes, we established the FDBKC system to stratify patients into three molecular subtypes (k = 3, Supplementary Fig. 5A). Delta-area analysis confirmed optimal clustering stability via minimal cumulative distribution function (CDF) deviations (Supplementary Fig. 5B). We visualized the expression of DEFDRGs across multiple dimensions: consensus cluster, cell state, sex, age, and diagnosis (Fig. 4A). Gene expression levels in the three consensus clusters were associated with clinicopathological characteristics. Strikingly, strong connections between fibroblast states and consensus clusters were observed. The overall expression levels of fibroblast fate-determining genes were higher in cluster 1 and lower in cluster 2 and 3, which exhibited distinct expression patterns, demonstrating obvious heterogeneity among the clusters (Fig. 4A).

Fig. 4: Consensus clustering and clinical correlation analysis.
figure 4

A Heatmap depicting the expression profile of 25 DEFDRGs across three consensus clusters. B Distribution of keloid and healthy diagnoses across the three clusters, showing significant diagnostic composition differences. C Proportional representation of healthy samples and keloid tissues at different time points within each cluster. D PCA score plot illustrating sample segregation based on DEFDRG expression patterns. E Fibroblast score distribution across clusters, with Cluster 1 exhibiting the lowest score and Cluster 3 the highest. F Association analysis of high/low fibroblast score with consensus clusters, diagnosis, and time. G Differential expression of 25 DEFDRGs in three clusters. Many genes were highly expressed in Cluster 1, while TPSB2 and KLK6 were highly expressed in Cluster 3. LFDC low fibroblast differentiation classification, MFDC moderate fibroblast differentiation classification, HFDC high fibroblast differentiation classification, PCA principal component analysis, DEFDRGs differentially expressed fibroblast-differentiation-related genes.

Furthermore, chi-square tests confirmed that FDBKC was significantly associated with keloid diagnosis and time (P < 0.001) (Fig. 4B, C). Keloid time represented day 0 and day 42 after injury of keloid-prone patients. The stacked bar chart showed that healthy samples were predominantly in cluster 1, while keloid samples (including day 0 and day 42) were mostly in cluster 3, indicating that cluster 3 was associated with a more severe pathological state.

In addition, to further clarify the molecular pathological features of fibroblasts in each FDBKC cluster and minimize bias introduced by categorical variable analysis, principal component analysis (PCA) was performed based on DEFDRGs to calculate the fibroblast score (FS) of each sample (Fig. 4D). We then tested the ability of PCA-based FS to distinguish different FDBKC clusters and found significant differences (Fig. 4E). In FDBKC, PCA FS was highest in cluster 3, lowest in cluster 1, and intermediate in cluster 2. Therefore, we named the three clusters of FDBKC as follows: cluster 1 (Low Fibroblast Differentiation Classification, LFDC), cluster 2 (Moderate Fibroblast Differentiation Classification, MFDC), and cluster 3 (High Fibroblast Differentiation Classification, HFDC). Samples were divided into high and low PCA FS groups using the median as the cutoff. The Sankey diagram illustrates the proportion of high and low PCA FS in each cluster and their association with diagnosis and time (Fig. 4F).

The expression of 25 DEFDRGs across three FDBKC clusters was analyzed using the ssGSEA algorithm, revealing significant differences in gene expression levels among clusters (Fig. 4G). Almost all DEFDRGs were highly expressed in cluster 1, which was consistent with the above results (Fig. 4A). In addition, protein sequencing results of clinical keloids, hypertrophic scars, and normal skin identified 15 protein expressions of 25 DEFDRGs, among which 4 proteins showed obvious expression differences (Supplementary Fig. 5C). Baseline patient information is provided in Supplementary Table 1. To strengthen the robustness of our findings, we additionally validated our results using three public datasets, including GSE188952, GSE7890, and GSE4427017,18. Almost all DEFDRGs were identified in these gene sets, and many genes showed statistically significant differences (Supplementary Fig. 5D–F and Supplementary Table 2).

Immune infiltration characteristics of FDBKC

The correlation between immune cells or immune function and PCA FS was demonstrated in Supplementary Fig. 6A. Additionally, we investigated the characteristics of immune infiltration for 23 immune cell types in all samples from the high- and low-risk groups. The percentage of each immune cell subtype in each sample is shown in Supplementary Fig. 6B. The differences in immune cell infiltration and function between high- and low-risk groups are also displayed in Supplementary Fig. 6C, D. The above results suggested little difference in the immune infiltration patterns between the high- and low-risk groups. The result was not consistent with previous studies, which found that mast cell heterogeneity was associated with keloid activity status19. This discrepancy may be attributed to the limited sample size and individual variations.

Identify DEGs, differentially expressed transcription factors (DETFs), and signaling pathways of bulk transcriptomics

In the Bulk-RNA seq analysis of 37 keloids and 27 healthy samples, we found that 1828 genes satisfied P value < 0.05 and were considered DEGs (Supplementary Fig. 7A). Meanwhile, heatmaps and volcano plots were used to visualize the differential expression of DEFDRGs (Supplementary Fig. 7B), among which 22 genes met a P value < 0.05. The results showed that the expression of these key genes was lower in keloid patients than in healthy controls, which was consistent with previous results based on FDBKC. Furthermore, four DETFs were screened: Homeobox C9 (HOXC9), LIM Domain Only 2 (LMO2), Pancreatic and Duodenal Homeobox 1 (PDX1), and RUNX Family Transcription Factor 1 (RUNX1) (Supplementary Fig. 7C). Eventually, we used the Gene Set Variation Analysis (GSVA) algorithm to analyze the top 50 Hallmark signaling pathways in all samples and visualized the results via heatmaps and bar charts (Supplementary Fig. 7D).

DEFGDRs demonstrate significant clinical relevance

To clarify the clinical relevance of DEFDRGs, PCA analysis and LASSO regression were finally used to establish a multivariable logistic regression model. The LASSO regression was applied to prevent overfitting of the model. Then, 11 DEFDRGs were extracted for subsequent analyses (Fig. 5A). These DEFDRGs were highly expressed in healthy individuals, consistent with previous results of FDBKC (Fig. 5B). A total of 64 skin samples (37 keloids, 27 healthy skins) were randomly assigned to train and test sets. The train set included 39 samples (21 keloids and 18 healthy skins), while the test set comprised 16 keloids and 9 healthy skins. We tested the expression levels of 11 key genes in high- and low-risk groups across the total, train, and test sets and found significant differential expression. The high-risk group showed high expression levels of TPSB2 and KLK6 genes (Fig. 5C). Both are significantly correlated with inflammation response. TPSB2 is also closely related to fibrosis and acts as a pro-fibrosis factor.

Fig. 5: Multivariable logistic regression analysis of DEFDRGs for keloid classification.
figure 5

A Cross-validation plot depicting binomial deviance as a function of log(λ) during Lasso regression model selection. B Heatmap illustrating expression profiles of DEFDRGs in keloid and healthy samples. C Differential expression of DEFDRGs in high- vs. low-risk groups across total, train, and test sets. D Clinical correlation analysis showing associations between risk scores and demographic/clinical parameters (age, sex, diagnosis, and time). E The ROC curve of the train and test sets demonstrating the predictive performance of the DEFDRG-based model for keloid classification (train set AUC = 0.995, test set AUC = 0.868). F GSEA enrichment plots for representative signaling pathways enriched in high- and low-risk groups. DEFDRGs differentially expressed fibroblast-differentiation-related genes, ROC receiver operator characteristic, AUC area under the curve, GSEA gene set enrichment analysis.

Clinical correlation analysis showed that the risk score was associated with age, sex, diagnosis, and time. Patients under 30 years old, female sex, prolonged time, and keloid diagnosis were associated with high risk (Fig. 5D). The receiver operator characteristic (ROC) curve of the PCA model showed an area under the curve (AUC) of 0.995 in the train set and 0.868 in the test set, indicating excellent predictive value for keloid diagnosis (Fig. 5E). Finally, downstream pathways were analyzed in the high- and low-risk groups, respectively, and the results are presented in Fig. 5F.

Construct regulatory networks and identify corresponding specific drugs for FDBKC

The subtype-specific and DEFDRG-centered regulatory network containing upstream DETFs, downstream signaling pathways, and potential regulation abilities of immune cells/immune functions was constructed in LFDC, MFDC, and HFDC, respectively (Fig. 6A). RUNX1 was a common TF in all subtypes of FDBKC. RUNX1 has been reported to be associated with fibrosis in various diseases20,21,22. However, there is a lack of research on it in skin fibrosis. This study indicated that RUNX1 might play a crucial role in keloid formation. Additionally, we illustrated the specific correlation coefficients from correlation analysis in each regulatory network using co-expression heatmaps (Fig. 6B).

Fig. 6: Regulatory network construction for keloid molecular subtypes and candidate drug identification.
figure 6

A Subtype-specific regulatory networks centered on DEFDRGs, integrating upstream DETFs, downstream signaling pathways, and immune cell/function regulatory potential across LFDC, MFDC, and HFDC. B Co-expression patterns and correlation heatmaps of network components (DEFDRGs, DETFs, pathways) in LFDC, MFDC, and HFDC. C RNA-FISH results of COL1A1, VIM, TPSB2/PECAM1, and DAPI in human keloid samples. The white arrows indicate co-stained cells. Scale bars = 50 μm. D Expression profiles of core DEFDRGs and DETFs in LFDC, MFDC, and HFDC. E Candidate inhibitors predicted by Connectivity Map analysis in LFDC, MFDC, and HFDC. LFDC low fibroblast differentiation classification, MFDC moderate fibroblast differentiation classification; HFDC high fibroblast differentiation classification, PCA principal component analysis, DEFDRGs differentially expressed fibroblast-differentiation-related genes, DETFs differentially expressed transcription factors, RNA-FISH RNA fluorescence in situ hybridization.

In LFDC, the key DEFDRGs were C-Type Lectin Domain Containing 11A (CLEC11A) and Midkine (MDK). The upstream DETF highly correlated to CLEC11A was RUNX1 (R = 0.42), while both LMO2 (R = 0.36) and HOXC9 (R = 0.35) were closely related to MDK. For downstream pathways, bile acid metabolism (R = 0.42) showed a strong association with CLEC11A. For MDK, three related signaling pathways were identified, including TGF-beta signaling (R = −0.41) and mTORC1 signaling (R = 0.4).

In MFDC, Platelet and Endothelial Cell Adhesion Molecule 1 (PECAM1) and LIM Domain Only 2 (LMO2) occupied the central positions. The upstream DETF strongly correlated with PECAM1 was RUNX1 (R = 0.55). The DETF highly correlated with LMO2 was HOXC9 (R = 0.35). Downstream pathways highly correlated with PECAM1 were Myc targets v2 (R = 0.61), Interferon-alpha response (R = −0.55), and IL-2/STAT5 signaling (R = 0.52). Similarly, the downstream pathway highly correlated with LMO2 was hypoxia (R = 0.46).

In HFDC, KLK6 and Coagulation Factor XIII A Chain (F13A1) occupied the core positions, determining the correlations between DEFDRGs and other components. The upstream DETF with intense correlation with both KLK6 and F13A1 was RUNX1 (KLK6: R = −0.62; F13A1: R = 0.57). The downstream pathway highly correlated with KLK6 was E2F-targets (R = 0.53). For F13A1, the highly correlated downstream pathways were Wnt/β-Catenin signaling (R = 0.59) and Estrogen response late (R = 0.51).

Considering that PECAM1 is primarily a marker gene for endothelial cells and TPSB2 serves as a marker gene for mast cells, we performed RNA fluorescence in situ hybridization (RNA-FISH) staining (PECAM1 + COL1A1 + VIM; TPSB2 + COL1A1 + VIM) to further validate that keloid fibroblasts express these two genes. Staining results revealed co-stained cells in keloids, demonstrating that fibroblasts express PECAM1 and TPSB2 transcripts (Fig. 6C).

Furthermore, the marker genes for each FDBKC classification were identified based on high expression levels, role in the regulatory network, and relevant biological function: LFDC (cluster 1, markers: MMP1, F13A1), MFDC (cluster 2, markers: FOXC2, MDK), and HFDC (cluster 3, markers: KLK6, TPSB2).

The differential expression of key TFs and genes in the aforementioned regulatory network is shown in Fig. 6D. Eventually, through the Connectivity Map (CMap) algorithm, we identified small molecule drugs targeting DEFDRGs and TFs in regulatory network of each FDBKC subtype. The top 10 drugs with the lowest P values were visualized in the dot heatmap, aiming to discover potential therapeutics and provide a theoretical basis for precision treatment of keloids (Fig. 6E).

Pathological manifestations of different clinical subtypes of keloids

Informed consent from 19 keloid patients was obtained before collecting discarded samples and relevant clinical information. A representative marker from each FDBKC type was selected for Immunohistochemistry (IHC). Given that HFDC is associated with more severe pathological manifestations and its identification is more informative for clinical treatment, two HFDC markers were chosen. Then, IHC for MMP1, FOXC2, KLK6, and TPSB2 was performed on keloid tissues, and professional pathologists were invited to score the expression of related marker genes in keloids to determine clinical subtypes. The baseline information, Vancouver Scar Scale (VSS) scores, and IHC scores are displayed in Table 1. Among the 19 patients, 12 were classified as HFDC, 3 as MFDC, and the remaining 4 as LFDC. IHC results are presented in Fig. 7A. The LFDC group had the highest expression of MMP1, the MFDC group showed peak expression of FOXC2, and the HFDC group exhibited the highest expression levels of KLK6 and TPSB2.

Fig. 7: IHC validation in clinical samples and clinical phenotype correlation analysis.
figure 7

A IHC staining depicts expression and semi-quantitative scoring of keloid subtype-specific markers in clinical keloid tissues (n = 3 independent samples per group; *P < 0.05, **P < 0.01, ***P < 0.001, and ****P < 0.0001; one-way ANOVA). Scale bars = 100 μm. Data are presented as mean ± SD. BG Correlation analyses illustrate associations between keloid molecular subtypes and clinical phenotypic metrics: (B) VSS score, (C) lesion thickness, (D) tissue softness, (E) pain intensity, (F) itch severity, and (G) blood vessels. IHC immunohistochemistry, VSS Vancouver Scar Scale.

Table 1 Baseline information of keloid patients, clinical-pathological feature of keloids, and immunohistochemical scores of each marker (MMP1, FOXC2, TPSB2, and KLK6)

According to the clinical manifestations of patients with different subtypes, the average VSS score in the HFDC group was significantly higher than that in the other two groups (Fig. 7B). Keloids in the HFDC group were observed to be thicker than those in the other two groups, with nearly 83% measuring over 2 mm (Fig. 7C). Furthermore, keloids in the HFDC group were also found to be harder, potentially attributed to the disordered distribution of fibroblasts and collagen (Fig. 7D). The proportions of keloids with pain, itching, and vascular abnormalities were significantly higher in the HFDC group (Fig. 7E–G), reflecting more severe clinical manifestations. These findings were consistent with scRNA-seq and bulk-RNA seq results, validating the effectiveness of our keloid classification system.

Discussion

Keloid is a dermal fibroproliferative disease characterized by abnormal deposition of collagen and ECM, affecting patients’ life quality due to cosmetic and functional impairments and the consequent psychological disorders23,24. The specific mechanisms of keloid pathogenesis remain unclear; and current prevention and treatment strategies are unsatisfactory due to the high postoperative recurrence rate23,24. Fibroblasts play a central role in tissue repair and fibrosis because of their ability to produce, deposit, and remodel ECM proteins25. However, when wound-derived cytokines and inflammatory signals trigger profound fibroblast activation and excessive ECM deposition, fibrosis and scars will appear26,27. Therefore, fibroblasts, as key downstream mediators of scar formation and fibrosis, represent promising research targets for improving wound healing and fibrosis outcomes.

In this study, 44,463 cells from 3 keloids and 3 normal scars were clustered into 9 cell types. Then, 13,925 fibroblasts were clustered into 6 fibroblast subpopulations, with fibroblast 3 showing enriched distribution in keloids. Cell lineage trajectories were reconstructed using Monocle 2, and FDRGs were identified. Furthermore, we integrated FDRGs with genes significantly associated with clinical outcomes (diagnosis and time) to derive 25 clinical-related FDRGs, termed DEFDRGs. To address the lack of an effective genotyping system for keloids with distinct molecular expression patterns and differentiation states, we developed a consensus-based clustering system, FDBKC, and proposed a keloid classification based on DEFDRGs. The FDBKC comprised 3 clusters: LFDC (cluster 1, markers: MMP1, F13A1), MFDC (cluster 2, markers: FOXC2, MDK), and HFDC (cluster 3, markers: KLK6, TPSB2). The three molecular subtypes differed significantly in keloid development risk. Subsequently, we performed comprehensive bioinformatics analysis to verify the clinical relevance of FDBKC and clarify its medical rationale. Results showed that LFDC patients had a lower risk of keloid development, whereas the HFDC patients exhibited a higher risk. IHC of clinical keloid samples verified the expression of classification markers. Further clinical correlation analysis showed that HFDC keloids were harder, thicker, with vascular abnormalities and higher VSS scores, and were associated with a higher incidence of pain and itching, reflecting a more severe pathological phenotype in the HFDC group.

LFDC

In the study, many DEFDRGs are highly expressed in LFDC, such as MMP1 and F13A1. Keloids are characterized by excessive ECM deposition, which is determined by the dynamic balance among matrix synthesis, accumulation, and degradation. Matrix metalloproteinases (MMPs) are a family of enzymes capable of degrading ECM components. MMP1, a zinc-dependent neutral endopeptidase, is a typical representative28. MMP1 is involved in different physiological processes (development, tissue morphogenesis, and wound repair) and various human diseases (cancer, rheumatoid arthritis, emphysema, and fibrotic diseases)29. In the wound repair process, MMP1 can cleave type I collagen, alleviating collagen-induced restriction of cell motility, thereby promoting re-epithelialization30 and reducing scar formation31. In keloid fibroblasts, knockdown of tissue inhibitor of metalloproteinase-1 (TIMP-1) promotes MMP1 release and enhances collagen degradation, suggesting potential therapeutic strategies for keloid32.

F13A1, encoding Factor XIII-A transglutaminase (FXIII-A), plays a crucial role in the coagulation process33,34. Recent studies have shown that F13A1 is involved in not only hemostatic processes but also tissue repair and fibrosis. FXIII-A can participate in multiple stages of wound repair, including regulation of fibroblast function, stabilization of fibrin clots and ECM, and promotion of angiogenesis35,36. FXIII-A deficiency is associated with impaired tissue repair37,38. Additionally, FXIII could reduce collagen synthesis in progressive systemic sclerosis fibroblasts to levels comparable with normal fibroblasts39. Moreover, FXIII-A could prevent vascular leakage and decrease profibrotic stimuli, thereby mitigating myocardial fibrosis40. Particularly, FXIII-A is beneficial in scleroderma. Decreased skin stiffness and improved musculoskeletal symptoms were observed after systemic application of recombinant FXIII-A, demonstrating the therapeutic potential of FXIII-A regulation in fibrosis41.

MFDC

In MFDC, MDK and FOXC2 are highly expressed. MDK is a heparin-binding molecule and a recognized multifunctional protein involved in various biological processes, including development, inflammation, and oncogenesis42. MDK promotes ECM production, including type I and III collagens and glycosaminoglycans, in a dose- and time-dependent manner in dermal fibroblasts43. MDK may also contribute to pulmonary fibrosis by regulating inflammatory cell migration, promoting inflammatory cytokine expression, and mediating epithelial-mesenchymal transition (EMT)44,45. However, MDK exhibits protective effects in certain contexts. It has been shown to alleviate ventricular dysfunction and fibrosis after myocardial infarction by activating the Phosphatidylinositol 3-kinase (PI3K)/protein kinase B (Akt) and mitogen-activated protein kinase (MAPK) pathways46.

FOXC2, a member of the Forkhead transcription factor family, is involved in angiogenesis and mesenchymal-epithelial transformation (MET)/EMT, critical processes for embryonic development and wound healing. It is commonly accepted that abnormal angiogenesis and dysregulation of EMT/MET processes contribute to fibrosis and even keloid formation16,47.

In summary, MDK and FOXC2 exhibit dual properties. They promote wound repair via angiogenesis and EMT, yet also drive excessive vascular proliferation and ECM deposition, thereby promoting fibrosis and scar formation.

HFDC

The keloids in the HFDC group showed more severe clinical manifestations. Gene expression analysis revealed elevated expression of KLK6 and TPSB2 in the HFDC group. Keloids are characterized by persistent local inflammation and excessive collagen deposition. Many studies have shown that inflammation regulats collagen synthesis, and the intensity of inflammation is positively correlated with final scar size48,49,50.

KLK6 belongs to the family of serine proteases and is emerging as a universal biomarker of inflammation. It may promote inflammation and autoimmunity by cleaving protease-activated receptors (PARs) 1 and 251,52. KLK6 was up-regulated in psoriasis patients’ skin compared to normal skin, suggesting its involvement in psoriasis development53. This may be attributed to the enhanced inflammatory response induced by KLK654. Additionally, KLK6 could stimulate immune cells to secrete pro-inflammatory factors and decrease anti-inflammatory factor levels, biasing the immune system toward a pro-inflammatory state and exacerbating neuroinflammation and paralytic neurological impairment in multiple sclerosis patients55. These findings suggest that KLK6 may promote inflammatory responses and contribute to keloid pathogenesis.

Tryptase, a serine protease mainly produced by tissue mast cells, is stored in secretory granules alongside other pre-formed mediators like histamine and chemokines. Interestingly, our findings indicate that TPSB2 may be involved in fibroblast differentiation and keloid progression. Recent studies have verified that TPSB2 enhances collagen synthesis in fibroblasts and leads to peritoneal adhesion56. TPSB2 could also stimulate lung fibroblast proliferation via cleavage and activation of PAR2, promoting ECM deposition and pulmonary fibrosis57. Through the PAR2 pathway, TPSB2 could also maintain liver fibrosis and interstitial fibrosis in IGA nephropathy by activating TGF-β signaling in hepatic stellate cells and renal tubular cells, respectively58,59,60.

To date, there remains a lack of precision therapies targeting keloid pathophysiology molecular mechanisms. Computational in silico analyses are widely used to develop and evaluate pharmacological hypotheses, and are suitable for identifying and optimizing small molecules with specific target affinity61. Based on comprehensive CMap analysis, we identified specific inhibitors for FDBKC, especially for HFDC, which may reverse aberrant gene expression patterns. Among the top 10 predicted inhibitors of HFDC, Ouabain may be a drug worthy of future research. As a selective inhibitor of Na⁺/K⁺-ATPase, Ouabain is clinically used primarily for heart failure. Studies have shown that Ouabain also exhibits anti-fibrotic properties. Ouabain can ameliorate pulmonary fibrosis by inhibiting TGF-β-induced fibrotic signaling, myofibroblast differentiation, and the expression of smooth muscle α-actin (α-SMA), collagen-1, and fibronectin62. Moreover, Ouabain could also alleviate bleomycin-induced pulmonary fibrosis by directly inhibiting the proliferation and promoting the apoptosis of lung fibroblasts63. Therefore, assessing the efficacy and safety of Ouabain represents a promising future direction for this research.

This study differs from previous studies identifying biomarkers for keloids. Firstly, we classified keloids into three subtypes based on fibroblast differentiation-related genes and identified marker genes with potentially therapeutic value. These markers are endogenously expressed in their cell of origin, as opposed to free-floating molecules in blood or cerebrospinal fluid. Secondly, unlike studies preselecting candidate biomarkers, we performed unbiased high-throughput analyses of transcripts from 44,463 single cells isolated from keloids and normal scars. Meanwhile, we quantified the typing markers by IHC. All results proved that the DEFDRGs-based keloid classification had high diagnostic value, as it better reflected clinical reality and distinguished keloid patients from healthy controls.

However, this study has limitations. Mast cells and endothelial cells have also been reported to be closely associated with keloid pathogenesis and activity16,19. However, in this study, in this study, to facilitate clinical translation, keloid classification was only conducted based on fibroblast differentiation. Moreover, pure bioinformatics analysis may have limitations in fully recapitulating fibroblast lineage trajectories and cell differentiation states due to inherent biases. Further cellular and animal experiments are needed to validate the correlation between identified DEFDRGs and phenotypes and to verify the clinical relevance of the keloid classification. Furthermore, the absence of keloid prognosis-related sequencing data (especially for recurrence) in public databases restricts this keloid classification to interpreting keloid development rather than predicting prognosis, thereby limiting its clinical application.

In conclusion, this study elucidated the mechanisms of fibroblast differentiation in keloid pathogenesis, identified stage-dependent diagnostic gaps in keloids, and predicted new diagnostic biomarkers and therapeutic targets. Our research is of great significance for exploring keloid pathophysiological mechanisms and resolving clinical dilemmas in keloid management.

Methods

Data collection

Keloid and normal scar tissues were obtained from patients with confirmed clinical diagnoses, and scRNA-seq was performed using the 10x genomics platform. The scRNA-seq data used in this study were derived from a previous publication (accession code: GSE163973) (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE163973)14. No new scRNA-seq analyses were performed on patient-derived tissues in this study. RNA-seq data of 64 samples were obtained by the HiSeq2500 and HiSeq3000 sequencers (Illumina, Cambridge), which were available in the Sequence Read Archive (SRA ID: SRP137071) or the NCBI-GEO database (accession code: GSE113619)64. Of these, 37 samples were from keloid-prone individuals and 27 from healthy matched controls. Samples were labeled as “keloid” or “healthy” based on their patient origin, and keloid samples further categorized as DAY 0 or DAY42 according to sampling time points. The Cistrome database (http://cistrome.org/)65 was used to construct TF regulatory network. Immunologically relevant gene expression profiles were obtained from the Import database (http://www.import.org/)66, and gene sets and molecular signatures were downloaded from the Molecular Signatures Database (MSigDB, Version 7.1) (https://www.gsea-msigdb.org/gsea/msigdb/index.jsp)67. Biorender (https://biorender.com/) was used to build study flow chart.

ScRNA-seq data processing

Sample procession and scRNA-seq were conducted using 10x Genomics Chromium (https://www.10xgenomics.com/instruments/chromium-x-series). Following demultiplexing, paired-end reads were trimmed to remove template switch oligo (TSO) and poly-A tail sequences. Clean reads were aligned to the GRCh38 (Version: 100) reference assembly and quantified using Cell Ranger Software (Version 5.0.0) (http://10xgenomics.com/).

Then, gene expression matrices were analyzed with the Seurat pipeline (Version: 5.1.0)68. Only cells with transcript expression greater than 100,000, mitochondrial gene proportion less than 10%, and genes expressed in at least three cells could be included for subsequent analyses. Following QC, keloid and normal scar samples were merged into a single Seurat object. The data were scaled and standardized using “ScaleData” function. Besides, the top 2000 highly variable genes (HVGs) were identified via the “vst” method. PCA was performed on HVGs for dimensionality reduction. To mitigate batch effects, Harmony (default parameters) was used to integrate samples while preserving biological variation68. DoubletFinder was applied to detect doublets during the analytical process. The first 20 harmonized principal components (PCs) were utilized for UMAP visualization.

RNA-seq data processing

Firstly, we used fastQC and Trimmomatic software to ensure and control the quality of raw fastq files; the flanking and low-quality sequences were processed to get clean data without redundancy (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). The following analyses were all based on the above clean data. We then mapped clean data to the human reference genome using hierarchical indexing for spliced alignment of transcripts 2 (HISAT2, Version: Ensembl GRCh38) and obtained Sequence Alignment/Map (SAM) Format files containing output alignment information. Next, we used SAMtools (Version: 1.8) to remove the redundancy of the obtained SAM files, and then the Binary Alignment/Map Format (BAM) files would be printed69. Finally, we used the FeatureCounts software package to quantify gene expression levels in all BAM files to evaluate gene transcription count in each sample70. Integrating the count of every sample, we could get a gene expression matrix with rows representing gene names and columns meaning sample IDs, and subsequent analyses were all based on the matrix.

DEGs analysis

Using the “FindAllMarkers” function and “Wilcox” method, the scRNA-seq DEGs between keloids and normal scars were identified from HVGs only when the gene satisfied the condition of |log2Fold Change (FC)| > 0.5 as well as false discovery rate (FDR) < 0.05. These DEGs were then used as potential marker genes for downstream cell type annotations.

Cell type annotation

To identify the cell type of each unsupervised cluster, we integrated the above DEGs in all sub-clusters with known specific biomarkers of a particular cell type from CellMarker71 for the comprehensive cell type annotation.

Considering the variability of gene expression patterns in keloids, we used a specific cellular annotation method. Firstly, the known cell types in keloids included endothelial cells (SELE, PECAM1), fibroblasts (DCN, COL1A1), glandular cells (SCGB2A2, DCD), immune cells (LYZ, PTPRC), keratinocytes (KRT5, KRT1), lymphatic endothelial cells (LYVE1, PROX1), melanocytes (MLANA, DCT), smooth muscle cells (MYH11, ACTA2), and neural cells (S100B, MPZ). The marker genes for these cell types were also known (in parentheses), which were used to annotate corresponding cell types. Besides, we also used the Seurat R package (Version: 3.2.2) and SCANPY module (Version: 1.7.1) in Python (Version 3.6) to display the cellular feature plots and violin plots72,73.

Fibroblast subtype analysis

To assess the distribution of different subtypes in fibroblasts, we extracted all fibroblasts from the integrated dataset, constructed a Seurat object, and performed subgroup clustering analysis. We then used RunUMAP, the FindAllMarkers function, and the Wilcoxon test method to identify DEGs of different fibroblast subgroups. Ultimately, the expression signatures of the top DEGs between different fibroblast subsets were shown by violin plots.

Cell communication analysis

To explore potential interactions and ligand-receptor pairs among different cell types in keloids, we used the iTALK R package (Version: 0.1.0) (https://github.com/Coolgenome/iTALK) to analyze cellular communication74. The top 1,500 HVGs were processed through the “rawParse” function to generate an expression matrix, which served as input for the iTALK object. Significant cell communication patterns were identified using the Wilcoxon rank-sum test (FDR < 0.05). The top 200 ligand-receptor interactions were visualized by ligand-receptor diagrams and iTALK networks using the edgebundleR package (https://github.com/garthtarr/edgebundleR)75. Besides, we analyzed 50 hallmark signaling pathways that were downloaded from the Molecular Signatures Database (MSigDB) (Version 7.1) (https://www.gsea-msigdb.org/gsea/msigdb/index.jsp) and quantified their activity scores across all samples by GSVA (Version: 1.38.0)76,77.

Trajectory analysis

To analyze pseudotime trajectories of keloid fibroblast lineages, we performed trajectory inference using the Monocle R package (Version: 2.18.0)78. Differentiation sequences of fibroblast subtypes were determined based on gene expression profiles. A matrix of Unique Molecular Identifiers (UMI) expression from all fibroblasts was used as input data for unsupervised pseudotime ordering. Then, the pseudotime values of each subtype in different samples were calculated via the “reduceDimension” function with the “DDRTree” method, and the results were visualized as trajectory plots78. Genes related to differentiation, named FDRGs, were extracted from the top variable genes among fibroblast clusters by Differential GeneTest function. To validate the robustness of pseudotime ordering and characterize dynamic gene expression patterns, DEG clusters were analyzed using the plot_pseudotime_heatmap function in Monocle 2.

ConsensusClusterPlus analysis and construction of FDBKC

Firstly, we took the intersection of keloid-related DEGs and FDRGs, which was defined as DEFDRGs. Then, ConsensusClusterPlus analysis was conducted to cluster all included samples based on their internal characteristics of DEFDRGs and construct the FDBKC using ConsensusClusterPlus R package79,80. Based on the expression matrix of DEFDRGs, 80% of samples were randomly subsampled with replacement over 100 iterations. Each subsampled dataset was partitioned into k clusters (k = 2–10) via agglomerative hierarchical clustering (Euclidean distance, Ward’s linkage). Pairwise co-clustering frequencies across iterations were calculated as Consensus Values (CV), which were aggregated into a Consensus Matrix (CM) for each k. A distance matrix (1 – CV) was subsequently generated, and samples were re-clustered into k consensus subtypes using the same hierarchical method. Additionally, we evaluated the clustering results on account of quantitative and visual ‘stability’ evidence so as to evaluate the number of clusters and the confidence of the clustering results and then select the optimal k value79,80.

PCA and clinical correlation analysis

Since the consensus clustering results were categorical variables, PCA was used to calculate the FS of each cluster according to the expression patterns of DEFDRGs to further explore the molecular pathological characteristics of fibroblasts and reduce bias81. To illustrate the relationship between fibroblast differentiation and keloid development, we conducted a clinical correlation analysis integrating three data modalities, including categorical variables (FDBKC subtypes), continuous variables (FS), and clinical parameters (age, sex, diagnosis, time, and other relevant factors). This analysis aimed to validate the clinical relevance and translational potential of FDBKC while elucidating fibroblast-driven pathological mechanisms. The results were visualized through box plots, bar plots, violin plots, and heatmaps.

DEGs identification of RNA-seq data

We utilized the “limma” R package to recognize DEGs between 37 keloid and 27 healthy samples based on RNA-seq data of patients82. Genes satisfying the screening criteria of |Log2 FC| > 1.0 and FDR < 0.05 were selected, and these genes were displayed by heatmaps and volcano plots. Similarly, keloid-related DETFs were determined with the same threshold and method as above.

Quantification of Hallmark signaling pathways by GSVA

We used GSVA to detect the absolute quantification of differentially expressed signaling pathways between keloids and normal scars from 50 hallmark signaling pathways downloaded from MSigDB76.

Functional enrichment analysis

To detect the important downstream signaling pathways, the Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses were both performed83,84. The GO analysis investigated DEG-associated biological processes (BPs), cellular components (CCs), and molecular functions (MFs), whereas the KEGG analysis focused on mapping signaling pathways. The screening thresholds of both GO and KEGG analyses were P < 0.01 and FDR < 0.05.

Immune Cell Infiltration Patterns in Keloids

As a method to estimate the cell composition of complex tissues based on gene expression characteristics, the cell type identification by estimating relative subsets of RNA transcripts (CIBERSORT) algorithm was generally used to explore immune cell infiltrating85. To determine the relationship between identified fibroblast clusters and immune cells in keloids, we uploaded RNA-seq gene expression matrix data to the CIBERSORT database (https://cibersort.stanford.edu/) to obtain the percentage of 22 purified immune cell clusters in each sample85. Genes and immune cells satisfying P < 0.05 would be considered for the following analyses. The association between fibroblast clusters and 22 kinds of immune cells was identified by nonparametric tests. Besides, a correlation analysis of the above immune cells was performed to evaluate the communication among each immune cell. We also estimated and quantified the enrichment of immune-related pathways in keloid samples using Single-Sample Gene Set Enrichment Analysis (ssGSEA)86. Besides, the correlations between DEGs and immune genes or immune-related pathways were analyzed with Pearson correlation analysis. P < 0.05 was considered to have statistical significance.

Construction of the FDBKC subtypes regulatory network

To further explore the potential key regulatory mechanisms of FDBKC subtypes, Pearson’s correlation analysis was carried out on account of DETFs, DEFDRGs, immune cells, and hallmark pathways with a correlation coefficient R > 0.300 and P < 0.001. The complex regulatory network realized data visualization using the igraph package in R87.

CMap analysis

The CMap database (https://portals.broadinstitute.org/cmap/), which contains expression profiles from more than 7,000 cell culture systems and 1,309 small-molecule compounds approved by the Food and Drug Administration (FDA), was leveraged to identify correlations between bioactive drugs and target genes88. As a result, we could get some specific inhibitors targeting DETFs and DEFDRGs and establish a drug-target gene-disease network. P < 0.05 was the screening criterion. Eventually, we identified the top 10 candidate drugs for each FDBKC subtype based on minimal P-values and visualized them with dot heatmaps, hoping to provide some theoretical basis for the precise treatment of keloid patients.

Patients

Formalin-fixed paraffin-embedded human keloid tissues were used for IHC. Keloid, hypertrophic scar, and normal skin samples from patients were collected for protein sequencing. This study was approved by the Ethics Committees of the Shanghai Changhai Hospital (CHEC2023-088). Informed consent for sample collection was acquired from all participating patients prior to study commencement. All ethical regulations relevant to human research participants were followed.

IHC

IHC was performed on formalin-fixed paraffin-embedded human keloid tissues. 4μm paraffin sections were deparaffinized, hydrated in ethanol, and subjected to heat-mediated antigen retrieval with citrate buffer (pH 6.0) or Tris/EDTA buffer (pH 9.0). Sections were blocked with a blocking solution for 30 min at room temperature and then incubated with primary antibody overnight at 4 °C, followed by incubated with species-matched biotinylated secondary antibody. 3,3′-Diaminobenzidine was used as the chromogen, and cell nuclei were counterstained with hematoxylin. The primary antibodies were as follows: anti-MMP1 (ab52631; Abcam; 1:500), anti-KLK6 (ab190924; Abcam; 1:1000), anti-TPSB2 (K007207P; Solarbio; 1:500), and anti-FOXC2 (PA5-24588; Invitrogen; 1:50). The images were obtained using an Olympus light microscope (Japan).

Protein sequencing

Keloid, hypertrophic scar, and normal skin samples were collected. The BCA kit was employed to measure the protein concentration after protein extraction. Equal amounts of proteins from each sample were subjected to enzymatic digestion to obtain peptides. The peptides were dissolved and then loaded onto Evotip Pure columns according to the manufacturer’s protocol. Subsequently, the Evosep One ultra-high-performance liquid system was utilized for separation. After that, the peptides were ionized via the capillary ion source and then analyzed by the timsTOF Pro 2 mass spectrometer.

RNA-FISH

For RNA-FISH on tissue sections, start with antigen retrieval using an EDTA buffer (pH 9.0), followed by three PBS washes. Block endogenous peroxidase with 3% hydrogen peroxide at room temperature in the dark for 25 min. Rehydrate sections in DEPC water and DEPC-PBS, then digest with 20 μg/mL protease K at 37 °C for 20 min. After prehybridizing with prehybridization solution for 2 h at 37 °C, dilute probes in prehybridization buffer, denature at 85 °C, chill on ice, and incubate at 42 °C. Apply the probe mixture for overnight hybridization at 37 °C in a humidified chamber. Perform sequential washes with SSC buffers, incubate with a biotin-conjugated secondary antibody for 1 h, and conduct TSA signal amplification. Finally, counterstain nuclei with DAPI and incubate for 10 min in the dark at room temperature. Images were captured using a fluorescence microscope. The probes used in the experiment were designed and synthesized by Genomeditech (Shanghai, China).

The probe sequences were as follows (from 5’ to 3’):

VIM: TTTGAGTGGGTATCAACCAGAGGGAGT;

COL1A1: TTTCCACACGTCTCGGTCATGGTACCT;

PECAM1: AGGAAGATTCCAGTTCGGGCTTGGAAA;

TPSB2: TTTTGAACAGCAGGGGCTGGCTCTCC.

Statistics and reproducibility

Statistical analysis was performed using R version 4.0.3 (Institute for Statistics and Mathematics, Vienna, Austria, www.r-project.org), Python version 3.6 (https://www.python.org/), Strawberry Perl version 5.30.0.1 (https://www.perl.org/), and GraphPad Prism 9. For continuous variables in descriptive statistics, the mean ± standard deviation (SD) was used when the distribution was normal, while the median (range) was used when the distribution was non-normal. Only P < 0.05 and FDR < 0.05 were considered statistically significant (*P < 0.05; **P < 0.01; ***P < 0.001; ****P < 0.0001).

Ethics approval

This study was approved by the Ethics Committees of the Shanghai Changhai Hospital (CHEC2023-088).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.