Introduction

Pedal non-healing chronic wounds are a significant healthcare burden globally. Amongst these, diabetic foot ulcers (DFU) are the most common. Worldwide there is nearly half a billion people with diabetes, and it is estimated that roughly a quarter will develop DFUs within their lifetime1. Currently the standard of care for wound healing involves surgical debridement, off-loading, strict glucose control, and antibiotics and revascularization if necessary. Despite this, over a half of wounds will fail to heal during a 12 week period and persist for prolonged periods of time as chronic wounds and can lead to lower limb amputations in a considerable portion of the affected patients2. The reason for these poor outcomes is due to diabetes ability to cause multiple complications. Patients with diabetes have increased risk for atherosclerosis and peripheral artery disease, which leads to decreased distal tissue perfusion. From a macrovascular standpoint, this patient population has a diminished nerve axon reflex and loses the ability to locally vasodilate blood vessels in response to injury. Additionally, diabetes leads to peripheral neuropathy causing patients to become more prone to tissue damage due to loss of proprioception and temperature sensing3,4. As a result of these complications and in contrast to normal wound healing in acute wounds, non-healing DFUs are characterized by the presence of persistent low-grade inflammation with limited progression to the next phase of wound healing, the proliferative one.

Early intervention employing intensive care, which includes new treatments that carry a high cost has the potential to improve DFU healing rate and prevent amputations5,6. Identification and characterization of the wound healing state of the DFUs that become chronic, non-healing wounds is therefore very important as it can guide the required intervention. The currently employed methods are mainly based on changes of the wound size over a 4-week period but more accurate techniques are urgently needed7.

The main aim of this study was to identify biomarkers that could classify the state of healing of diabetic wounds. We first obtained wound samples from diabetic mice at different time points and analyzed their protein and microRNA profiles using standard techniques. We then investigated whether a small number of biomarkers could serve as sensors to monitor the healing progress. To this end, we created a function mapping from protein profiles to wound closure data, and then used feature selection techniques in machine learning to select the most important proteins as candidate biomarkers. Furthermore, we performed miRNome landscape analysis on the same timepoints to explore the use of microRNAs as additional biomarkers. Finally, we incorporated sequential serum protein measurements taken over a 12-week from diabetic patients with healing and non-healing DFUs to establish potential correlation of healing outcome with systemic levels of specific proteins.

Materials and methods

Animal studies

Male db/db mice were obtained from Jackson Laboratories (stock # 000642). Mice were weighed once a week and their blood glucose levels were assessed with a commercially available glucometer (Contour, Bayer). Diabetes was confirmed with a random blood glucose measurement level of > 250 mg/dL. For the wounding procedure, 19-week old mice were anesthetized using isoflurane and given buprenorphine sustained release subcutaneously prior to wounding. Two circular full-thickness wounds were created on their depilated and disinfected dorsum with a 6 mm biopsy punch (Integra Miltex). The wounds were then covered with an occlusive dressing (Tegaderm, 3 M) for protection (Fig. 1A). Mice were housed individually after surgery and observed daily. Wounds were photographed and measured with a digital caliper by the same experimenter (ThermoFisher, 14-648-17) on days 0, 3, 5, 10, and 16. Wounds were considered an ellipse and their surface area was calculated by the mathematical formula π*α*β, where α and β are the semi-minor and semi-major axes. Wound closure was expressed as percentage healed compared to day 0 measurements. Mice were euthanized with a CO2 smartbox on days 3, 5, 10, and 16 for wound collection and harvested wounds were processed for proteomics analyses. The BIDMC Institutional Animal Care and Use Committee approved all procedures (protocol #062-2021), which were conducted in compliance with all Federal, State and Local laws and regulations. The study is reported in accordance with ARRIVE guidelines.

Fig. 1
figure 1

Wound healing profile in db/db mice. (A) Experimental design of wound healing in db/db mice. Full thickness 6-mm wounds were created on the dorsum on day 0. Wounds were collected on days 3, 5, 10, and 16 and processed for downstream analyses. Created with BioRender.com (B) Wound size measurements as % open compared to day 0 (n = 18 wounds for Days 3,5 and 10 and n = 20 wounds for Day 16). (C) Representative macroscopic wound images from wounding day and collection days.

Protein extraction from mouse wounds

Protein was extracted from snap-frozen wound tissues and day 0 uninjured skin tissue samples. Tissues were lysed in ice-cold complete lysis buffer, consisting of Tissue Protein Extraction Reagent (T-PER) (ThermoFisher, Cat. No. 78510) supplemented with 10 µl/ml protease and phosphatase inhibitors (ThermoFisher, Cat. No. 78430 and 78,420), and homogenized using a Tissue Lyser (Qiagen, TissueLyser LT) at 50 Hz for 5 min per cycle. Homogenization cycles were repeated until tissue samples appeared fully lysed. Lysed tissues were then incubated on ice for 30 min with periodical pipetting and vortexing, and afterwards centrifuged at 14,000 rpm for 30 min at 4 °C. Supernatants were collected and centrifuged again at 14,000 rpm for 10 min at 4 °C to obtain clean lysates. Protein concentrations were determined using the Pierce™ BCA Protein Assay Kit (ThermoFisher, Cat. No. 23225).

Luminex multiplex assay kits

For proteomics analysis, five magnetic bead panels of mouse MILLIPLEX® Multiplex Assays (EMD Millipore) were used according to manufacturer’s instructions. Plates were read using the xMAP® MAGPIX® instrument and its plate reading software, Luminex xPONENT® (Luminex). The number of wound lysates quantitated per time point were: n = 6 for Day 0, n = 17 for Day 3, n = 18 for Day 5, n = 18 for Day 10 and n = 19 for Day 16. A list of the panels and all included analytes is shown in Table 1. Values per sample were normalized per total protein contained and expressed as pg / total mg of protein.

Table 1 The Milliplex multiplex assay kits and the analyzed proteins used in this study.

Wound exudate collection

We employed the polyvinyl alcohol (PVA) sponge model, adapted from Deskins et al.8 and Crane et al. for collection of wound exudate9. Briefly, PVA sponges were cut into 1 cm2 with 4 mm thickness and were hydrated overnight in PBS buffer. Next day, hydrated sponges were autoclaved for 30 min. The following steps were carried out in aseptic conditions. In a laminar hood, PVA sponges were drained. Each sponge was held with tweezers and vacuumed to remove as much of the buffer possible. Sponges were collected and laid in a large petri dish to dry overnight and further sterilized under UV light. Dried sponges were used for subcutaneous implantation in the mice dorsum, and were harvested on days 3, 5, 10, and 16 post-implantation.

Male db/db mice (stock # 000642) were obtained from Jackson Laboratories and were acclimated to the animal facility for at least one week before surgery. A vertical incision of about 2 cm was created on depilated and disinfected dorsum and a pocket was formed by separating the skin from the muscle to insert 4–6 PVA sponges. Once insertions were completed, the incisions were sutured and dermabond was applied.

On collection days, PVA sponges containing wound fluids were collected in empty 5 ml syringes placed in 15 ml falcon tubes and centrifuged for fluid collection at 3,000 rpm for 10 min at 4 °C. Five mice were used for each time point: n = 5 for Days 3, 5, 10 and 16. Supernatants were collected for miRNA isolation using the QIAGEN miRNeasy mini kit. 70 ng of extracted miRNA were sequenced with the QIAseq miRNA Library Kit (QIAGEN).

Bioinformatics analysis

Briefly, raw FASTQ files were processed using cutadapt10 for quality control, while removing the QIAseq miRNA library kit adapters. Manatee11 was then utilized to quantify miRNA expression using the GRCm38 (mm10) genome and transcriptome Bowtie12 indices, utilizing the Ensembl v102 annotation. Manatee was run allowing for up to 50 multimapping loci, 5 unannotated read abundances per cluster, and cluster merging distance of 50, and one mismatch in mapping in a strand-specific manner and collapsing reads for the same genomic sequences.

Count matrices were then imported into R, miRNAs with less than 5 counts in either group were filtered out from downstream analysis and differential expression analysis was carried out using DESeq213, comparing Early phase wound healing (3 and 5 Days Post wounding-DPW) with Late phase wound healing (10 and 16 Days Post wounding-DPW). The downregulated miRs in Late phase wound healing were then used for miRPath v4 analysis14.

Description of the diabetic human data

We employed data from a group of patients with DFUs who participated in a study that has already been published15. Exclusion criteria included active smokers, previous myocardial infarction, peripheral arterial disease, angina or any other serious illness. Ultimately, 39 patients with diabetes and DFUs were enrolled. Patients were followed biweekly for 12 weeks at the Beth Israel Deaconess Foot Center. Blood specimens were collected at each visit. Serum was profiled with multiplex Luminex assays. The study was approved by the Beth Israel Deaconess Medical Center IRB and all subjects provided a signed Informed Consent Form.

Machine learning methods for feature selection

db/db mice protein measurements

The Least Absolute Shrinkage and Selection Operator (LASSO) was employed to identify the key proteins related with the wound healing process. LASSO solves the following optimization problem15,

$$\underset{{\varvec{\beta}}}{\text{min}}\sum_{i=1}^{N}{\left({Y}^{\left(i\right)}-\sum_{j=1}^{D}{\beta }_{j}{X}_{j}^{\left(i\right)}-{\beta }_{0}\right)}^{2}+\lambda \sum_{j=0}^{D}\left|{\beta }_{j}\right|,$$
(1)

where \(N\) denotes the number of the data, \({Y}^{\left(i\right)}\) is the target variable, \({X}_{j}^{\left(i\right)}\) is the j-th predictor, \({\varvec{\beta}}\) is the LASSO coefficient vector, and \(\lambda\) is the regularization parameter. Solving the optimization problem for a given \(\lambda\) results in a sparse solution, where the LASSO coefficients, \({\beta }_{j}\), of less important predictors vanish. The value of \(\lambda\) determines the sparsity of the solution, i.e., increasing \(\lambda\) results in a sparser solution. LASSO is one of the most widely used and robust methods to select the most relevant features by removing co-linearity.

The LASSO was employed to identify key features related with the wound healing process from the diabetic mice data. The percentage of open wound area on days 3, 5, 10, and 16 was used as the target variable, \(Y\), and the proteins were used as \({\varvec{X}}\)’s in (1). All the variables (\({\varvec{X}},Y\)) were standardized such that every feature had zero mean and unit standard deviation. The LASSO model was trained by using the “glmnet” package in R16.

First, a tenfold cross validation was performed for a wide range of the regularization parameter, \(\lambda\) (Fig. 2A). It is shown that the mean-squared error (MSE), which is the first term in (1), reduces as \(\lambda\) is increased from 3 × 10–7 to 0.150. Note that the size of the dataset is only 62, while the number of the predictors, i.e., proteins, is 69. Because of the relatively small data size, a machine learning model is likely to overfit the data, which explains the large cross-validation error for smaller values of \(\lambda\). As \(\lambda\) is increased, LASSO reduces irrelevant features more aggressively by setting their coefficients, \({\beta }_{j}\), zero, which prevents the overfitting. However, when \(\lambda\) becomes too large, e.g., \(\lambda >0.150\), the sparsity condition of LASSO overwhelms the optimization problem, which results in a less accurate model, which is shown as the increase of MSE for \(\lambda >0.150\).

Fig. 2
figure 2

LASSO regression results and feature selection of db/db mice wounds. (A) Mean-squared error (MSE) of LASSO regression against the regularization parameter, \(\lambda\). For better visualization, the horizontal axis is scaled using the logarithm (Log(\(\lambda\))). The error bar denotes one standard deviation of MSE from tenfold Cross Validation. The dotted vertical line indicates \(\lambda\) at the minimum of MSE (\(\lambda =0.150\)). (B) LASSO solution path. The lines indicate the changes of \({\varvec{\beta}}\)’s with respect to \(\lambda\). In both (A) and (B), the numbers on the top of the plots indicate the number of non-zero \({\varvec{\beta}}\)’s, i.e. the number of relevant features selected by LASSO. (C) Feature Selection by LASSO of db/db mice wounds. 8 Proteins from db/db mice wound healing with non-zero β’s are selected by LASSO with λ = 0.041, i.e., λ at the minimum MSE. The proteins are ordered from left to right with respect to the absolute value of βj. The magnitude of the LASSO coefficient, βj, indicates the relative contribution to the wound healing, while the sign shows the direction. For example, MMP-2 has the strongest correlation with the % open wound with negative correlation. On the other hand, Leptin has the second strongest correlation but is positively correlated with % open wound. (69 proteins analyzed and n = 17 measurements for Day 3, n = 18 for Days 5 and 10, and n = 19 for Day 16).

The solution path of LASSO is shown in Fig. 2B. The solution path shows how the LASSO coefficients, \({\varvec{\beta}}\), change with respect to \(\lambda\). Each line corresponds to the LASSO coefficient, \({\beta }_{j}\), of a protein. It is shown that the LASSO solution starts to shrink rapidly, as \(\text{log}\left(\lambda \right)\) becomes larger than − 10. At the minimum MSE (\(\lambda =0.150\)), the number of relevant features, i.e. proteins with non-zero \({\varvec{\beta}}\), is reduced to 8. It also shows that, among the nine features, two proteins have the most noticeable correlation with the percentage of open wound area.

DFU patients’ serum measurements

LASSO was also applied for the feature selection of the logistic regression for the DFU subjects’ human data. The logistic regression aims to learn a model for a binary classification. In the logistic regression, when the target is an integer variable, \(Y\in \left\{\text{0,1}\right\}\), the probability of Y being one given a predictor, X, is computed as

$$p\left( {Y = 1{|}{\varvec{X}};{\varvec{\beta}}} \right) = \frac{1}{{1 + \exp \left( { - \user2{X\beta }} \right)}},$$
(2)

in which \({\varvec{X}}{\varvec{\beta}}=\sum_{j=1}^{D}{\beta }_{j}{X}_{j}\). With LASSO, the logistic regression model is trained by solving the following optimization problem,

$$\underset{{\varvec{\beta}}}{\text{min}}\sum_{i=1}^{N}\left[{Y}^{\left(i\right)}{{\varvec{X}}}^{(i)}{\varvec{\beta}}-\text{log}\left(1+\text{exp}\left({{\varvec{X}}}^{(i)}{\varvec{\beta}}\right)\right)\right]+\lambda \sum_{j=0}^{D}\left|{\beta }_{j}\right|.$$
(3)

Once the logistic model is trained with LASSO, the coefficients,\({\beta }_{j}\), of less relevant features become zero. The LASSO logistic regression model is trained by using the “scikit-learn” package in Python16.

Among all healed patients, the data from two groups of patients are selected based on the number of days to wound healing (TH). The target variable, Y, is generated from the grouping; the faster healing group is assigned Y = 1 and the slower healing group has Y = 0. To investigate the temporal nature of wound healing, the data set is further divided into three stages, Stage I ~ III. Each stage consists of the data from two consecutive clinical visits, e.g., the data from the first and second visits are grouped together for Stage I, the third and fourth are Stage II, and the fifth and sixth are Stage III. For each stage, the above LASSO based logistic regression is applied and select the most important features which can discriminate faster and slower healers. We used the K-nearest neighbor imputation method to impute missing values, with K = 5.

db/db mice microRNA measurements

Finally, an ordinal regression with LASSO is employed to predict the date from the initial wound from microRNA measurements of mice. Let the class label C = 1,2,3,4 denote the day 3, 5, 10, and 16 from the initial wound. Then, there is an ordinal structure between the classes, e.g., class 3 is closer to class 2 than class 1 and class 3 comes next to class 2. To account for such structure, the probability of the prediction Y being class k is modeled as

$$p\left( {Y = k{|}{\varvec{X}};{\varvec{\theta}},{\varvec{\beta}}} \right) = {\Phi }\left( {\theta_{k} - \user2{X\beta }} \right) - {\Phi }\left( {\theta_{k - 1} - \user2{X\beta }} \right)$$
(4)

Here, \(\Phi\) denotes the cumulative distribution function of the standard normal distribution. The ordinal regression model is trained from a penalized negative log likelihood method,

$$\underset{{\varvec{\theta}},{\varvec{\beta}}}{\text{min}}-\sum_{i=1}^{N}\text{log}\left[\Phi \left({\theta }_{{C}^{\left(i\right)}}-{{\varvec{X}}}^{\left(i\right)}{\varvec{\beta}}\right)-\Phi \left({\theta }_{{C}^{\left(i\right)}-1}-{{\varvec{X}}}^{\left(i\right)}{\varvec{\beta}}\right)\right]+ \lambda \sum_{j=0}^{D}\left|{\beta }_{j}\right|,$$
(5)

where \({C}^{(i)}\) is the class label of the i-th data and we set \({\theta }_{0}=-\infty\) and \({\theta }_{K}=\infty\). The ordinal regression model is trained by a stochastic gradient descent method. In addition, for the feature selection on synthetic data with both protein features and microRNAs, we use the same ordinal regression method.

Results

The wound healing and protein profile in db/db mice

Mice were sacrificed on days 3, 5, 10, and 16, which represent stages of inflammation, proliferation and remodeling17,18,19, for tissue collection, and wounds were processed for proteomics analysis and histological staining (Fig. 1A). Favorable wound healing signs such as partial or complete re-epithelialization, decreased wound sizes, and full wound coverage were also observed (Fig. 1C). Db/db mice largely showed a delayed healing profile as previously reported by us20,21,22 and others23,24,25, with a gradual decrease in the average percentage of open wound area from 112.9 ± 17.4% on day 3, to 86.4 ± 15.3% on day 5, to 58.5 ± 17.3% on day 10, and reaching 29.6 ± 23.6% on day 16 (Fig. 1B).

Five multiplex assay panels with a combined total of 69 distinct proteins were analyzed to investigate the longitudinal protein profile of the wounds along the healing course. Measured proteins included angiogenesis associated proteins, growth factors, matrix metalloproteinases (MMPs), TGF-β pathway associated proteins, and various cytokines and chemokines (Table 1). Protein data were linked to the wound closure rates and several proteins were found to be increased or reduced depending on the wound healing stage. Unwounded skin measurements (day 0) were used as baseline levels. For example, MCP-1 and KC were elevated on all following time points post-injury (Supplementary Fig. 1), while FGF-2 decreased on days 3 and 5, and then increased on days 10 and 16 (Supplementary Fig. 2). Other proteins, namely MMP-2, HGF, and soluble CD31 declined on day 3, then gradually grew on days 5, 10 and 16 (Supplementary Fig. 3). All protein measurements are provided in Supplementary Table 1.

The microRNA profile in db/db mice wound exudates

We employed the subcutaneous implantation sponge model to ensure sufficient material for microRNA sequencing and collected exudate on days 3, 5, 10 and 16 post-implantation. We initially performed Exploratory Data Analysis to identify potential outlier samples in our dataset (Fig. 3A and B) after applying a variance stabilizing transformation, where we observed most of data points to group together according to days post-implantation. We then performed Differential Expression Analysis (DEA) revealing 40 upregulated and 39 downregulated microRNAs (full list on Supplementary Table 2) in Early stage wound healing, including most prominently mir-122-5p, as well as the mir-335 family (Fig. 3C). Subsequently, the top downregulated miRNAs were utilized to identify de-regulated pathways in Early vs Late stage wound healing. Figure 3D shows the downregulated pathways in early stage of wound healing, while Fig. 3E shows the upregulated pathways in early stage of wound healing. Notably, processes like cellular response to fibroblast growth factor stimulus and angiogenesis were downregulated early on, while regulation of fibroblast proliferation were upregulated.

Fig. 3
figure 3

The microRNA profile in db/db mice wound exudates. Principal Component Analysis of top 500 variable miRNA expression after variance stabilizing transform, colored by Days Post Wounding (DPW) (A). Heatmap of sample to sample Pearson correlation using the top 200 variable microRNAs (B). Volcano plot of Early (DPW 3 and 5) vs Late (DPW 10 and 16) wound healing phases showing differentially expressed genes. Positive Log2 Fold Changes (FC) indicate microRNAs upregulated in the early phase, whereas negative Log2 FC indicate microRNAs upregulated during the late phase (C). DIANA-miRPath v4.0 analysis on top de-regulated microRNAs in Early vs Late phase wound healing, using the collective miRNA targets to identify differentially regulated pathways (D and E). D is downregulated pathway in early and E is upregulated pathway in early. (n = 5 for each timepoint).

Feature selection for diabetic mice wound healing

Figure 2C shows the LASSO coefficients of the identified key proteins at the minimum MSE. The sign of \(\beta\) indicates a positive / negative association with the open wound area. In other words, a negative \(\beta\) indicates that the concentration of the protein is higher when the open wound area becomes smaller and vice versa. The absolute value of \(\beta\), i.e., magnitude of the coefficient, indicates the strength of the association. A larger absolute value of \(\beta\) indicates the protein has a stronger correlation with the open wound area.

Figure 4A shows the out-of-sample prediction of the percentage open wound area by a linear regression, which uses only 8 proteins shown in Fig. 2C as the predictors. The out-of-sample prediction was performed by a tenfold cross validation. The goodness of the fit (R2) of the prediction was 0.66, indicating that the LASSO model fit the data well. For a comparison, we also show the out-of-sample prediction of a linear model (Ordinary Least Squares –OLS) trained with all the 69 proteins as the input feature set in Fig. 4B. This result suggests that the linear model overfits the data, which led to a poor out-of-sample prediction accuracy (R2 = 0.004). MMP-2 had the strongest correlation with the percent open wound with negative correlation, while Leptin had the second strongest correlation but was positively correlated with % open wound. Furthermore, macrophage inflammatory protein 2 (MIP-2), interleukins 2, 5 and 12, hepatocyte growth factor (HGF) and leukemia inhibitory factor (LIF) were the remaining proteins that arose as predictors (Fig. 2C).

Fig. 4
figure 4

Out-of-sample prediction of Percent Open Wound in db/db mice by (A) the proteins selected by LASSO and (B) all the proteins. The dotted lines denote the prediction, the circles denote the test samples, and the distance from circles to the dotted lines indicate the prediction errors. The LASSO prediction was made by, first, selecting the proteins by a tenfold cross validation (Fig. 2), and, then, making out-of-sample predictions by another tenfold cross validation. The prediction by the full protein set is shown only when the prediction is within 0 ~ 150% (x axis). Due to the overfitting, the out-of-sample prediction of the full protein set ranges from − 2397 to 1739%. The out-of-sample prediction error using only selected proteins (A) is clearly smaller than using the full protein set (B): the goodness of the fit (R2) degrades from 0.66 (A) to 0.004 (B).

Feature selection for diabetic human wound healing

Of the 39 patients enrolled, after 12 weeks 15 patients fully healed their DFU, while 24 patients failed to heal. The hierarchical clustering analysis of the 28 proteins measured does not reveal strong correlations among the biomarkers (Supplementary Fig. 4). The patient cohorts were renamed “healers” and “nonhealers,” respectively. We focused our attention on the “healers”, who are the subjects whose ulcers successfully healed. Among the 15 healers, the data from two groups of patients were selected based on the number of days to complete wound healing (TH). The first group, faster healing group, consisted of total 8 patients with TH < 40, while the second group, slower healing group, consisted of total 4 patients with TH > 100; the remaining 3 patients with 40 < TH < 100 were discarded to make the two selective groups distinct enough. Table 2 lists the TH of the patients in the faster and slower healing groups. The target variable, Y, was generated from the grouping; the faster healing group was assigned Y = 1 and the slower healing group had Y = 0. To investigate the temporal nature of wound healing, the data set was further divided into three stages, Stage I, II and III. Each stage consists of the data from two consecutive clinical visits, e.g., the data from the first and second visits were grouped together for Stage I, the third and fourth were Stage II, and the fifth and sixth were Stage III. Hence, each stage had a total of 24 data points, two clinical visits of the 12 patients.

Table 2 The number of days to wound healing (TH) for the two patient groups.

Three logistic regression models were trained for each stage. The input data was scaled such that each variable had zero mean and unit standard deviation. The regularization parameter was set to \(\lambda =0.01\). Figure 5 shows the prediction of the logistic regression models. For all stages, it is shown there is a clear separation between the faster and slower healing groups in the latent space, \(Z={\varvec{X}}{\varvec{\beta}}\), indicating that there is a clear difference in the proteins between the faster and slower healing groups.

Fig. 5
figure 5

Logistic Regression Model Prediction for (A) Stage I, (B) Stage II, and (C) Stage III in serum measurements of diabetic patients with healing DFUs. The horizontal axis denotes \({Z}^{(i)}={{\varvec{X}}}^{(i)}{\varvec{\beta}}\) and the vertical axis is the probability of being in the faster healing group, i.e., Y(i) = 1. Thus, the dashed lines indicate the patients in the latent space Z and their predicted probabilities of being in the faster healing group. The solid circles denote the ground truth. It is shown that in the latent space, Z, the fast and slow healing groups can be clearly distinguished.

The features selected by LASSO are listed in Tables 3, 4 and 5 for each stage. It is shown that, among the 28 proteins, only 11 proteins were relevant in distinguishing the faster healing group from the slower healing group in Stage I, which changes to 8 and 10 proteins in Stages II and III, respectively. The positive \({\beta }_{j}\) indicate the protein was positively associated with faster wound healing, while the proteins with negative \({\beta }_{j}\) had negative correlations with faster wound healing. The relative magnitudes between \({\beta }_{j}\)’s imply the strength of the association. For example, Table 3 and 4 indicate that the amount of Fractalkine was the strongest indicator of the faster healing group in Stages I and II, while, in Stage III, FGF-2 became the strongest indicator.

Table 3 Feature Selection by LASSO for Stage I, II and III of the two patient groups. The sign of \(\beta\) indicates the direction of the correlation, while the magnitude shows a relative strength of the correlation.
Table 4 Feature Selection by LASSO for mice microRNA data. The sign of \(\beta\) indicates the temporal association, while the magnitude shows a relative importance. A negative \(\beta\) indicates a strong association with the early days and vice versa.
Table 5 Feature selection by LASSO from the synthetic data combining proteins and microRNAs.

Feature selection for mice microRNA data

The ordinal regression model was trained to predict the days from the initial wound from the mice microRNA data. All normalized counts for the microRNA data used are provided in Supplementary Table 3. The days from the initial wound, Y = 3,5,10,16 (days), were converted to a class label, C = 1,2,3,4, and the ordinal regression model aimed to predict the correct class label given the set of 413 microRNAs. Before training the model, all the features were scaled by the respective minimum and maximum values such that the input feature to the model was in a hypercube of \({\left[\text{0,1}\right]}^{D}\), in which the dimension \(D=413\). The regularization parameter, \(\lambda =0.3\), was set to allow only one misclassification among 20 training data points to prevent overfitting. After the model training, all the irrelevant microRNAs, of which corresponding \({\beta }_{j}\) was less than 0.1 in the absolute value, were removed.

The selected features are listed in Table 4. Among 413 microRNAs, only 12 were selected by LASSO. The signs of the coefficients, \(\beta\), indicate a temporal association. A negative value of \(\beta\) indicates the microRNA is related with the early days of wound healing, while a positive \(\beta\) means the microRNA is more pronounced in the later days. The magnitude of \(\beta\) is related with the strength of the temporal association. It is shown that mmu-miR-107-3p and mmu-miR-122-5p are strongly associated with the wound process in the early days, while mmu-miR-7a-2 and mmu-miR-1b are correlated with the later days.

Feature selection across modalities

Since the protein and microRNA data were not from the same mice, synthetic data was created to combine these features together. In particular, we randomly selected a mouse with protein data and another mouse with microRNA features in the same healing stage and combined them. The features from the two mice were concatenated to form a new synthetic sample, and its healing stage remained unchanged. We also removed all redundancy so that an original mouse could not be used to generate two different synthetic samples. Consequently, we obtained 20 synthetic samples each of which had the 8 selected protein features and 12 selected microRNA features. Similar to the experiment on microRNA data, ordinal regression with LASSO was employed to predict the date from the initial wound. In total 4 microRNAs (mmu-miR-1b, mmu-miR-3473b, mmu-miR-107-3p, and mmu-miR-1306-5p) and 5 proteins (MMP-2, HGF, IL-5, IL-12, and Leptin) were selected as the key contributors. The selected features are listed in Table 5.

Discussion

In light of the relatively small sample sizes across our datasets, ranging from 4 to 19 samples per group, it is important to acknowledge that our findings are inherently limited in their generalizability and should be interpreted as preliminary insights specific to the studied cohorts rather than as definitive biomarkers for broader populations.

A well-established type 2 diabetic mouse model, db/db mice was used to study impaired wound healing. In this study, we ran a proteomics analysis of wound tissue proteins that were compared to the wound sizes to establish a correlation between the observed protein profile and wound closure. In addition, our analysis highlighted biomarkers and targets that could not only help correctly identify the extent of healing, but could be also leveraged to potentially accelerate the repair process. MMP-2, Leptin, IL-2 and IL-5 were the proteins that emerged as possible predictors of wound healing stages in diabetic mice and have previously been variedly implicated in cutaneous repair processes. MMP-2, a gelatinase and key tissue remodeling molecule that is secreted by dermal fibroblasts and neutrophils, was elevated in chronic wounds26,27. Both systemic and topical treatment with Leptin, a hormone produced by adipocytes, has been shown to accelerate wound closure in the ob/ob diabetic mouse model28,29. Notably, the mouse model employed in the present study is a Leptin receptor deficient animal, so this should also been taken into consideration when interpreting these findings. IL-2 is an important regulator of T-cell proliferation and differentiation and has been reported to affect granulation tissue strength30. Furthermore, IL-5 mediates eosinophil accumulation and increased levels have been linked to delayed wound healing31. Finally, mounting evidence suggests that HGF promotes diabetic wound healing32,33.

To complement our murine proteome analysis, we additionally employed microRNA sequencing to profile multiple microRNAs that could be important for wound healing. To ensure collection of sufficient amount of exudate, we opted for the PVA sponge implantation technique. Despite being a different model of injury than the excisional that was used for protein collection, we are confident that this approach still enabled us to capture integral microRNAs that are associated with discrete stages of cutaneous repair. Exudate composition is a reflection of the wound microenvironment and can provide insights into cell types and processes being differentially activated. MicroRNAs are short, non-coding ribonucleic acids that can widely affect protein expression. Some work to silence RNAs by binding to the mRNAs to destabilize them prior to translation, while others can work as promoters to enhance transcription34,35. As shown in Table 4 some of the microRNAs identified in our study have positive beta-coefficients, which represent association with the later stages of healing while others are associated with earlier stages (negative beta-coefficients). The temporal abundance of a microRNA at earlier or later stages of wound healing does not necessarily indicate an inhibitory or pro-reparative effect; further functional investigations are required to elucidate its specific role and regulatory mechanisms in the wound healing process.

Amongst the later healing stages associated microRNAs, miR-122-5p has been linked to amplified angiogenesis. Lou et al. reported that miR-122-5p promoted angiogenesis and activated VEGF signaling. They found that this microRNA potentiates new capillary formation through enhanced fatty acid utilization in endothelial cells36. Similarly, miR-107-3p has also been documented to promote angiogenesis via a Dicer 1- dependent pathway37. Conversely, miR-199a was associated with the earlier stages of wound healing. Previous studies have described how this microRNA interferes in multiple pathways. By targeting the JAK-STAT signaling pathway, it inhibits macrophage-mediated inflammation38. More recently, Wang et al. found that this microRNA suppresses VEGF in diabetic foot ulcers and that overexpression led to arrest of cell proliferation and migration39.

Importantly, we also analyzed human DFU patient derived serum protein data to delineate chronological aberrancies in diabetic wound healing. By segregating the wounds into different stages, our findings demonstrate that the protein levels at each stage of wound healing are different. While some of these deficiencies are expected, others appear to not be well known. In the non-healer cohort during the early stages, there was a clear deficit of Fractalkine and sVCAM. Our model also showed that in the later stages of wound healing, non-healers displayed diminished levels of FGF2, IL-21 and IL-23. Fractalkine, the unique ligand for the chemokine receptor CX3CR1, is a membrane bound cytokine that signals to the cell nucleus via a G protein-coupled receptor 40. It is expressed on endothelial cells of blood vessels as well as multiple inflammatory cells such as T-cells and natural killer cells and is known to potentiate the inflammatory state by attracting macrophages and fibroblasts41. It has also been shown that patients with diabetes have higher levels of Fractalkine and this is a proposed mechanism as to why this cohort develops early atherosclerosis. However, its role in wound healing is unclear42.

Herein, our data shows that early expression of Fractalkine (Stages I and II) could be a significant contributor (β > 5) to wound healing. While the mechanism is not currently well understood, one hypothesis is that Fractalkine is a pro-inflammatory cytokine. Studies have shown that patients with diabetes have a chronic low-grade inflammatory state which impedes wound healing43. In order to progress to the proliferative phase an acute-like augmented inflammation is required to enable recruitment of fibroblasts, keratinocytes and masts cells15,44,45. It is possible that Fractalkine operates as an integral part of this inflammatory burst; however, further studies are necessary to corroborate this. Moreover, FGF-2 became the strongest predictor during Stage III and this fits with its established role as a fibrosis mediator46 and regulator of epithelial-mesenchymal transition47, which take place later in the wound healing cascade.

Currently, wound care is dictated by clinical acumen and assessing vascular perfusion. This immense amount of novel data can now set the groundwork for future experiments in both animals and humans. By accurately predicting the stage of wound healing, researchers and clinicians can exploit this data to accelerate wound healing by identifying the missing proteins or inhibiting overexpressed ones.

The Least Absolute Shrinkage and Selection (LASSO) methods were employed to identify key biomarkers. LASSO solves a regularized optimization problem with the mean absolute regularization. As the magnitude of the regularization increases, LASSO more aggressively removes less relevant features, sacrificing the accuracy. To keep a balance between feature selection and prediction accuracy, a tenfold cross validation was performed.

In this study, LASSO is employed to identify key biomarkers. One of the biggest challenges in applying the advanced ML approaches is the lack of sufficient training examples. In our problem, the size of the data is relatively small, ranging from around 24 to 64 and the number of biomarkers is typically a few times larger than the size of the data. LASSO solves a convex optimization problem to find a global model, which is shown to provide a more robust feature selection for this kind of small-data-large-feature-set problems, compared to advanced feature selection methods, such as Random Forest48, Gradient Boosting49, and Stabl50. However, it is important to note that LASSO considers only the first-order effects and cannot correctly identify nonlinear interaction between the features.

The study has its limitations. The main limitation is that there was no DFU tissue analysis in the human study. The tissue that can be collected during routine debridement treatments is usually degraded and not suitable for high quality proteomics quantitation. In addition, diabetic mice present with delayed wound healing whereas non-healing DFUs do not heal even after prolonged period of time. This is a well-known weakness of the mouse model but, nonetheless, the current consensus is that it is still the best model to conduct exploratory studies that can subsequently expand to human or other animal model studies. The animal wounding models we utilized for generation of protein and microRNA data were distinct: an excisional wound healing model vs a subcutaneous sponge implantation model and this should be considered for data interpretation. Furthermore, it should be reiterated that the datasets analyzed, especially for the human subjects, were relatively small and additional studies are necessary to confirm our observations. Ultimately, while the reported associations of these markers with wound healing processes provide valuable insights, further mechanistic and clinical investigations are essential to establish their definitive functional roles, predictive value and therapeutic potential in human diabetic wound healing.

Conclusions

Understanding diabetic wound healing pathophysiology remains a critical component for researchers to improve outcomes. Despite standard of care treatments, DFUs still have very poor healing rates. By using proteomics from mice and human patients, as well as complementary microRNA mouse data with computer regression models we can better understand the molecular and protein deficits associated with impaired diabetic wound healing. LASSO in particular is a useful method for these data sets as it helps direct the focus on a smaller subset of clinically important markers.