Introduction

Port-wine stain (PWS), also known as nevus flammeus, is a benign congenital vascular malformation characterized by the dilation and malformation of superficial dermal capillaries and post-capillary venules. The incidence of PWS in newborns is approximately 0.3–0.5%1. Further, PWS frequently occurs in exposed areas, such as the face and neck, and generally does not regress spontaneously. If not treated promptly or appropriately, the lesions may enlarge, thicken, and darken as the patient ages, potentially leading to spontaneous bleeding2. PWS is a disfiguring condition that can lead to psychosocial issues and reduce the overall quality of life of patients. Furthermore, the pathogenesis of PWS remains unclear3making it difficult to prevent the condition at its root cause. Current research focuses primarily on optimizing treatment strategies.

In recent years, photodynamic therapy (PDT) has been rapidly developed and shown to be an effective treatment for PWS, which can induce photochemical reactions leading to cell death by apoptosis, autophagy, or necrosis4. Hematoporphyrin monomethyl ether (HMME), also called Hemoporfin, is a novel second-generation porphyrin photosensitizer. It offers advantages, such as safety, minimal side effects, non-invasiveness, and high selectivity for abnormal blood vessels. Hematoporphyrin monomethyl ether-PDT (HMME-PDT) has been approved for the treatment of PWS in China since 20175. The efficacy of HMME-PDT has been confirmed in both children and adults by years of clinical observation6. It is worth noting that the application of HMME-PDT is not limited to the treatment of PWS. Studies have shown that it also demonstrates significant efficacy in combating fungal infections and drug-resistant bacterial infections7,8. Due to the heterogeneity of vascular malformations and individual differences among patients, there are significant variations in the efficacy of HMME-PDT among different patients9.

Many non-invasive skin examination techniques are gradually being applied to predict, assesss and intraoperatively monitor treatment outcomes in PWS. These include approaches such as dermoscopy, reflectance confocal microscopy, and high-frequency ultrasound. Research has shown that different vascular patterns identified using dermatoscopy can predict the efficacy of HMME-PDT10. Real-time fluorescence spectroscopy can monitor the skin photosensitizer concentration during PDT for PWS and predict treatment responses11. However, there is still a lack of comprehensive methods to predict PDT efficacy based on various clinical factors in PWS patients, which is far from meeting clinical needs. Therefore, there is an urgent need for precise and comprehensive new methods to predict patient outcomes, which can assist clinicians in treatment decisions and achieve accurate and personalized therapy.

Machine learning has played a crucial role in medical research and clinical practice, given the advancement and development of big data technology. Machine learning techniques can autonomously learn the identification of hidden relationships in existing data and use this knowledge to predict outcomes for unknown data. Further, machine learning has garnered global attention due to its efficient data processing, deep analytical capabilities, and self-optimization attributes. Currently, machine learning has a unique value in several fields, such as basic medical research, clinical practice, and epidemiological studies12,13,14. For example, Mungeret al. used the random forest machine learning algorithm in an observational cohort study to identify the main predictors of non-calcified coronary plaque burden in psoriasis patients15. In another report, Maintz et al. employed a machine learning-gradient boosting approach to analyze the factors associated with the severity of atopic dermatitis16. Zheng et al. were the first to construct risk prediction model for postherpetic itch using machine learning methods, including logistic regression, random forest, k-nearest neighbor, gradient boosting decision tree and neural network17. Furthermore, machine learning algorithms use non-parametric models that do not rely on assumptions about data distribution in contrast to traditional statistical methods. These algorithms can flexibly adapt to the nonlinearity and complexity of the data and automatically adjust without the need for human intervention. They can efficiently and objectively reveal the complex and hidden relationships between clinical characteristics and patient outcomes, as well as the relationships among various features18.

In summary, this study collected data on clinical characteristics of HMME-PDT of PWS patients, including the immediate fluorescence intensity (IFI) at the lesion site after HMME-PDT, dermoscopy vascular pattern, and the facial port-wine stain area and severity index (FSASI) score19. Feature selection was performed using the Recursive Feature Elimination (RFE) method, and efficacy prediction models for HMME-PDT were constructed by applying Extreme Gradient Boosting (XGBoost) and Random Forest (RF) algorithms. These models were comprehensively evaluated to determine the superior model. The relationships between various clinical features and the efficacy of HMME-PDT were explored by leveraging these models. The work aimed to establish a foundation for timely and accurate assessment and adjustment of HMME-PDT treatment plans.

Materials and methods

Research design and objectives

This study retrospectively analyzed clinically diagnosed facial PWS patients (n = 131) who received a single HMME-PDT session at the Second Xiangya Hospital of Central South University from May 2022 to January 2025. Patients enrolled between May 2022 and April 2024 were assigned to the training cohort (n = 78), while patients enrolled between May 2024 and January 2025 were assigned to the validation cohort (n = 53). Exclusion criteria included: (1) presence of other vascular malformations; (2) without clear clinical images for efficacy evaluation. This study was approved by the clinical research ethics committee of Second Xiangya Hospital of Central South University (LYEC2024-K0152). The research was performed in accordance with the principles of the Declaration of Helsinki. Written informed consent was obtained from all patients before treatment. The patients mentioned in this manuscript have given written informed consent to the release of images that may lead to identification. Specially, minor patients was obtained consent from their parents and legal guardians.

Treatment protocol

Pre-treatment preparation

All patients completed blood count, liver and kidney function test, and ECG examination to evaluate treatment risks. An experienced operator captured lesion images using VISIA-CR™ system (Canfield Scientific, Inc., United States) under consistent angle and light intensity. Dermoscope images of lesion were detected by the hand-held dermoscope with 50X polarized light (Guangzhou Chuanghong Medical Technology Co. Ltd., China) in the same area. Each patient was assessed for the subtype of lesion and the FSASI score.

HMME-PDT treatment

Hemoporfin solution (Shanghai Fudan Zhangjiang Biopharmaceutical Co. Ltd., China) was prepared at a dosage of 5 mg/kg. The patient was made to lie down, the hair at the treatment site was shaved, and the lesions were fully exposed and cleaned. Normal skin within the irradiation area was covered with a light-blocking cloth while the operator wore protective goggles. The total administration time was 20 min for adults (speed of the infusion pump speed was 150 ml/h) and 5 min for children weighing ≤ 40 kg (speed of the infusion pump speed was 240 ml/h). Illumination was started at the 8th minute of drug infusion for adults and the 3rd minute for children. The lesion was irradiated using 532 nm LED green light (Wuhan YaGe Photoelectric Technology Co. Ltd., China) for 15–20 min for each spot. An auxiliary treatment head was used simultaneously for larger or more dispersed lesions. After treatment, a routine cooling spray or cold compress was applied for 30 min, and patients were given postoperative care instructions.

Post-treatment and follow-up

Immediately after HMME-PDT, the lesion was irradiated to induce fluorescence response using UVA (385 ± 10 nm) as the excitation light source. Fluorescence images were recorded with an Apple iPhone XS mobile phone. After the single HMME-PDT, the patient underwent follow-up at the outpatient department. The same VISIA-CR™ system was utilized to capture lesion images under constant conditions.

Data collection

The clinical baseline data of each patient was collected, including age; gender; the subtype of the lesion; history of previous pulsed dye laser (PDL) treatment; pre-treatment FSASI19 score; dermatoscopic images before treatment; immediate fluorescence images after HMME-PDT; clinical images before and after HMME-PDT; and the interval between two treatment. The efficacy assessment followed the traditional classification method. Depending on the degree of regression in treated lesions, the clinical efficacy was categorized into two classes: effective groups (≥ 25% improvement) and ineffective groups (< 25% improvement). The post-PDT fluorescence was qualitatively classified into three levels. It was defined as weak when little fluorescence was observed, and intense when homogeneous, bright fluorescence was observed (Fig. 1). The intermediate fluorescence fell in between these two levels. We focused on three main dermoscopic patterns, as follows the superficial type included point globular and short clubbed blood vessels; the deep type included mainly reticular and linear blood vessels (Fig. 1). If the images included both these vascular patterns, it was defined as a mixed type. All these mentioned evaluation techniques were independently assessed and recorded by two dermatologists with extensive clinical experience.

Fig. 1
figure 1

Clinical pictures, dermoscopy and fluorescence imaging of port-wine stain (PWS)lesions before and after one session of hematoporphyrin monomethyl ether-photodynamic therapy (HMME-PDT). (a) A PWS patient that showed ≥ 25% improvement (effective). (a1) Before treatment and (a2) after HMME-PDT. (a3) Superficial type: dotted and globular, and short clubbed vessels were seen under dermoscopy before treatment. (a4) Immediate fluorescence intensity (IFI) at the lesion site after HMME-PDT where bright and homogeneous fluorescence was detected in the lesion. (b) A PWS patient that showed < 25% improvement (ineffective). (b1) Before treatment and (b2) after HMME-PDT. (b3) Deep type: reticular and linear vessels were seen under dermoscopy before treatment. (b4) Immediate fluorescence intensity at the lesion site after HMME-PDT where little fluorescence was detected in the lesion.

Machine learning development process

Data preprocessing

This study enrolled a dataset of 78 patients with PWS, including 9 variables (6 categorical and 3 continuous). The categorical variables were gender, subtype of lesion, PDL treatment history, IFI after HMME-PDT, dermoscopy vascular pattern, and the therapeutic effect. The continuous variables were age (integer), the interval between two treatment sessions (integer), and the FSASI score (decimal number). Depending on the presence of sequential relationships and the number of categories, one-hot encoding was used for unordered categorical variables, while ordinal encoding was applied to ordered categorical variables. During the data processing phase, missing values are detrimental to the generalization of predictive models and can reduce their practical utility to some extent. Therefore, it is essential to conduct appropriate data cleaning to address this issue20,21. In this study, data cleaning was necessary due to a small number of missing values in the dermatoscopy data. The data were categorical variables with values missing completely at random. The primary method used for data cleaning was mode imputation.

Algorithm training and validation

After data preprocessing, we used RFE with an XGBoost model as the estimator to iteratively remove the least important features22identifying the most significant and clinically relevant predictive factors. Based on the RFE feature selection results, the dataset of 78 PWS patients was randomly divided into training and testing sets with an 8:2 ratio using a random seed. This study applied a Bayesian optimization algorithm to select the optimal hyperparameters. We utilized the Gaussian Process as the surrogate model and the Expected Improvement (EI) as the acquisition function, with negative accuracy as the objective function return value. The hyperparameter search space for the XGBoost and RF algorithms was defined separately (Table 3). The iteration count was set to 100, and the termination criterion was reaching the maximum number of iterations (Supplementary Fig. 2–4). The synthetic minority over-sampling technique (SMOTE) method was employed (setting strategy = auto、k_neighbors = 3) for oversampling to balance the sample size disparity between groups. The prediction models were developed using XGBoost and RF methods, respectively. We comprehensively evaluated model performance and selected the optimal model by plotting the confusion matrix, receiver operating characteristic curve (ROC), and evaluation measures, such as Accuracy, Precision, Recall, F1-Score, and the area under the curve (AUC). The validation cohort was employed to assess the performance of both models using the same ways. And we interpreted the impact of each feature in the model using the Feature Importance and SHapley Additive exPlanations (SHAP) method by applying Tree Explainer23.

Table 3 Bayesian optimization search space and optimal values for hyperparameters in extreme gradient boosting (XGBoost) and random forest (RF) efficacy prediction models.

Statistical analysis

The baseline characteristics of the patients were compared between the different groups (Table 1, Supplementary Table 1). Categorical data were compared using Pearson’s chi-square or likelihood ratio chi-square tests, with results presented as frequency and percentage. Continuous variables were compared using the Mann-Whitney U test, and the results were expressed as M (Q1, Q3). The data analysis was conducted using SPSS 26.0 software. Feature selection, model training, validation, evaluation, and interpretation analyses were performed using Python 3.9.13 software. The primary data analysis packages included xgboost 1.7.6, scikit-learn 1.1.3, and shap 0.44.0. The model interpretation also utilized the built-in Feature Importance provided by XGBoost algorithms. A p-value<0.05 was deemed statistically significant.

Table 1 Patients’ baseline clinical characteristics analysis in the training cohort.

Results

Patient characteristics

This study enrolled 131 facial PWS patients, with 78 assigned to the training cohort and 53 to the validation cohort. The baseline characteristics in the training cohort are presented in Table 1. There were 36 male and 42 female patients, with the age ranging between 2 and 47 years. The treatment of training cohort was ineffective in 21 patients and effective in 57 patients. There were no statistically significant differences between the two groups regarding gender, lesion subtype, PDL treatment history, age, and the interval between treatment sessions (p > 0.05). In contrast, significant differences were observed between the groups in terms of the dermoscopy vascular pattern, immediate fluorescence intensity following HMME-PDT, and the FSASI score (p < 0.05). This suggests that these factors may have an impact on the treatment efficacy.

Feature selection

We performed feature selection using XGBoost-RFE to obtain meaningful features. Further optimization of the feature combination was achieved using 3-fold cross-validation. The accuracy of the RFE model using XGBoost as the estimator peaked when retaining 4 or 5 features (Fig. 2, Table 2). Ultimately, the algorithm opted for a concise subset comprising 4 features. At this stage, the important features were dermoscopy vascular pattern, IFI after HMME-PDT, the FSASI score, and age.

Fig. 2
figure 2

Flowchart of the recursive feature elimination (RFE) with an Extreme Gradient Boosting (XGBoost) model as the estimator (XGBoost-RFE). The four curves in the figure represent the performance of the XGBoost-RFE model in 3-fold cross-validation and its mean performance. Based on the blue curve, the accuracy peaked when 4 or 5 features were retained.

Table 2 Performance evaluation results of the XGBoost-RFE model with varying numbers of retained features.

Comparing machine learning performance

The Gaussian process was chosen as the surrogate model using the four important features previously selected. The acquisition function for Bayesian optimization within the search space described above was EI, and the optimal hyperparameter values were obtained (Table 3, Supplementary Fig. 2–4). The HMME-PDT efficacy prediction models were developed on the training set using XGBoost and RF algorithms under the optimal hyperparameter combination. After fitting the models on the test set, their performance was evaluated (Table 4). The overall accuracy of the XGBoost prediction model was 0.875. The weighted average precision, recall, and F1 score were more than 0.85, demonstrating its excellent fitting performance. The overall accuracy of the RF prediction model was 0.8125, with weighted average precision, recall, and the F1 score exceeding 0.80. These results were also considered excellent. We evaluated the accuracy of the prediction models for treatment efficacy using ROC curve analysis. The ROC curves of both models were significantly above the baseline of random classification, represented by the 45-degree diagonal line. The AUC values were 0.8638 for the XGBoost model and 0.8818 for the RF model (Fig. 3). In summary, both models exhibited an excellent predictive performance. However, the overall performance of the XGBoost model was better. Given its higher F1 score and accuracy, the XGBoost model had a better and more accurate performance predicting both effective and ineffective outcomes.

Table 4 Comparison of the performance of XGBoost and RF models by confusion matrix and evaluation measures.
Fig. 3
figure 3

Receiver operating characteristic (ROC) curve of (a) XGBoost and (b) Random Forest (RF) on the testing dataset.

Explanatory analysis of the prediction model

For the global interpretation, the feature importance scores of each feature ranked from highest to lowest in the XGBoost model were as follows: IFI after HMME-PDT, dermoscopy vascular pattern, FSASI score, and age (Fig. 4). We used SHAP to quantify the contribution of each feature to the prediction results. Based on their SHAP absolute values, the features ranked in descending order include dermoscopy vascular pattern, IFI after HMME-PDT, FSASI score, and age (Figs. 5 and 6). Additionally, we utilized SHAP for local interpretation of the data. The SHAP distributions between groups exhibited significant imbalance for categorical variables. Superficial type under dermoscopy and intense fluorescence intensity both had SHAP values > 0, indicting correlation with better efficacy (Fig. 7a-b). There was a clear trend in the distribution of the FSASI score and age data points (Fig. 7c-d). The SHAP values for the FSASI score increased with feature statistics and reached a peak, demonstrating a positive correlation with efficacy. And age was negatively correlated with efficacy.

Fig. 4
figure 4

Importance of each feature during the training process of the XGBoost model.

Fig. 5
figure 5

Ranking of the average absolute SHapley Additive exPlanations (SHAP) values for each feature. The x-axis represents the absolute value of the average SHAP values for each feature, indicating the degree of its influence on the prediction results. Higher SHAP values indicate a greater impact of the feature on predicting a specific outcome.

Fig. 6
figure 6

Overview of SHAP distribution for each feature in the XGBoost model for PWS. The x-axis represents the SHAP values attributed to each feature, and the y-axis lists the features in descending order of importance. Each point in the plot represents a patient, with the color indicating the feature value in the dataset: red denotes higher values, while blue represents lower values. Positive SHAP values suggest that the feature prompts the model to predict the therapy as effective, while negative SHAP values suggest the feature causes the model to predict the therapy as ineffective.

Fig. 7
figure 7

SHAP distribution of key clinical features in the XGBoost efficacy prediction model for PWS. (a) Dermoscopy vascular pattern; (b) IFI at the lesion site after HMME-PDT; (c) The facial port-wine stain area and severity index (FSASI) score; (d) Age. The x-axis represents the statistical values of each feature (with the x-axis in (a) and (b) showing the encoded values of the features), and the y-axis shows the SHAP values attributed to each feature. Each point on the plot indicates the marginal contribution of a sample to the model’s prediction. The color gradient, ranging from blue to red, illustrates the value of another feature that may interact with the primary feature displayed on the x-axis. The color scale is consistent across all subplots, with blue indicating lower values and red indicating higher values of the secondary feature.

Model validation

A comparison of baseline characteristics between the training cohort (n = 78) and validation cohort (n = 53) is presented in Supplementary Table 1. No statistically significant differences were observed between the two groups in terms of gender, age, history treatment of PDL, lesion type, IFI, and FSASI scores (p > 0.05). In the validation cohort, the AUC values were 0.7672 for the XGBoost model and 0.7557 for the RF model (Supplementary Fig. 1). The results show the comparison of F1 score, accuracy, precision, and recall among the two models (Supplementary Table 2). The XGBoost model achieved values greater than 0.73 for all of the above metrics and it had the better comprehensive performance.

Discussion

PWS is a congenital disfiguring condition with an incidence rate of approximately 0.3–0.5%. Although PDL is the gold standard clinical treatment for PWS, only 10-20% of patients achieve near-complete lesion resolution after multiple treatments, with a high risk of recurrence24. Therefore, efficient treatment strategies are needed to reduce or eliminate skin lesions in PWS patients. HMME-PDT was introduced into clinical practice in 2017 as a novel technology. Due to its superior efficacy and safety, HMME-PDT is increasingly used in the treatment of PWS in China. It is noteworthy that the efficacy of HMME-PDT is influenced by various factors, such as the type of PWS, dermoscopic pattern, lesion area, site and thickness, patient’s age at treatment, and previous history of PDL treatment. Numerous studies have evaluated and predicted the efficacy of HMME-PDT in treating PWS. However, a single, precise indicator for accurately predicting HMME-PDT efficacy in real-world settings still needs to be identified. This limitation makes it difficult for clinicians to achieve precise and personalized treatment based on patient characteristics.

Machine learning has demonstrated extensive potential applications in clinical research and medical practice by leveraging its advantages in accuracy, large-scale data processing, and self-learning optimization. It has improved diagnostic accuracy and extended the comprehensive management of complex diseases, meeting various medical needs. In this study, we developed machine learning-based prediction models for predicting the efficacy of HMME-PDT in PWS patients. This lays the foundation for developing an artificial intelligence system that can be widely applied in clinical practice, offering a new approach for early efficacy assessment and timely intervention.

XGBoost is an efficient supervised learning algorithm suitable for regression, classification, and ranking problems. It is based on the principle of Gradient Boosting Trees, continuously adding decision trees to correct prediction errors and approximate the true values. The algorithm uses parallel computing and optimization techniques to reduce training costs and automatically handle missing values, enhancing the model’s predictive capability25,26. The other algorithm RF is an ensemble learning method that constructs multiple decision trees and combines their prediction results to augment the model’s performance27,28. Although both XGBoost and RF are decision tree-based machine learning algorithms, they differ significantly in their computational processes. The RF algorithm uses the Bootstrap Aggregating method to perform random sampling and independently train decision trees in parallel. In contrast, XGBoost builds each decision tree based on the performance of the previous tree, creating a serial relationship between the trees. Furthermore, the XGBoost algorithm has low training costs, and fast computation speed and also offers a rich set of hyperparameters for optimizing performance. Although RF is a powerful ensemble learning method, its performance can be affected by certain factors. If there are fewer samples of certain classes in the dataset, RF may have difficulty adequately learning the features of the few classes. XGBoost can better handle the sample imbalance problem by weighting the loss function. In addition, XGBoost is able to automatically capture the non-linear interactions between features, while RF is weaker in this aspect, which may lead to a decrease in prediction performance29,30. These were also demonstrated in the present study.

This study primarily involved data preprocessing, feature selection, algorithm training and testing, and performance evaluation steps. Further validation was conducted using patient data from different time periods. After data preprocessing, the XGBoost-RFE method identified the dermoscopy vascular pattern classification, IFI after HMME-PDT, FSASI score, and age as predictive factors for HMME-PDT treatment efficacy in PWS. Two prediction models for assessing PWS therapeutic effect were developed using XGBoost and RF machine learning methods. Model performance was compared, with the XGBoost model showing a slightly lower AUC value but higher accuracy and F1 score compared to the RF model. Additionally, the XGBoost model demonstrated a more balanced performance between effective and ineffective groups. In the validation cohort, the XGBoost model outperformed the RF model, achieving higher values for all of the above metrics.

Machine learning algorithms have significant advantages in analyzing complex datasets and fitting potential relationships between features. However, they also inevitably need help with the drawback of poor interpretability31. This study utilized feature importance analysis and SHAP to explain the decision-making process and outcomes of the model to understand the model32. We conducted global and local analyses to assess the impact and positive or negative correlations of each clinical feature on the prediction model. The feature importance analysis and SHAP results were inconsistent in this study. The rankings of dermoscopy vascular pattern and IFI after HMME-PDT varied, possibly due to differences in the theoretical foundations of both assessment methods employed. Feature importance analysis evaluates the importance of features during node splitting in the model. It is often computed from the gain from node splitting, which measures the improvement in model performance after splitting on a particular feature. This evaluation reflects the degree to which a feature influences model performance during the decision tree construction33. On the other hand, SHAP takes the interactions and nonlinear relationships between features into account. It calculates the weighted sum of contributions from each feature to the overall model based on their permutations and combinations. This approach emphasizes the importance of the impact of features on the actual predictive outcomes of the model, making it more relevant to clinical research and practical applications. Given these advantages, our feature ranking was primarily based on SHAP.

In local interpretations based on SHAP, when superficial dermoscopic vascular patterns and strong fluorescence intensity both have SHAP values > 0, the treatment efficacy is generally higher. Previous studies have shown that superficial vascular patterns in PWS correlate with better treatment responses34,35. Furthermore, a prospective cohort study of 163 PWS patients found a significant correlation between PDT efficacy and dermoscopic vascular patterns. Dot-like and short rod-like vascular patterns were highly associated with beneficial outcomes and cures, assisting in predicting treatment efficacy10. Consistent with these reports, our study found that superficial patterns correspond to better treatment outcomes. Hence, the dermoscopic assessment of vascular features in PWS helps to predict the response to PDT and manage patient expectations.

Fluorescence intensity related to PDT has been previously proposed as an indicator for observing treatment response in clinical settings. For example, Wang et al. established a fluorescence method by measuring real-time fluorescence spectra of the skin during PDT to determine the local concentration of the photosensitizer. A therapeutic effect correlation index (TECI) was proposed as the area under the photosensitizer concentration-time curve during PDT. The correlation between TECI and PDT outcomes in 31 PWS patients revealedthat fluorescence spectroscopy can monitor the concentration of the photosensitizer in the skin during PDT and predict treatment response11. However, it has not yet been widely adopted in clinical practice due to the complexity of this technique. Therefore, to the best of our knowledge, our work is the first to propose IFI at the lesion site after HMME-PDT as a new indicator while evaluating its role in predicting therapeutic efficacy.

Our results also indicated that the IFI after HMME-PDT was the second most important factor for HMME-PDT. When the fluorescence intensity was categorized as intermediate or intense, all the SHAP values were greater than 0, indicating a significant impact on model prediction and association with better therapeutic outcomes. To address the varying fluorescence patterns of skin lesions among different patients, we hypothesize the following possible mechanisms. There is variability in the accumulation of HMME due to factors such as heterogeneity in vascular abnormalities of the lesions and individual patient differences36. After PDT, HMME and its photobleaching products can absorb UVA energy and emit fluorescence. Hence, variations in HMME accumulation result in differences in fluorescence intensity. To summarize, our study indicates that higher IFI after HMME-PDT is associated with better therapeutic outcomes. However, the specific mechanisms underlying the fluorescence response and associated influencing factors require further investigation.

The FSASI score is an area and severity index scoring system for facial PWS that divides the face into four regions: forehead, right malar, left malar, and perioral. It provides a comprehensive assessment of PWS lesions based on the percentage of the area affected, lesion color, and thickness. The total scores range from 0 to 42, with higher scores indicating greater severity of the condition. As the facial PWS lesions are lightened, the FSASI scores gradually decrease19. SHAP analysis indicated that the FSASI score significantly impacts the PWS efficacy prediction model, and the SHAP values for individual samples show a positive correlation with the outcome for PWS. Previous studies have shown that larger areas and greater thickness of PWS lesions are related to poorer curative effects37,38. However, our results are contrary to these reports. This discrepancy might be attributed to smaller, lighter, and thinner lesions showing little changes in the FSASI scores after a single HMME-PDT session. The difference may also reflect a limitation of the FSASI scoring method. Future research should focus on developing more objective and practical efficacy assessment methods for clinical use.

The age of PWS patients undergoing PDT also influences the efficacy. Studies have shown that treatment outcomes for PWS are better in patients under the age of 1839. Our study also reveals that younger PWS patients may benefit more from HMME-PDT, which aligns with existing research. This result suggests that early interventions can improve treatment outcomes.

This study presents the IFI after HMME-PDT as a novel metric and applies it to predict the efficacy of HMME-PDT in PWS. As a direct indicator of the distribution and activation status of the photosensitizer in the lesion tissue, fluorescence intensity may be closely associated with treatment outcomes. Our study innovatively employs two machine learning algorithms to construct predictive models for the efficacy of HMME-PDT for PWS. The prediction models can help physicians to better predict patients’ response to treatment, thereby optimizing the treatment plan and improving treatment outcomes. In addition, our study provides new ideas for building predictive models for treating PWS, especially by applying machine learning algorithms.

Our study had some limitations. As a retrospective study, the potential biases in the data collection process may lead to a decrease in model stability and an increased risk of overfitting. Furthermore, this was the first study to explore the IFI after HMME-PDT to assess and predict the efficacy of PWS. Thus, our study was limited by a relatively small sample size and the lack of external validation data. This may limit the model’s ability to fully understand and learn from the data, leading to increased instability and a potential risk of overfitting. These limitations underscore the necessity for future research, particularly the conduct of large-scale, multi-center, prospective cohort studies, to further validate our findings and provide more robust and reliable evidence to support the treatment of PWS.

Conclusions

Our study utilized the XGBoost-RFE feature selection method to identify four key clinical features and built two predictive models for the therapeutic effect of HMME-PDT in PWS patients using two different machine learning algorithms. The XGBoost prediction model demonstrated better performance. Furthermore, for the first time, we proposed and explored the relationship between IFI at the lesion site after HMME-PDT and the treatment efficacy. Future studies could increase sample size and include multicenter data to validate model generalizability and robustness. This would allow physicians to quickly assess patients’ treatment responses in clinical practice.