Introduction

Facial nerve palsy (FNP) is classified as either central or peripheral, depending on the suspected primary site of the disease. Central FNP results from upper motor neuron injuries affecting structures such as the cerebral cortex, pons, and corticobulbar tract1,2. In contrast, peripheral FNP occurs due to facial nerve injury, causing paralysis on the ipsilateral side of the face, including the forehead2.

Peripheral FNP has diverse causes, including idiopathic factors, varicella-zoster virus infection, chronic otitis media, and trauma. The most common cause is idiopathic, referred to as Bell’s palsy. A study reported that the monthly prevalence of Bell’s palsy ranges from 7.7 to 9.1 per 100,0003. Since viral infection and inflammation are considered potential causes of Bell’s palsy, steroids are widely used for treatment1,4. Although most cases of Bell’s palsy resolve successfully, some patients experience incomplete recovery despite medical therapy1,4,5. Surgical treatment may be considered in cases of severe FNP with minimal likelihood of recovery, even after medical management1,4,6. However, its efficacy remains controversial, and it carries risks of postoperative complications and residual sequelae6,7.

Various evaluation tools have been developed to assess the current status of FNP, improve prognostic prediction, and guide early rehabilitation or more intensive treatment. To date, more than 10 grading systems, including the House–Brackmann facial paralysis scale (H–B grade)8 and the Sunnybrook facial grading system, have been introduced. The H–B grade system, developed in 1985, is the most widely used8. Electrophysiologic evaluations, including electroneurography (ENoG) and electromyography (EMG), have been used for assessment and prognostic evaluation. However, most tests have limited prognostic value due to poor reliability9. Notably, ENoG is considered a valuable tool for determining the need for surgical intervention in Bell’s palsy and predicting prognosis10. Despite this, ENoG has limitations in predicting recovery rates or the natural course of moderate FNP11. Given that most recovery occurs within the first 3 weeks and approximately 80% of Bell’s palsy cases achieve favorable outcomes4,5,12,13, predicting the prognosis of non-recovered cases at an early stage may support clinical decision-making.

Machine learning algorithms have emerged as promising tools for predicting clinical outcomes across various diseases using historical medical data. A previous study demonstrated the feasibility of predicting hearing recovery with a deep learning model14. Similarly, studies have demonstrated that Recurrent Neural Network (RNN) models can predict visual field recovery15,16. A recent study introduced a strategy using explainable deep learning approaches with attention-based mechanisms and the Synthetic Minority Oversampling Technique (SMOTE), which exhibited good performance in predicting brain tumors and Alzheimer’s disease by addressing data imbalance and improving both model stability and interpretability17. Efforts have also been made to apply deep learning to FNP, with some studies exploring models for the diagnosis and grading of FNP18,19. However, machine learning models for predicting the prognosis of FNP remain largely unexplored.

Herein, this study aims to predict the early prognosis of FNP using clinical data obtained during the early stages after onset, leveraging a machine learning model.

Materials and methods

Subjects inclusion and data curation

We retrospectively reviewed the electronic medical records of 407 patients who visited the clinic or emergency room for FNP between January 1, 2010, and March 31, 2024. Initially, 390 patients were selected who had FNP grades at the initial visit, follow-up information, age between 10 and 100 years, and available data on sex, diabetes, and hypertension. Among them, 279 patients with records of at least two visits with FNP grade evaluations within 30 days of onset and at least one FNP grade evaluation between 30 and 300 days after onset were selected as the study subjects. The subjects were further categorized to create a main dataset for developing the machine learning models and a validation dataset for model evaluation. In consideration of previous findings on sample size recommendations (approximately 1000 for LSTM and 500 for SVM)20,21,22,23,24,25 and the necessity of applying temporal and external validation to prevent information leakage and to simulate clinical circumstances under which predictions would be made for future patients26,27,28, January 1, 2024, was set as the cutoff date (Fig. 1). Consequently, the training and test datasets were constructed with 267 subjects, and the validation dataset was constructed with 12 subjects. Furthermore, the dataset for predicting recovery of unrecovered patients within 30 days (persistent FNP) was constructed from data obtained from 151 subjects.

Fig. 1
Fig. 1
Full size image

Subjects inclusion. FNP, facial nerve palsy; H–B grade, House Brackmann grade.

Information on demographic factors, and the medical history of FNP was retrieved from electronic medical records. Since delayed treatment (treatment starting more than 7 days after onset), diabetes, and hypertension have been identified in studies as important risk factors for poor FNP prognosis, these features were also retrieved to construct the machine learning models29,30,31,32,33. This study was approved by the Institutional Review Board of Seoul National University Boramae Medical Center (IRB No. 10-2022-90). Given its retrospective, nature, the Institutional Review Board of Seoul National University Boramae Medical Center waived the requirement for informed consent. All methods were carried out in accordance with relevant guidelines and regulations. This study was conducted in accordance with the STROBE statement and the TRIPOD statement.

FNP grading system

The H–B grade system was used to evaluate the grades of FNP at each clinic visit (suppl. Table 1)8. In this study, ambiguous H–B grades, described as ranges between adjacent grades in the electronic medical records, were defined by adding 0.5 to the lower grade. A grade of II or better was considered acceptable recovery. The mean H–B grade was calculated by averaging the numerical values of the H–B grades for each patient.

Data preprocessing

Missing values and outliers were excluded during the subject inclusion process prior to data preprocessing. For each patient, di and gi represent the date of the ith clinic visit and the corresponding H–B grade, respectively. d1 was defined as the day before FNP onset with g1 assigned a grade of 1, and the first hospital visit date, d2, was set as day 0. Overlapping subsequences of three clinic visits within the first 30 days were constructed, such as (d1, d2, d3), (d2, d3, d4), …, (dt−2, dt−1, dt), with dk and gk representing the time point and grade after 30 days. Each subsequence (di, di + 1, di + 2) was used to generate features (di, di + 1, di + 2, dk, gi, gi + 1, gi + 2) for machine learning models. Regression models used both time and grade after 30 days, whereas classification models used only the final outcome.

To standardize the time points by setting the present time point \({d}_{i+2}\) to 0, the Selected Assessment Days (SAD1–SAD4) were defined as di−di + 2, di + 1−di + 2, 0, and dk−di + 2, and the Selected Assessment Grades (SAG1–SAG4) were defined as gi, gi+1, gi+2, and gk. Additional features included gender, age, hypertension, diabetes, and delayed treatment (treatment starting more than 7 days after onset).

For the classification task predicting patient recovery, patients were labeled as recovered (1) if gk was less than or equal to 2, and not recovered (0) if gk was greater than 2. Eleven features—gender, age, hypertension, diabetes, delayed treatment, SAD1, SAD2, SAD4, SAG1, SAG2, and SAG3—were used to construct the model (Suppl. Fig. 1). For regression models predicting H–B grade at specific time points, the 11 features were used, with two out of hypertension, diabetes, and delayed treatment included along with gender, age, SAD1–SAD4, and SAG1–SAG3 (Suppl. Fig. 1).

Machine learning models for predicting the prognosis of FNP

Model construction

The predictive models designed and implemented were developed using Python 3.11. For the classification model aimed at predicting patient recovery, scikit-learn 1.5.2 was utilized. For the regression model aimed at predicting the patient’s H–B grade, TensorFlow 2.18.0 and Keras were employed. To ensure reproducibility, all experiments were performed with fixed random seeds (random_state = 42 in scikit-learn and tf.random.set_seed (42) in TensorFlow).

Exploratory analysis of feature importance

To identify the key variables affecting model performance, permutation importance analysis was conducted for all subjects using Support Vector Machine (SVM)34 with an RBF kernel. SAG1 was excluded from this analysis because it had a constant value of 1. All features were standardized to have a mean 0 and a variance of 1 before training. Permutation importance was computed on the entire dataset with accuracy as the metric, repeating 100 random permutations for each feature. This exploratory analysis focused on assessing relative feature influence rather than model generalizability.

Classification model for acceptable recovery of facial nerve palsy

SVM34 was utilized to predict the recovery of FNP patients. Acceptable recovery was defined as grade II or below within 300 days for patients. A total of 668 data points were obtained after preprocessing from 267 subjects. The dataset was split into training (N = 534) and test (N = 134) sets in an 8:2 ratio. An independent sample t test was conducted between the training and test data, confirming that there was no statistically significant difference between the two groups (Suppl. Table 2). All values were normalized to a mean of 0 and a variance of 1. The same analysis was conducted on 393 data from 151 subjects who did not show recovery from FNP within 30 days, with the dataset split into training (N = 314) and test (N = 79) sets in an 8:2 ratio (Suppl. Table 3).

The SVM model was tuned with efficient hyperparameters using the grid search algorithm provided by scikit-learn. Additionally, fivefold cross-validation was performed on the training data to prevent overfitting. In the grid search, the range of the regularization parameter was set from 0.01 to 30, while the range of gamma was set from 0.001 to 10. For the model involving all subjects, parameters were selected based on the highest accuracy.

The selected parameters were C = 2.7 and gamma = 0.09. Additionally, the radial basis function (RBF) kernel was used. For the classification models of patients with persistent FNP, the selected parameters were C = 5.6 and gamma = 0.09 (Suppl. Table 4). The RBF kernel was also used. The evaluation metrics for the model were accuracy, sensitivity, and F1-score, based on the Confusion Matrix.

Regression model for predicting the recovery course of facial nerve palsy

Long Short-Term Memory (LSTM)35,36 was implemented for modeling the prediction of the recovery course of FNP. After preprocessing, 1,253 data were obtained from 267 patients. The dataset was split into training (N = 1002) and test (N = 251) sets in an 8:2 ratio. An independent sample t test was conducted between the training and test data, confirming that there was no statistically significant difference between the two groups (Suppl. Table 5). The same analysis was conducted for the 151 patients with persistent FNP, with the data (N = 869) split into training (N = 695) and test (N = 174) sets in an 8:2 ratio. The training and test data showed no statistically significant differences in demographic factors, underlying diseases, clinic visit timing, and the distribution of H–B grades between the two groups (Suppl. Table 6). The validation data consisted of 33 day-grade pairs from 12 individuals and was used to validate the regression model of recovery from FNP for all subjects.

The model consists of three LSTM layers and one output layer. The input data is structured as a 4 × 3 matrix, representing two past time points, one present time point (SAD3 = 0), and one future time point (SAD4 > 0). Each time point includes the corresponding H–B grade, while the H–B grade of the future time point (the prediction target) is left blank so that the model can learn to generate predictions during training. Additionally, the input includes four key features that are highly related to FNP. These features consist of gender and age, which are essential factors known to influence FNP, along with two additional features selected from delayed treatment, diabetes, and hypertension, and, forming three different feature combinations.

The first LSTM layer consists of 64 units, the second has 32 units, and the third has 16 units. All layers use the tanh activation function, 20% dropout to prevent overfitting, and batch normalization to enhance training stability. The output layer is a Dense layer, designed to produce a single continuous value. Mean Squared Error (MSE) was used as the loss function to minimize the difference between predicted and actual values. Additionally, MSE was also used as a performance evaluation metric for further monitoring. The model was optimized using the Adam optimizer with a learning rate of 0.005.

During training, 10% of the total data was set aside as a validation set to continuously evaluate the model’s performance throughout the process. To efficiently derive the optimal model, Early Stopping and ReduceLROnPlateau were applied.

When using all data, the batch size was set to 128. Training was stopped if the validation loss did not improve for 40 epochs, and the learning rate was reduced by 50% if the validation loss did not improve for 20 epochs. For the subjects with persistent FNP, the batch size was adjusted to 64 due to differences in dataset size. In this setting, training was stopped if the validation loss did not improve for 80 epochs, and the learning rate was reduced by 50% if the validation loss did not improve for 30 epochs. To prevent excessive reduction in the learning rate, the minimum learning rate was set to 1e-6. Detailed model configurations and hyperparameters are provided in Supplementary Table 7. The model maintains the same structure regardless of the feature combination and is designed to perform consistently across all three combinations (Suppl. Fig. 2).

The model was evaluated using Root Mean Squared Error (RMSE), MSE, and Mean Absolute Error (MAE) as performance metrics.

Results

Demographics and clinical characteristics

The mean age of the subjects in the training set was 56.57 ± 17.59 years. There were 136 male subjects (50.94%) and 131 female subjects (49.06%). The mean H–B grade was 3.25 ± 0.84, with 18.35%, 40.45%, 33.33%, and 7.87% classified as H–B grades II, III, IV, and V, respectively (values represented within the range were classified as severe grades). The mean first visit date after FNP occurrence was 2.35 ± 3.11 days, and the mean follow-up duration was 88.84 ± 49.09 days. Of the subjects, 64.79% received treatment during the first week, while 35.21% started steroid administration 1 week after onset. Among all subjects, 25.09% had diabetes, and 31.09% had hypertension (Table 1).

Table 1 Demographic factors and clinical characteristics of the subjects.

For the 12 subjects (4 males and 8 females) in the validation set, the mean age was 58.67 ± 19.15 years. Their detailed values for demographic factors, underlying diseases, H–B grades, delayed treatment status, and follow-up patterns are described in Table 1.

Clinical course of facial nerve palsy

The changes in FNP over time were examined and plotted for all 1425 day-grade pairs. FNP was abruptly aggravated during the first week after onset, followed by gradual recovery over time. The most severe grade occurred in the latter half of the first week, with a mean H–B grade of approximately 3.5. By the fifth week, the FNP grade had improved to around H–B grade II, which represents an acceptable treatment outcome for FNP (Fig. 2).

Fig. 2
Fig. 2
Full size image

Clinical recovery course of facial nerve palsy. FNP, facial nerve palsy; H–B grade, House Brackmann grade.

Prediction of prognosis in facial nerve palsy using artificial intelligence

Feature importance

Among the features used to construct the models, SAG3 had the highest importance at 24.43%, followed by SAD4 (13.72%). Gender (10.57%), hypertension (10.01%), age (9.57%), and diabetes (7.58%) showed relatively higher values, while the other H–B features exhibited lower importance (Fig. 3).

Fig. 3
Fig. 3
Full size image

Feature importance analysis was performed using clinical features. The analysis was conducted using permutation importance for all subjects. For each feature, 100 random permutations were performed. SAD, selected assessment day; SAG, selected assessment grade; Tx, treatment.

Classification models

The classification models, considering H–B grades from two serial visits within 30 days after onset, age, gender, delayed treatment, diabetes, and hypertension, and (Suppl. Table 3), showed an accuracy of 0.903 (95% CI = 0.853–0.953), a recall of 0.991 (95% CI = 0.970–1.000), and an F1-score of 0.944 (95% CI = 0.912–0.975) (Fig. 4).

Fig. 4
Fig. 4
Full size image

Prediction scores (A) and confusion matrices (B) of classification models for predicting acceptable recovery from facial nerve palsy. Prediction scores (C), and (D) Confusion matrix for those who had not recovered from facial nerve palsy until 30 days after the initial visit.

The same analysis was performed for individuals with persistent FNP. The classification model achieved an accuracy of 0.848 (95% CI = 0.769–0.927), a recall of 0.946 (95% CI = 0.885–1.000), and an F1-score of 0.898 (95% CI = 0.838–0.950) (Fig. 4).

Regression models

H–B grades from two consecutive visits within 30 days after onset, along with age and gender, were used as features in all regression models. Additionally, two of the following variables— delayed treatment, diabetes, and hypertension—were included as features in model 1 (delayed treatment and diabetes), model 2 (delayed treatment and hypertension), and model 3 (diabetes and hypertension). The specific values of each feature are described in Supplementary Table 5. Model 1 had an MAE of 0.478 (95% CI = 0.415–0.550) and an MAE of 0.460 (95% CI = 0.323–0.606) in the validation data, while Model 2 showed worse MAE values (0.487, 95% CI = 0.420–0.559) and MAE values (0.581, 95% CI = 0.431–0.760) in the validation data compared to Model 1. Model 3 had an MAE of 0.458 (95% CI = 0.401–0.532) and an MAE of 0.481 (95% CI = 0.345–0.619) in the validation data (Table 2). The estimated results with Model 3 for the test data were the most accurately predicted, except for H–B grade 5 on SAD4 (Table 2, Fig. 5), showing a similar course with minimal gaps in training loss and validation loss (Suppl. Fig. 2). The estimated course using the validation data also featured a similar progression to the natural course of FNP (Fig. 6) with a low MAE (Table 2).

Table 2 Performance comparison of regression deep learning models based on features.
Fig. 5
Fig. 5
Full size image

Predicted values of the House–Brackmann grade on the fourth selected assessment day using regression models constructed with age, gender, diabetes, and hypertension in the test set. The number of test data sets was documented in gray bar. The predicted mean H–B grade is represented by a blue dot, while values exceeding two standard deviations are marked with black circles. SAG, selected assessment grade.

Fig. 6
Fig. 6
Full size image

The predicted recovery course from facial nerve palsy using long short-term memory using features including age, gender, hypertension and diabetes in validation dataset.

For individuals with persistent FNP, the same analysis was conducted. The MAE of Model 1, Model 2, and Model 3 were 0.608 (95% CI = 0.515–0.701), 0.680 (0.591–0.777), and 0.637 (0.533–0.747) in the test data (Table 2).

Discussion

The present study demonstrated that changes in H–B grades in FNP patients after initial recovery could be successfully estimated using the regression and classification models designed based on SVM and LSTM. The classification models showed good accuracy for all patients. In addition, as shown in Figs. 4 and 5, our machine learning models accurately predicted the clinical course of FNP, showing favorable MAE values for all patients, including those who had not recovered by the first month. Based on these models, grades at specific time points can be predicted, and the expected recovery rates can be derived from three consecutive evaluations of grades.

This study proposed machine learning models that serve as potential decision-support tools by estimating the recovery course of FNP, thereby assisting clinicians in facilitating early rehabilitation or cosmetic interventions for facial reanimation and in guiding the determination of aggressive treatment plans, such as facial nerve decompression4. Considering that the majority of FNP recovers within the first 3 weeks, most patients do not receive rehabilitation. In contrast, patients who do not fully recover tend to be lost to follow-up loss and seek alternative treatments, such as acupuncture and moxibustion4. Given that appropriate rehabilitation for FNP can improve the recovery rate, providing consultation and rehabilitation based on predicted recovery potential is crucial for patients with FNP37,38. Additionally, FNP can result in high socio-emotional costs, and consultation for reconstructive intervention should be provided to patients predicted to have an unfavorable prognosis4. However, existing evaluation tools for FNP have limitations in predicting the recovery course, which may delay appropriate interventions9. In this regard, the results of this study may be helpful for clinicians in providing appropriate consultation based on the expected changes and recovery of FNP.

The accuracy of the classification model was 0.903 for all subjects, with a recall of 0.991. Given that other estimation tools have poor reliability and ENoG is limited in predicting the prognosis for mild or severe FNP9,11, our models can predict with better accuracy than conventional tools. Surprisingly, the recovery estimation was successfully maintained with an accuracy of 0.848 and a recall of 0.946 for patients with persistent FNP. Furthermore, the prediction method was developed using serial H–B grades from three clinic visits and did not require additional examinations or discomforting procedures. Using the classification model, physicians can predict successful recovery without additional time, cost, or procedures, while maintaining good prediction rates.

Prediction of the patient’s H–B grade is a regression problem, where the goal is to predict the exact H–B grade at a specific time point. Since a patient’s condition changes over time, it is important to track how it develops. The MAE of our regression models was about 0.458 after adding diabetes, hypertension, age, and gender as features in the machine learning model. Since the grade consists of ordinal values, an MAE of 0.458 in model 3 may indicate that the predicted values are similar to the actual values. Furthermore, those with persistent FNP also showed good prediction for H–B grades at the fourth visit, with a MAE of approximately 0.61 when considering demographics, diabetes, and delayed treatment. Based on the regression model, the estimated natural course of FNP can be outlined. Therefore, the classification model and regression model of this study can be helpful in explaining the natural course of personalized recovery, reassuring patients, and consulting on further treatment.

According to the clinical course of FNP described in Fig. 2, the mean values of the H–B grades showed successful recovery about 1 month after onset. However, the present study revealed that some patients did not recover within 1 month after the initial visit. Surprisingly, our machine learning model also showed good predictions for these patients, in terms of grade changes and acceptable recovery rates. Based on the machine learning models from our study, clinicians may benefit from more accurate predictions of facial palsy and provide appropriate interventions for patients with FNP without the need for complex evaluations.

Some previous studies identified diabetes and hypertension as risk factors for FNP and found them to be associated with poor prognosis29,30,31. Delayed treatment was also a significant risk factor for the poor prognosis of FNP32,33. Therefore, considering these factors when predicting the prognosis of FNP may improve classification accuracy, contributing to the higher accuracy observed in our results. Based on the results of the present study, LSTM incorporating diabetes and hypertension performed best for all subjects, which was consistent with the findings from permutation importance, while the inclusion of delayed treatment and diabetes as features yielded the highest accuracy for subjects with persistent FNP. However, our study showed inconsistent results between the training set and the validation data for all subjects. This might be due to delayed treatment not being present in the validation data. Further studies with larger validation data sets may provide results with higher accuracy and greater consistency with the training set.

The findings from this study demonstrated that SAG4 was most significantly associated with SAG3, followed by SAD4. As shown in Fig. 2, the typical natural course of FNP remains relatively stable over time, exhibiting a regression pattern. Therefore, the grade from the most recent evaluation and the date of evaluation are likely to have a significant impact on predicting a given point39.

In addition, age also showed a higher association with SAG4. Consistent with the present study, previous research showed a relationship between facial palsy recovery and age5. With increasing age, neural regeneration becomes limited40, endothelial function and capillary density decline41,42, low-grade inflammation persists43, muscle atrophy accelerates44, recovery of neuromuscular junctions diminishes44, and denervation atrophy progresses more rapidly44, all of which may contribute to poorer facial nerve recovery in older individuals. Therefore, as suggested in the present study, poorer outcomes in older individuals should be anticipated by taking additional prognostic factors into account, and prompt, diverse therapeutic strategies should be implemented accordingly.

In the present study, gender was considered a potential factor for predicting FNP recovery. However, its role as a prognostic indicator remains controversial in previous studies45,46,47,48. One study reported that in women, earlier treatment and younger age were associated with better recovery47, whereas other studies found no significant association45,46,48. Evidence from animal and cellular studies suggests that females may exhibit faster or more effective nerve regeneration than males due to a combination of factors49,50,51,52. Enhanced Schwann cell activity, faster remyelination, a supportive distal nerve microenvironment, and estrogen contribute to this effect49,50,51,52. Given the limited research on gender differences in FNP recovery, further studies are warranted.

This study limited the subjects to Bell’s palsy, which usually has a favorable prognosis. Expanding the study to include Ramsay Hunt syndrome, which has a poorer prognosis, with a larger sample size may provide valuable insights into predicting recovery in intractable peripheral FNP associated with Ramsay Hunt syndrome.

The present study predicted future recovery and patient grades based on time-series data using SVM and LSTM. SVM is a widely used machine learning method based on the well-established mathematical concept of the margin of the classification hyperplane34. Furthermore, LSTM, a type of RNN, is highly effective for handling sequential data35,36. It was specifically designed to overcome the vanishing gradient problem, which traditional RNNs face during training. LSTM utilizes a cell state and a gate mechanism to selectively retain important information for a long period while discarding unnecessary information. This enables the model to learn both long-term and short-term dependencies within the data.

The limitations of this study include the small sample size and the limited dataset available for applying machine learning models. Although many patients with FNP visited the clinic, a significant number were lost to follow-up after favorable recovery, and some pursued alternative treatments after completing initial therapy. Consequently, the number of subjects—particularly in the validation set—was limited, restricting the evaluation of the classification model for patients with persistent FNP and potentially affecting the stability and generalizability of the results. In addition, patients with H–B grade 5 were not accurately predicted on SAD4 due to the low representation of such cases. Moreover, the dataset was imbalanced across classes because of the limited number of patients available for balanced sampling, and artificial balancing procedures, such as data synthesis or resampling, were not applied in order to preserve medical reliability. Despite this limitation, the present study demonstrated the potential of deep learning for predicting facial palsy. The distributions of the training and test datasets were similar, and the overall predictive performance of the model was satisfactory. Considering that the primary clinical objective of this study was the identification of recovered patients, the high recall for the recovery class indicates clinically acceptable discriminative performance, even with slight bias toward the majority class. As accuracy and mean absolute error improve with larger sample sizes, training and validating models on larger, more balanced datasets may provide better predictions for these patients. Further large-scale studies are warranted to confirm the clinical feasibility of this approach.

Conclusion

The present study demonstrated that changes in FNP grades can be accurately estimated, achieving an accuracy of 0.903 for predicting acceptable recovery and a MAE of approximately 0.46 for each clinic visit, using data from serial H–B grades from consecutive clinic visits and clinically relevant features. Given that FNP results in significant social and emotional costs, appropriate intervention is necessary based on the estimated prognosis. Our machine learning model may assist clinicians in predicting prognosis and providing appropriate consultation and treatment for FNP patients.