Introduction

The best way to manage vesicoureteral reflux (VUR) in infants is under discussion1. Initially, the standard of care for most affected children is low-dose prophylactic antibiotic therapy to prevent febrile urinary tract infections (fUTIs)2. However, medical treatment increases the probability of developing antibiotic-resistant bacteria3. On the other hand, surgical interventions are mainly favored for high-grade VUR, patients with breakthrough UTIs, allergic reactions to antibiotics, poor patient compliance, deterioration of renal scars, and occasionally due to the parents’ decision4.

Since FDA approval of Deflux® for VUR, endoscopic treatment has emerged in some centers as the first-line treatment when surgery is necessary5. Generally, patients with high-grade VUR, breakthrough UTIs, and deterioration of renal scars are selected for surgical intervention. However, the selection of patients who should undergo surgery or continue their continuous antibiotic prophylaxis (CAP) may be more efficient by using more effective criteria. Also, referring some patients who will not be successfully treated with CAP to undergo surgical intervention may be predicted and done in earlier visits by pediatricians/nephrologists.

Here, we investigated the independent risk factors influencing the outcome of CAP in VUR, trained a model to predict the outcome, and evaluated which infants will be most likely to benefit from CAP and which should be referred for endoscopic VUR correction during their first visits.

Materials and methods

Study design and participants

A retrospective study was conducted in infants ≤ 2 years of age with a diagnosis of VUR between 2009 and 2022, recruited from two separate centers; 115 patients in the CAP group from a pediatric nephrology clinic and 110 patients in the endoscopic treatment (ET) group from a pediatric urology department. This study focused on the CAP group, while the patient information, treatment details, and outcomes for the ET group are provided in Supplementary Material 1.

All patients in the CAP group without a history of anti-reflux procedures received CAP. Additionally, radionuclide cystography (RNC) performed ≥ 12 months after the diagnosis of VUR was necessary to enroll in the study. The main goal of CAP was to prevent febrile UTIs. CAP included administering low doses of antibiotics (one-third to one-fourth of the therapeutic doses), which were recommended at night before sleeping. Antibiotics with high urinary excretion, including trimethoprim-sulfamethoxazole, nitrofurantoin, cefalexin, and ciprofloxacin, were the main choices.

Data collection

The data collected included gender, age at diagnosis, medications, uni- or bilaterality of VUR, dimercaptosuccinic acid (DMSA) differential renal function, VUR grade, dilating or non-dilating reflux in ultrasonography, and presence of fUTI, prenatal hydronephrosis, ureteral anomaly, bladder dysfunction, neuropathic bladder, failure to thrive, and renal scarring.

The international system of radiographic grading of VUR was applied for VUR grading in voiding cystourethrography (VCUG)6. In patients with bilateral VUR, the maximal VUR grade was reported. The term dilating VUR was used to describe the dilation of renal pelvises and/or ureters in ultrasonography. Bladder dysfunction was defined only in neurologically intact patients who underwent urodynamic studies. Conversely, neuropathic bladder was defined in children with neurological disorders. Renal scarring was determined by pre-treatment DMSA.

Follow-up of the patients

Urine analysis and culture were recommended at intervals of three months and in cases of fever. After the diagnosis of VUR, patients were advised to undergo RNC at intervals of every 12 to 18 months. Additionally, DMSA was performed in the presence of fUTI. The follow-up duration was recorded as the time between the diagnosis of VUR in VCUG and the last RNC requested for the patient.

Clinical outcomes

The primary outcome was the occurrence of fUTI after the start of treatment, new renal scarring, or an increase in the stage of an old scar in the follow-up. fUTI was determined by the existence of leukocyturia (white blood cells ≥ 5 cells on a high-power field) or a positive dipstick for leucocyte esterase in urinalysis and the growth of one microorganism with colony-forming units (CFU) ≥ 105/ml accompanied by a body temperature ≥ 38.5 °C.

The secondary outcome was persistence, improvement, or resolution of VUR at follow-up RNCs. RNCs were reported as no, mild, moderate, and severe VUR. To compare the pre-treatment VCUGs with follow-up RNCs, grades I and II in VCUGs were considered equivalent to mild, grade III equivalent to moderate, and grades IV and V equivalent to severe VUR. Therefore, in the presence of unilateral VUR, improvement was defined as the decrease of the VUR grade in the side of reflux without developing de novo VUR in the other side (e.g. pre-treatment VCUG grade IV to mild or moderate VUR in follow-up RNC). In bilateral VUR, improvement was defined as the decrease of VUR grade in both kidney-ureter units (KUUs) or the decrease of VUR grade in one KUU and reflux resolution in the other KUU.

Statistical analysis

The patients’ data were analyzed with the aid of the Statistical Package for the Social Sciences (SPSS) version 26 and Python version 3.10 and its packages, including scikit-learn. The normality of the variables was evaluated with the Shapiro-Wilk test. The quantitative variables were described by mean ± standard deviation (SD) for the variable with normal distribution and median and interquartile ranges (IQRs) for variables with non-normal distribution. Qualitative variables were described by frequency and percentage (%). Binary logistic regression was used for univariable and multivariable regression analysis. A p-value < 0.05 was regarded as statistically significant.

Machine learning algorithm

Our aim was to train machine learning-based models to predict the outcome of CAP in patients with VUR. Therefore, patients were divided into training data (75% of patients) and validation data (25% of patients). Training data were used to train the models and validation data to test them. We utilized five commonly used models: logistic regression, random forest, support vector machine (SVM), gradient boosting, and fully connected neural network. Six-fold cross-validation was used for each machine learning algorithm to increase confidence in results.

Pre-processing

The variables with more than 50% missing data were removed from the model. Additionally, missing data in binary variables were filled with the median. Neuropathic bladder was not entered as a feature to prevent overfitting; only 1 among 115 children had neuropathic bladder. Finally, the following variables were entered to train the models gender, age at diagnosis, uni- or bilaterality of VUR, DMSA differential renal function, VUR grade, dilating or non-dilating reflux in ultrasonography, and presence of fUTI, prenatal hydronephrosis, ureteral anomaly, bladder dysfunction, failure to thrive, and renal scarring. Differential renal function under 40% was considered as an abnormal function and equal or above 40% as a normal function, so DMSA was entered into the analysis as a binary feature.

Performance evaluation

Area under curve (AUC) of Receiver Operating Characteristics (ROC), F1-score, accuracy, precision, and recall were calculated for performance evaluation of the models. The model with the highest F1-score was chosen as the best model for each endpoint of the study. Generally, an AUC of 0.5 indicates no discrimination, 0.7 to 0.8 is considered acceptable, 0.8 to 0.9 is considered excellent, and more than 0.9 is considered outstanding7. The calculation of the measures was as follows:

$$\:\text{A}\text{c}\text{c}\text{u}\text{r}\text{a}\text{c}\text{y}=\frac{\text{T}\text{r}\text{u}\text{e}\:\text{p}\text{o}\text{s}\text{i}\text{t}\text{i}\text{v}\text{e}+\text{T}\text{r}\text{u}\text{e}\:\text{n}\text{e}\text{g}\text{a}\text{t}\text{i}\text{v}\text{e}}{\text{T}\text{r}\text{u}\text{e}\:\text{p}\text{o}\text{s}\text{i}\text{t}\text{i}\text{v}\text{e}+\text{T}\text{r}\text{u}\text{e}\:\text{n}\text{e}\text{g}\text{a}\text{t}\text{i}\text{v}\text{e}+\text{F}\text{a}\text{l}\text{s}\text{e}\:\text{p}\text{o}\text{s}\text{i}\text{t}\text{i}\text{v}\text{e}+\text{F}\text{a}\text{l}\text{s}\text{e}\:\text{n}\text{e}\text{g}\text{a}\text{t}\text{i}\text{v}\text{e}}$$
$$\:\text{P}\text{r}\text{e}\text{c}\text{i}\text{s}\text{i}\text{o}\text{n}=\frac{\text{T}\text{r}\text{u}\text{e}\:\text{p}\text{o}\text{s}\text{i}\text{t}\text{i}\text{v}\text{e}}{\text{T}\text{r}\text{u}\text{e}\:\text{p}\text{o}\text{s}\text{i}\text{t}\text{i}\text{v}\text{e}+\text{F}\text{a}\text{l}\text{s}\text{e}\:\text{p}\text{o}\text{s}\text{i}\text{t}\text{i}\text{v}\text{e}}$$
$$\:\text{R}\text{e}\text{c}\text{a}\text{l}\text{l}=\frac{\text{T}\text{r}\text{u}\text{e}\:\text{p}\text{o}\text{s}\text{i}\text{t}\text{i}\text{v}\text{e}}{\text{T}\text{r}\text{u}\text{e}\:\text{p}\text{o}\text{s}\text{i}\text{t}\text{i}\text{v}\text{e}+\text{F}\text{a}\text{l}\text{s}\text{e}\:\text{n}\text{e}\text{g}\text{a}\text{t}\text{i}\text{v}\text{e}}$$
$$\:\text{F}1-\text{s}\text{c}\text{o}\text{r}\text{e}=\frac{2\times\:\text{P}\text{r}\text{e}\text{c}\text{i}\text{s}\text{i}\text{o}\text{n}\times\:\text{R}\text{e}\text{c}\text{a}\text{l}\text{l}}{\text{T}\text{r}\text{u}\text{e}\:\text{p}\text{o}\text{s}\text{i}\text{t}\text{i}\text{v}\text{e}+\text{F}\text{a}\text{l}\text{s}\text{e}\:\text{p}\text{o}\text{s}\text{i}\text{t}\text{i}\text{v}\text{e}}$$

Ethical consideration

The research was approved by the Medical Research Ethics Committee of Tehran University of Medical Sciences [IR.TUMS.VCR.REC.1397.878] and the Medical Research Ethics Committee of Mashhad University of Medical Sciences [IR.MUMS.MEDICAL.REC.1399.461]. The study was in accordance with the Declaration of Helsinki. As this study was retrospective in nature, the Medical Research Ethics Committees of Tehran University of Medical Sciences and Mashhad University of Medical Sciences waived the requirement for written informed consent.

Results

Baseline data

A total of 115 infants, including 69 boys (60.0%) and 46 girls (40.0%) entered the study. The median age of the study population at the start of treatment was 6.0 months (IQR: 8.0). The baseline data are shown in Table 1. All cases had VUR in bladder filling and emptying phases, and no cases had VUR only in the emptying phase. Eleven children (9.6%) had ureteral anomalies. The most common anomaly was ureteropelvic junction obstruction (UPJO) in 5 patients (45.5%), followed by duplicated ureters (n = 4, 36.4%), ureterovesical junction obstruction (UVJO) (n = 2, 18.2%), and ureterocele (n = 2, 18.2%).

Table 1 Baseline information.

Clinical outcomes

Twenty-six children (22.6%) had breakthrough fUTIs after the start of CAP, and all of them were admitted to the hospital due to fUTIs. Among them, 6 patients had new renal scarring or an increase in the scarring stage. Notably, 1 child did not have any episodes of fUTIs after the start of treatment; however, he had an increase in the stage of previous renal scarring in his DMSA.

At the last RNCs, among 177 KUUs, the VUR in 97 KUUs (54.8%) was resolved, improved in 31 (17.5%), and had no change in 37 KUUs (20.9%). Also, the grade of reflux was upgraded in 12 KUUs, and de novo VUR was seen in 8 KUUs. When reporting the outcomes based on patients (instead of KUUs), VUR was resolved in 52 children (45.2%), improved in 19 (16.5%), and failed in 44 patients (38.3%) (Table 2).

Table 2 The outcome of infants who underwent continuous antibiotic prophylaxis.

Regarding patients with bladder dysfunction (n = 17), all (100.0%) received antimuscarinic agents, 2 patients (11.8%) received alpha-blockers, and 1 (5.9%) underwent clean intermittent catheterization (CIC) due to large bladder capacity. Among them, only 1 (5.9%) had chronic constipation, which was treated with laxatives and responded well. Additionally, among them, 3 patients (17.6%) had breakthrough UTIs, but none had new scarring or an increase in the scarring stage. However, VUR was resolved in only 3 patients (17.6%).

Predictive factors of outcome

To determine the independent predictor factors of fUTI and/or renal scarring, patients with fUTIs after the start of treatment and the 1 patient who had an increase in the staging of previous renal scarring were counted as a group. The following variables were entered into the analysis to determine predictive factors: age, gender, uni- or bilaterality of VUR, DMSA differential renal function, VUR grade, dilating or non-dilating reflux in ultrasonography, and presence of fUTI, prenatal hydronephrosis, ureteral anomaly, bladder dysfunction, neuropathic bladder, failure to thrive, and renal scarring. In the univariable analysis, maximal VUR grade (p-value: 0.004; OR: 1.846; 95% CI: 1.219–2.795), renal scarring (p-value: < 0.001; OR: 7.519; 95% CI: 2.723–20.761), and DMSA differential renal function (p-value: 0.040; OR: 2.734; 95% CI: 1.048–7.132) had significant correlation with fUTI and/or renal scarring after the treatment. When entering these three factors into the multivariable analysis, only renal scarring was significantly associated with post-treatment fUTIs and/or renal scarring (p-value: 0.007; OR: 6.467; 95% CI: 1.651–25.326).

To assess the independent predictor factors of the persistence of VUR, improvement and resolution of VUR were counted as a group compared to the group of patients with failed VUR downgrade or resolution. All of the variables included for the fUTI and/or renal scarring were entered to this analysis as well. In the univariable analysis, bladder dysfunction (p-value: < 0.001; OR: 10.578; 95% CI: 2.829–39.552), renal scarring (p-value: < 0.001; OR: 4.200; 95% CI: 1.876–9.402), and DMSA differential renal function (p-value: 0.002; OR: 4.359; 95% CI: 1.720-11.046) had significant correlation with VUR persistence after the treatment. When entering these three factors into the multivariable analysis, only bladder dysfunction was significantly associated with post-treatment VUR persistence (p-value: 0.004; OR: 7.456; 95% CI: 1.886–29.472).

Machine learning-based model to predict outcome

The five models of logistic regression, random forest, SVM, gradient boosting, and neural network were developed. They were developed for both of our outcomes: (1) fUTI and/or renal scarring and (2) VUR persistence. The results showed that for both outcomes, random forest was the best model based on the F1-scores (Table 3). The ROC curves for the models are shown in Fig. 1.

Table 3 The performance of models for febrile urinary tract infection and/or renal scarring and vesicoureteral reflux persistence.
Fig. 1
figure 1

Receiver Operating Characteristics curve of (A) febrile urinary tract infection and/or renal scarring and (B) vesicoureteral reflux persistence.

Discussion

The optimal method for treating VUR in infants is controversial, and the best selection method to treat the patients with CAP or to refer them to a urologist for surgical intervention is lacking8. We conducted a study on infants with a diagnosis of VUR. Our aim was to evaluate the most efficient method to predict the patients who will not benefit from CAP based on their presenting features and suggest referring them to a urologist. In our analyses, we found that the presence of renal scarring and bladder dysfunction were associated with a failed treatment with CAP. On the other hand, we observed that the success rates of endoscopic injection in patients with renal scarring and bladder dysfunction were acceptable. We also used machine learning-based models to predict the outcome of CAP, in which random forest showed the best results compared to other algorithms.

In our multivariable analyses, renal scarring was associated with post-treatment fUTIs and/or renal scarring, and bladder dysfunction was associated with VUR persistence. Although some studies have introduced high-grade and bilateral VUR as independent factors of recurrent UTI for children during CAP9,10, some other studies, similar to our study, showed that VUR grade was not a predictive factor of VUR resolution11. Similar to our study, Loukogeorgakis et al.12 found that renal scarring was the only significant risk factor for breakthrough UTI, indicating that patients with renal scarring were three times more likely to develop breakthrough UTI. Also, Nakamura et al.13 revealed that renal scarring as well as female gender was a risk factor for fUTI after stopping CAP. Besides, according to the study by Sjöström et al.14, bladder dysfunction based on videocystometry was an independent predictive factor for reflux persistence in congenital high-grade VUR. We therefore suggest referring patients with renal scarring and/or bladder dysfunction to a urologist for surgical intervention at first visits by pediatricians/nephrologists. On the other hand, endoscopic injection of Dx/HA in patients with renal scarring and bladder dysfunction resulted in excellent outcomes (Supplementary Material 1). Of note, in our center, we avoid open ureteral reimplantation for children under three years of age, because it may cause denervation hypersensitivity of the bladder due to their incomplete neuromuscular development.

Only a few studies in the literature have utilized prediction models, especially machine learning, to predict VUR resolution with CAP. Bertsimas et al.15 aimed to develop a machine learning model to predict the children who would most likely benefit from CAP. They analyzed the impact of CAP at various cutoffs of recurrent UTI risk reduction. The authors finally reported that when using a cutoff of 10% for recurrent UTI risk reduction, minimal recurrent UTI per population was achieved by administering CAP to 40% of patients instead of everyone. In another study, Arlen et al.16 included 255 patients to predict the probability of breakthrough fUTI in primary VUR. Among their tested models, a 2-hidden node neural network model with an AUC of 0.76 was the best model. Here, we found random forest to be the best fit for both of our outcomes, fUTI and/or renal scarring and VUR persistence. We believe that using prediction models with the help of machine learning enables physicians to make informed decisions about starting CAP or referring patients early for surgical intervention. By tailoring treatment plans based on each patient’s characteristics, this approach can support future research in integrating predictive models into daily practice. If successful, decision-making will become more precise, leading to improved outcomes.

The main limitation of the study is its small sample size. With larger sample sizes, more reliable machine learning based-modeling can be achieved. In addition, we did our analyses based on the final outcome of our cases at the last follow-ups. However, future studies are suggested to predict VUR resolution rates during the follow-up period (e.g. annual resolution rate) and discuss the best time during the follow-up period to refer the child to a urologist. Additionally, the follow-up period for VUR resolution is relatively short, so longer follow-ups are recommended for future research. However, it is worth mentioning that the first year of an infant’s life presents both the highest chance of VUR resolution and the greatest risk of UTIs8. Another limitation is that the data were collected retrospectively from the medical records. Also, some other variables not recorded/reported in this study might have significant impact on outcomes and prediction models. For instance, distal ureteral diameter was not recorded in this study, which had significant predictive value in previous studies17,18. Besides, bladder dysfunction was only defined in patients with urodynamic studies, which may cause underestimation of this condition compared to the literature.

Conclusions

Renal scarring and bladder dysfunction should be considered important predictors of breakthrough fUTI and/or renal scarring and VUR persistence, respectively, when the patient is receiving CAP. Therefore, referring these patients to a urologist for surgical intervention is suggested at first visits by pediatricians/nephrologists. Our outcomes demonstrated that these high-risk patients benefited from endoscopic injection of Dx/HA. Also, our analyses showed that for both breakthrough fUTI and/or renal scarring and VUR persistence, random forest was the best prediction model.