Machine learning and SHAP-based risk assessment of PICC-related bloodstream infections in premature infants at the time of clinical suspicion

Guo, Yongqin; Dou, Yingying; Song, Wenxia; Wang, Lihong; Wang, Li

doi:10.1038/s41390-026-05049-6

Download PDF

Clinical Research Article
Open access
Published: 07 May 2026

Machine learning and SHAP-based risk assessment of PICC-related bloodstream infections in premature infants at the time of clinical suspicion

Yongqin Guo¹,
Yingying Dou¹,
Wenxia Song²,
Lihong Wang² &
…
Li Wang¹

Pediatric Research (2026) Cite this article

893 Accesses
1 Altmetric
Metrics details

Abstract

Background

Peripherally inserted central catheter-related bloodstream infections (PICC-CRBSI) pose a serious threat to preterm infants. This study aimed to develop and validate an interpretable machine learning model for risk assessment of PICC-CRBSI at the time of clinical suspicion.

Methods

A total of 490 preterm infants who underwent PICC insertion in a tertiary hospital NICU in China were prospectively enrolled from January 2024 to October 2025. After feature selection, prediction models were constructed using four machine learning algorithms. Model performance was evaluated using area under the receiver operating characteristic curve (AUC), calibration, and decision curve analysis. The optimal model was interpreted using Shapley additive explanations (SHAP).

Results

CRBSI occurred in 68 patients (13.88%). The Random Forest model demonstrated the best performance, with AUC values of 0.973 and 0.934 in the training and validation sets, and overall accuracy of 0.945 and 0.905. SHAP analysis revealed that C-reactive protein(CRP), white blood cell count, and respiratory rate had the most significant influence on the model’s predictive performance.

Conclusion

The random forest model demonstrated robust performance for risk assessment of PICC-CRBSI in preterm infants at the time of clinical suspicion. These findings may support clinical risk stratification and provide hypothesis-generating insights into key factors associated with PICC-CRBSI.

Impact

Develops a high-performance, interpretable random forest model for risk assessment of peripherally inserted central catheter-related bloodstream infection (PICC-CRBSI) in preterm infants at the time of clinical suspicion.
Addresses a gap in the literature by integrating the SHAP method to transparently identify and prioritize key risk-associated features (e.g., C-reactive protein and white blood cell count), enhancing the clinical interpretability of the model.
Provides a practical approach for risk stratification, with potential to support clinical assessment and improve the understanding of PICC-CRBSI risk in neonatal intensive care.

A predictive model for PICC-related thrombosis in sepsis patients using XGBoost algorithm

Article Open access 21 March 2026

Development and validation of a machine learning model for critical progression risk in pediatric severe community-acquired pneumonia

Article Open access 02 December 2025

Interpretable machine learning model for early prediction of disseminated intravascular coagulation in critically ill children

Article Open access 02 April 2025

Introduction

Peripherally inserted central catheters (PICC) provide stable venous access and have become an essential tool for parenteral nutrition and drug administration in neonatal intensive care, particularly for preterm infants.^1,2 However, catheter placement itself is a well-established risk factor for catheter-related bloodstream infections (CRBSI).³ For premature infants with immature immune systems, CRBSI significantly increases the risk of complications such as delayed-onset sepsis and necrotizing enterocolitis and is closely associated with higher mortality rates, longer hospital stays, and poorer neurodevelopmental outcomes.^4,5 Early identification of CRBSI in premature infants faces dual challenges. First, clinical manifestations of infection are often nonspecific. Symptoms like fever, apnea, and feeding intolerance can easily be confused with other common neonatal conditions, leading to diagnostic uncertainty.⁶ Second, blood culture—the gold standard for diagnosis—faces limitations in the preterm population: obtaining sufficient blood for culture is challenging, potentially reducing sensitivity; and culture results typically require 48 to 72 h to obtain, failing to meet the clinical need for early intervention.^7,8 This diagnostic delay or false-negative result exposes clinicians to misjudgment risks during initial assessment, potentially delaying treatment or leading to unnecessary antibiotic exposure. In recent years, machine learning technologies have demonstrated significant potential in disease risk prediction due to their advantages in processing high-dimensional, nonlinear clinical data.^9,10 Researchers have attempted to develop models predicting catheter-related infection risk in adult populations.^11,12 However, studies specifically targeting predictive models for PICC-CRBSI in preterm infants remain limited. Preterm infants exhibit unique characteristics in baseline physiological parameters, inflammatory response patterns, and catheter maintenance details, rendering direct application of models from other populations unreliable. Furthermore, many complex machine learning models operate as black boxes, with internal decision-making logic difficult to interpret. This limits clinicians’ trust in their predictive outcomes and hinders the identification of actionable risk factors from the models to guide clinical practice.¹³ Therefore, this study aims to construct and validate a PICC-CRBSI risk prediction model specifically tailored for preterm infants. It systematically incorporates baseline characteristics, dynamic vital signs, catheter-related parameters, and a series of inflammatory markers. Multiple feature selection methods were employed to identify key variables, and the performance of different machine learning algorithms was compared. The core innovation of this study lies not only in developing a high-performance predictive tool but also in enhancing clinical transparency and acceptability by introducing SHAP (Shapley Additive exPlanations) for visualizing the predictive logic of the optimal model. It is hoped that this model will provide clinicians with an objective, supplementary risk assessment tool when signs of infection emerge but pathogenetic evidence remains inconclusive. This will facilitate early identification of high-risk pediatric patients, optimize catheter management strategies, and offer a reference basis for timely intervention initiation.

Methods

Study population

This prospective observational study was conducted in the Neonatal Intensive Care Unit (NICU) of a tertiary-level hospital in China from January 2024 to October 2025. The study population comprised preterm infants who underwent PICC during this period. A total of 490 eligible infants were screened and enrolled. Inclusion criteria were as follows: (1) gestational age <37 weeks and birth weight <2000 g; (2) admission to the hospital NICU within 12 h of birth; (3) successful placement and retention of a PICC line at this center; (4) Underwent at least one blood culture test during catheter placement due to clinical signs of infection (e.g., unexplained thrombocytopenia, apnea, feeding intolerance, abnormal temperature or heart rate); (5) No history of documented bloodstream infection prior to catheter placement; (6) Informed consent obtained from legal guardians. Exclusion criteria included: (1) Catheter removal within 48 h post-PICC placement due to non-infectious causes (e.g., catheter dysfunction, completion of treatment); (2) Incomplete clinical records or key variable documentation. This study protocol was approved by the Institutional Review Board of Changzhi Maternal and Child Health Hospital (Ethics Approval No.: CZSFYLL2024.026).

Sample size calculation

According to previous literature, the incidence of PICC-associated bloodstream infections in neonates is approximately 22%.¹⁴ Although 34 candidate variables were initially considered, the number of variables ultimately included in the regression model was limited to fewer than 8. This study adopted an EPV value of 34. The required sample size was calculated using the following formula: Required sample size = (Number of variables included in the model × EPV)/(1 − Event incidence rate).¹⁵ Therefore, the estimated minimum sample size was 8 × 34/(1 − 0.22) ≈ 349. Considering an anticipated 10% data loss rate, a total of 384 subjects were required. Ultimately, this study enrolled 490 preterm infants, a sample size sufficient to meet subsequent statistical and modeling analysis requirements.

Assessment of variables

Based on a literature review and expert discussions, we predefined a set of candidate variables. The first category includes baseline characteristics such as gestational age, birth weight, biological sex, 5-min Apgar score, mode of delivery, number of fetuse, neonatal respiratory distress syndrome, and mechanical ventilation requirements. The second category involved catheter-related factors, including age in days at catheter placement, catheter insertion site, number of puncture attempts, catheter repositioning, duration of PICC placement, frequency of dressing change, catheter occlusion, catheter connector wrapping and catheter displacement. The third category comprised vital sign parameters, covering axillary temperature, heart rate, respiratory rate, mean blood pressure, and oxygen saturation. The fourth category comprises laboratory parameters, including red blood cell count (RBC), white blood cell count (WBC), platelet count (PLT), neutrophil percentage (NEUT), lymphocyte percentage (lymph), and C-reactive protein (CRP).

To comprehensively evaluate the association between laboratory parameters and CRBSI, two feature sets were defined based on blood collection timing: Sample 1 represents laboratory test data obtained within 72 h prior to blood culture collection, while Sample 2 represents laboratory test data obtained on the same day as blood culture collection. In this study, blood culture results served as the gold standard for diagnosing bloodstream infections and were used to determine CRBSI outcomes. In contrast, complete blood count (CBC) and CRP tests functioned solely as laboratory input variables for model training and prediction, supporting risk assessment rather than diagnostic determination.

Diagnostic criteria

The diagnostic criteria for CRBSI refer to the 2009 guidelines issued by the Infectious Diseases Society of America. Catheter-related bloodstream infection¹⁶ is defined as: within 48 h of having an intravascular catheter or after its removal, a patient presents with bacteremia or fungemia, accompanied by signs of infection such as fever (>38 °C), chills, hypotension, oliguria, etc., and no other clear source of infection apart from the intravascular catheter. Meanwhile, laboratory microbiological examination reveals the same pathogen (such as Gram-positive cocci, coagulase-negative cocci, Staphylococcus epidermidis, etc.) cultured from both a peripheral venous sample and the intravascular catheter.

Data collection methods

Following standardized training, two investigators independently reviewed and collected data using standardized data collection forms. Disease-related information and laboratory data were extracted from the electronic medical record system, with laboratory testing routinely performed every three days. Catheter-related variables were obtained from electronic nursing records. Vital sign data originated from bedside continuous monitoring records. Monitors recorded raw data at recorded hourly, while temperature measurements were extracted from nursing records. To assess infection-related physiological status, vital sign values within the 24-h period preceding blood culture collection were extracted, averaged, and used as predictor variables. Data accuracy was independently cross-verified and confirmed by two researchers. All procedures were executed according to standardized protocols.

Data preprocessing

First, missing data across all variables were assessed. Variables with missing values exceeding 20%, including umbilical cord blood pH and whether patient repositioning occurred during catheter insertion, were excluded from further analysis. For variables with less than 20% missing data, values were imputed using the mean of the corresponding gestational age group to minimize potential systematic bias. For this purpose, gestational age was grouped as <28 weeks, 28 ≤ gestational age <34 weeks, and 34 ≤ gestational age <37 weeks. To assess the applicability of this imputation strategy, data completeness was examined across different gestational age groups. The analysis revealed broadly similar patterns of missingness across groups, with no gestational age group exhibiting a missingness rate exceeding 20%. Therefore, the study concluded that the stratification-based mean imputation method was feasible and robust. Following data cleaning and preprocessing, 490 preterm infants were ultimately retained for subsequent model development and analysis. Variable-level missingness for variables with missing data, including gestational age, is summarized in Supplementary Table S1.

Statistical methods

Quantitative data following a normal distribution were expressed as mean ± standard deviation, with intergroup comparisons performed using the independent samples t-test. Quantitative data not following a normal distribution were expressed as median (P25, P75), with intergroup comparisons performed using the Mann–Whitney U test. Categorical variables were expressed as frequencies and percentages, with comparisons performed using chi-square tests or Fisher’s exact tests as appropriate. Given the large number of candidate variables involved in univariate analysis, to control the risk of false positives from multiple comparisons, all univariate P-values underwent false discovery rate (FDR) correction using the Benjamini-Hochberg method, yielding adjusted Q-values. All statistical analyses were performed using SPSS version 24.0 software. After FDR correction, a Q-value < 0.05 was defined as statistically significant.

Machine learning algorithms

Model development was performed using R software (version 4.4.2). All candidate variables—including demographic characteristics, vital signs, catheter-related factors, and both sets of laboratory indicators—were entered into machine learning algorithms for feature selection. Three feature selection methods were applied: Least Absolute Shrinkage and Selection Operator (LASSO) regression, Boruta algorithm, and Recursive Feature Elimination (RFE). The overlapping features consistently identified by the three feature selection methods were retained as a robust subset of key predictors. Based on these selected features, four machine learning models were constructed: Logistic Regression (LR), Decision Tree (DT), Random Forest (RF), and Light Gradient Boosting Machine (LightGBM). The dataset was randomly split into a training set (70%) and a validation set (30%). For the training set, hyperparameter optimization and model selection were performed using grid search combined with 10-fold cross-validation. The final model performance was evaluated on the independent validation set. Model performance was assessed using the area under the receiver operating characteristic curve (AUC), calibration curves, and decision curve analysis (DCA). Accuracy, specificity, precision, recall, and F1 scores were calculated for each machine learning model. In addition, 95% confidence intervals were reported for AUC, accuracy, recall, and specificity to enhance the statistical clarity of model evaluation; confidence intervals for accuracy, recall, and specificity were calculated using the exact Clopper–Pearson method. The optimal model selection was based on a comprehensive evaluation of area under the curve (AUC) and accuracy. Statistical comparisons of AUC among models were performed using the DeLong test. Variable importance rankings were visualized using importance matrices, and the SHAP method was applied to interpret the contribution of each variable to model predictions.

Results

Inclusion of pediatric patients

This study included 490 premature infants with indwelling PICC lines. Among them, 68 cases were confirmed as CRBSI, yielding an incidence rate of 13.88% (68/490). Specifically, 48 cases were confirmed as CRBSI in the training set, with an incidence rate of 13.99% (48/343); while 20 cases were confirmed in the validation set, yielding an incidence rate of 13.61% (20/147). A total of 76 strains of pathogenic bacteria were detected in 68 patients with positive blood cultures, predominantly Gram-negative bacteria (56 isolates, 73.68%). Among these, Klebsiella pneumoniae was the most common (31 isolates, 40.79%), followed by Escherichia coli (18 strains, 23.68%), Pseudomonas aeruginosa (4 strains, 5.26%), and other Gram-negative bacteria (3 strains, 3.95%). Among the 20 Gram-positive bacteria detected (26.32%), the main species included Staphylococcus epidermidis (10 strains, 13.16%), Staphylococcus aureus (6 strains, 7.89%), Group B Streptococcus (2 strains, 2.63%), and Enterococcus spp. (2 strains, 2.63%). Univariate comparisons were performed between the CRBSI group (n = 68) and the non-CRBSI group (n = 422) for baseline and clinical variables, with multiple comparisons adjusted using the Benjamini-Hochberg false discovery rate. After FDR correction, 13 variables demonstrated statistically significant differences between groups (corrected Q-value < 0.05), categorized as follows: Demographic and Perinatal Characteristics: Gestational age, birth weight; Treatment and Catheter-Related Factors: Mechanical ventilation, Age in days at catheter placement, number of puncture attempts, duration of PICC placement; Vital signs: Axillary temperature, respiratory rate, oxygen saturation; Laboratory parameters (Sample 2, collected on the same day as blood culture): red blood cell count in sample 2 (RBC2), white blood cell count in sample 2 (WBC2), neutrophil percentage in sample 2 (NEUT2), C-reactive protein in sample 2 (CRP2). All other compared variables showed no statistically significant differences after FDR adjustment (adjusted p ≥ 0.05), as detailed in Table 1.

Table 1 Baseline characteristics of included pediatric patients

Full size table

Feature selection using Lasso, Boruta, and RFE

Key features for constructing predictive models were initially selected from variables that remained statistically significant after FDR correction in univariate analysis (Q-value < 0.05). These were then input into three feature selection methods: LASSO regression, Boruta algorithm, and RFE. LASSO regression identified 7 non-zero coefficient features through cross-validation (Fig. 1a, b): birth weight, age at catheter placement, number of punctures, respiratory rate, blood oxygen saturation, WBC2, and CRP2. The Boruta algorithm evaluated and ranked the relative importance of all features (Fig. 1c); and the RFE method similarly selected 7 predictive features (Fig. 1d). LASSO and RFE each selected 7 features, whereas Boruta was used to rank all candidate features and the top 7 ranked features were retained for comparison. Six overlapping features were consistently identified across the three methods and were incorporated into subsequent model building: birth weight, age at catheter placement, number of punctures, respiratory rate, WBC2, and CRP2.

**Fig. 1: Feature selection based on Lasso regression, Boruta algorithm, and recursive feature elimination (RFE).**

Model development and performance comparison

The dataset was randomly split into training and validation sets at a 7:3 ratio. Among the four machine learning models trained on the training set, the random forest model demonstrated optimal discriminative performance, achieving an area under the receiver operating characteristic curve of 0.973 (95% confidence interval: 0.925–0.995) (Fig. 2a). DCA results indicated that the random forest model provided higher clinical net benefit across a broad range of threshold probabilities, suggesting its potential practical value (Fig. 2b). Calibration curves demonstrated good agreement between the model’s predicted probabilities and observed outcomes (Fig. 2c). On an independent validation set, the random forest model maintained excellent predictive performance with an AUC of 0.934 (95% confidence interval: 0.850–0.985) (Fig. 2d). The DCA curve and calibration curve on the validation set further confirmed the model’s strong clinical applicability and predictive accuracy (Fig. 2e, f). Table 2 summarizes the performance of the four models on the training and validation sets, including AUC, accuracy, specificity, precision, recall, and F1 score. Overall, the random forest model demonstrated the best performance across the primary evaluation metrics and was therefore selected as the optimal predictive model.

**Fig. 2: Predictive performance of the machine learning model.**

Table 2 Performance evaluation of four machine learning algorithms

Full size table

Statistical comparison of model discrimination capabilities

Furthermore, DeLong tests were performed to compare the AUCs of the random forest model with those of the other models. In the training set, the AUC of the random forest model was significantly higher than that of LR (p = 0.03), DT (p = 0.02), and LightGBM (p = 0.009). Similar results were observed in the validation set, where the random forest model also outperformed LR (p = 0.02), DT (p = 0.02), and LightGBM (p = 0.008).

Variable importance in the optimal random forest model

The relative importance of each predictor variable in the optimal random forest model was assessed using the MeanDecreaseGini method. As shown in Fig. 3, CRP2 exhibited the highest importance, followed by WBC2 and respiratory rate. These variables contributed most significantly to the model’s predictive performance. Other significant predictors—such as birth weight,age in days at catheter placement and number of puncture attempts—exhibited lower contribution levels but remained statistically significant.

Optimal model interpretation based on SHAP

To gain deeper insight into the predictive logic of the random forest model and enhance its clinical interpretability, this study employed the SHAP method for model interpretation. As shown in Fig. 4, features are ranked in descending order by their mean absolute SHAP value, intuitively illustrating their contribution to the model output. Analysis indicates that CRP2 and WBC2 exhibit the highest SHAP values, confirming them as the most significant risk factors for predicting PICC-associated bloodstream infections. Subsequently, the most influential features include respiratory rate, number of puncture attempts, age in days at catheter placement and birth weight. The SHAP plots (Fig. 4) further reveal nonlinear relationships between feature values and predicted risk. Specifically, elevated CRP2 levels, abnormal WBC2 (including increases or decreases), abnormal respiratory rate, higher puncture frequency, younger age at catheter placement, and lower birth weight were all associated with increased predicted CRBSI risk. Notably, during feature preprocessing, birth weight was categorized into ordered levels (e.g., high, medium, low, coded as 0, 1, 2). In the Shap analysis plots, higher feature values (e.g., yellow regions in the image) correspond to lower birth weight levels, fully aligning with clinical observations—low birth weight increases susceptibility to CRBSI. The model’s interpretation further validates this risk trend. Through SHAP analysis, the model not only predicts risk but also visually quantifies the direction and magnitude of key clinical features’ influence. This provides transparent evidence for clinicians to understand model decision-making, aiding in identifying controllable risk factors and guiding early interventions.

Discussion

PICC-CRBSI rate and microbial pathogenology

Globally, the incidence of neonatal CRBSI ranges from 2 to 30%, with developing countries generally exhibiting higher rates.^17,18 This variation is typically associated with differences in infection control measures, neonatal population characteristics, and healthcare quality across regions. The PICC-CRBSI incidence rate in this study was 13.88%, falling within the lower range of reported values and comparable to rates in developed countries. This finding aligns with the range (7.25–13.78%) reported by Ren et al.¹⁹ for preterm infants. The observed CRBSI incidence and pathogen spectrum in this study may be influenced by the following factors: First, this cohort included a high proportion of very low birth weight (VLBW) and extremely low birth weight (ELBW) infants, most of whom had prolonged catheterization. Such infants exhibit immature immune systems and compromised skin barrier function, resulting in significantly increased susceptibility to infection. Second, this study strictly adhered to the 2021 revised CRBSI diagnostic criteria from the U.S. Centers for Disease Control and Prevention,¹⁵ emphasizing a comprehensive assessment of clinical presentation, catheter-related factors, and blood culture results. This approach may have more sensitively identified true CRBSI events. Regarding pathogens, Gram-negative bacteria—particularly Klebsiella pneumoniae and Escherichia coli—predominated in this study. This contrasts with pathogen profiles reported in some high-income countries, where Gram-positive bacteria like Staphylococcus aureus and Staphylococcus epidermidis are more prevalent.^20,21 This discrepancy may be related to the higher proportion of VLBW/ELBW infants in this cohort, who are particularly susceptible to Gram-negative bacterial infections. This demographic characteristic should be considered a limitation when extrapolating the study findings to populations with different gestational ages or weight distributions.

Clinical variables and their relationship with CRBSI

This study identified multiple clinical and catheter-related indicators significantly associated with CRBSI risk, quantifying their contribution through SHAP analysis. Inflammatory markers: Both the feature importance matrix and SHAP analysis confirmed CRP as the most critical predictor of PICC-CRBSI. Data showed that CRP2 levels in the CRBSI group were significantly higher than in the non-CRBSI group. As a classic acute-phase inflammatory marker, CRP rapidly increases during bacterial infection. Previous studies indicate that dynamic changes in CRP levels reflect the inflammatory burden and progression of catheter-related infections, offering early warning value.^22,23 Based on these findings, monitoring dynamic CRP changes—particularly unexplained elevations during catheter placement—provides crucial clues for early CRBSI detection. Abnormal white blood cell counts (elevated or decreased) represent another key predictor, reflecting immune dysregulation during sepsis.^24,25 In clinical practice, combined analysis of CRP and WBC enhances the efficacy of infection status assessment. Physiological Signs: In this study, abnormal respiratory rate was identified as a risk factor for CRBSI. During bloodstream infections, pathogens and inflammatory mediators may affect the immature central nervous system of preterm infants, leading to unstable respiratory drive manifested as apnea or altered respiratory rate.²⁶ Therefore, healthcare providers are advised to conduct close, dynamic monitoring of vital signs in infants with PICC lines. Abnormal respiratory rate changes should be regarded as non-specific early warning signs of infection, necessitating prompt evaluation and investigation of potential infection sources. Catheter-related factors: This study particularly emphasizes the importance of the catheterization procedure and its timing. A higher number of punctures correlates with increased CRBSI risk, as multiple punctures may increase opportunities for local tissue damage and microbial invasion. This suggests that efforts should be made to improve the success rate of single-puncture catheterization, performed by experienced healthcare providers, while strictly adhering to aseptic principles. Age in days at catheter placement is another critical factor. This study found a higher CRBSI incidence in infants placed within ≤7 days of life (e.g., 57% vs 43%). In the early postnatal period, preterm infants exhibit dual deficiencies in innate and adaptive immunity, coupled with an immature skin barrier function, making invasive procedures a potential portal for infection.^27,28 Therefore, for extremely preterm infants or very low birth weight infants requiring catheterization within the first week of life, stricter management and monitoring strategies should be implemented throughout the decision-making process, insertion procedure, and subsequent maintenance. Patient Baseline Characteristics: Birth weight, as an unmodifiable factor, has been consistently demonstrated in this study and multiple previous investigations to be negatively correlated with infection risk.²⁸ A U.S. study indicated that for every 100 g decrease in birth weight, the risk of bloodstream infection increases by 9.0%.²⁹ This suggests that preterm infants with lower birth weights should be prioritized for catheter-related infection prevention and control. Departments are advised to consider establishing specialized vascular access management teams to implement systematic, meticulous end-to-end catheter management for these high-risk patients, thereby comprehensively reducing infection risk.

Predictive performance and model interpretation

After completing feature selection and data preprocessing, we compared the performance of four machine learning algorithms in predicting the risk of PICC-associated bloodstream infections in preterm infants. Evaluation on an independent validation set revealed that the RF model demonstrated optimal discriminative performance, achieving an AUC of 0.934, an accuracy rate of 0.905, a precision of 0.625, and a specificity of 0.929. These results align with previous studies, indicating that the Random Forest model effectively captures nonlinear relationships in high-dimensional, complex clinical data by integrating multiple decision trees, demonstrating good predictive stability and discriminative capability.^26,30 Beyond predictive performance, we applied the SHAP method to enhance model interpretability. SHAP analysis does not provide insights into mechanisms or causality but quantifies the relative contribution of each variable to model predictions, thereby increasing transparency and facilitating clinical understanding of how input features influence risk assessment. This interpretability approach supports the use of the RF model as a risk stratification tool, while acknowledging the need for further external validation to confirm its robustness and generalization capability.

Implications for clinical practice

The core value of the predictive model developed in this study lies in providing objective evidence for clinical decision-making through systematic risk assessment. Based on the interpretation of key predictive variables, this model may support clinical practice in the following ways. First, the model facilitates early risk identification and stratified intervention. By integrating dynamic indicators such as CRP, white blood cell count, and respiratory rate, the model can identify high-risk pediatric patients for PICC-CRBSI before blood culture results are available or when clinical presentations are atypical. This prompts clinicians to intensify monitoring frequency and escalate nursing interventions (e.g., stricter aseptic technique, earlier empirical antimicrobial evaluation) for high-risk patients, thereby advancing the point of intervention. Second, the model’s analytical findings provide evidence for optimizing catheter management protocols. The study revealed that increased puncture attempts and early catheter placement (within ≤7 days of age) correlate with heightened infection risk. This suggests clinical practice should prioritize improving first-puncture success rates and assigning procedures to experienced teams. For extremely low/very low birth weight infants requiring catheterization within the first week of life, the necessity of placement should be carefully weighed, accompanied by stricter post-insertion monitoring. By reducing unnecessary punctures and rigorously controlling catheterization duration, infection risks can be mitigated at the source. The model supports differentiated management and rational resource allocation. Accurate identification of low-risk patients avoids excessive interventions (such as unnecessary antibiotic use or frequent catheter changes), thereby reducing healthcare resource consumption and unnecessary patient exposure risks. This allows limited infection control resources to be concentrated on high-risk populations. It is important to note that this model is not intended to replace comprehensive clinical judgment or microbiological diagnosis. Rather, it serves as an auxiliary tool providing quantitative risk reference during the early infection assessment phase to enhance the precision and timeliness of clinical decision-making. Integrating the model into clinical information systems and conducting prospective application studies represent critical steps for validating its practical efficacy and translational value.

Limitations and future research

This study has several limitations. First, both the development and validation of the model were based on a single-center prospective cohort, undergoing only internal validation. Its generalizability across different institutions, populations, and clinical practice settings requires further verification. Future research should prioritize multicenter, large-sample external validation to objectively assess the model’s calibration and discriminative performance, evaluating its robustness and applicability across diverse settings.³¹ Additionally, the model constructed in this study primarily predicts the overall risk of CRBSI occurrence and cannot yet distinguish specific pathogen types. Future research may explore incorporating additional microbiological or host immune characteristics to develop models capable of predicting infections caused by specific pathogens (e.g., Gram-negative vs. Gram-positive bacteria). Such advanced models could provide decision support for the precise selection of early empirical antibiotics, thereby helping to optimize antimicrobial stewardship, curb the emergence of drug-resistant bacteria, and potentially improve patient outcomes.

Conclusion

In summary, this study developed a machine learning-based prediction model using routinely collected clinical data for the early assessment of PICC-associated bloodstream infection risk in preterm infants. The random forest model demonstrated good discriminatory performance in internal validation and identified several clinical, physiological, and laboratory variables influencing risk assessment. These findings should be interpreted as hypothesis-generating results reflecting associations within this specific cohort, rather than evidence of causality or direct clinical benefit. Future prospective, multicenter studies are needed to validate the model’s performance, assess its robustness across settings, and determine its potential role in supporting clinical risk assessment for suspected PICC-related bloodstream infections in preterm infants.

Data availability

The data used to support the findings of this study are not publicly available due to the need to protect the privacy of neonatal participants, but are avaliable from the corresponding author on reasonable request.

References

Teibel, H., Hood, K., Manasco, K. & Bhatia, J. Antibiotic administration prior to central venous catheter removal in neonates. J. Pharm. Pract. 34, 894–900 (2021).
Article PubMed Google Scholar
Ponticelli, E. et al. Complete blood count collected via venipuncture versus peripherally inserted central catheter in hematological patients: a comparison of 2 methods. Cancer Nurs. 45, E36–e42 (2022).
Article PubMed Google Scholar
van Tonder, D. J., Keough, N., van Niekerk, M. L. & van Schoor, A. N. The position of the common facial vein in neonates: an alternate route for central venous catheter placement. Clin. Anat. 34, 644–650 (2021).
Article PubMed Google Scholar
Brown, R. & Burke, D. The hidden cost of catheter related blood stream infections in patients on parenteral nutrition. Clin. Nutr. ESPEN 36, 146–149 (2020).
Article PubMed Google Scholar
Wu, Y. et al. A review of neonatal peripherally inserted central venous catheters in extremely or very low birthweight infants based on a 3-year clinical practice: complication incidences and risk factors. Front. Pediatr. 10, 987512 (2022).
Article PubMed PubMed Central Google Scholar
Klinger, G. et al. Late-onset sepsis in very low birth weight infants. Pediatrics 152, e2023062223 (2023).
Bierlaire, S., Danhaive, O., Carkeek, K. & Piersigilli, F. How to minimize central line-associated bloodstream infections in a neonatal intensive care unit: a quality improvement intervention based on a retrospective analysis and the adoption of an evidence-based bundle. Eur. J. Pediatr. 180, 449–460 (2021).
Article CAS PubMed Google Scholar
Lu, X. et al. Sheathless and high-throughput elasto-inertial bacterial sorting for enhancing molecular diagnosis of bloodstream infection. Lab Chip 21, 2163–2177 (2021).
Article CAS PubMed Google Scholar
Triantafyllidis, A. et al. Computerized decision support and machine learning applications for the prevention and treatment of childhood obesity: a systematic review of the literature. Artif. Intell. Med. 104, 101844 (2020).
Article PubMed Google Scholar
Fernandes, M. et al. Predicting intensive care unit admission among patients presenting to the emergency department using machine learning and natural language processing. PLoS ONE 15, e0229331 (2020).
Article CAS PubMed PubMed Central Google Scholar
Li, X. H., Yang, X. L., Dong, B. B. & Liu, Q. Predicting 28-day all-cause mortality in patients admitted to intensive care units with pre-existing chronic heart failure using the stress hyperglycemia ratio: a machine learning-driven retrospective cohort analysis. Cardiovasc. Diabetol. 24, 10 (2025).
Article CAS PubMed PubMed Central Google Scholar
Albu, E. et al. Hospital-wide, dynamic, individualized prediction of central line-associated bloodstream infections-development and temporal evaluation of six prediction models. BMC Infect. Dis. 25, 597 (2025).
Article PubMed PubMed Central Google Scholar
Gao, S. et al. Systematic review finds risk of bias and applicability concerns for models predicting central line-associated bloodstream infection. J. Clin. Epidemiol. 161, 127–139 (2023).
Article PubMed Google Scholar
Colacchio, K., Deng, Y., Northrup, V. & Bizzarro, M. J. Complications associated with central and non-central venous catheters in a neonatal intensive care unit. J. Perinatol. 32, 941–946 (2012).
Article CAS PubMed Google Scholar
Bu, Z. J. et al. Introduction and example analysis of common sample size estimation methods in clinical prediction model construction. Mod. Chin. Med. Clin. 31, 32–37 (2024).
Google Scholar
Mermel, L. A. et al. Clinical practice guidelines for the diagnosis and management of intravascular catheter-related infection: 2009 update by the infectious diseases society of America. Clin. Infect. Dis. 49, 1–45 (2009).
Article CAS PubMed PubMed Central Google Scholar
Gordon, A., Greenhalgh, M. & McGuire, W. Early planned removal versus expectant management of peripherally inserted central catheters to prevent infection in newborn infants. Cochrane Database Syst. Rev. 6, Cd012141 (2018).
PubMed PubMed Central Google Scholar
Rosenthal, V. D. et al. International nosocomial infection control consortium (Inicc) report, data summary of 45 Countries for 2012-2017: device-associated module. Am. J. Infect. Control 48, 423–432 (2020).
Article PubMed Google Scholar
Ren, X. L. et al. Ultrasound to localize the peripherally inserted central catheter tip position in newborn infants. Am. J. Perinatol. 38, 122–125 (2021).
Article PubMed Google Scholar
Jansen, S. J. et al. Central-line-associated bloodstream infection burden among Dutch neonatal intensive care units. J. Hosp. Infect. 144, 20–27 (2024).
Article CAS PubMed Google Scholar
Wang, J. et al. A risk prediction model for physical restraints among older chinese adults in long-term care facilities: machine learning study. J. Med. Internet Res. 25, e43815 (2023).
Article PubMed PubMed Central Google Scholar
Papadimitriou-Olivgeris, M. et al. Molecular characteristics and predictors of mortality among gram-positive bacteria isolated from bloodstream infections in critically Ill patients during a 5-year period (2012-2016). Eur. J. Clin. Microbiol Infect. Dis. 39, 863–869 (2020).
Article CAS PubMed PubMed Central Google Scholar
Futamura, A. et al. Factors associated with mortality in patients with catheter-related bloodstream infection: a multicenter retrospective study. Vivo 38, 3041–3049 (2024).
Article CAS Google Scholar
Poggi, C., Lucenteforte, E., Petri, D., De Masi, S. & Dani, C. Presepsin for the diagnosis of neonatal early-onset sepsis: a systematic review and meta-analysis. JAMA Pediatr. 176, 750–758 (2022).
Article PubMed PubMed Central Google Scholar
Gilfillan, M. & Bhandari, V. Biomarkers for the diagnosis of neonatal sepsis and necrotizing enterocolitis: clinical practice guidelines. Early Hum. Dev. 105, 25–33 (2017).
Article CAS PubMed Google Scholar
Jin, Y., Lan, A., Dai, Y., Jiang, L. & Liu, S. Development and testing of a random forest-based machine learning model for predicting events among breast cancer patients with a poor response to neoadjuvant chemotherapy. Eur. J. Med. Res. 28, 394 (2023).
Article CAS PubMed PubMed Central Google Scholar
Al-Matary, A. et al. Correlation between Bronchopulmonary Dysplasia and cerebral palsy in children: a comprehensive analysis using the national inpatient sample dataset. Children 11, 1129 (2024).
Emeriaud, G. et al. Executive summary of the second international guidelines for the diagnosis and management of pediatric acute respiratory distress syndrome (Palicc-2). Pediatr. Crit. Care Med. 24, 143–168 (2023).
Article PubMed PubMed Central Google Scholar
Papoff, P. et al. The starting rate for high-flow nasal cannula oxygen therapy in infants with bronchiolitis: is clinical judgment enough?. Pediatr. Pulmonol. 56, 2611–2620 (2021).
Article PubMed Google Scholar
Cruz, A. F. et al. Sustained vs. intratidal recruitment in the injured lung during airway pressure release ventilation: a computational modeling perspective. Mil. Med. 188, 141–148 (2023).
Article PubMed Google Scholar
Bliss, J. M. & Wynn, J. L. Editorial: the neonatal immune system: a unique host-microbial interface. Front. Pediatr. 5, 274 (2017).
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We sincerely thank the Neonatal Intensive Care Unit and the Medical Laboratory Department of a tertiary hospital in Changzhi City, Shanxi Province for their strong support. At the same time, we express our heartfelt gratitude to the department director and head nurse for their encouragement and assistance during this study.

Funding

Changzhi City Basic Research Program (Free Exploration Category) Project and Project No:JC202439.

Author information

Authors and Affiliations

Neonatal Department of Changzhi Maternity and Child Health Hospital, Changzhi, Shanxi, China
Yongqin Guo, Yingying Dou & Li Wang
Hospital Office of Changzhi Maternity and Child Health Hospital, Changzhi, Shanxi, China
Wenxia Song & Lihong Wang

Authors

Yongqin Guo
View author publications
Search author on:PubMed Google Scholar
Yingying Dou
View author publications
Search author on:PubMed Google Scholar
Wenxia Song
View author publications
Search author on:PubMed Google Scholar
Lihong Wang
View author publications
Search author on:PubMed Google Scholar
Li Wang
View author publications
Search author on:PubMed Google Scholar

Contributions

Y.Q.G.: Methodology, writing-original draft. Y.Y.D.: Data clean and analysis, Model validation. W.X.S.: Thesis guidance. L.H.W.: Project administration. L.W.: Supervision.

Corresponding author

Correspondence to Yongqin Guo.

Ethics declarations

Competing interests

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Ethical approval

This study was reviewed and approved by the Medical Ethics Committee of Changzhi Maternal and Child Health Hospital (Ethics Approval No: CZSFYLL2024.026) and complies with the ethical principles of the Declaration of Helsinki. As a prospective study, informed consent was obtained from all participants. During data collection and processing, all personal information was anonymized using coded identifiers to protect participant privacy.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information (download DOCX )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Guo, Y., Dou, Y., Song, W. et al. Machine learning and SHAP-based risk assessment of PICC-related bloodstream infections in premature infants at the time of clinical suspicion. Pediatr Res (2026). https://doi.org/10.1038/s41390-026-05049-6

Download citation

Received: 20 November 2025
Revised: 24 March 2026
Accepted: 06 April 2026
Published: 07 May 2026
Version of record: 07 May 2026
DOI: https://doi.org/10.1038/s41390-026-05049-6

Abstract

Background

Methods

Results

Conclusion

Impact

Similar content being viewed by others

A predictive model for PICC-related thrombosis in sepsis patients using XGBoost algorithm

Development and validation of a machine learning model for critical progression risk in pediatric severe community-acquired pneumonia

Interpretable machine learning model for early prediction of disseminated intravascular coagulation in critically ill children

Introduction

Methods

Study population

Sample size calculation

Assessment of variables

Diagnostic criteria

Data collection methods

Data preprocessing

Statistical methods

Machine learning algorithms

Results

Inclusion of pediatric patients

Feature selection using Lasso, Boruta, and RFE

Model development and performance comparison

Statistical comparison of model discrimination capabilities

Variable importance in the optimal random forest model

Optimal model interpretation based on SHAP

Discussion

PICC-CRBSI rate and microbial pathogenology

Clinical variables and their relationship with CRBSI

Predictive performance and model interpretation

Implications for clinical practice

Limitations and future research

Conclusion

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Ethical approval

Additional information

Supplementary information

Supplementary Information (download DOCX )

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links