Introduction

Sepsis is typically characterized by a dysregulated immune system response to bodily pathogens1,2. While it can impact individuals of all ages, sepsis poses a particularly grave threat to children3. In 2017, approximately 20.3 million new cases of sepsis and 2.9 million deaths occurred in children under five years of age, where the burden of sepsis is greater in low-resource settings, as well as rural and remote communities4,5,6. Prevention, early recognition, and treatment are critical, as sepsis can significantly damage various organ systems7. Following a diagnosis, one-third of children may experience a new disability or fail to fully recover their previous health-related quality of life after one to three months8,9,10. The World Health Organization has positioned sepsis prevention, recognition, and treatment as a global priority, and there remain significant opportunities to improve pediatric sepsis prediction through the appropriate design and implementation of digitalized clinical tools3,11, including data-driven algorithms, machine learning (ML), and artificial intelligence (AI).

Compared with adults and neonates, the design of technologies for predicting pediatric sepsis—the focus of this review—is uniquely challenging, with emergency departments potentially missing 7.1% of cases in children aged 1–19 years old12. The sources and development of infection, maturation-based differences, age-based clinical differences, commonalities with other conditions and syndromes, and diagnostic modalities contribute to these complexities. For example, compared to adults, the absence of standardized diagnostic criteria and age-adjusted cardiovascular and respiratory vital signs make it challenging to distinguish symptom presentation from heterogeneous clinical baselines in children with comorbidities, common febrile infections, and other syndromes such as Kawasaki disease and multisystem inflammatory syndrome in children13. Recognizing early signs of sepsis in children is also complicated by their limited ability to articulate their symptoms and their capacity to maintain physiological function until reaching an advanced stage of septic shock compared to adults14.

Among children, sepsis etiology significantly differs by age and varies globally15. In neonates, different bacterial and viral pathogens are more commonly acquired in-hospital within high-income countries and through vertical transmission from the vaginal tract, or community-acquired in low- and middle-income countries15. In older children, the most common infection sources are through the respiratory tract and bloodstream, and viral sepsis accounts for almost one-third of childhood sepsis cases8. The neonatal adaptive and innate immune response is also distinctly altered compared to the developing immune system of older children; it is further complicated by differences in clinical signs and biomarkers16,17,18. For diagnosis, many clinicians prioritize microbiological findings over nonspecific organ dysfunction criteria in neonates19,20,21, whereas multiorgan dysfunction is prioritized in pediatric and adult populations1,22.

The current deployment of data-driven technologies focuses mainly on adult and neonatal sepsis prediction23,24, neglecting the non-neonatal pediatric age group. This age group has also been neglected in previous reviews25,26,27,28,29,30,31,32,33,34,35,36. Indeed, existing systematic and scoping reviews often exclude young infants and adolescents27,28,29,30,31,33,37,38 while also excluding the engineering disciplines from their searches25,30,31,39,40 and limiting their scope to specific ML tools41. Reviews that have not excluded young infants have not been comprehensive, reviewing only one25 or three36,42 non-neonatal pediatric articles, thereby lacking the specificity needed to inform technology development for this intermediate pediatric age range. However, prior reviews highlight the potential for non-invasive predictive measures30 and the possibility of enhancing early detection of sepsis by integrating prediction models with clinical judgment31. They also identify barriers to implementation, including the risk of bias, overfitting, and the absence of standardized validation protocols32. Yet, beyond the absence of a comprehensive pediatric-focused review, it is also important to recognize that most of these prior reviews primarily report on quantitative performance metrics. These reviews do not consider how these systems are designed for the clinical environment or the factors that may facilitate or hinder effective clinician interaction, such as situational awareness, interface design, and the timing of an alert25,27,28,29,30,31,32,33,34,35,36. With the anticipated widespread use of AI in healthcare in the coming decade, a thorough analysis of the existing literature on data-driven technologies for predicting pediatric sepsis is essential for directing research efforts to ensure appropriate and effective integration for this age group.

The literature on pediatric sepsis prediction technologies is highly varied, with differences in study design, development approaches, implementation stages, and evaluation methods. This variability highlights the importance of assessing the potential scope and depth of the available pediatric sepsis prediction research. Importantly, in this scoping review, we address the limitations of prior reviews by exploring both the data-driven development aspects and the human factors considerations of sepsis, severe sepsis, and septic shock prediction systems in the pediatric population. Focusing on young infants, children, and adolescents, the objective of this review is two-fold: (1) to explore current design strategies and performance metrics of developed technologies across healthcare settings, thereby providing valuable insights for future research and development, and (2) to critically analyze human factors aspects within the literature to inform current and future integration needs and research directions that support safe and effective clinical use. Our research questions are specifically designed to address the previously identified gaps and objectives and inform future clinical applications of this technology: (1) What are the current design and performance characteristics of data-driven pediatric sepsis technologies? (2) How have developed systems considered human factors in their design for clinical use? (3) What specific challenges and research gaps need to be addressed to inform the future clinical implementation of these technologies?

Results

There were 36,757 records identified. After removing 9,586 duplicates and screening titles and abstracts, 163 records were assessed for inclusion, resulting in 27 articles in the final review (Supplementary Fig. 1). Most articles (23 [85%]) involved single-center datasets (Table 1). Five (19%) were multi-site studies: one combined datasets from multiple continents, including low-middle-income countries43, and four (15%) used more than one healthcare site within North America or Australia44,45,46. There were 17 (63%) articles using development datasets from the continent of North America, six (22%) from Asia, three (11%) from Europe, two (7%) from Africa, and two (7%) from Oceania. Included healthcare settings were mostly emergency departments (17 [63%]) and intensive care units (15 [56%]). Six (22%) articles included inpatient units. The completed extraction template is included in Supplementary Data 1, with the categorization and organization of the model features in Supplementary Data 2.

Table 1 Characteristics of the included studies

Endpoint definitions

More than half (20 [74%]) of the articles focused on predicting sepsis, compared to severe sepsis and septic shock. International consensus definitions and guidelines were used to define sepsis endpoints in most articles (16 [59%] of 27), including Goldstein et al.47, Rhodes et al.48, Singer et al.22, and Weiss et al.14, as well as criteria from Eisenberg et al.49, Scott et al.50, and Sepanski et al.51, with modifications (Table 2)52,53. Other definitions included treatment or diagnostic codes (8 [30%] of 27). Less frequent approaches were mortality or extracorporeal membrane oxygenation use, Delphi processes, and outcomes from another screening algorithm43,54,55,56. One article led to the most recent international consensus criteria for pediatric sepsis and septic shock: the Phoenix Sepsis Score43.

Table 2 Summary of endpoint definitions for pediatric sepsis, severe sepsis, and septic shock

We noted variability in how clinicians defined sepsis46,57,58. Changes in treatment practices compared between when the data was collected and when the model was developed were also noted, as were expected changes in diagnostic definitions requiring model re-validation59,60. One article mentioned the risk of including overtreated patients through the use of intention and treatment-based definitions61.

Model development approaches

More than half of the articles (16 [59%]) used logistic regression (Table 3)45,46,53,54,55,56,57,58,60,61,62,63,64,65,66,67,68. Others used gradient boosting (6 [22%])52,57,69,70,71,72 and random forest (4 [15%])55,59,60,62. Less common approaches were support vector machines or maximization (2 [7%])62,72, stacked or classification and regression tree modelling (2 [7%])43,62, tree augmented or Naïve Bayes (3 [11%])58,62,72, elastic net regularization (1 [4%])44, neural networks (1 [4%])55, and empirical derivations for an age and temperature-adjusted index score (1 [4%])73.

Table 3 Summary of development and performance outcomes reported for each article by sepsis-related endpoint

Patient demographics

We extracted the sample size based on the number of patients from most articles (19 [70%]), which had a median of 1238 (IQR 289–4403). Some reported the number of visits (3 [11%]), encounters (5 [19%]), or admissions (2 [7%]), which had a combined median of 35330 (IQR 2464–141510). Many articles (16 [59%]) reported median age statistics, ranging from 13.1 months to 10 years. One article specifically tested their models on different age groups, such as one-month to one-year-olds and those greater than one year old58. 22 articles (81%) reported patient sex, and 10 (37%) included race or ethnicity data. Ten articles (37%) referenced the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) guidelines43,44,45,46,54,58,59,61,64,65.

Most articles (20 [74%]) had a class imbalance in their datasets, with a low prevalence of sepsis-related outcomes (0.05% to 19.38%). Five articles (19%) used datasets with more than half of the patients classified with a sepsis-related outcome, with a prevalence ranging from 50.80% to 74.50% and sample sizes ranging from 65 to 66764,65,66,68,72. One article normalized a pediatric sepsis dataset to an adult sepsis dataset because of the lack of readily available data71.

Other patient risk factor characteristics were included in 16 (59%) articles. For example, one study focused on septic shock outcomes in a post-chemotherapy population63, and another included Kawasaki disease as a differentiating outcome from sepsis68. Other health and socio-demographic risk factors that were characterized across studies were those related to immunodeficiency or immunosuppression56,61, blastomas, bone marrow transplantation, aplastic anemia, organ transplantation57,67, healthcare utilization and diagnoses history44,45,69, hospital category (e.g., quaternary, dedicated pediatric, mixed) or resource setting43,46, surgical status59, complex and chronic conditions and comorbidities43,44,45,56,70, history of prematurity56,58, insurance status44,45,61,62, weight or malnutrition43,46,54, and medical technology needs43,44,45,59,61.

Predictive features

The minimum number of features used in any article was three; the maximum was 107. The average number of features used across all models was 17.3 (SD = 19.6). Of the 27 articles in this review, we extracted ranked feature information from 29 individual models among 17 (63%) articles. For models defining a sepsis endpoint, the top six higher-ranking predictive features were age, platelet count, temperature, immunocompromised status, heart rate, and serum lactate (Supplementary Fig. 2). For severe sepsis endpoints, the top six higher-ranking predictive features were mean heart rate, maximum diastolic blood pressure, age, maximum heart rate, minimum systolic blood pressure, and heart rate (Supplementary Fig. 3). For septic shock endpoints, the top six higher-ranking predictive features were serum lactate, platelet count, maximum heart rate, maximum mean arterial pressure, Glasgow Coma Scale score, and respiratory sequential organ failure assessment score (Supplementary Fig. 4).

When categorized broadly (Fig. 1), blood pressure had the highest category ranking among each sepsis-related endpoint, followed by heart rate features for severe sepsis and septic shock, and thermoregulation for sepsis (e.g., temperature, fever, and hypothermia). Hematologic laboratory values (e.g., hematocrit, white blood cell count, platelet count, and basophils), immunological risk factors (e.g., up-to-date immunization, oncological comorbidity, cancer relapse, and chemotherapy), oxygen requirement (e.g., oxygen saturation and fraction of inspired oxygen), and demographics (e.g., age and home zip code) were also among the higher-ranking feature groups across the sepsis-related endpoints. Laboratory values were used in sepsis and septic shock models but not in the severe sepsis models included in this review.

Fig. 1: Frequency-weighted feature group rankings based on the top 20 extracted features.
figure 1

The normalized rankings of combined feature groups used in models for pediatric a sepsis, b severe sepsis, and c septic shock, weighted by how often each feature group appeared within the three endpoints and listed from highest (top) to lowest (bottom) median value. The individual features included within the developed categorizations are available in Supplementary Data 2. The data contained in this graph is from 17 articles and 29 models reporting ranked features for pediatric sepsis, severe sepsis, and septic shock endpoints, with some models including more than one sepsis-related endpoint.

Most articles without ranked features had less than 20 total features except one58. One article was ranked for Kawasaki disease in differentiating from sepsis68. When examining the frequency of the remaining top 20 feature groups from all developed models, blood pressure, heart rate, and thermoregulation were the most frequently occurring among each sepsis-related endpoint (Fig. 2). Hematologic laboratory values were the highest occurring laboratory feature group for sepsis and septic shock, followed by lactate. Electrolyte laboratory values, perfusion (e.g., capillary refill), parent and caregiver factors, and anthropometric and nutritional factors only appeared in sepsis modelling approaches. The renal function indicator feature (i.e., urine output) only appeared in septic shock modelling approaches.

Fig. 2: Overall frequency of categorized features for pediatric sepsis, severe sepsis, and septic shock endpoints.
figure 2

The frequency of each feature category is listed in descending order from highest to lowest occurrence. The individual features included within the developed categorizations are available in Supplementary Data 2. All articles with 20 or fewer features are included. For articles with more than 20 features, only the top 20 are included. Articles with a ranking for Kawasaki Disease vs Sepsis (n = 1) and articles with more than 20 non-ranked features (n = 1) are excluded from this figure. The “Biomarkers: Genes” category was only used in two articles, accounting for 100% of the model features when used, and so is only counted once for septic shock and twice for sepsis in this figure.

Articles mentioned feature limitations such as having routine or non-routine access to laboratory or diagnostic information44,45,46,53,59,69,70,74, differences in handling or excluding missing features or documentation45,57,58,62,65, and concerns about human error43,44,69,73. We also noted slightly different language for similar features (e.g., “perfusion” and “capillary refill”), feature exclusion due to reliability concerns63, including only the most concerning feature values within a time range58, using qualitative variables65, and not knowing a patient’s treatment time74.

Performance metrics

The average area under the receiver operating curve (AUROC) for sepsis endpoint logistic regression-based models developed into a score-based tool was 0.84. For the linear and tree-based sepsis models, the average AUROCs were 0.81 and 0.85, respectively. For the severe sepsis models, the average AUROCs for the linear and tree-based approaches were 0.74 and 0.84, respectively. For septic shock models, the average AUROC for logistic regression-based models developed into a score-based tool, linear models, and tree-based models was 0.81, 0.83, and 0.89, respectively, with the latter score heavily weighted by two articles with AUROCs above 0.90 at multiple prediction time points57,72.

Few articles (5 [18%]) reported the area under the precision-recall curve (AUPRC)43,61,62,69,72. Three articles (11%) reported AUPRCs less than 0.50 for sepsis, severe sepsis, and septic shock, using stochastic gradient boosting, regression-based scoring tools, or random forest methods43,62,69. Two articles (7%) reported AUPRCs higher than 0.90 for a logistic regression-based scoring tool and gradient boosting model for sepsis and septic shock61,72. Most articles reported sensitivity (20 [74%]) and specificity (19 [70%]). Some reported positive predictive value (16 [59%]), negative predictive value (12 [44%]), positive or negative likelihood ratios (4 [15%]), and F1 scores (3 [11%]).

Automation type

We categorized nine articles (33%) as “analysis automation,” which is the automation of algorithms or other calculations to inform clinicians about certain sepsis criteria being met53,55,57,62,69,70,71,73,74. We categorized 18 (67%) articles as “decision automation,” in which the article described more details about how the technology provides additional information to support sepsis suspicions, stratify risk or triage levels, identify workflow actions, or achieve greater situational awareness43,44,45,46,52,54,56,58,59,60,61,63,64,65,66,67,68,72. All articles included manual provider-entered information within an electronic health record. Two articles (7%) involved “acquisition automation” through collecting data without human interaction75, such as continuous physiological monitoring55,60.

Prediction timings

The prediction timings ranged from the onset of the sepsis-related condition52,71, within five to seven seconds after initial triage69, and between two and 12 hours earlier52,55,70,73. Other prediction timings were within 24 hours60,61,67 and up to 24 hours before sepsis-related outcome onset57. For eight (30%) articles, clinical suspicion was first required to initiate the model and provide a result44,45,46,58,59,64,65,66,72. The four (11%) remaining articles were identified as being used during a patient encounter43,62,63,68.

The prediction timings also ranged with respect to other design and performance elements (Table 3). For example, one XGBoost model that could predict the onset of septic shock 24 hours in advance across over 1200 patients with hematological malignancies in the inpatient unit, with a median age of 58 months using 24 features, had a reported AUROC of 0.9357. One decision tree ensemble model that could predict the onset of severe sepsis four hours in advance across over 9400 patients in the inpatient unit, with a median age of 10 years using seven features, had a reported AUROC of 0.71852. Comparatively, the Phoenix Sepsis Criteria, a regression-model informed integer-based scoring tool using up to 13 variables to identify sepsis and septic shock across over 200000 patient encounters in high- and low-income healthcare settings, with a median age of 2.6–3.7 years, had a reported AUROC between 0.71–0.96 and an AUPRC between 0.14–0.4843.

Interface and interaction design

Few articles described specific details related to how clinicians would interact or interface with the developed model or alerting systems. Two articles (7%) described a two-tiered notification system: an “alert” tier, with a score greater than or equal to 15, to prompt a team-based bedside evaluation, or an “aware” tier, with a score greater than or equal to 30, to increase situational awareness, including highlighting the score in a bright colour upon opening the patient’s electronic health record chart61,67. The purpose was to reduce false alerts and unnecessary and unplanned team huddles while ensuring children who met the “aware” threshold were discussed during rounds, handoffs, and planned huddles61,67. The screening scores were visualized on patient lists and on-screen banners, updated in real-time to prompt a huddle at the bedside67. The clinical variables and their scoring values contributing to the overall score were also displayed, including provider-specific steps when the “alert” tier was activated to call a sepsis huddle. Clinician feedback from an urban quaternary children’s hospital indicated higher compliance with the sepsis huddle process and minimal alert fatigue, but this was partly attributed to pre-implementation and pilot testing67.

Other articles mentioned a tier-like system design to identify risk levels or support triage of the most at-risk children54,56, support risk stratification43,46,52,56,60, identify low-risk post-chemotherapy children63, and identify children who are at a higher risk of hypotensive septic shock45. One article combined their data-driven sepsis decompensation risk score with a red or yellow stoplight colour to indicate patient status53, available to all emergency department staff and clinicians74. The red or yellow stoplight colour specified how often a bedside huddle was needed for the patient, which was every 30 or 60 min, respectively74. A tracking board was used to alert all clinicians with a checkerboard flag displayed if the sepsis score indicated a child at risk, suggesting the need for a bedside assessment, with the most recent risk score shown next to the patient on the tracking board for increased situational awareness74.

Clinical implementation and decision-making

Three (11%) technologies were implemented electronically53,67,74, one (4%) non-electronically46, and one (4%) in silent mode61. One-third (9 [33%]) of the articles mentioned specific human factors considerations regarding implementation44,45,46,58,59,61,67,72,74. This included descriptions of implementation within sepsis workflows46,61,67, the impact of the technology on team-based activities, including huddles and team situational awareness61,67,74, and considerations for minimizing alert fatigue61,67. Two articles found positive impacts on huddle compliance67,74, and after electronic implementation, one found a significant improvement in team-based assessments, situational awareness, patient huddles, and communication, as well as increased fluid bolus and antibiotics administration by three hours post-activation74. However, clinicians were not always blinded to the tool, which may have biased their decisions to diagnose more or fewer patients, or the study was not powered to determine the impact on patient outcomes46. Additionally, one study mentioned that early treatment or transfers may inadvertently perturb the measured outcome or lead to overusing healthcare resources67.

For decision support, two articles differentiated sepsis from other conditions, such as non-infectious systemic inflammatory response syndrome or Kawasaki disease59,68. A few articles (9 [33%]) mentioned that users would interact with their system after clinician-initiated suspicion44,45,46,58,59,64,65,66,72. The predictive features would be entered into a smartphone-based calculator application54,66, a web calculator59,72, or an electronic health record43,44,45 once a provider thought a child might develop sepsis or septic shock. The system would provide a probability risk score, or other information would be returned to support or refute their judgment.

While at least one of the models was not ultimately designed to be clinically implemented46, none of the articles using complex data-driven modelling explicitly described the explainable aspects of their technologies if they were to be implemented. Notably, one article explained that “black box” models do not support direct inferences about how specific features impact outcome predictions59. Given the diversity of patient characteristics and demographics, features, prediction timings, healthcare settings, outcome definitions, and performance outcomes, we did not identify detailed considerations for providing prediction uncertainty information or evaluating clinician interpretations of data-driven predictions beyond limited measurements of huddle and treatment outcomes.

Potential biases

We noted risks of bias potentially impacting testing and validation, which may introduce uncertainties for appropriate clinical use. For example, selection bias was reported in one screening tool’s performance68, and feature data may be collected at different times between different technologies70. As some models were only applied to patients for whom clinicians suspected sepsis, this may introduce bias in performance outcomes44,46. Some patient groups, such as children with oncological diseases63, were more heavily represented in some datasets, and the model may perform better or worse if used in a different patient group52,54,55,56,60,61. One model was developed and tested on the same patient cohort65. Some models were intentionally biased for maximum sensitivity to alert clinicians earlier than existing technologies55, and datasets may not have been specifically collected for data-driven algorithmic or AI prediction purposes52,56,57,59.

Discussion

This scoping review is the first to comprehensively identify and summarize the current design and performance of data-driven technologies for predicting pediatric sepsis-related endpoints in healthcare settings, including aspects of human factors related to clinical interaction and implementation. Our findings encompass 27 studies exploring endpoint definitions, prediction approaches, datasets, feature combinations, performance metrics, automation types, prediction timings, interface elements and implementation outcomes. There are crucial research directions to consider for the future development and integration of data-driven technologies for pediatric sepsis prediction, particularly in terms of feature use, data inclusivity with standardized reporting, and human factors. Research focusing on these aspects is essential to facilitate effective comparisons among technologies, support performance outcomes across heterogeneous patient cohorts, and enable safe clinical integration that augments—rather than replaces—clinical decision-making.

Higher-ranking individual predictive features for pediatric sepsis, severe sepsis, and septic shock generally included cardiovascular vital signs (e.g., heart rate, mean arterial pressure, and capillary refill), laboratory values (e.g., lactate and platelet counts), respiratory vital signs (e.g., oxygen saturation and PaO2:FIO2), and neurological vital signs (e.g., Glasgow Coma Scale score). These outcomes align with the latest pediatric sepsis and septic shock international consensus criteria, which can guide future model development1. However, it is also important to recognize that the identified lower-ranking or less frequently occurring features in this review should not necessarily be seen as unimportant for pediatric sepsis outcomes—this may simply reflect the overrepresentation of high-income healthcare settings in which models were more commonly developed. Lower-ranked features may also have resulted from fewer articles that reported ranked features and the lack of investigation into potentially underappreciated features across models, such as parental or caregiver concerns76, reducing their frequency of occurrence.

Among the sepsis endpoint models, age was a notably high-ranking feature. Indeed, identifying abnormal vital signs often includes age-based thresholds in pediatric sepsis consensus definitions, such as for mean arterial pressure and creatinine1,48. In this review, age was an individual predictor for sepsis-related endpoints, with included models identifying older children (e.g., five-year-olds) or increasing age as being at higher risk of sepsis60,62,69 and higher risks for one-year-olds than the very youngest patients60. These findings could be due to challenges with underreporting or under-recognition of vital sign abnormalities among older infants and preschoolers62 and to how abnormal vital sign thresholds change with age69. There is a potential for future research to consider age as a predictive feature in data-driven models, with further investigation into its importance.

In the few models with a severe sepsis endpoint, none included laboratory values as predictive features. While severe sepsis is often referred to as organ dysfunction (i.e., cardiovascular or respiratory or two other organ system dysfunctions) in addition to a sepsis diagnosis1, laboratory values for sepsis may already be confirmed before these models are intended for use. The utility of a severe sepsis model may be for identifying deterioration and increasingly abnormal organ dysfunction beyond an initial sepsis diagnosis. However, the term “severe sepsis” has been recently noted as redundant to sepsis and septic shock in the latest international consensus criteria1, suggesting that future model development should focus on the newest sepsis and septic shock definitions.

In contrast, laboratory values were the top two predictive features for septic shock models. Serum lactate emerged as the highest-ranking individual feature on the basis of median values, followed by platelet count. Lactate is a clinical indicator for hemodynamic instability in children and, if infected, indicates a greater risk of multiorgan dysfunction syndrome and mortality77. Lactate is also included in the latest pediatric septic shock criteria1. Yet, controversy surrounds this measurement as it may not always be available in lower-resource settings1, and children with normal lactate levels are not necessarily excluded from a sepsis diagnosis1,78. Low platelet counts are most clearly understood in the pediatric sepsis literature as a “consumptive coagulopathy” leading to simultaneously increased bleeding and clotting79. Other research suggests that low platelet counts may also be a good predictor of 28-day mortality from septic shock children80. However, similar to lactate, acquiring platelet counts in all settings may not be possible. Scoring tools with built-in redundancies if laboratory values are unavailable—or models that do not rely on laboratory values—may have greater utility and practicality in lower-resource environments43,44,62,73, and this consideration for resiliency in response to resource availability should be emphasized in future model development.

Patient datasets varied widely across studies with respect to the volume and balance of sepsis and non-sepsis outcomes, as well as patient demographics. This is a crucial challenge to ensuring the appropriate representativeness of patient groups in model development and for classification-based approaches. Current trends in research predominantly originate from North America and Asia, focusing on single-site studies, with the United States of America contributing the majority of pediatric datasets43. Notably, only four studies have aggregated data from multiple healthcare sites, one of which included low-to-middle-income settings43,44,45,46. This lack of inclusion highlights substantial limitations with respect to model development and technology access in remote and rural communities, Indigenous communities, and low-to-middle-income countries, which are disproportionately affected by sepsis4,5,6. These areas are frequently overlooked in the design, validation, and implementation of predictive technologies, resulting in a scarcity of studies representing these underrepresented environments43,44,45,54.

Furthermore, there is a risk of over-representing the same patient populations across the various studies, contributing to the prevalent issue of class imbalance within the available datasets. In some instances, multiple approaches have been explored using identical datasets from single-site studies, and both multi-site and single-site studies have used the same datasets, including those from the Chinese pediatric intensive care dataset81. Future research should address the issue of class imbalance by employing appropriate model evaluation methods tailored for imbalanced datasets, implementing strategies to balance the datasets, and refining algorithmic approaches to manage class disparities better. Ultimately, there is an urgent need for more international collaboration and diverse data collection practices to increase patient representativeness, alongside a focus on innovations that democratize access to pediatric sepsis prediction technologies in underserved regions. Without these advancements, relying on individual healthcare sites to independently develop and validate their own systems risks perpetuating disparities in under-resourced settings.

Data-driven pediatric sepsis prediction tools should also work effectively with children of all ages. Current model development shows wide variations in the age ranges of patients in the studies reviewed. Additionally, some articles specifically included or excluded certain pediatric age ranges55,56,68,72 and reported small sample sizes of children in different age groups55,58,62,64,72. This is significant because pediatric sepsis involves age-based differences in symptom manifestation26. Excluding different pediatric age groups runs the risk of model overfitting, leading to technologies that may not equitably apply to all children, complicating performance comparisons among articles and warranting caution for widespread clinical implementation. Future research should prioritize collaboration and develop robust dataset-sharing methods while exploring opportunities to ensure the optimal performance of these data-driven prediction models for all ages82.

In addition to concerns about pediatric age ranges, reporting and including patient sex, race, and ethnicity, especially in studies with relatively large sample sizes, is inconsistent. Only a few articles referenced the TRIPOD reporting guidelines, and some articles reported neither patient sex nor race or ethnicity. Limited transparency about this aspect of developing data-driven prediction tools may result in hidden discrimination, such as prioritizing treatment actions in children from different socio-demographic backgrounds. In fact, considerable concerns have been raised about the risks of codifying racism into data-driven prediction modelling for pediatric sepsis and exacerbating inequalities in the medical care of children83. In one article, race was included as a predictive feature in a top-performing model62, whereas another study that included Aboriginal descent as a prediction feature reported that their algorithm performed similarly to when this feature was removed in a smaller model46. The potential impacts on diverse pediatric populations have yet to be fully understood, but ensuring that all predictive models perform equally among racialized and historically disadvantaged groups must be prioritized in future research, ultimately providing equitable healthcare outcomes once implemented. The comprehensiveness of reporting patient demographics in future research should follow the most updated best practices, such as the latest TRIPOD + AI guidelines84, and ensure strong calibration to reduce the uncertainties of unequal and inequitable model performance among diverse pediatric patients.

As a data-driven, standardized definition of pediatric sepsis has only been recently published1,43, the endpoint definitions among the reviewed models varied over time. This finding was unsurprising, given the variability in clinical definitions from international research surveys85. Definition inconsistency was a substantial hindrance to comparing quantitative outcomes of model performance and potential outcomes, given modifications and variable use of treatment and diagnostic codes, which risk clinician bias. Adopting the latest international consensus criteria holds significant promise for standardizing future development1,43. It is crucial for future research to consolidate the use of pediatric sepsis definitions for data-driven prediction tools in ways that optimize equitable outcomes for all children, and to support comparative analyses among future prospective studies for this high-risk population.

Despite the development of some higher-performing models in this review, it is unlikely that future developments will achieve perfectly generalizable performance in predicting pediatric sepsis or septic shock across diverse healthcare environments and patient populations. Consequently, human factors will be crucial in supporting clinical users in understanding the functionality of these models—how they work or fail to work—and in guiding when and how to incorporate this information into decision-making processes alongside other indicators for determining pediatric sepsis prognoses and preventative measures. From the reviewed articles, several aspects of current model development and design will influence how clinicians engage with these technologies in a healthcare environment, contributing to a more holistic view of understanding overall system performance.

The type of automation implemented will influence workflows, and the timing of predictions could be crucial in whether a prediction model assists or complicates sepsis-related decision-making24. Early predictions generated through analysis automation may enable a greater frequency of timely and effective interventions. As noted in this review, decision automation has the potential to prompt or guide clinical actions both before and following the onset of clinical suspicion of sepsis. Furthermore, automation in data acquisition may decrease the risk of human error in data entry. Additional research is necessary to identify the most effective automation strategies for pediatric sepsis technologies, including the possible integration of diverse approaches within sepsis workflows that enhance clinical judgment and situational awareness, ultimately leading to improved patient outcomes compared with current practices.

The variety of prediction timings highlights the lack of a standardized target timepoint for optimal use, which may differ depending on the healthcare context and the specific decisions involved. For example, predictions made several hours before the onset of sepsis may be more beneficial for determining the timing of patient transfers to specialized pediatric centres or better-equipped facilities. In contrast, shorter prediction windows may be beneficial for prompting earlier clinical interventions to prevent deterioration. This review indicates that prediction timings can range from during a patient encounter to as much as 24 h before the onset of septic shock. The timing of a prediction and its associated risk introduce an element of uncertainty regarding integration into decision-making processes, particularly when other vital signs or indicators may not reflect sepsis-related risks or decompensation. Consequently, future research must focus on how clinicians will navigate the uncertainties tied to predictions of a potential septic state in children, as well as how they might respond to and act upon different levels of risk prediction. This should encompass considerations for effectively displaying or communicating such early prediction information, along with its limitations, to support safer and more effective integration into practice.

Research has shown that model interpretability is important for the acceptance of data-driven technologies among clinicians86. However, a recent systematic review and meta-analysis of 106 experiments on human-AI systems revealed no significant impacts on human-AI effectiveness by including explanations. Instead, human-AI performance was better when the human knew when to trust their judgment or the prediction, and vice-versa87. In the context of pediatric sepsis, the variability in patient characteristics among training and testing datasets introduces uncertainties when clinicians make decisions for patients for whom a potential model has not encountered. Understanding whether the training and testing datasets include representative patient characteristics—including the healthcare environments and available features—can affect perceptions of fairness in model outputs. This highlights the ethical considerations that clinicians must navigate to use model outputs responsibly88,89,90. Future research should focus on practical strategies to assist clinicians and clinical teams in determining how and when to trust their judgment alongside prediction outputs. These strategies should mitigate cognitive overload while enhancing situational awareness when these technologies are used in clinical decision-making.

This review aimed to capture the implementation stage of data-driven pediatric sepsis prediction technologies. However, some articles did not focus on identifying significant differences in outcomes, such as mortality rates, and lacked detailed descriptions of implementation factors. Additionally, some studies primarily evaluated performance metrics and alert firing rates. Among the articles reporting on implementation outcomes, data-driven pediatric sepsis prediction technologies may benefit clinical workflows in emergency departments and inpatient settings in high-income countries. These technologies can improve situational awareness with shared displays that communicate model outputs as patient decompensation risk scores, employing colour coding to facilitate understanding74 while imposing minimal alert fatigue on clinicians67. Furthermore, their implementation may reinforce compliance with workflow processes, such as bedside huddles and timely administration of antibiotics and fluids67,74. However, more research involving collaboration among clinicians, data scientists, and human factors engineers is needed to explore the factors that either promote or hinder the effective implementation of these technologies in various clinical environments, including the digital formats in which they are presented. Additionally, future longitudinal studies are essential to assess the experiences and impacts of model design and implementation on clinical teams and patient stakeholders. Such research will help identify the potential influence on decision-making and clinician workflow, as well as the possibilities for improved patient outcomes and new risks for human error compared with current practices and screening tools.

While our search strategy was broad, this approach to a scoping review for pediatric sepsis prediction was necessary for multiple reasons: inconsistencies in data reporting, endpoint definitions, patient characteristics, and the range of development strategies. The significant heterogeneity among the included studies subsequently limited our ability to compare performance outcomes systematically. Notably, most studies were retrospective, and some included small sample sizes. However, the reviewed studies, including some of those that were excluded (Supplementary Table 1), can inform eligibility criteria in future systematic reviews and meta-analyses. Future research can also address the identified limitations for future reviews through standardized reporting and analysis with specific patient cohorts—including high-risk segments, social determinants of health, and specific age groups. Aligning future endpoint definitions with the 2024 consensus criteria will also increase the capacity for rigorous and systematic analyses in this critical area.

Overall, this comprehensive scoping review highlights the heterogeneity of data-driven approaches for predicting sepsis-related outcomes in children and the challenges in translating current research into clinical practice. While promising approaches for improving pediatric sepsis prediction exist, they must be pursued with an awareness of the nuanced disparities across healthcare settings and patient demographics. In working towards enhancing patient outcomes in this vulnerable population, the 2024 international consensus criteria for pediatric sepsis and septic shock1 and standardized reporting practices can guide future research. Furthermore, there is a need to integrate clinical user perspectives and prioritize human factors in both the development and evaluation of predictive systems, specifically for prediction timings, approaches to automation, and supporting clinical judgment. The incorporation of this design component in future research will be critical to advancing the safe and effective implementation of data-driven pediatric sepsis prediction technologies.

Methods

Search strategy

The review adhered to the PRISMA-ScR guidelines91. Following the published protocol92, we searched the ACM Guide to Computing Literature, CINAHL, Embase, IEEE, PubMed, SCOPUS, and Web of Science for peer-reviewed studies published until March 1st, 2024. The search strategy for each database is outlined in Supplementary Table 2. Our search used keywords related to sepsis, the pediatric population, and digital technologies. We also used snowball sampling to find additional articles. The overall search was a complex process due to heterogeneity among the articles regarding development approaches and objectives, sepsis-related outcome definitions, patient demographics, reported performance characteristics, and outcome measures for implemented systems. We worked with an Information Specialist at the University of Waterloo (KM) to ensure a thorough and robust search strategy despite the lack of standardized reporting and data.

Selection criteria

The inclusion criteria encompassed articles published in English about data-driven approaches to predict sepsis-related outcomes for children aged 90 days to 21 years old, with no restrictions to country of origin or being commercially available. Articles were only included if there was a digital aspect in their current or future use within a computerized system, such as an electronic health record, bedside monitor, or mobile or online application. Studies focusing solely on sepsis mortality risk were excluded, as this review’s primary focus was on predicting sepsis, severe sepsis, and septic shock before mortality unless mortality outcomes were used as a defining criterion.

Screening, data extraction, and analysis

Two members (RT and JG) independently completed the abstract and title screening and the full-text review, with support from a third member (JK). A fourth member (KM) resolved disagreements. One member (RT) initially completed the extraction, verified by a second member (JG or JK). Study quality was evaluated by two members (RT and JK) using a simplified version of the Oxford Centre for Evidence-Based Medicine levels of evidence on the basis of the study design93.

We extracted general article information, the prediction task and automation type75, the prediction method, patient demographics, prediction features, validation measures, outcomes related to implementation, and human factors considerations. We extracted details about the best performers for articles outlining multiple approaches, as identified by the authors of each article when applicable. We attempted to contact all corresponding authors for missing information, but only ten responded to provide clarifications or confirm the availability of additional data. We summarized the data in tables using descriptive statistics and plotted the predictive features in graphical format using R (v 4.1.2). We calculated normalized frequency-weighted rankings to analyze the features when the articles included data on feature importance, coefficient values, or odds ratios. We visualized the weighted rankings for the top 20 most predictive individual features of the developed approaches—organized by sepsis, severe sepsis, and septic shock endpoints—by multiplying each feature’s rank by occurrence across all studies with ranked features. Features that included multiple related aspects were separated and given the same ranking. For example, if “fever or hypothermia” was listed 10th (Rank 10), it was separated into “fever” and “hypothermia,” both with the same rank of 10. We also categorized features and visualized weighted rankings of these grouped features.