Introduction

Enteral nutrition (EN) represents a nutritional support therapy that supplies nutrients via the gastrointestinal tract. The guidelines1,2 explicitly indicate that EN is the preferred nutritional intervention approach for those presenting nutritional risks and/or malnutrition, provided that the gastrointestinal tract is functional and can be utilized safely. EN can be delivered either orally or through tubes, such as nasogastric tubes, nasoenteric tubes, gastric or jejunal stomas, directly into the patient’s gastrointestinal tract to fulfill their nutritional requirements. Aspiration refers to the entry of food (or non-food) substances, oral secretions, gastroesophageal reflux contents, etc. into the airways below the glottis3. Research has revealed that the incidence of aspiration resulting from nasogastric feeding in patients ranges from 15 to 50%4,5. Aspiration pneumonia is the most severe complication of aspiration6, with a mortality rate as high as 58.33%, an average hospital stay extension of 9.43 days, and an average increase in hospital costs of $9,100.297. Thus, the adverse consequences caused by aspiration have increasingly become a crucial issue influencing the feeding safety of hospitalized patients undergoing nasogastric feeding. Predictive models are founded on mathematical models and by integrating multiple predictive factors, they estimate the probability of an event outcome8. In recent years, numerous researchers have developed risk prediction models for aspiration in patients with nasogastric EN. Nevertheless, these models lack comprehensive and systematic comparative studies in aspects such as the construction process, performance evaluation, and data sample bias. It remains unclear whether these predictive models can be accurately and reliably applied in clinical practice. Therefore, this study conducts a systematic review of the risk prediction models for aspiration in patients with nasogastric EN, with the aim of providing guidance for clinical medical staff in screening aspiration-related risk prediction models, and promoting the extensive application of predictive models in the risk management of aspiration during nasogastric EN, enhancing the scientific and accuracy of clinical decision-making and providing safer and more effective medical services for patients.

Methods

The study protocol has been registered with PROSPERO (registration number: CRD42024594672).

Inclusion and exclusion criteria

Inclusion criteria

(1) The study subjects were patients aged ≥ 18 years and receiving EN support through nasogastric tubes, including nasogastric or nasoenteric tube routes. (2) The study content was the development and/or validation of the aspiration risk prediction model for patients with nasogastric EN. (3) The study design was a prospective or retrospective study. (4) The outcome indicator was respiratory aspiration after EN support through nasogastric tubes. The diagnosis of aspiration referred to the diagnostic criteria for aspiration in the expert consensus of the North American Association9. The specific criteria included: Coughing, wheezing, accelerated breathing rate, cyanosis of the lips, and discovery of residuals of EN liquid in the oral or nasal cavity occurred in patients during nasogastric feeding; Suspected EN liquid was aspirated during sputum suction, and the value measured by a blood glucose meter exceeded 11.1 mmol/L; Gastric pepsin was detected through lung CT examination or determination of bronchial secretions. (5) The language was Chinese or English.

Exclusion criteria

(1) duplicate publications; (2) case reports, review articles, conference abstracts, etc.; (3) articles without access to the original text.

Search strategy

We conducted a comprehensive search of both Chinese and English databases, including the China National Knowledge Infrastructure (CNKI), Full Text Database of Chinese Medical Journals (Yiigle), SinoMed, Wanfang, PubMed, Embase, Medline, Web of Science, from the database’s establishment to May 10, 2025. We used a combination of subject terms and free words for the search. The search terms used were “enteral nutrition”, “gut nutrition”, “intestinal nutrition”, “nasal feeding”, “nasogastric feeding”, “respiratory aspiration”, “aspiration”, “risk factor”, “forecast”, “predict”, “predict model”, "risk prediction model", “forecast model”, “nomogram”, “alignment diagram”, “nomographic chart”, and “alignment chart”. The specific search strategy is provided in Supplementary Material (Appendix A1). Additionally, we obtained more relevant studies through the references of related studies.

For the systematic review, we adopted the PICOTS system, which is recommended by the Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies (CHARMS) checklist10 and is conducive to formulating the purpose, search strategy, and inclusion and exclusion criteria of the review11. The key items of our systematic review are as follows:

P (Population): Hospitalized patients aged ≥ 18 years receiving EN support through nasogastric tube or nasointestinal tube.

I (Intervention model): Aspiration risk prediction model for patients with nasogastric EN that were developed and published.

C (Comparator): No competing model.

O (Outcome): The outcome focused on respiratory aspiration.

T (Timing): The outcome was predicted after evaluating basic information, scoring scale results, and laboratory indicators.

S (Setting): The intended use of the risk prediction model is to predict the respiratory aspiration in patients with nasogastric EN, facilitating the implementation of preventive measures to prevent adverse events.

Study screening

All the retrieved literature was initially subjected to duplicate removal through EndNote 20 reference management software. Subsequently, an initial screening was conducted by reading the titles and abstracts of the literature in accordance with the inclusion and exclusion criteria. A secondary screening was then carried out by reading the full text and evaluating the methodological section. The entire literature screening process was independently performed by two researchers. In the event of a disagreement between the two, a third researcher would adjudicate. If the information of the included literature was incomplete or questionable, attempts were made to contact the original authors to obtain relevant information before determining whether to include it.

Data extraction

A data collection form was developed based on the prediction model systematic review data extraction toolkit (CHARMS) proposed by Moons et al.10. The form includes items such as the first author, publication year, country, study design, participants, incidence of aspiration, diagnostic criteria for aspiration, model development method, variable selection, number of models, sample size, missing data handling, continuous variable processing method, predictor variables, model performance, calibration method, validation method, and model presentation. Data extraction was conducted by one researcher, and then checked by another researcher to ensure accuracy and consistency.

Quality evaluation

The Prediction Model Risk of Bias Assessment Tool (PROBAST) was employed to assess the risk of bias and applicability of the literature for studies involving diagnostic or prognostic multivariable prediction models12. The risk of bias of the model was evaluated in four domains, and the applicability was evaluated in three domains. The assessment method for applicability was similar to that for the risk of bias. The evaluation results of each domain were classified into three grades: "low," "high," and "unclear." The risk of bias and applicability of the included literature were independently evaluated by two researchers. In the event of a disagreement, a third researcher was sought for adjudication. (1) Bias assessment: The bias assessment was carried out by evaluating four domains: the study population, predictor variables, outcomes, and data analysis. It encompassed a total of 20 questions, with 2 to 9 questions in each domain. Each question was answered as "yes," "probably yes," "no," "probably no," or "no information." If any domain was answered with “no” or "probably no," it was considered that the risk of bias in that domain was high. If all the questions in each domain were answered with “yes” or "probably yes," the risk of bias was low. (2) Applicability assessment: The applicability assessment included three domains: the study population, predictor variables, and outcomes. Each domain was evaluated as "good applicability," "poor applicability," or "unclear." If the applicability in any domain was poor, the overall applicability was considered low. Based on the above characteristics, the best model was selected. PROBAST can not only evaluate individual models but also compare multiple models.

Statistical analysis

Meta-analysis was performed using Stata 17 software on the predictive factors included in the model and the area under the curve (AUC) values of the validated models. The Q test and I2 were employed to evaluate the heterogeneity of the studies. With P > 0.1 or I2 < 50% regarded that the heterogeneity among the studies was small, and a fixed-effect model was chosen. P ≤ 0.1 or I2 ≥ 50%, it was considered that the heterogeneity among the studies was considerable, and sensitivity analysis was conducted. If the heterogeneity could not be eliminated, a random-effect model was selected. With P < 0.05 was considered statistically significant. Egger’s test was used to identify publication bias, with p > 0.05 indicating a low likelihood of publication bias.

Results

Study selection

Figure 1 presents the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) 2020 flowchart, which depicts the search process and results. We retrieved 559 relevant articles from relevant databases through systematic search, and after deduplication using EndNote 20, there were 503 remaining articles. Based on the inclusion and exclusion criteria for literature, we read the titles and abstracts of the articles and retained 29 of them. We then read the full texts of the articles and excluded 11 that did not meet the research objectives, 7 that did not build a model, and finally included 11 articles13,14,15,16,17,18,19,20,21,22,23 for a total of 22 predictive models.

Fig. 1
Fig. 1
Full size image

Preferred reporting items for systematic reviews and meta-analyses (PRISMA) flowchart of literature search and selection.

Study characteristics

This study included literature published from 2021 to 2024, all of which were from China (2 articles were published in English), and all had a retrospective study design. The study found that the incidence of aspiration in patients with nasogastric EN was 9.46% to 49.87%. The basic characteristics of the included literature are presented in Table 1.

Table 1 Basic characteristics of the included studies.

Models characteristics

Eleven studies13,14,15,16,17,18,19,20,21,22,23constructed a total of 22 aspiration risk prediction models. In model construction, one study 16 used machine learning and generalized linear regression methods to develop separate prediction models, three studies17,19,23 used machine learning and Logistic regression methods to establish separate prediction models, and the remaining studies only used Logistic regression methods to establish prediction models. In terms of the handling of continuous variables, five studies 13,16,17,18,19 maintained the continuity of continuous variables, while six studies 14,15,20,21,22,23 converted continuous variables into categorical variables. In terms of variable selection, two studies 16,23 used machine learning to select variables, while the remaining studies used univariate analysis to select variables; Table 2 shows the detailed information. The AUC of the prediction models included in this study ranged from 0.809 to 0.992, the sensitivity ranged from 64.9 to 95.2%, the specificity ranged from 75.8 to 97.4%, and the predictive performance of all prediction models was good (AUC > 0.8). Nine studies13,14,15,16,18,19,20,21,22 reported discrimination and calibration; In terms of model validation, five studies14,18,19,21,23 conducted external validation, ten studies13,14,15,16,17,18,19,20,21,23 conducted internal validation, and the main method used was Bootstrap. Among them, four studies14,17,18,21 did not report the internal validation method, and one study22 only developed the model but did not validate the model. The main presentation forms of the models were the nomogram analysis score (n = 7), followed by the classification and regression tree (CART) model (n = 2), mathematical formula (n = 2), β coefficient drawing risk scoring formula (n = 1), and one study18 did not provide a specific model. The performance and presentation form of the prediction models are shown in Table 3.

Table 2 Model establishment.
Table 3 Performance and presentation of models.

Results of quality assessment

Two researchers utilized the PROBAST scale12 to conduct a quality evaluation of the included literature and meticulously reviewed the evaluation results to guarantee the accuracy of the findings. In the participant domain, all studies were retrospective in design and presented a high risk of bias. In the predictor domain, all studies exhibited a low risk of bias. In the analysis domain, all studies demonstrated a high risk of bias. Among them, three studies15,16,18 had insufficient sample sizes, unable to meet the recommended guideline of having more than 20 ‘events per variable’ (EPV), six studies14,15,20,21,22,23 converted continuous variables into categorical variables, all studies failure to clearly define the approach for handling missing data, nine studies13,14,15,17,18,19,20,21,22 did not avoid selecting variables based on univariate analysis, two studies17,23 failed to report whether model calibration tests were conducted, five studies14,17,18,21,22 had non-utilization or non-reporting of internal validation. In the outcome domain, the risk of bias for two studies17,19 was unclear, while the remaining studies presented a low risk of bias. Regarding adaptability, in the predictor domain, all studies had a low risk of bias. The overall assessment indicated that although all studies had a high risk of bias, the overall applicability risk of the models was low. Table 4 presents the results of quality assessment.

Table 4 PROBAST results of the included studies.

Results of meta-analysis

A meta-analysis was performed on the predictor factors with a literature quantity of ≥ 3. The results indicated that the number of comorbidities, history of aspiration, use of sedatives, depth of tube placement, Amount of gastric residue, APACHE II score, consciousness disturbance, nutritional risk, and age were independent risk factors for aspiration in patients with nasogastric EN (P < 0.05). In the sensitivity analysis, the effect size was statistically combined after deleting relevant studies one by one, and no significant differences were found. The meta-analysis results are presented in Table 5, and the forest plot is presented in Supplementary Material A2. Due to insufficient reporting of model development details in the included studies, only four studies met the synthesis criteria. Among them, Sun et al. studied two model development methods, both of which were based on the same dataset, so only the CART-developed model was included. The fixed-effects model was used to calculate the combined AUC, which was 0.92 (0.90–0.93) (Fig. 2). The I2 value was 0.0% (P > 0.05), indicating that the heterogeneity among studies was small. The Begg’s test showed p > 0.05, indicated no significant publication bias (Appendix A3).

Table 5 Meta-analysis of risk predictors.
Fig. 2
Fig. 2
Full size image

Forest plot of the random effects meta-analysis of pooled AUC estimates for 4 validation models.

According to different disease types, a subgroup analyses was performed on the predictor factors with a literature quantity of ≥ 3 and I2 > 50%. The disease types including pancreatitis group, neurological disease group, and unrestricted disease group. Subgroup analyses provided an initial indication that disease type was the main source of heterogeneity for the predictor of intubation depth. By further data analysis of the included original literature, we found that patients in the pancreatitis group had nasointestinal tubes and patients in the undefined disease group had gastric tubes, the length of the tube in the two groups was not the same, which was the main factor leading to heterogeneity. Despite the subgroup stratification by disease type, the I2 of the comorbidities, history of aspiration, and consciousness disturbance were higher than 50%, suggesting that the effects of these three predictors may be affected by unmeasured confounders or have different mechanisms in different disease subgroups. Future studies need to be further verified by combining more clinical characteristics or biological markers. Details of the subgroup analyses are provided in the Appendix A4.

Discussion

This study found that the advantages of the existing models were excellent prediction performance (AUC > 0.8), diverse model forms, which could meet the needs of different application scenarios, with relatively low applicability risk and practical application potential. Of course, the limitations of this model are also obvious, including a high risk of methodological bias, mainly manifested in the retrospective study design, insufficient methodological standardization, and insufficient validation.

In this study, most of the models demonstrated high discrimination; however, there was no in-depth optimization in aspects such as model development, validation, and result reporting. The entire process of model construction encompasses clarifying the research question, determining the data source, selecting predictive variables, processing predictive variables, fitting the predictive model, evaluating the predictive model, presenting the predictive model, and reporting the research results24. Regarding the data source, all 22 predictive models included in this study were based on retrospective data. Considering that cohort studies have good representativeness, it is recommended that in future model construction or optimization, prospective data or registered data be selected as the modeling data24 to reduce the risk of bias12,25. In the selection of predictive variables, most studies relied on univariate logistic regression analysis, which might increase the risk of inappropriate predictor selection26. Some studies27 have proposed new variable selection methods, such as LASSO regression, Ridge regression, and ElasticNet regression. These methods can reduce the risk of overfitting. It is suggested that in future research, new selection methods can be adopted based on the actual clinical situation to enhance the accuracy of variable selection. Additionally, reporting and handling of missing data can prevent overfitting of the model28. It is recommended that future studies handle the missing data appropriately. In the variable processing stage, converting continuous data into categorical variables for modeling might lead to the loss of model efficacy; However, in the clinical promotion stage, such data conversion can be carried out for the convenience of researchers’ application29. For the performance evaluation of the model, the core indicators include discrimination and calibration. The discrimination is commonly measured by AUC or the C-index. When the value is greater than 0.7, it indicates that the model has good discrimination. The calibration is evaluated through methods such as the Hosmer- Lemeshow test or calibration curves. If the model lacks performance evaluation, overfitting might occur, which to a certain extent limits the model’s applicability. The average sample size of the included studies was 307, falling within the scope of small sample studies. In such studies, the lack of internal validation might lead to an overestimation of the model’s performance. Therefore, internal validation is essential. Furthermore, external validation is an important approach to enhance the generalizability of the model30, but when conducting external validation, it is necessary to compare the baseline characteristics of the modeling and validation data. Overall, current research on the aspiration prediction model for patients with nasogastric EN has made certain progress, however, there are still some issues that require further in-depth investigation in aspects such as model construction, validation, and reporting.

The existing prediction models reported in this review also have clinical implications, and the high-frequency predictors have implications for nursing practice and future research. The most commonly used predictors were depth of tube placement and history of aspiration, which appeared in 8 and 7 models, respectively. Other commonly used predictors were consciousness disturbance and nutritional risk, which were used in 6 and 5 models, respectively. The number of comorbidities, APACHE II score, and age were used in four models, and use of sedatives, Amount of gastric residue were used in three models. The above nine predictors can be prioritized when developing new models. Most of these factors are objective and easy to collect.

The EN via nasal feeding include nasogastric tube feeding and nasoenteric tube feeding. For patients with a risk of reflux and aspiration or feeding intolerance, post-pyloric feeding with a nasoenteric tube should be adopted. For the majority of patients, it is sufficient for the tip of the nasoenteric tube to pass the pylorus. However, for patients with a high risk of reflux and aspiration, the tube should be advanced to 95–105 cm, that is, through the duodenum, which can effectively reduce the incidence of aspiration risk31. For patients receiving EN support via nasogastric tube, if the gastric tube is inserted too deeply, it may stimulate the patient’s gastric peristalsis, causing food reflux and thereby increasing the risk of aspiration15. If the insertion is too shallow, with the top of the gastric tube only reaching the cardia, when the patient is exposed to external stimuli such as coughing, it is prone to reflux and aspiration. Qian et al.32demonstrated that when the insertion depth of the nasogastric tube reaches the prepyloric part of the antrum, the occurrence of complications such as aspiration and diarrhea in enteral nutrition can be reduced. The recommended insertion length of the gastric tube is Nose to Ear to Xiphisternum (NEX), approximately 45–55 cm for adults33,34. However, some studies have questioned this standard, suggesting that it poses a relatively high risk of aspiration35,36. Taylor et al.37proposed that when the catheterization depth is less than 48 cm, it is prone to ectopic placement of the nasal feeding tube, leading to aspiration. Therefore, they suggested “NEX + 10” as the reference standard for the catheterization depth in nasal feeding patients, that is, extending the catheterization depth by 10 cm based on the NEX standard. Therefore, to prevent the occurrence of aspiration in EN, it is recommended that nurses fully consider the individual circumstances of the patient when performing gastric tube insertion, select an appropriate catheterization depth, and when necessary, use visual ultrasound technology equipment to determine the location where the gastric tube reaches.

In our study, we found that there was a high heterogeneity in age, which was caused by the inconsistent age stratification criteria of the included studies. Advanced age has always been considered as a risk factor for aspiration, and the risk of aspiration is significantly increased in elderly patients due to the deterioration of swallowing function, weakened cough reflex and a variety of chronic diseases40,41,42. Therefore, nurses should be alert to the possibility of aspiration in elderly patients with nasogastric enteral nutrition. In addition, some studies have shown that intra-abdominal hypertension (IAH) is an independent risk factor for aspiration in patients with gastric tube feeding (OR = 0.225–0.329)43, and that mechanical ventilation and IAH increase the risk of aspiration43. However, IAH was not included in the models included in this study, and only one study included mechanical ventilation. Further studies are needed to confirm the predictive value of these risk factors.

To enhance the effectiveness of clinical decision-making and practice guidance, developing precise prediction models for the risk of aspiration in tube-feeding EN patients is particularly important. For the future research on such prediction models, the following dimensions can be deepened and optimized: (1) Model development: When developing new models, the nine predictive factors in previous meta-analyses can be referenced to enhance the model’s predictive ability. (2) Data collection method: To ensure the objectivity and accuracy of the data, it is recommended to collect data using a blind method. (3) Sample size determination: The sample size required for the study can be scientifically and rationally determined using EPV value or the sample size calculation method proposed by Riley et al.39. (4) Variable handling: Before converting continuous variables into categorical variables, the specific basis for grouping should be clarified to ensure the rationality and consistency of the conversion. (5) Variable selection: It is advisable to avoid selecting variables solely through univariate analysis and to conduct a more comprehensive and in-depth variable selection by combining machine learning algorithms and relevant professional background knowledge. (6) Research scope and scale: It is recommended to conduct large-sample, multi-center studies to cover a broader patient population and disease types, thereby improving the model’s applicability and generalization ability. (7) Follow the TRIPOD statement for reporting of individual prognosis or diagnosis-specific multivariable prediction models to enhance transparency.

Limitation

This study has certain limitations. Firstly, this study only retrieved Chinese studies on the prediction models of nasogastric EN aspiration, and there is currently a lack of relevant studies in other countries, which may limit the applicability of the research results in the population of other countries and require adjustments when applied in different regions. Secondly, there is inconsistency in the disease types included in the studies, and some predictive factors have been studied in fewer studies, without being subjected to a meta-analysis, which may have certain impacts on the predictive results. In summary, all 22 prediction models included in this study showed good discriminatory power. However, there is still considerable room for improvement in the quality of the models. Therefore, in the future, rigorous methods and transparent reporting are needed to further refine related studies. Finally, since this review only includes studies published in English and Chinese, there may be problems related to language barriers, and the results of studies published in other major languages have not been included in this review.

Conclusion

This systematic review encompassed 11 studies and 22 models. The results indicated that the combined AUC of the 4 validated models was 0.92 (0.90–0.93), suggesting a excellent discriminatory capacity of the models. Nevertheless, based on PROBAST, all the included studies were evaluated as having a high risk of bias. Currently, the prediction models for the aspiration risk in patients with nasogastric EN do not comply with the PROBAST criteria. Hence, researchers need to be acquainted with the PROBAST checklist and adhere to the reporting guidelines in the TRIPOD statement to enhance the quality of future studies. Future research should give priority to the development of new models with larger sample sizes, rigorous study designs, and multi-center external validation.