Introduction

Upper gastrointestinal bleeding (UGIB) is a potentially life-threatening condition defined as bleeding originating from a source proximal to the ligament of Treitz. UGIBs represent a leading cause of hospital admission worldwide with an estimated incidence of 50–100/100,000 individuals per year1,2 and mortality rates ranging from 5 to 10%3.

Despite the recent advancements in the management of UGIB emergency conditions, the geriatric population remains a high-risk category. Approximately 35–45% of patients referring to the hospital for UGIB are over 60 years of age4,5, and estimates are expected to rise due to the aging population in the future. Advanced age critically influences morbidity and mortality rates associated with UGIB6,7,8 and the incidence of complicated gastrointestinal events5,9.

Common symptoms of serious UGIB include hematemesis and/or melena, syncope, or hemodynamic failure. In geriatric patients minor clinical symptoms such as dyspepsia, heartburn or abdominal pain, tend to be rare compared to younger individuals, causing a delay in the diagnostic-therapeutic process10,11. Furthermore, beyond the high prevalence of risk factors (e.g., use of antiplatelets, anticoagulants, nonsteroidal anti-inflammatory drugs (NSAIDs) or corticosteroids)8,12 the presence of multiple comorbidities in geriatric people often provokes severe gastrointestinal adverse events and clinical deterioration after UGIB8,13.

In order to predict relevant outcomes including mortality, severity of GI bleeding, rebleeding and length of hospital stay, several risk scores have been proposed mostly taking into account demographic, biochemical, clinical (pre-endoscopic risk scores) and endoscopic data (post-endoscopic risk scores). Prognostic scores reduce healthcare costs without impacting on patient outcomes14, by stratifying UGIB in "low-risk", addressed to outpatient management, and “high-risk” patients, requiring hospitalization, urgent endoscopy or intensive care.

However, several prognostic scores do not consider age or advanced age in their assessment. In particular, among the risk scores most frequently used in clinical practice, the Glasgow-Blatchford Bleeding score (GBS)15 and its modified version (mGBS)16, the T-score17 and the MAP(ASH) score18 do not consider age as a prognostic factor whereas the Canada–United Kingdom–Adelaide (CANUKA)19 and the AIMS6520 attribute a score to subjects aged ≥ 65 years although without further differentiation by age group.

In this study we aimed to investigate the validity of those pre-endoscopic risk scores, not considering age or advanced age as a prognostic variable, in the prediction of different outcomes in geriatric subjects hospitalized with UGIB: 30 days mortality since hospitalization, a clinically relevant outcome (required red blood transfusions, endoscopic treatment or rebleeding) and length of hospital stay, in the geriatric population.

Methods

Study population and data collection

From November 2021 to May 2022, we conducted a prospective, multi-center and cross-sectional study on the performance of gastrointestinal bleeding risk scores in predicting clinically relevant outcomes in a geriatric population.

Given the aim of this study, patients aged ≥ 65 years with suspected UGIB were recruited whereas severe hematological disorders, inflammatory bowel disease, simultaneous bleeding from a non-gastrointestinal source and lack of data required for calculation of risk scores were considered exclusion criteria. The present study included 136 participants and on the basis of the median age of the study population (82 years old), the total sample was divided into two categories: “< 82 years” (65–81.9 years old) and “≥ 82 years” (82–100 years old).

All individuals visited the Emergency Department of Policlinico Riuniti, Foggia, Italy or the Emergency Department of Hospital “Vito Fazzi”, Lecce, Italy exhibiting symptoms of suspected UGIB including hematemesis and/or melena. Following clinical stabilization patients were hospitalized at the Liver Unit, University Hospital of Foggia or Internal Medicine Unit, Hospital “Vito Fazzi”, Lecce. Medical support was provided to each patient, and clinical signs and laboratory tests were performed at the time of admission. In each case, esophagogastroduodenoscopy (EGDS) was performed within 72 h of admission to the Emergency Department in light of the need for clinical stabilization and the overcrowding of the hospital facilities. All participants who required hemostasis successfully completed the procedure.

The following data were gathered: age, sex, exhibited symptoms at hospital admission (hematemesis, melena or syncope), vital signs (systolic blood pressure, heart rate), mental status, medical history and medication usage. Further, laboratory test findings (haemoglobin, albumin level, blood urea nitrogen, PT, and INR), endoscopic findings, the need for packed red blood cells transfusion, endoscopic management of bleeding and the need for repeated endoscopies due to rebleeding during hospitalization were also collected.

The study was approved by the Ethics Committee of the Teaching Hospital “Policlinico Riuniti” of Foggia and was conducted according to the ethical standards of our institutional research committee and the 1964 Declaration of Helsinki and its later amendments. All participants provided written informed consent to the study.

Gastrointestinal bleeding risk scores

For this study, the following gastrointestinal bleeding risk scores were considered: Glasgow-Blatchford Bleeding (GBS) and its modified version (mGBS), T-score, MAP(ASH), Canada–United Kingdom–Adelaide (CANUKA), AIMS65. The variables of each scoring system are reported in Table S1. For each risk score, higher values imply higher risks, except for T-scores, for which the mutual score was taken into account.

Due to age variable categorization that prevents comparison with the considered scores, other gastrointestinal bleeding risk scores, such as Clinical Rockall Score, were excluded from this investigation based on the intrinsic methodological limit.

Assessment of outcomes

Three different outcomes were explored in the present study: mortality, defined as death occurring within 30 days since hospital admission, a composite outcome (required red blood transfusions, endoscopic treatment and rebleeding) and in-hospital length of stay.

Blood transfusion was required in case of haemoglobin < 8 g/dl or hemodynamic instability following medical evaluation.

Endoscopic treatment included hemostatic electrocoagulation, application of endoscopic clips, injection of epinephrine, transarterial embolization or argon plasma coagulation.

Rebleeding was considered in case of bleeding from previously detected or treated lesions during EGDS.

The median of days of hospitalization (11 days) was considered as a threshold to distinguish short (< 11 days) from long (≥ 11 days) hospitalization.

Statistical analysis

Baseline characteristics of the study participants by age categories (“< 82 years” and “≥ 82 years”) were assessed using chi-square (χ2) tests for categorical variables and t-test and Wilcoxon rank sum test for continuous variables. Categorical variables were presented as frequencies or percentages and continuous variables as mean ± SD.

First, the receiver-operating curve (ROC), the area under ROC curve (AUROC), sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), odds ratio (OR) and corresponding 95% confidence intervals were calculated to assess the prognostic value of each scoring system for mortality, composite outcome and in-hospital length of stay. ROC curves plot true positive rates (sensitivity) as a function of false positive rates (100-specificity) for different cut-off points and each point on the curve corresponds to a specific decision threshold in terms of sensitivity/specificity.

AUROCs by age categories of each risk score were calculated and tested for equality with the DeLong method21. Then, in order to identify the relationship between the explored outcome and risk score in each age category, pairwise comparisons were performed. We examined the relationship between outcome events and each score separately, using logistic regression analysis.

For each scoring system and outcome, the cut-off value as a function of each age category was determined by the maximum product of sensitivity and specificity according to Liu’s method22. Then, sensitivity and specificity and their 95% exact binomial confidence intervals were calculated by age categories and compared across different outcomes and risk scores.

Finally, tests of the equality of AUROCs of every risk score by age group were performed.

Values were considered significant if p-value < 0.05 (95% confidence interval). All statistical analyses were performed using Stata SE 15.0 (StataCorp, College Station, TX).

Results

Baseline characteristics of the study population

Baseline characteristics of participants by age category are summarized in Table 1. Out of the 136 participants, 67 (49.26%) were in the “< 82 years” group (mean age 74.01 ± 5.44) and 69 (50.74%) were in the “≥ 82 years” group (mean age 88.1 ± 4.26).

Table 1 Baseline characteristics of the study population by age categories (n = 136).

Participants in the older group had higher blood urea and lower albumin levels. Moreover, individuals in the older group presented higher scores of GBS, mGBS, CANUKA, MAP(ASH) and T-scores. No differences were found among the groups in terms of antiplatelets, anticoagulants and corticosteroids usage, EGDS findings or UGIB-related symptoms exhibited at the time of admission.

30-days mortality since hospitalization

No scoring system has been shown to be superior in predicting mortality in the overall sample (Table S2). Despite “≥ 82 years” participants showed smaller AUROCs than “< 82 years” participants (Fig. 1A), no significant differences were observed across age groups in predicting mortality among risk scores, except for T-score, where the “< 82 years” group (AUROC 0.88, 95% CI 0.77–0.99) showed a larger AUROC than “≥ 82 years” group (AUROC 0.53, 95% CI 0.27–0.75) (Table 2, Fig. 1A).

Fig. 1
figure 1

ROC curves of prediction of (A) mortality, (B) composite outcome and (C) length of stay among age groups. The dotted line corresponds to the reference line. “< 82 years.” (65–81.9 years old); “≥ 82 years.” (82–100 years old). GBS Glasgow-Blatchford Bleeding, mGBS modified Glasgow-Blatchford Bleeding, CANUKA Canada–United Kingdom–Adelaide; yrs. years.

Table 2 Comparison of ROC curves, sensitivities, and specificities among age categories.

In the pairwise comparison of the risk scores among “< 82 years” individuals, CANUKA score showed a larger AUROC compared to GBS score (p-value 0.006) and mGBS score (p-value 0.026). Similarly, in the same group, AUROC of T-score was superior compared to AUROCs of GBS score (p-value 0.007) and mGBS score (p-value 0.027) (Tables S3, S4).

Moreover, in the mortality prediction risk scores had similar sensitivity and specificity across age groups (Table 2, Fig. 2A).

Fig. 2
figure 2

Sensibility and specificity of risk scores predicting (A) mortality, (B) composite outcome and (C) length of stay among age groups. “< 82 yrs.” (65–81.9 years old); “≥ 82 yrs.” (82–100 years old). GBS Glasgow-Blatchford Bleeding, mGBS modified Glasgow-Blatchford Bleeding, CANUKA Canada–United Kingdom–Adelaide.

Comparison of AUROCs revealed no difference in the total sample (p-value 0.145) or in the “≥ 82 years” group (p-value 0.421) but a statistically significant difference in the “< 82 years” group (p-value 0.026) (Fig. 3A).

Fig. 3
figure 3

Comparison of risk scores predicting (A) mortality, (B) composite outcome and (C) length of stay among age groups. *Tests of the equality of AUROCs: p-value < 0.05. “< 82 yrs.” (65–81.9 years old); “≥ 82 yrs.” (82–100 years old). AUROC area under ROC curve, GBS Glasgow-Blatchford Bleeding, mGBS modified Glasgow-Blatchford Bleeding, CANUKA Canada–United Kingdom–Adelaide.

Composite outcome

T-score had the largest AUROC compared to the other risk scores in the overall sample (Table S2). No differences across AUROCs (Fig. 1B) and specificities (Fig. 2B) in the prediction of the composite outcome were found among age groups (Table 2). However, all risk scores in the “< 82 years” group had significantly higher sensitivities than those in the “≥ 82 years” group, except for the T-score (Table 2, Fig. 2B).

Pairwise comparison of risk scores among “< 82 years” participants showed inferiority of AIMS65 score compared to mGBS score (p-value 0.016), MAP(ASH) score (p-value 0.049), CANUKA score (p-value 0.011) and T-score (p-value 0.001) (Table S5) whereas no superiority of any risk score was found among “≥ 82 years” participants (Table S6).

Test of equality of AUROCs predicting composite outcome showed a significant difference in the total sample (p-value 0.008), in the “< 82 years” (p-value 0.047) and “≥ 82 years” (p-value 0.048) groups (Fig. 3B).

Length of stay

All risk scores showed poor and similar performances in predicting length of stay in the overall sample (Table S2) and comparison of AUROCs of each risk score among age groups showed a significant difference in GBS score (p-value 0.010), mGBS score (p-value 0.001), MAP(ASH) score (p-value 0.009), T-score (p-value 0.049) and AIMS65 score (p-value 0.025) with poorer performances in the “≥ 82 years” group but not CANUKA score (p-value 0.113) (Table 2, Fig. 1C).

However, no differences were found in terms of sensitivities and specificities (Table 2—Fig. 2C) or superiority of the risk scores in the pairwise comparison among age groups (Tables S7, S8).

Finally, there was no superiority when comparing AUROCs of risk scores, both in the total sample (p-value 0.796) and among age groups, nor in the “< 82 years” (p-value 0.499) or the “≥ 82 years” (p-value 0.731) groups (Fig. 3C).

Discussion

Globally, an increasing number of individuals requiring advanced medical support visit Emergency Departments every day, having a significant impact both on patients and on healthcare system in terms of crowding and costs. Patients suffering from suspected UGIB are usually admitted to the Emergency Department with immediate or very urgent priority. However, in order to predict severe outcomes such as mortality, rebleeding, need for urgent endoscopic interventions or RBC transfusions and to detect individuals who require hospitalization, early resuscitation and close monitoring several risk stratification scores have been developed over time. Moreover, to identify those candidates for outpatient management could represent a potential application of UGIB risk scores.

An ideal risk score for UGIB should be easy to calculate, contain accessible variables and predict significant outcomes accurately. To avoid misclassifying “high-risk” patients as “low-risk”, a risk stratification score's sensitivity is important. On the contrary, the specificity of the risk score is less important in terms of patient safety, and more important in terms of resources wastage.

Although endoscopic data would improve UGIB risk scores performance, endoscopy is not continuously available in all medical centres and, in geriatric individuals is often not tolerated or even not applicable23.

In contrast, age is a cost-free and easily accessible information which could improve the validity of UGIB risk scores. With the aging population, the higher incidence of UGIB is expected to increase24 and the development of a validated tool able to predict clinically relevant outcomes in geriatric patients represent a crucial challenge.

Previous studies highlighted the increasing relative risk associated with advanced age7 or implemented existing risk scores with amendments for advanced age, improving the validity of risk stratification25,26.

A recently published study compared six pre-endoscopic scoring systems in older (age ≥ 65) and younger (age < 65) cohorts: one of these, the ABC score, a risk score accounting for advanced age (60–74 years; ≥ 75 years), resulted in the best performance in mortality and rebleeding predictions in both age cohorts regardless significant differences between groups whereas MAP(ASH) score was the most accurate for predicting intervention in both groups27.

However, most used scoring systems do not consider age or advanced age, generating an important flaw in the evaluation of geriatric patients with UGIB. Here, exploring the validity of most used UGIB scoring systems in the prediction of different clinical outcomes in geriatric individuals, we found that all prognostic scores performed better in the “< 82 years” group (age 65–81.9 years) compared to the “≥ 82 years” group (age 82–100 years) and none of the analysed scores showed a total superiority compared to the others. Moreover, sensitivities were significantly higher in the prediction of the composite outcome in the “< 82 years” group suggesting poor performances in the older population.

Guidelines currently recommend GBS as the most UGIB reliable scoring system28,29. In our study, GBS and its modified version mGBS were not superior in the prediction of any of the explored outcomes, neither in age cohorts.

AIMS65 is an easy calculable score system that was initially developed to predict the mean length of stay and mortality20; however, here AIMS65 outperformed only in the prediction of mortality in the “< 82 years” group.

MAP(ASH) score accounts for six variables and was initially designed to predict mortality and intervention18; here it showed excellent performance in the prediction of mortality in the “< 82 years” group whereas the prediction of composite outcome and length of stay resulted in barely acceptable or very low performances in either age groups.

Similarly, CANUKA score showed acceptable performances among “< 82 years” participants but poor performances in the “≥ 82 years” age group.

Finally, T-score is a simple scoring system including four clinical parameters commonly assessed in the UGIB setting. In the prediction of composite outcome T-score showed superiority compared to the other risk scores in the total sample and both in the “< 82 years” and “≥ 82 years” cohorts. Moreover, in the “≥ 82 years” group T-score was less reliable in predicting mortality than in the “< 82 years” group. This is relevant as Tammaro et al.30 demonstrated a similar accuracy to GBS in mortality prediction. On the contrary, T-score was the only score showing a similar sensitivity of composite outcome prediction between “≥ 82 years” and “< 82 years” groups.

Strengths and limitations

Strengths of this study are the wide and clinically applicable outcomes explored, the prospective and multi-center design and the comparison of pre-endoscopic scoring systems with similar features but different variables. Some limitations should be acknowledged: the event of rehospitalization was not considered, no distinction was made between bleeding due to esophageal varices and other causes, the relatively low number of death events and the ascertainment of the cause of death which could cause mortality to be less accurate. Finally, the results of the analyses cannot be generalized due to the small sample size.

Conclusions

If on the one hand progresses in modern medicine led to increase life expectancy, from the other multimorbidity and high consumption of drugs such as anticoagulants, antiplatelets or NSAIDs, expose geriatric patients to more adverse outcomes after UGIBs. Several scoring systems are available for UGIB to predict adverse outcomes. This study contributes to the evidence that some of most recommended prognostic scores in clinical practice (i.e., GBS, mGBS, MAP(ASH), CANUKA, T-score and AIMS65) may be biased by not considering age, and especially advanced age, in assessing UGIB severity, resulting in low performance in the prediction of clinically relevant outcomes in the geriatric population. Therefore, we believe that UGIBs scoring systems should be not recommended in advanced age, and further studies are eagerly needed to develop new scores or validate existing ones in old and very old patients.