Introduction

An abdominal aortic aneurysm (AAA) refers to an irreversible localized dilatation of the abdominal aorta. Globally, the prevalence of AAA is reported to be around 2–8%. In Korea, its prevalence is approximately 2.8%1,2. In general, the size of AAA tends to gradually increase over time. As the size increases, the risk of rupture also increases3,4. Once a rupture occurs, the mortality rate has been reported to be as high as 81%5. When treating intact AAA, the 30-day mortality rate has been reported to range from 1.16 to 3.27%. For cases with ruptured AAA, among patients receiving treatment in hospitals, the 30-day mortality rate has been reported to be in the range of 30.2–39.6%6.

Due to reasons mentioned earlier, several countries including the United States, the United Kingdom, Sweden, Denmark, and others have implemented AAA screening programs. Numerous randomized controlled trials (RCTs) and observational follow-up studies have been conducted in this context7,8,9,10. These screening programs typically target men aged 65 and older with a history of smoking. While such screening programs do not significantly affect the overall all-cause mortality, studies have reported that such screening programs can reduce the rate of ruptured abdominal aortic aneurysms and decrease AAA-related mortality. Additionally, these screenings have been shown to be cost-effective in terms of healthcare resource utilization.

However, considering that these approaches have focused on men aged 65 and older with a history of smoking, there is a possibility of missing out on patients. According to Summers et al., there are still significant high-risk groups that fall outside the current guidelines who could greatly benefit from AAA screening11. By developing a model that can predict AAA occurrence based on basic screening results, it would be possible to expand screening to a wider population. This could help reduce chances of missing out on individuals who may be at risk and minimize unnecessary screenings, ultimately leading to improved cost-effectiveness.

Therefore, this study aimed to develop a predictive model for the presence of AAA using 11 years of data from the Korean National Health Insurance Service (NHIS) database and subsequently conduct validation. To the best of our knowledge, this is a novel approach.

Methods

This study was a parallel study to “Risk of various cancers in adults with abdominal aortic aneurysm” and “The risk of dementia in adults with abdominal aortic aneurysm” by Cho et al.12,13 It showed similarities in protocol, patient group selection method, and statistical method.

Data source

The healthcare insurance system in Korea has been introduced in the two previous parallel studies. Data utilized in this research spanned from 2009 to 2020. They were gathered from the NHIS database.

Patients

The study initially enrolled 4,234,415 individuals aged 20 and above who underwent health examinations in 2009. Patients who had previously been diagnosed with AAA at the time of the health examination were excluded (n = 2,409). Individuals who lacked data for one or more variables required for this study at the time of the health examination were excluded from the analysis (n = 284,471). The AAA patient group was defined using diagnostic codes and procedure codes, similar to previous studies. (Appendix 1) Patients who were lost to follow-up within one year after the health examination were excluded. Likewise, those who developed AAA within one year were also excluded to establish a clear cause-and-effect relationship (n = 10,431). 70% of these patients were randomly assigned to a development cohort for model training, while the remaining 30% were allocated to a validation cohort. The allocation was performed using simple random sampling without stratification to preserve the natural distribution of baseline characteristics. (Fig. 1)

Fig. 1
figure 1

Enrollment flow chart. AAA: Abdominal aortic aneurysm.

Data collection and definition

Demographic data were gathered from the NHIS database, encompassing age, sex, smoking habits, alcohol consumption, physical activity, waist circumference, body mass index (BMI), and income level. Information regarding underlying health conditions, including hypertension, diabetes mellitus (DM), dyslipidemia, chronic kidney disease (CKD), and a history of cardiocerebrovascular disease (CVD), was also collated. Definitions of variables were similar to those described in previous papers. They are summarized in Appendix 1.

This study was conducted in accordance with the ethical standards of the Declaration of Helsinki and the regulations. The requirement for informed consent was waived because the study used de-identified data from the National Health Insurance Service (NHIS) database of Korea. This study was approved by the Institutional Review Board (IRB) of The Catholic University of Korea, Eunpyeong St. Mary’s Hospital, Seoul, Korea (IRB approval number: PC23ZASI0143).

Statistical analysis

Continuous variables are presented as mean ± SD or 95% CIs, while categorical variables are expressed as numbers and percentages (%). For comparing characteristics between patient and control groups, Student’s t-tests were employed for continuous variables and Chi-squared test or Fisher’s exact tests were used for categorical variables. Incidence rates of AAA are presented per 1,000 person-years. To investigate hazard ratios (HR) of various variables on the occurrence of AAA, the Cox proportional hazard regression model was employed. Variables included factors associated with AAA based on the literature14. These variables were selected from data obtainable through health examination records.

Risk scores were allocated according to the HR for each risk factor identified in the final Cox hazard regression model. Each of the 10 variables (age, sex, obesity, smoking status, drinking, fasting glucose level, blood pressure, total cholesterol level, presence of CKD, and previous CVD) was assigned a score ranging from 0 to 100. Each variable was then mapped to a specific point by extending a line vertically along the score axis. To assess the performance of the model, calibration and discrimination were conducted. For calibration, predicted 5-years disease free survival was plotted against observed 5-years disease free survival to visually inspect the alignment15. (Appendix 2) For discrimination, receiver operating characteristic curves were generated and the area under curve was examined16. All statistical analyses were performed using SAS version 9.4 (SAS Institute Inc., Cary, NC, USA) and the R Project for Statistical Computing version 3.3 (Vienna, Austria).

Results

Baseline characteristics according to presence of AAA

Initially, it was observed that distribution patterns of variables were not significantly different between the development cohort and the validation cohort. (Table 1) Average follow-up period was 10.11 ± 1.28 years, with AAA occurring in 6,514 (2.36%) out of 2,755,973 participants in the development cohort. At baseline, the mean age was 47.22 ± 14.01 years. Male patients accounted for 54.56%. The AAA patient group was older (62.88 ± 11.3 years vs. 47.19 ± 14 years, p-value < 0.001), had a higher proportion of males (67.88% vs. 54.53%, p-value < 0.001), and higher BMI (24.25 ± 3.1 vs. 23.7 ± 3.22, p-value < 0.001) than the control group. Additionally, the smoking rate was higher (31.98% vs. 26.01%, p-value < 0.001), while the proportion of patients who consumed alcohol was comparatively lower (38.52% vs. 48.31%, p-value < 0.001) in the AAA patient group. Interestingly, the AAA group exhibited higher levels of physical activity (21.08% vs. 17.96%, p-value < 0.001). In terms of comorbidities, the AAA group had higher prevalances of hypertension (55.63% vs. 25.37%, p-value < 0.001), hyperlipidemia (32.7% vs. 17.34%, p-value < 0.001), CKD (15.32% vs. 6.91%, p-value < 0.001), CVD (7.52% vs. 1.87%, p-value < 0.001), and DM (12.48% vs. 8.66%, p-value < 0.001).

Table 1 Clinical characteristics of study participants based on the occurrence of abdominal aortic aneurysm (AAA) in development and validation cohorts used for the predictive model.

Selection of variables

Among variables listed in Table 1 that showed significant distribution differences between the AAA group and the control group, a total of 12 variables were selected. When similar or overlapping variables were present (e.g., BMI vs. obesity, or diabetes vs. diabetes severity), the variable with greater clinical interpretability or statistical contribution was retained to avoid redundancy and overfitting in the model. (Tables 2 and 3) The Cox proportional hazard regression model was then utilized to examine the HR for the occurrence of AAA. In multivariate analysis, we ultimately selected 10 variables that were statistically significant, including age and sex, obesity, smoking, drinking, DM, HTN, dyslipidemia, CKD, and CVD. Old age [HR: 30.43 (95% CI: 26.48–34.97)], male sex [HR: 2.01 (95% CI: 1.88–2.16)], obesity [HR: 1.06 (95% CI: 1.01–1.11)], smoking [HR: 2.20 (95% CI: 2.05–2.36)], DM [HR: 0.64 (95% CI: 0.59–0.69)], HTN [HR: 2.04 (95% CI: 1.89–2.20)], dyslipidemia [HR: 1.56 (95% CI: 1.47–1.66)], CKD [HR: 1.41 (95% CI: 1.31–1.51)], CVD [HR: 1.50 (95% CI: 1.67–1.65)] were significant predictive factors for occurrence of AAA after adjusting for all 10 variables. A nomogram for risk scoring developed from the risk prediction model was constructed to estimate the five-year risk of AAA. (Fig. 2)

Fig. 2
figure 2

A nomogram for predicting the 5-year probability of abdominal aortic aneurysm occurrence. Note: The 10 variables—age, sex, obesity, smoking status, drinking, presence of diabetes mellitus or hypertension, chronic kidney disease, cardiocerebrovascular disease, and total cholesterol level—were each assigned scores ranging from 0 to 100. The corresponding score for each variable can be determined by drawing a straight line to the scoring axis. The total score, calculated as the sum of the scores for all variables, ranges from 0 to 226 and is displayed at the bottom of the nomogram. DM: Diabetes mellitus, IFG: Impaired fasting glucose, HTN: Hypertension, CKD: Chronic kidney disease, CVD: Cardiocerebrovascular disease.

Table 2 Hazards ratios (95% CIs) for the occurrence of abdominal aortic aneurysm (Univariate model).
Table 3 Hazards ratios (95% CIs) for the occurrence of abdominal aortic aneurysm (Multivariate model and final model).

Validation of the prediction model

Average follow-up period was 10.11 ± 1.29 years, with AAA occurring in 2,836 (2.4%) out of 1,181,131 participants in the validation cohort. At baseline, the mean age was 47.24 ± 14.02 years. Male patients accounted for 54.58%. Additionally, after examining the area under curve (AUC) value for AAA occurrence prediction in the prediction model, when applied to the development cohort data, the AUC was found to be 0.807 (95% CI: 0.80–0.81). When applied to the validation cohort data, the AUC value was 0.803 (95% CI: 0.79–0.81). (Fig. 3) This suggests that the model is effective in predicting the occurrence of AAA.

Fig. 3
figure 3

The model’s receiver operating characteristic curve (ROC) when using development and validation cohorts. Note: A ROC curve is using development cohort, and B ROC curve is using validation cohort.

Prediction model

The sum of total scores was obtained by combining scores of the 10 variables. It ranged from 0 to a maximum of 226 points. For example, a male (20 points) aged 65 or older (100 points) who smoked (23 points), had a normal weight (0 points), and had diabetes (0 points) with no other underlying conditions (0 points) would have a total score of 143 points. On the other hand, a female (0 points) aged 40 (57 points) who smoked (23 points), had a normal weight (0 points) who did not drink alcohol (12 points), had no DM (13 points) but had HTN (21 points), hyperlipidemia (13 points), and CKD (10 points) would have a total score of 149 points.(Table S1) In this case, it can be inferred that the probability of developing abdominal aortic aneurysm within 5 years is less than 0.5%.

In the patient group with the highest score range of 215 points or above in the prediction model, the incidence rate of AAA was confirmed to be 1.194 per 1000 person-years.(Fig. 4) Additionally, to assess the potential overfitting of the model, we compared the incidence in the validation cohort. It was observed that similar patterns were present in each interval, confirming the excellence of this predictive model.

Fig. 4
figure 4

Predicted incidence rate (per 1 000 person-years) using development and validation cohorts.

Discussion

Utilizing the NHIS database in Korea, we developed and validated a straightforward yet effective risk prediction model for the occurrence of AAA. To the best of our knowledge, this study developed the first model capable of predicting the likelihood of AAA occurrence after a prolonged period of 5 years using data obtained through long-term follow-up observations. Our model exhibited a strong performance with an AUC of 0.807 (95% CI: 0.80–0.81). Notably, older age, male sex, obesity, current smoking, non-drinking, absence of DM, presence of hypertension, hyperlipidemia, CKD, and CVD were identified as independent predictors of an increased risk of AAA.

In opinions of vascular and endovascular surgeons, prioritizing research on methods for predicting AAA growth is considered essential17. Therefore, various studies are being conducted to predict the growth of AAA. For example, approaches that include geometric perspectives from CT images, attempts to predict AAA growth using imaging modalities such as MRI to examine metabolism, and exploring associations with biomarkers have been reported18,19,20. Of course, while predicting the growth of AAA is important, it is even more crucial to first predict the presence of AAA.

AAA screening is most widely conducted based on the U.S. Preventive Services Task Force guidelines21. Additionally, there are other guidelines such as the slightly more sensitive Society for Vascular Surgery (SVS) guideline and the SVS expanded criteria22. The SVS guideline significantly broadens the scope of patients who could potentially be included in screening programs. Nevertheless, about one-quarter to one-third of patients undergoing treatment for AAA could still potentially go unnoticed by any existing screening guidelines23. This aspect can also be verified through the model developed in this study. For instance, scores corresponding to the criteria of traditional guidelines targeting males over 60 who smoke can also be sufficiently obtained from individuals under 60, females, or non-smokers. Through this, one aspect of the model’s excellence can be confirmed. Specifically, its ability to include a slightly more comprehensive range of patients while effectively identifying specific individuals within that group is confirmed.

The second excellence lies in the fact that variables included in this model are easily recognizable factors such as medical history, smoking history, age, and gender. Due to this simplicity, the general public can easily check their risk of developing AAA at any time. If a cutoff value is determined through subsequent research, strongly recommending screening to individuals with risk levels above this value could help improve the screening rate, addressing one of the issues with the current screening method. Lastly, due to being developed with long-term data from a substantial number of patients, it is likely to have strength in terms of reliability. As the model developed in this study is an initial iteration, further research involving additional easily obtainable variables through subsequent validation could enhance the model’s accuracy. This aligns with the initial goal of the model development, which is to expedite identification of AAA patients, enabling timely intervention before complications such as rupture, ultimately improving the survival rate.

The current study is subject to several limitation. Firstly, due to the utilization of NHIS data, defining the presence or absence of a patient’s diseases based on diagnosis codes and medication intake might have led to various errors and inconsistencies. This is because there may be undiagnosed asymptomatic AAA cases or patients who died from ruptured AAA without diagnosis compared to the group identified through screening. Secondly, in the selection of variables included in this model, efforts were made to incorporate well-known risk factors based on available information. However, due to the nature of the data, only variables that could be confirmed through health check-ups were included, while factors such as family history, transplantation status, or the presence of aneurysms in other peripheral arteries could not be incorporated. Thirdly, since the model was developed for the Korean population, it is necessary to assess its applicability and effectiveness when applied to different racial groups. Fourthly, although the cohort has a follow-up period exceeding 10 years, the model was not separately validated for short-term and long-term outcomes. Therefore, we could not assess potential differences in predictive performance or variable importance across different time horizons. Future studies with time-specific modeling may help address this limitation.

Further research is needed to establish an appropriate cut-off level for recommending screening. Subsequently, when conducting screening on the identified patient group, assessing improvements in aspects such as all-cause mortality, AAA-related mortality, and cost-effectiveness will be essential.

Conclusions

We developed a multivariable risk model capable of predicting the onset of AAA. In this study, the model demonstrated excellent performance with an AUC value of 0.807, surpassing traditional screening methods. It is anticipated that this model can selectively identify patients from a slightly more comprehensive pool compared to existing screening approaches. Moreover, efforts should be directed towards proactive screening of high-risk individuals for AAA, aiming to reduce AAA-related mortality. For further optimization of the model’s performance, external validation is necessary, along with additional analysis and refinement using larger cohort data. The development of an appropriate surveillance program in response to these findings is also crucial.