Introduction

Inflammatory bowel disease (IBD), encompassing Crohn’s disease (CD) and ulcerative colitis (UC), represents a chronic and progressive group of immune-mediated disorders targeting the gastrointestinal tract, characterized by debilitating symptoms including abdominal pain, diarrhea, hematochezia, fever, and extraintestinal manifestations. IBD has emerged as a global disease in twenty-first century, with an estimated worldwide prevalence of 7 million cases1,2. In addition, based on the research data of IBD population from 2010 to 2019, it is estimated that the number of IBD in North America will reach 2.2 million in the next 10 years, and Europe will reach 2.5–3 million IBD patients3,4,5. According to four epidemiologic stages, many Asian countries, particularly in East Asia, have long remained in stage 1 and are now beginning to enter stage 2, some may even be approaching Stage 3. IBD is marked by its difficulty in achieving definitive cure, frequent relapses, and association with multiple complications6. Following the initial disease peak, IBD patients often face a cyclical pattern of remission and relapse, where recurring gastrointestinal symptoms significantly disrupt multiple dimensions of their lives7.

Substantial research endeavors have focused on elucidating the multifactorial etiology of IBD to mitigate its onset and progression. The pathophysiological landscape of IBD is inherently complex, involving intricate interactions between genetic susceptibility, dietary influences, environmental exposures, gut microbial dysbiosis, immune dysregulation, and other emerging risk factors8. Notably, mounting evidence supports a central role for genetic predisposition, as exemplified by epidemiological studies indicating that up to 12% of IBD patients report a positive family history of the condition, underscoring a significant hereditary component in disease pathogenesis9,10,11. Family and twin studies demonstrate that the heritability of IBD varies significantly by disease phenotype, with CD showing a more robust genetic determinism compared to ulcerative colitis UC12. Current genome-wide association studies (GWAS) on IBD have identified more than 250 important loci9. While genetic factors establish susceptibility, environmental exposures dynamically shape IBD pathogenesis through multiscale interactions, with risk factors operating across individual, familial, communal, and geopolitical strata synergistically modulating disease onset and progression13,14,15. Urbanization drives profound environmental shifts—including industrialized dietary patterns, heightened antibiotic use, sanitization-induced microbial deprivation, and escalating pollution—that collectively disrupt host-microbiome homeostasis, thereby emerging as critical modifiable risk factors linked to IBD emergence4,15. Among environmental risk factors, shifts in dietary patterns stand out as a major modulator of IBD pathogenesis, albeit without clear evidence of direct causation. Emerging research posits that dietary transitions may indirectly precipitate IBD onset by reshaping gut microbial ecosystems and triggering dysregulated immune responses—a hypothesis underscoring the diet-microbiota-immune axis as a critical interface in disease development16,17. While individual risk factors have been extensively investigated, there remains a dearth of studies integrating polygenic susceptibility, immunometabolic pathways, and environmental exposures—particularly those elucidating the dynamic interplay between these elements15,18,19.

In this study, the United Kingdom Biobank (UKB), a large prospective cohort, was utilized to investigate the pathogenic mechanisms of IBD and their multidimensional interactions. By integrating polygenic risk scores (PRS), we seek to advance the understanding of IBD etiology beyond conventional frameworks. Specifically, we aim to develop an etiological model that dissects the intricate interplay among pathogenic factors, thereby elucidating the cascading pathways that culminate in IBD onset.

Methods

Participants

Inclusion criteria specified participants who were evaluated at baseline (2006–2010) and had relevant diagnostic records. Exclusion criteria included participants diagnosed with intestinal cancer (including small intestinal cancer, rectum/colon/sigmoid colon cancer, and anal cancer) or IBD at baseline.

Variables and outcome measurements

Baseline characteristics variables

The baseline characteristics encompass population demographics, genetic risk profiles, lifestyle habit data, medication/disease histories, and intestinal surgery-related medical records. Detailed definitions of these variables are outlined in the methods section of the supplementary materials.

Dietary habit

We collected dietary data from UKB datasets. We organized the data and used it for subsequent analysis by classifying the different dietary intake according to seven food groups (including fruits, vegetables, fish, processed meats, unprocessed red meats, whole grains and refined grains). The dietary habits were scored based on the intake of each food group20. This dietary habit definition method is based on recognized healthy eating habits ((DASH)21, and is defined by the intake of vegetables, fruits, low-fat dairy products, as well as red meat and processed foods to determine whether their dietary habits are healthy. The subjects were ultimately divided into healthy dietary habit and unhealthy dietary habit (Healthy dietary habit: ≥ 4 of the 7 food groups, unhealthy dietary habit: < 4 of the 7 food groups) (Supplement Table s1 classification criteria for dietary habits).

Immune metabolism

Blood biochemical indicators, blood cell counts, and NMR (Nuclear Magnetic Resonance) metabolic indicators were sourced from UKB, which measured as biomarkers of recruited follow-up subjects. The biomarker assay quality procedures are detailed in an openly accessible document (https://biobank.ndph.ox.ac.uk/showcase/ukb/docs/biomarker_issues.pdf). We collected a total of 220 biomarkers, which were classified according to their characteristics (for example “cholesterol”, “liver function”)22,23.

Lifestyle factors

This study comprehensively examined how modifiable lifestyle factors influence IBD development by analyzing a spectrum of behavioral indicators, including tobacco exposure (smoking status, home/environmental secondhand smoke), alcohol consumption frequency, physical activity levels (IPAQ-categorized), sleep quality (duration and insomnia prevalence), dietary patterns (habits and recent changes), and early-life variables (breastfeeding history and delivery mode). Given these factors’ adaptability, identifying and addressing them offers actionable pathways to reduce IBD incidence. The research aimed to clarify their independent and synergistic impacts on disease risk, ultimately guiding the formulation of targeted prevention strategies that leverage lifestyle interventions to mitigate IBD burden. The research data on lifestyle are all sourced from UKB. For specific variables included, please refer to the variable description in the methods section of the supplementary materials.

Genetic risks

PRS were computed as proxies for the individual genetic predisposition to CD and UC, respectively for each participant. [The PRS calculation method: www.medrxiv.org/content/https://doi.org/10.1101/2022.06.16.22276246v2.full]. This study utilized PRS derived from UKB official dataset, generated through their standardized PRS evaluation platform. These scores, which estimate individual genetic liabilities across multiple diseases and phenotypic traits, were computed using a baseline algorithm trained on integrated external cohorts and UKB’s internal genomic data. To validate comparability, both Standard (n = 486,150) and Enhanced (n = 104,600) PRS variants were benchmarked against a unified set of quality control metrics within a diverse UKB testing subset that deliberately included European, South Asian, African, and East Asian ancestral groups to enhance global demographic representation. Given the substantially larger sample size and broader population coverage of the Standard PRS model, it was selected for primary analysis to ensure robust statistical power and minimize potential biases related to ancestral underrepresentation in genomic studies.

Body pain

This study investigated the association between chronic pain conditions and IBD by assessing multiple pain domains, including persistent abdominal/stomach pain (≥ 3 months), chronic back pain (≥ 3 months), chronic hip/knee pain (≥ 3 months), and oral/dental discomfort. By systematically evaluating the presence and localization of these pain symptoms, the analysis aimed to distinguish between pain directly attributable to IBD pathophysiology and potential comorbid pain conditions that may coexist but remain clinically unrecognized. The research data on body pain are all sourced from UKB. For specific variables included, please refer to the variable description in the methods section of the supplementary materials.

Outcome measurements

The study defined IBD cases using diagnostic criteria from the International Classification of Diseases (ICD-9 codes 555, 556; ICD-10 codes K50, K51). The endpoint was the incidence of first-time IBD diagnosis. Participants free of IBD at baseline were prospectively monitored from study initiation until the earliest occurrence of IBD diagnosis, death, loss to follow-up, or administrative censoring dates (September 30, 2021, for England/Wales; October 31, 2021, for Scotland). This longitudinal design enabled rigorous evaluation of IBD development trajectories over time, accounting for competing risks and regional variations in follow-up duration.

Statistical analysis

Categorical variables were reported as frequencies (proportions), while normally distributed continuous variables were presented as means ± standard deviations (SDs). To compare baseline characteristics between different groups, we applied appropriate statistical methodologies. Specifically, parametric t-tests and non-parametric rank sum tests were used for continuous variables, while Chi-square tests were employed for categorical data to ensure robust group comparisons. To select the most informative variables, we utilized LASSO regression, a generalized linear regression extension that reduces variance in regression coefficients and prediction errors by incorporating a penalty term in the log-likelihood function to constrain model complexity. Model selection was guided by either tenfold cross-validation or the Akaike Information Criterion. Statistical analyses were conducted using the “glmnet” package (R Foundation for Statistical Computing, Vienna, Austria).

Cox proportional hazards model was used to assess associations between exposures and IBD incidence risks in a prospective IBD cohort, represented as hazard ratios (HRs) and 95% confidence intervals (CIs). The model was further adjusted for sex, age, BMI, index of multiple deprivation and PRS. An exploratory Pearson correlation analysis was conducted with peripheral biomarkers and the incidence of IBD.

Finally, Structural Equation Modeling (SEM) was used to investigate the relationships of genetic risks, lifestyle, diet, diseases, immune metabolism and IBD. Following rigorous data curation procedures, including handling over 90% of missing data through multiple imputation and excluding incomplete records. Missing data in the covariate matrix were systematically addressed using the Multivariate Imputation by Chained Equations (MICE) algorithm. The principal component method was employed to extract common factors, utilizing eigenvalues greater than 1 as the criterion for factor selection. Additionally, the scree plot was referenced to provide a comprehensive assessment, thereby determining the optimal number of latent variables. Kaiser standardized maximum variance method was utilized to rotate the factor loading matrix, and the entries with lower factor load were deleted to ensure that each item had a strong relationship with its evaluated structure. All tests were performed by two-sides test, with P value < 0.05 as the statistical difference. All data were performed with R software, version 4.3.1.

Ethical statement

The UK biobank received ethical approval from the North West-Haydock Research Ethics Committee [16/NW/0274], and all participants in this study signed informed consent forms when enrolled. The study was conducted in accordance with the principles of the Declaration of Helsinki and ICH-GCP.

Results

General overview

502,411 subjects were evaluated at baseline (2006–2010) and had relevant diagnosis records. We excluded participants who were diagnosed with intestinal cancer (including small intestinal cancer, rectum/colon/sigmoid colon cancer, and anal cancer, n = 36,406) and IBD (n = 4,551) at baseline, the remaining participants for analysis totaled 461,454 (mean age 56.26 years, 247,771 female participants [53.7%]; 213,683 male participants [46.3%]). Our comprehensive correlation analysis integrated three distinct data domains: blood count measurements from 440,560 individuals, blood biochemistry profiles from 432,586 participants, and NMR-derived metabolic biomarker assessments from 111,897 subjects. Following rigorous data curation procedures, a refined dataset containing 97,229 participants was constructed. This optimized cohort provided the foundational input for subsequent structural equation modeling, enabling sophisticated multivariable analysis of the complex relationships between hematological, biochemical, and metabolic parameters. Figure 1 provides a schematic overview of the study’s methodological framework through a structured flowchart, detailing each analytical stage from data acquisition to final model estimation. A total of 461,454 participants were included, of whom 3,494 had a diagnosis of IBD during the follow-up.

Fig. 1
figure 1

Flowchart of the study.

Baseline characteristics

Table s2 outlines participant demographics and characteristics. Of all included participants, the mean age was 56.26(standard deviation [SD] = 8.11), the mean BMI was 27.44kg/m2 (SD = 4.80), 53.7% of participants was female, 86.4% of participants came from urban area, and 90.3% of whom was white ethnicity. Compared with non-IBD participants, older age, male, higher BMI, urban area, higher education level, breastfed as a baby, higher PRS score, lower IPAQ activity, insomnia, exposure to tobacco, lower IPAQ activity, insomnia, shorter sleep duration, smoking, daily drink, unemployed, lower index of Multiple Deprivation, mental illness, body pain etc. may associated with prevalent cases of IBD.

Survival analyses on the association between risk factors and IBD

To pick out the variables most associated with IBD, LASSO regression was applied to filter the variables on 461,454 patients. We utilized ten-fold cross-validation to select the penalty term, lambda (λ). log(λ) = -6.781 (λ = 0.001134172) when the error of the model is minimized, and 5 key risk factors (PRS for standard ulcerative colitis UC, major dietary changes in the last 5 years, stomach/abdominal pain for 3 + months, ever had bowel cancer screening, mother High blood pressure) were identified (Supplement Fig. 1 A, B, C).

Unadjusted and adjusted hazard ratios for factors associated with the onset of IBD during 15 years’ follow-up were shown in Table s3/Fig. 2. Results indicated that employment status played a key role in the risk of IBD, with being unable to work due to sickness or disability associated with an 80% increase in IBD risk (HR:1.80, CI:1.53, 2.11); usually insomnia was associated with a 23% increase in IBD risk (HR:1.23, CI:1.12, 1.35); major dietary changes because of illness increased 87% risk (HR:1.87, CI:1.70, 2.06), and unhealthy dietary habit increased 15% (HR:1.15 CI:1.07, 1.23). Furthermore, smoking, stomach pain, back pain, knee pain, hip pain boosted 46% (HR:1.46 CI:1.31, 1.63), 128% (HR:2.28, CI:2.04,2.56), 27% (HR:1.27, CI:1.17,1.38), 22% (HR:1.22, CI:1.12,1.33) and 17% (HR:1.17, CI:1.05,1.31) risk of IBD respectively. Participants who ever had bowel cancer screening had a 182% (HR:2.82, CI:2.61,3.04) increase in IBD risk. Moreover, exposure to tobacco, lower educational degree, higher neuroticism score and seen a psychiatrist for nerves, anxiety, tension or depression before also identified as hazard risks. On the contrary, longer sleep duration, frequent physical activity, mother had bowel cancer and high blood pressure were associated with a lower risk of IBD.

Fig. 2
figure 2

The hazard ratios of baseline characteristics, lifestyle factors, diseases with IBD.

Immunometabolic markers and IBD

The association between 220 peripheral markers and IBD were shown in Fig. 3. The top 20 most relevant indicators were classified as liver function, inflammation, white blood cell count, glycolysis-related metabolites, red blood cells count, cholesterol esters, free cholesterol, cholesterol, platelets and total lipids. Of which Albumin(r = 0.023, P < 0.001), C-reactive protein(r = 0.023, P < 0.001), Neutrophill count(r = 0.022, P < 0.001), Glycoprotein Acetyls(r = 0.022, P < 0.001) and Lymphocyte percentage(r = 0.020, P < 0.001) ranked top 5 relevant indicators. Specific results were shown in Supplement Table s4.

Fig. 3
figure 3

The association of peripheral markers with IBD.

Structure equation model on etiology and pathogenesis of IBD

In order to validate the wheel model hypothesis of IBD etiology in this study, structural equation model was constructed to estimate the relationship between genetic risks, lifestyle, disease, diet, immune metabolism and IBD in 97,229 participants. Confirmatory factor analysis results showed that the model fits well and can be used to build structural equation models (Supplement Table s6). The Kaiser standardized maximum variance method was employed to rotate the factor loading matrix, with the removal of entries possessing low factor loads, thereby ensuring a robust correlation between each item and its respective evaluated structure. Four latent variables (23 entries) were finally retained in this study, and each latent variable was represented by at least three entries (Supplement Table s5). The loading coefficients for each marker to the corresponding latent variables are shown in Fig. 4.

Fig. 4
figure 4

IBD etiological model.

The SEM model was built and fitted well (Supplement Table s7), which revealed the etiological mechanisms of polygenic risks, lifestyle, dietary habits, diseases, immunometabolism and inflammatory bowel disease. The standardized path coefficients of SEM paths were shown in Supplement Table s8. Genetic risks (β = 0.027, P < 0.001), life style (β = 0.049, P < 0.001) and immue metabolism (β = 0.011, P = 0.003) were significant predictors of IBD. Besides, disease status and dietary habits can also exert an indirect influence on IBD.

Discussion

In this large cross-sectional and prospective cohort study with 461,454 participants, identified several modifiable and non-modifiable risk factors associated with IBD. Specifically, work disability due to health issues, chronic insomnia, illness-related dietary alterations, unhealthy eating patterns, smoking, bodily pain, and prior bowel cancer screening emerged as significant hazard risks. Conversely, extended sleep duration, regular physical activity, maternal history of bowel cancer, and hypertension were associated with reduced IBD risk. In-depth analysis of 220 immunometabolic markers revealed cholesterol as a promising novel biomarker for IBD prediction. To further elucidate the disease’s complex etiology, we developed a SEM that validated the multifactorial pathogenesis hypothesis. The model demonstrated that genetic predisposition, lifestyle choices, and immunometabolic profiles directly predicted IBD risk. Additionally, disease status and dietary habits exerted indirect influences on IBD development through their interactions with these core predictors, underscoring the intricate interplay between biological and environmental factors in IBD pathogenesis.

Polygenic risk and IBD have been proven, Genome-wide association studies (GWAS) have discovered over 250 genetic loci associated with IBD22,24. Although individual genetic variants account for only a small fraction of IBD genetic variability, PRS which combined with multiple risk sites can be used as an indicator to identify individuals with a higher genetic susceptibility to IBD. In this study, we also identified mother’s bowel cancer and high pressure as protective factors, a possible reason is that having a family history of cancer may prompt individuals to be more vigilant about their health, leading them to take proactive preventive measures such as regular health check-ups and improving lifestyle habits. This could indirectly reduce the risk of developing the disease.

Previous studies have explored the impact of lifestyle on IBD, suggesting that smoking25,26, mental health27, sleep25, and physical activity28,29,30 have impacts on the onset of IBD, the impact of this factor on the onset of IBD, and its potential to reduce IBD incidence, warrants further investigation. In this study, it was found that the smoking status of the individual played an undeniable role in the development of IBD. It was also discovered that people who were exposed to tobacco at home had a higher chance of IBD. According to earlier research, Crohn’s disease patients who are exposed to passive smoking may have worsening of their health and a higher chance of surgery26. The risk of IBD in children who are passive smokers rises26. Another key factor in the development and prognosis of IBD is physical exercise. Consistent with previous research and common sense, our findings also indicate that exercise can reduce the risk of developing IBD. Research has indicated that regular exercise helps alleviate IBD symptoms by reducing inflammation and altering the microbiota, which in turn lowers the chance of developing IBD25,28,29,30. A population study found that adolescents who regularly exercise and are physically fitter are shown to be at a decreased risk of developing Crohn’s disease and ulcerative colitis29. Relevant animal investigations also show that mice that have been exercising vary from mice that have been sedentary for a long period in terms of their microbiota, intestinal immune system, cytokine production, and oxygen free radicals30. Maintaining exercise is advantageous for the prognosis of colitis, another animal experiment showed that protecting myokines from functioning skeletal muscles promoted the healing of experimental colitis in mice28. Additionally, psychological factors also play an important role in the occurrence of IBD. We found a favorable association between higher neuroticism score, insomnia, short sleep, sleep difficulties and the risk of IBD. Probably correlated with the regulation of brain-gut axis27,31, or because accumulating stress, emotional swings and sleep deprivation raise the chances of depression25. It’s hypothesized that the hypothalamic–pituitary–adrenal axis is chronically activated by depression’s prolonged stress, which activates the neuroenteric pathway and causes systemic proinflammatory cytokines32. Furthermore, our findings indicate that lower educational levels and unstable employment correlate with increased incidence of IBD, potentially mediated by stress-related mechanisms. However, it is critical to emphasize that lacking post-secondary education may not be a predisposing risk factor for IBD; instead, IBD itself may impede individuals’ ability to pursue higher education. Furthermore, individuals whose illness severity prevents them from working might already be experiencing IBD symptoms yet remain undiagnosed, highlighting potential diagnostic delays. These observations underscore the necessity of reevaluating the directional relationships between educational outcomes, employment stability, and IBD. Longitudinal investigations are imperative to decipher the temporal interplay between IBD onset, symptom evolution, educational interruptions, and occupational trajectories, thereby shedding light on the bidirectional interactions between health status and socioeconomic determinants. Individuals who have undergone bowel cancer screening may face a heightened risk of developing IBD, potentially due to underlying intestinal symptoms or conditions that could trigger the onset of IBD, it is critical to clarify that avoiding cancer screening would not prevent IBD, as the observed correlation may stem from increased health vigilance or shared risk factors.

Researchers have long been concerned about immunological metabolism’s effect on IBD. Our study performed a thorough correlation analysis on the immune metabolism of IBD patients and discovered that the immune metabolism factors that are more closely associated with the development of IBD. Our results indicated that albumin is negatively correlated with IBD. Due to its possible role in the body’s transportation of substances that protect and sustain immune molecules like immunoglobulin, albumin may have an effect on immune function33. A recent study manifested variations in serum albumin during the first two weeks of anti-TNF treatment were predictive of PNR status, the outcome of endoscopy, the length of colectomy, and the impact of anti-TNF treatment in patients with UC34. Some studies have pointed out that C-reactive protein can be used as one of the important indicators to evaluate the severity and treatment effect of IBD33. Our findings revealed that C-reactive protein might be linked to both the development and occurrence of IBD, which implies it may also be associated with the pathogenesis of IBD. In addition to this, Neutrophill count and Glycoprotein Acetyls also have strong correlation with IBD, Neutrophils play a key role in the immune system by engulfing and destroying bacteria and other pathogens35. Glycoprotein Acetyls are proteins involved in immune response control, cell adhesion, and signaling36.

Different from previous studies, we used the intake of different food groups to obtain a binary variable that might represent the total dietary habits of participants, thus further providing a more intuitive understanding of the impact of dietary factors on the risk of IBD. Our findings indicate that unhealthy dietary habits can lead to an increased risk of IBD, suggesting that we can prevent the onset of IBD by reasonably combining the intake of various food groups. Large consumption of animal fat and little consumption of fruits and vegetables have been linked to an increased risk of IBD37. Moreover, relevant research has shown that appropriate nutritional management can enhance the prognosis for IBD37.

The model further provide evidence for a multifactorial relationship. It has been demonstrated that IBD has comorbidities and extraneous symptoms. Our research indicated that individuals experiencing pain in specific body regions, such as the abdomen, back, knee, and hips are more susceptible to IBD. That is, before the onset of IBD, pain in certain parts of the body could be the early signs of disease. This increased risk may be attributed to systemic inflammation that arises during the initial phases of IBD38. The genetic marker HLA-B27 may contribute to intestinal or joint inflammation, which could explain the correlation between joint pain and IBD39. Studies have shown that increased dietary fat intake, irregular diets, eating too much processed foods, and lack of certain nutrients such as VD and calcium can increase IBD20,37,40,41. Although the precise causal association between dietary change and the beginning of IBD in patients has not yet been fully established, this large cohort study appears to offer additional data in support of that relationship.

This study has some limitations. First of all, the majority of participants with high follow-up compliance may be in a very healthy state due to the bias against health volunteers in UKB dataset. Furthermore, despite our best efforts to choose more objective data for analysis, certain lifestyle data may have been collected using questionnaire surveys, which can be vulnerable to subjectivity and introduce inaccuracy. In addition, the incidence of IBD reaches its peak in individuals under 35 years of age, establishing this demographic as a critical window for disease onset42. Whereas the research population in UKB consists of middle-aged and elderly individuals between the ages of 37 and 7314. In order to reduce the impact of age on the research results and exclude the influence of premature sickness on the outcomes, the baseline diseased population was omitted from the study during the inclusion and exclusion phases. Finally, because Europeans make up the bulk of the population in the UKB data, there are certain racial population limits. Subsequent research can be implemented by enrolling patients from different regions and ethnic groups.

Conclusions

In conclusion, this comprehensive cohort study has not only identified key hazard risks for IBD but has also constructed an etiological framework that disentangles the intricate pathogenic interplay among polygenic predispositions, lifestyle behaviors, immunometabolic profiles, dietary patterns, and disease trajectories. Our findings provide novel mechanistic insights into IBD pathogenesis, introduce promising predictive biomarkers, and propose innovative strategies for disease prevention. To deepen our understanding of IBD’s molecular underpinnings and uncover actionable therapeutic targets, future investigations should prioritize the elucidation of immunometabolic pathways at the molecular level, particularly those linking genetic susceptibility with environmental triggers in disease manifestation.