Introduction

The third most commonly diagnosed cancer, colorectal cancer (CRC), is also the second leading cause of cancer-related mortality worldwide. Annually, it comprises around 1.9 million new cases and 935,000 deaths, which also represents approximately 10% of all cancer cases and fatalities1. In recent years, particularly in the United States, colorectal cancer (CRC) has risen from the third to the second leading cause of cancer-related deaths2. Numerous epidemiological investigations indicate that despite breakthroughs in therapy lowering the overall mortality rate of CRC, the burden of CRC is progressively shifting toward younger populations2,3,4. In stark contrast to the poor survival rates of advanced-stage patients, the five-year survival rate for stage I CRC patients is as high as 90%. However, due to the protracted latent period of CRC, it typically presents at stage III or IV with distant metastasis when it is detected5. Consequently, the development of methods for the early diagnosis of colorectal cancer has emerged as a significant concern in this domain.

CRC, as a heterogeneous disease, exhibits complex and diverse mechanisms of growth and progression. Recent studies have underscored the critical roles of inflammation and immunity in its onset and advancement. Shah et al. reported that individuals with long-standing ulcerative colitis or Crohn’s disease face a 2–3 fold increased risk of developing CRC6. Chronic inflammation caused by intestinal barrier dysfunction, Paneth cell defects, or innate immune response deficiencies can create a fertile environment for tumor development. Meanwhile, ectopic lymphoid structures (ELS) formed by immune cells in the gut can act as a tumor-suppressive mechanism, aiding in the elimination of transformed cells7. In the absence of carcinogenic stimuli, inflammation can induce oxidative stress that damages DNA, hence initiating carcinogenesis. Intestinal inflammation disrupts the epithelial barrier function, triggering proliferation and clonal expansion of the initiating tumor cells as well as de-differentiation of non-stem cells into stem cell-like cells, ultimately leading to tumorigenesis8. There is accumulating evidence that dietary factors and overall nutritional status play a significant role in the pathogenesis of CRC. Several meta-analyses have shown that dietary patterns rich in fiber, fruits, vegetables, and whole grains are associated with a lower risk of CRC, while high consumption of red and processed meats, alcohol, and saturated fats may increase risk9,10. Dahham et al. comprehensively reviewed the role of lifestyle and nutritional factors in the development and prevention of colorectal cancer (CRC), emphasizing that behaviors such as smoking, alcohol consumption, obesity, physical inactivity, and poor diet significantly contribute to increased CRC risk, while healthy dietary patterns and regular physical activity are associated with reduced incidence of CRC11. These findings highlight the etiologic relevance of nutrition in CRC and support the rationale for evaluating nutritional biomarkers such prognostic nutritional index (PNI).

PNI was initially proposed by Onodera et al.12. The calculation is based on two indicators: serum albumin and lymphocyte count, serving as a measure of an individual’s immune and nutritional condition. Over time, it has been widely used as a prognostic predictor for cancer outcomes13,14,15,16,17. Recent studies have demonstrated that PNI is significantly associated with a broad range of health outcomes. These include albuminuria in adults18diabetic nephropathy in patients with type 2 diabetes19mortality in coronary artery disease20cognitive function in the elderly21non-alcoholic fatty liver disease and advanced liver fibrosis22and other diseases.

Nowadays, the early diagnosis of cancer remains a primary focus in cancer management. Although colonoscopy remains the gold standard for CRC screening, poor patient compliance23 and the variability in endoscopists’ proficiency, leading to suboptimal adenoma detection rates (ADR)24are significant limitations. For most people, there is a strong demand for an affordable, simple-to-measure, and sufficiently compelling biomarker25,26 for CRC.

Methods

Study population

The National Health and Nutrition Examination Survey (NHANES)27,28administered by the National Center for Health Statistics (NCHS), is a pivotal initiative designed to assess the health and nutritional status of the U.S. populace. Informed written agreement was acquired from all participants prior to the initiation of the study, and the research methodology was sanctioned by the NCHS Research Ethics Review Board. Additional information regarding this nationwide survey is available on the official website (https://wwwn.cdc.gov/nchs/nhanes/) as of January 17, 2025.

This study utilized data from five consecutive survey cycles of the NHANES database27encompassing a total of 50,588 participants between 2007 and 2016. These cycles were selected because all relevant variables required for calculating the PNI, assessing the history of CRC, and adjusting for other covariates were consistently and reliably collected during this period. These participants underwent extensive assessments, including demographic surveys, laboratory evaluations, and standardized questionnaires. To ensure methodological rigor and the validity of findings, a stringent data screening and exclusion process was implemented. Given that the objective of this study is to investigate the potential association between the PNI and CRC, individuals with incomplete or missing tumor history records (n = 21,416) and those lacking measurements for serum albumin and lymphocyte count (n = 2,943) were excluded. After applying these criteria, a final cohort of 26,229 eligible participants was included for subsequent analysis (Fig. 1).

Fig. 1
figure 1

Sample selection flowchart from NHANES 2007–2016.

Assessment of PNI

The exposure variable in this investigation was the PNI, which was determined by the formula PNI = 0.005 × absolute lymphocyte count (103 cells/µL) + 10 × albumin (g/dL)22. PNI was categorized into quartiles for analysis. Albumin levels were assessed via the bromocresol purple dye method29. Counts of lymphocytes were simultaneously obtained through a complete blood count (CBC) test, which utilized the Beckman Coulter technique for counting and sizing to acquire the CBC data30.

Definition of CRC

Cancer history was obtained from questionnaire data related to medical conditions. Participants were asked, MCQ220 “Have you ever been told by a doctor or other health professional that you had cancer or a malignancy of any kind?” For those who responded affirmatively, a follow-up question, MCQ230a-d “What kind was it?” was used to identify the specific type of cancer. Individuals with a diagnosis of colon cancer or rectal cancer were classified as “Yes”, while those diagnosed with other types of tumors or without any history of tumors were classified as “No”.

Definition of covariates

Based on a preliminary analysis and existing literature, we selected 21 covariates to adjust for potential confounding. These variables were chosen for their biological plausibility, established associations with colorectal cancer (CRC), or their influence on nutritional and inflammatory status2,5,31,32,33,34,35,36.


Demographic variables included:

  • Age (< 65 or ≥ 65 years)

  • Gender (male or female)

  • Race/Ethnicity (Mexican American, Other Hispanic, Non-Hispanic White, Non-Hispanic Black, and Other Races)

  • Educational level (less than high school, high school graduate, and above high school)

  • Marital status (married/living with partner vs. living alone)

  • Poverty-income ratio (PIR)

  • Examination and questionnaire-based health variables included:

    • Body mass index (BMI) and waist circumference

    • Peripheral blood cells: white blood cell (WBC) count, monocyte count (MONO), neutrophil count (NE), platelet count (PLT)

    • Serum lipids: triglycerides (TG), total cholesterol (TC), high-density lipoprotein (HDL), and low-density lipoprotein (LDL)

    • Lifestyle behaviors: smoking status (≥ 100 vs. < 100 cigarettes in lifetime), alcohol consumption (≥ 12 drinks/year vs. < 12)

    • Health conditions were derived from self-reported physician diagnoses of: Hypertension, hyperlipidemia, and diabetes

All questionnaire-based data were collected via Computer-Assisted Personal Interviews (CAPI) conducted by trained NHANES personnel in participants’ residences. All datasets are publicly available through the Centers for Disease Control and Prevention (CDC) NHANES portal.

Statistical analysis

For continuous variables, the Kruskal-Wallis rank-sum test was applied. In contrast, the chi-square test was used for categorical variables to assess the demographic characteristics of participants across PNI quartiles. Continuous variables are expressed as mean ± standard deviation (MD ± SD), and categorical variables are presented as percentages. Various analytical methods were applied to investigate the relationship between PNI and CRC. To assess the risk of CRC, logistic regression analysis was performed, treating it as a binary outcome across three different models. A trend analysis was conducted by converting the continuous PNI variable into quartile categories, and assigning the median value of each quartile as a continuous variable in the model to assess the linear trend between PNI and CRC. To examine the non-linear relationship between PNI and CRC, restricted cubic spline (RCS) regression was performed with 3 knots placed at the 10th, 50th, and 90th percentiles of the PNI distribution. An iterative approach was used to identify inflection points in the PNI-CRC relationship, enabling two-piece linear regression analysis on either side of the identified point. Further exploration of the data included subgroup analyses to clarify the relationship between PNI and CRC among different age groups, racial backgrounds, poverty status, education levels, marital status, and conditions such as hypertension, hyperlipidemia, diabetes, as well as smoking and alcohol consumption. Interaction tests were performed by including multiplicative interaction terms between each stratifying variable and the quartiles of PNI (with median values assigned and treated as a continuous variable) in the multivariable logistic regression model. The statistical significance of the interaction was evaluated using the likelihood ratio test by comparing models with and without the interaction term. Comprehensive analysis was conducted using DecisionLinnc version 1.0 (https://www.statsape.com/). Missing values for covariates (ranging from 0.1 to 8.9%) were addressed using multiple imputation by chained equations (MICE), based on five datasets. Rubin’s rules were used to combine estimates. Sex, age, and race/ethnicity were key predictors in the imputation model due to their completeness and predictive value. Statistical significance was determined at a P-value threshold of less than 0.05. This study follows a cross-sectional design, with both PNI and CRC history assessed at the same time point.

Result

Baseline characteristics

This study encompassed a total of 26,229 participants, of whom 176 self-reported a history of CRC, resulting in an overall prevalence of 0.67% within the cohort (Table 1). The participants had an average age of 49.47 ± 17.67 years, with 48.62% identified as male and 51.38% as female. Among them, 42.44% were classified as non-Hispanic White, and the mean PNI score was 42.55 ± 3.42. For further analysis, the PNI was divided into quartiles: Q1 (≤ 40.02), Q2 (40.02–43.01), Q3 (43.01–45.01), and Q4 (> 45.01). Compared to individuals in the lowest PNI quartile, those in the highest quartile generally exhibited younger age characteristics, a higher proportion of males, a greater percentage of non-Hispanic Whites, higher educational attainment, a lower prevalence of obesity or overweight, fewer individuals living in poverty, a higher tendency for alcohol consumption, and lower rates of hypertension, hyperlipidemia, and diabetes. Notably, a significant and statistically meaningful decline in the prevalence of CRC was observed with increasing PNI scores.

Table 1 Baseline characteristics of participants.

Association of PNI and CRC

To further investigate the association between PNI and CRC, we performed multivariable logistic regression analysis, which are shown in Table 2. The univariate analysis in Model 1 demonstrated a strong negative correlation between PNI and CRC [OR: 0.8939; 95% CI: 0.8614, 0.9291; p < 0.001]. Even after adjusting for gender, age, and race in Model 2, this significant negative association remained robust [OR: 0.9195; 95% CI: 0.8810, 0.9612; p < 0.001]. In the Model 3, after adjusting for all relevant factors, the negative association remained consistent [OR: 0.9357; 95% CI: 0.8942, 0.9807; p = 0.005]. This indicates that for each unit increase in PNI scores, the odds of CRC decreased by 6.43%. The receiver operating characteristic (ROC) curve analysis for Model 3 demonstrated an area under the curve (AUC) of 0.845, whereas the AUC for the model without PNI was 0.635. It indicates that the model has a strong predictive and diagnostic capability (Fig. 2). Additionally, we categorized PNI (initially a continuous variable) into quartiles for sensitivity analysis. Compared to Quartile 1, the odds of CRC in Quartile 4 declined by 47.98%, which was statistically significant. Furthermore, the observed p-trend indicated that the decreasing trend in odds of CRC with increasing PNI was statistically significant across all models, highlighting a robust inverse relationship (Table 2).

Fig. 2
figure 2

ROC curve of the diagnostic performance of PNI for CRC. (A) AUC of Model 3 with PNI = 0.845, (B) AUC of Model 3 without PNI = 0.635.

Table 2 Association between PNI and prevalence of CRC.

Additionally, RCS analysis revealed a linear relationship (Fig. 3) between PNI and CRC (p for overall < 0.001, p for nonlinear = 0.542), characterized by a significant negative correlation. We also attempted to investigate this association through threshold effect analysis, while no statistically significant inflection points were observed (Table 3).

Fig. 3
figure 3

The linear association between the PNI and CRC.

Table 3 Threshold effect analysis of prognostic nutritional index on colorectal cancer (Location: united states, years: 2007–2016).

Subgroup analysis

Subgroup analyses were conducted to evaluate the strength and reliability of the association between PNI and CRC. Factors such as gender, age, race, marital status, education level, poverty status, BMI, smoking and alcohol consumption, as well as the presence of hypertension, hyperlipidemia, and diabetes were included as consideration criteria. The results of the subgroup analysis indicate that the aforementioned factors have no significant impact (p for interaction >0.05) on this association (Fig. 4). Interestingly, this negative association between PNI and CRC appeared more robust among male participants (p for interaction = 0.022).

Fig. 4
figure 4

Subgroup analysis.

Discussion

In this cross-sectional study of 26,229 participants derived from the NHANES database, we identified a statistically significant and independent inverse association between elevated PNI scores and reduced CRC odds. Restricted cubic spline analysis revealed a linear association between PNI and CRC odds. Subgroup analyses stratified by age, gender, race, education level, and other related variables further confirmed the consistency of this association across diverse subpopulations.

CRC ranks as the third most frequently diagnosed malignancy worldwide and the second leading cause of cancer-related mortality, posing a significant burden on patients and healthcare systems1,2. In patients with CRC, classified as Tumor, Nodes, Metastasis (TNM) stage I or II, the 5-year survival rate reaches up to 93%. However, survival rates decline to 60%, 42%, and 25% for those with TNM stage IIIA, IIIB, and IIIC, respectively. Notably, metastatic CRC (mCRC; stage IV) remains incurable mainly, with the 5-year survival rate dropping below 10%37,38. Despite multiple screening modalities and extensive public health efforts aimed at promoting CRC screening, approximately one-third of eligible individuals in the United States remain unscreened. Thus, identifying a screening biomarker that is well-tolerated, easily accessible, and capable of ensuring high patient adherence would be considered an “optimal” approach in certain scenarios39.

PNI is calculated based on a composite formula incorporating serum albumin and lymphocyte count, both of which are readily accessible through routine serological assessments. It serves as a biomarker of systemic inflammation, nutritional status, and immune function, all of which play a crucial role in the pathogenesis of CRC. A meta-analysis has demonstrated that dietary pro-inflammatory components are independently linked to a higher risk of CRC40. Chronic intestinal inflammation8,41,42 is a pivotal factor in the initiation and progression of CRC, a notion that is widely recognized in CRC research. Emerging evidence indicates that various inflammatory cytokines play a crucial role in tumorigenesis and cancer progression. Specifically, IFN-γ, interleukin-12 (IL-12), IL-15, IL-17 F, and IL-18 exhibit tumor-suppressive properties in CRC, whereas IL-4, IL-6, IL-8, IL-11, IL-17 A, IL-22, IL-23, IL-33, TNF, tumor necrosis factor-beta (TGF-β), and VEGF contribute to tumor promotion43.

As an integral component of PNI, albumin not only serves as an indicator of nutritional status but also exhibits antioxidant and anti-inflammatory properties44. Reduced serum albumin levels are closely linked to elevated inflammatory markers45particularly pro-inflammatory cytokines such as IL-1, IL-6, and TNF-α46. A Mendelian randomization analysis indicates that higher serum albumin levels exert a causal protective effect against CRC risk47. Furthermore, hypoalbuminemia is linked to greater nutritional risk and the activation of systemic inflammatory responses, serving as a valuable independent prognostic marker in CRC patients48.

In conclusion, persistent chronic inflammation, malnutrition, and immune dysregulation serve as key drivers of CRC progression. This finding aligns with our observation of a significant inverse correlation between the PNI and CRC risk. Subgroup analysis revealed a significant inverse association between elevated PNI and a reduced risk of CRC. However, this association did not reach statistical significance among female participants. This discrepancy may be attributed to inherent physiological differences between sexes, as well as variations in dietary intake and health-related behaviors.

Although no prior study has comprehensively assessed the association between PNI and CRC risk in the general population, existing literature has reported associations between its components—serum albumin and lymphocyte count—and CRC or overall cancer risk. For instance, Walts et al. reported a significant association between lower serum albumin levels and increased CRC risk in a prospective cohort49. Constantinescu et al. demonstrated that genetically predicted lymphocyte traits were linked to CRC susceptibility through Mendelian randomization50. Ghuman et al. further observed that inflammatory markers, such as white blood cell count, were associated with both CRC incidence and survival51. Our study builds upon these findings by evaluating the combined effect of albumin and lymphocytes through the PNI, providing new evidence of its inverse relationship with CRC risk in a nationally representative population. Moreover, we implemented multiple imputation techniques to address missing covariates, thereby strengthening the robustness of our findings and enhancing their generalizability to broader populations. Nevertheless, the inherent limitations of our study must be recognized. One notable limitation of our study is the exclusion of approximately 48% of NHANES participants due to missing data on cancer history. This exclusion raises the possibility of selection bias if the missingness is not completely at random. For instance, individuals with limited healthcare access or lower health literacy may be less likely to report their cancer history, and these characteristics could also be associated with PNI or CRC risk. As such, the analytic sample may not fully represent the broader population, potentially affecting the generalizability of our results. Although we conducted a comparison of baseline characteristics between included and excluded participants (Supplementary Table S1) and found no significant differences in age, sex, or BMI, unmeasured confounding cannot be ruled out. Given the cross-sectional nature of this study, causality cannot be established. It is also possible that individuals with CRC may experience systemic inflammation and malnutrition as a consequence of the disease process, leading to reduced PNI scores. Thus, reverse causality remains a plausible explanation for the observed association. Although significant associations were observed in the Black and White subgroups, the lack of a significant interaction term (P for interaction > 0.05) suggests that these differences may not reflect true heterogeneity in effect across racial groups, but rather differences in statistical power. Furthermore, as the sample data were sourced from the U.S. NHANES database, regional and racial disparities may limit the generalizability and applicability of our findings. Last but not least, the absence of colonoscopy data precludes a definitive diagnosis of CRC. Reliance on self-reported interviews for CRC diagnosis may introduce recall bias and does not allow for the confirmation of whether the reported CRC cases are primary tumors.

Conclusion

Based on this cross-sectional study, higher PNI levels are inversely associated with the odds of CRC, and this relationship appears consistent across diverse demographic and clinical subgroups. Therefore, we hypothesize that PNI may reflect nutritional and inflammatory status associated with CRC, and could be explored in future research as a potential cost-effective and clinically relevant marker to aid in early detection. However, given the simultaneous measurement of PNI and CRC history, the direction of this association cannot be determined, and reverse causality cannot be ruled out. Further large-scale prospective cohort studies are warranted to validate these findings, determine temporality, and elucidate the biological mechanisms underlying the observed association.