Introduction

Body mass index (BMI) is a widely used measure in healthcare to assess an individual’s weight status and determine their risk for developing health problems related to obesity. However, it is essential to note that BMI does not distinguish between muscle and fat, so it may overestimate body fat in athletes and underestimate body fat in older adults. Additionally, BMI does not consider individual differences in body shape and composition. The World Health Organization (WHO) has recommended rescaling the BMI for Asian individuals1, based on studies performed in China that have produced new population-based BMI scales for obesity2. Indonesia, Turkey, Singapore, Japan, India, and other countries have done the same3. These BMI cutoffs vary depending on the data and resources available. Some measures used to set the new BMI cutoffs include adiposity, all-cause mortality, waist circumference, waist-to-hip ratio, and incidence rates of certain prevalent diseases4,5,6.  A 2019 study also recommended new BMI cutoffs for Black, Hispanic, and White males and females, based on the risk of developing one of three metabolic diseases, namely dyslipidemia, diabetes mellitus, and hypertension. These authors found differences in BMI cutoffs across race and gender for these three diseases7. The WHO recommends that population-based obesity BMI cutoffs be founded on morbidity and mortality6. Despite these developments, there remains a significant gap in knowledge for the United States. Previous studies have only identified race-specific BMI cutoffs in relation to particular diseases, which does not provide a comprehensive understanding of the general health risks associated with obesity. Additionally, population-specific BMI cutoffs have not been adequately researched or established for major racial groups within the U.S., particularly for Asian, Native American, and Black Americans.

Assessing the relationship between BMI and healthcare burden across race in the United States (U.S.) is crucial in determining a person’s overall health status and informing the development of a tailored treatment plan. This relationship between BMI and healthcare utilization is generally positive, meaning that as BMI increases, so does healthcare utilization8. Higher BMIs are often associated with a greater risk of obesity-related health conditions such as type 2 diabetes mellitus, cardiovascular disease, and certain cancers9. These health conditions can lead to increased hospitalizations, medication use, and doctor visits, thereby straining the healthcare system. While individuals with lower BMIs are less likely to require as many healthcare resources, regular healthcare utilization results in better healthcare outcomes regardless of BMI status10,11.

The Charlson Comorbidity Index (CCI) is a widely used tool for evaluating comorbidity in patients with chronic diseases and is often used to assess health outcomes12. This study aims to assess the relationship between BMI and CCI as stratified by race and healthcare utilization among middle-aged patients in the US. We hypothesize that BMI does not operate equally across race. This will help address health disparities and design target interventions to improve health outcomes.

Materials and methods

Research design

This cross-sectional study used data from the Cerner HealthFacts® data warehouse for 2016–2017 that did not contain any identifying information. The dataset comprises the encounter data from more than 490 million patient encounters with over 70 million patients who received treatment at hospitals and clinics from 792 non-affiliated healthcare systems across the U.S. from 2001 to 2017. The data includes information about the type of encounter, patient medical history, diagnoses, laboratory results, prescriptions, patient demographics, clinic type, and procedures performed. The Cerner HealthFacts® database did not capture waist circumference, which has strongly correlated with multimorbidity in previous research. However, this database did capture insurance and visit types. To ensure data quality, we included only healthcare centers that had recorded values for race, age, gender, and BMI for at least some patients. Of the 588 hospitals with data available during the study period, 474 met these criteria and were included in the analysis. We will use these two variables to investigate their association with the burden of multimorbidity across races.

For this study, the patient inclusion criteria were: (1) age 45–64 years, (2) BMI value present and ranging between 18.5 and 75, (3) assigned race category, and (4) assigned gender (i.e., male or female). The exclusion criteria were refined to omit specific patient groups where standard BMI analysis could be confounded by extenuating health conditions. These exclusions were: (1) patients with a cancer diagnosis, due to cancer and its treatments significantly affecting body weight either through loss or gain, thereby complicating the analysis of BMI and its direct relationship with other comorbidities; (2) pregnant patients, as pregnancy can temporarily alter weight and BMI in ways that do not reflect the patient’s usual health status; and (3) patients with International Classification of Diseases-9th Version-Clinical Modification (ICD-9-CM) diagnostic codes. These exclusions were carefully chosen based on their potential impact on the variables under investigation, particularly BMI and its associations with other comorbidities. Cancer and its treatments can significantly affect body weight, either through loss or gain, which can obscure the analysis of BMI and its direct relationship with other comorbidities. Excluding these patients allows for a clearer examination of the BMI impacts uninfluenced by the extreme weight variations often seen in cancer patients. Pregnancy can temporarily alter weight and BMI in ways that do not reflect the patient’s usual health status. Since BMI changes during pregnancy are typically not indicative of the individual’s health outside of pregnancy, including pregnant patients could distort the analysis of BMI’s effects on health outcomes. Using ICD-9-CM codes, due to their less specific nature than ICD-10-CM, could lead to inaccurate disease classifications affecting the study’s precision in assigning comorbidity scores. CD-9 and ICD-10 differ not only in the number and structure of codes but also in the clinical details they capture. This disparity makes a direct or 1-to-1 mapping between the two systems challenging and often inaccurate. Given that our study relies heavily on precise coding to assign CCI values accurately, using two different coding systems could introduce a significant risk of misclassification and bias in our analysis. We selected a two-year study period to enhance our analytical sensitivity in capturing healthcare utilization patterns and to gather a more comprehensive dataset, ensuring robustness and depth in our findings.

We stratified the patients by race and divided them into three subgroups based on patterns of healthcare utilization, which we defined as a minimum of two outpatient visits within the two-year study period. Patients with no outpatient visits were classified as “non-utilizers,” while those with only one such visit were classified as “low utilizers.” It should be noted that all utilization subgroups may have had inpatient and/or emergency department visits. This decision was intentional and is grounded in our study’s focus on regular preventive care. We chose to emphasize outpatient visits because they are a primary indicator of engagement in ongoing health maintenance, which can significantly reduce the burden of disease over time. Stratifying by utilization refers to grouping individuals or data based on their use of healthcare services. This can include the frequency of doctor visits, use of prescription medications, and hospitalization rates. By stratifying the data in this way, we can examine how different levels of healthcare utilization may be associated with various health outcomes. It is also crucial to stratify the data by race because some racial groups may have different levels of access to healthcare that influence health-seeking behavior. By stratifying the data, we can examine how these differences in health-seeking behavior may alter associations with health outcomes for different racial groups.

Ethical considerations

This research utilized a limited dataset that excluded the 16 identifiable variables. Per the National Institutes of Health policy, the University of Tennessee Health Science Center (UTHSC) Institutional Review Board (IRB) classified the research as exempt. This study was conducted in compliance with all other necessary research requirements.

Ethical considerations regarding data use agreements concerning specific American Indian/Alaska Native tribes are acknowledged. The study does not delve into specific tribes; therefore, we did not seek any data use agreements. We followed the recommendations for using American Indian/Alaska Native data by the Urban Indian Health Institute13, which recommends not censoring these data as much as possible.

Independent variables

Demographics, payer type, smoking, alcohol use, urbanism, and 2014–2015 healthcare utilization (i.e., patients who had an encounter during 2014–2015) were the primary independent variables. Demographic variables were race, age, gender, marital status, and body mass index (BMI), while racial categories were Black, Asian/Pacific Islander, White, and Native American. Given the limitations in sample size, we chose to include only these four racial categories due to their adequate representation within the Cerner HealthFacts dataset, despite the availability of additional racial categories. We categorized marital status as those patients currently with a partner (i.e., married or life partner), having an ex-partner (i.e., divorced, widowed, or legally separated), single, and unknown. We categorized payer types as those patients with public insurance, private insurance, self-pay, or unknown, based on the information available in the dataset. We limited BMI values to the range 18.5 ≤ BMI < 75, based on the distribution of this variable within our sample population. We defined 2014–2015 healthcare utilization as a patient having at least two outpatient visits during that period. In this study, the variables of age and BMI were standardized via min-max scaling to a uniform range of 0 to 1. This rescaling approach is critical in multivariate analyses as it normalizes the scales of variables with disparate measurement units and ranges, thereby ensuring that each variable contributes equitably to the regression model. By doing so, it mitigates the risk of introducing scale-induced bias into the model’s estimations, thus enhancing the interpretability and numerical stability of the estimated parameters. Patients with unknown urban status were assigned to the urban category, since it had the highest prevalence in the dataset.

Outcome variable

Our primary outcome variable was the healthcare burden measure, CCI, calculated using the Quan version of the Charlson Comorbidity Index14 via the R comorbidity package from the International Classification of Diseases-10th Version-Clinical Modification (ICD-10-CM) diagnosis codes. The Quan version of the Charlson Comorbidity Index is an adaptation of the original CCI that has been enhanced for use with ICD-10 administrative data. This version improves the accuracy of comorbidity data capture by updating the coding algorithms to reflect changes in the classification and diagnosis of diseases, providing a more precise tool for assessing patient comorbidities in contemporary healthcare datasets. Thus, it is important to note that a patient might have morbidities or comorbidities that were not on this list. We chose CCI as our outcome variable since it is a reliable tool for evaluating health outcomes by quantifying the degree of illness in an individual patient12 and provides a standardized measure of disease that allows patient comparisons over time15. We transformed CCI scores into a binary variable—0 for patients with a CCI of 0 and 1 for those with a CCI of 1 or higher. This modification was chosen to simplify the analysis and enhance the interpretability of data, particularly for identifying BMI cutoffs related to patients with any comorbidity burden. As we exclude patients with cancer, only the 16 other comorbidities in the Quan version of CCI are represented in our analysis.

Statistical analysis

In this study, we applied logistic regression with a binomial family distribution to the training dataset (70% of the final population) to model the relationship between the predictor variables and the binary outcome variable, CCI16. The Kolmogorov-Smirnov (KS) test was performed to compare the distribution of the outcome variable among the different cohorts17. The results indicated no statistically significant deviation from normality for all the cohorts except for the White utilizer cohort, which the KS test showed to be statistically significant. We checked for multicollinearity, which refers to linear relationships between multiple variables18 using a generalized variance inflation factor (GVIF) analysis suitable for a combination of numerical and categorical variables19,20. We used the car R package to remove variables with a high GVIF^(1/2Df ) score (i.e., alcohol use)21 and repeated this process until no variable had a score above the conservative threshold of two20.

We examined how BMI is associated with CCI score by comparing the area under the curve (AUC) of the receiver operating characteristic (ROC) curve for the adjusted model with and without inclusion of the BMI. We determined the best BMI cutoff for each group by analyzing the ROC curve and finding the point with the highest AUC within the range of 20 to 38 BMI for each group. We had 19 ROC curves for each cohort and considered only the BMI cutoff with the highest AUC.

Results

In this study, the majority of patients in the combined cohorts were female and the average age was 55 years. Table 1 shows the breakdown of demographics by race. The percentages were determined in relation to the entire patient population. Table 2 shows the breakdown of the number of patients in different utilization cohorts by race.

Table 1 Patient demographics by race.
Table 2 Number of patients in utilization cohorts by Race.

The Cerner HealthFacts® database included data from 1,500,636 middle-aged patients for 2016–2017. The study flow diagram used to recruit the final patient population is shown in Appendix Fig. 1. After data cleaning procedures, excluding patients with ICD-9-CM diagnostic codes, patients with a diagnosis of cancer or pregnancy, and patients who received treatment in multiple urban locations, the remaining patient population comprised 1,271,697 individuals. The mean age, BMI, and CCI scores for each racial cohort are shown in Table 3, while the distribution of BMI across races is shown in Appendix Fig. 2.

Fig. 1
figure 1

Odds Ratios with Confidence Intervals Across Racial Groups for Utilizers. Estimates that are not statistically significant cross the reference line at 100 on the y-axis. The “Ex-partner” variable represents patients who are divorced, widowed, or legally separated.

Table 3 Mean and standard deviations for age, BMI, and CCI score by race for healthcare utilizer Cohorts.
Fig. 2
figure 2

Odds Ratios with Confidence Intervals Across Race for Non-Utilizers. Estimates that are not statistically significant cross the reference line at 100 on the y-axis. The “Ex-partner” variable represents patients who are divorced, widowed, or legally separated.

In our logistic regression model, the Charlson Comorbidity Index (CCI) serves as the outcome variable, with predictors including BMI, age, gender, health insurance type, smoking status, marital status, healthcare utilization during 2014–2015, and urbanism. The reference categories for the categorical variables are as follows:

  • Gender: Female.

  • Health Insurance Type: Self-pay.

  • Marital Status: Partnered.

  • Smoking Status: Non-smoking.

  • Urbanism: Rural.

  • Healthcare Utilization (2014–2015): Fewer than 2 outpatient visits.

Figures 1, 2 and 3 present the odds ratios for various health variables, stratified by race and healthcare utilization, displayed on a logarithmic scale. The logarithmic scale was employed to accommodate the significantly higher estimates associated with BMI compared to other variables, ensuring a clear and comprehensible visual representation across a broad range of values. Appendix Table 1 provides the non-logarithmically transformed estimates and their corresponding confidence intervals for each health variable. Based on the odds ratio, high BMI is significantly associated with the healthcare burden measure, CCI, in the Asian/Pacific Islander and White races across all utilization cohorts. In the healthcare utilizer cohort, BMI coefficient values almost doubled for those two races compared to the Black and the Native American races. BMI had the highest predictor of CCI across race and utilization. As the BMI variable was scaled between 0 and 1, the odds ratio represents the change in odds associated with moving from the lowest to the highest value of the BMI scale. This quantifies the relative risk increase or decrease for individuals at the high end of the BMI scale compared to those at the low end. The top three variables predicting CCI varied by healthcare utilization. For example, BMI, age, and smoking were the top three variables predicting CCI in the healthcare utilizer cohorts for all races. Yet, BMI, age, and public insurance were the top three variables predicting CCI in the healthcare non-utilizer cohorts for the Black and Native American races. At the same time, the rank remained the same for the remaining races. For the healthcare low utilizers, BMI and smoking were the top two variables positively predicting CCI in the Black race. The third was unknown marital status, which was negatively associated with CCI prediction. Age, smoking, and BMI remained the top three predictors for the remaining races in the low-utilizer cohorts.

Fig. 3
figure 3

Odds Ratios with Confidence Intervals Across Race for Low-Utilizers. Estimates that are not statistically significant cross the reference line at 100 on the y-axis. The “Ex-partner” variable represents patients who are divorced, widowed, or legally separated.

We assessed the association of BMI with CCI score by displaying the AUC of the ROC curve for the adjusted model with and without BMI in the health utilizer cohort alongside the highest AUC of the adjusted model with BMI in the 20–38 range, as shown in Fig. 4. The highest AUC was achieved for the Asian/Pacific Islander cohort (0.71) and the lowest for the Black cohort among the healthcare utilizer cohort (0.63). The cutoff for the highest AUC in the healthcare utilizer cohorts was 34 for the Black cohort, 27 for the Asian/Pacific Islander cohort, 32 for the White cohort, and 35 for the Native American cohort. The assessment of the remaining utilization cohorts is shown in Appendix Figs. 3 and 4. In the low utilization cohort, the highest AUC was achieved for the White racial group, followed by that for the Asian/Pacific Islander racial group, with the CCI for the Black racial group being predicted least well by the model. In the non-utilization cohort, AUC values range between 0.65 and 0.69, with the Native American cohort having the highest value and Asians/Pacific Islanders having the lowest. BMI cutoffs vary differently based on utilization. For the Asian/Pacific Islander Cohort, BMI cut-offs were higher in low and non-utilizer cohorts and lower for African Americans in the low-utilizer cohort than the regular healthcare usage cohort.

Fig. 4
figure 4

Comparing Area Under the Curve (AUC) for different models to evaluate the association between BMI and CCI for healthcare utilizer cohorts. Red curves represent the unadjusted model without BMI, the blue curves represent the adjusted model with BMI, and the orange curves represent the highest AUC achieved and the corresponding BMI cutoff for that AUC value.

Discussion

This study assessed the relationship between BMI and the Charlson Comorbidity Index (CCI) across different racial groups. BMI was significantly associated with CCI, regardless of healthcare utilization and race. The AUC values suggest variations in the model’s ability to predict CCI, with an optimal cutoff for African American and Native American populations being higher than for White populations. This data can help researchers and healthcare providers understand and address health disparities22,23,24,25. Additionally, the variables associated with CCI varied depending on the degree of healthcare utilization being analyzed. For the White and Asian/Pacific Islander populations, low utilizers of regular care demonstrated higher odds ratios for BMI. As racial minority populations tend to have greater utilization of emergency services26, it is important to factor utilization into this kind of research. Users of “real world” EHR data should be aware that utilization can create bias in obesity research, and stratification is necessary. This research supports the growing emphasis from the American Medical Association on understanding the limitations of BMI27. Our findings across race groups reinforce the study’s central argument that relying solely on BMI without considering racial nuances can lead to suboptimal predictions of health outcomes.

Our study aligns with the growing movement highlighted in the work by Prillaman to move beyond an overreliance on BMI in obesity diagnosis and treatment28. The emphasis on racial differences adds a critical layer to this discussion, acknowledging that a one-size-fits-all approach is inadequate. In the context of the global demand for anti-obesity medications, our study raises concerns about the potential for BMI to continue as a predominant diagnostic tool, leading to the prescription of medications without thorough consideration of individual health contexts. This aligns with the Nature paper’s cautionary note about the impact of increasing reliance on BMI in the era of emerging weight-loss drugs. Studies consistently show that people from different ethnic and racial backgrounds may have different body compositions for a given BMI. For example, individuals of Asian descent may have a higher percentage of body fat at lower BMI values compared to individuals of European descent2. Some research suggests that health risks associated with obesity-related conditions may manifest at lower BMI levels in certain non-white populations compared to white populations7. Some countries and regions have started developing or adopting population-specific guidelines for assessing obesity and related health risks. These guidelines consider the unique characteristics of their populations, including genetic, cultural, and lifestyle factors. Our work adds to this mounting evidence by identifying the association between BMI and a common comorbidity burden score to identify cutoffs for BMI across racial groups in the United States. By factoring in utilization, we were able to control for a key characteristic that may bias results in other studies.

The study’s findings prompt reflection on the broader implications for clinical practice. While maximizing the AUC to identify BMI cutoffs represents a statistical optimization, its clinical relevance requires further investigation. Future research should focus on translating these statistical findings into meaningful clinical interventions and guidelines.

Limitations

A cross-sectional design may not fully capture the relationship between BMI, healthcare burden, and healthcare utilization complexities. It may not control for other important factors influencing healthcare utilization, such as genetics, lifestyle, and overall health. Additionally, the inherent limitations of a cross-sectional study restrict our ability to support causal assumptions, as it captures data at a single point in time rather than across different time periods. This prevents us from establishing temporal sequences and causality between the observed variables. Many patients were excluded from the analysis due to missing BMI, with a rate of 44% missingness. Ad hoc analysis, as shown in Appendix Table 2 was conducted to compare patients with and without recorded BMI values across several key variables: age, race, sex, visit count, and insurance status. Our findings confirm that there are no significant differences in average age (45 vs. 45) or the percentage of white individuals (70% vs. 71%) between our study cohort and the cohort without BMI values, respectively. Furthermore, our study cohort shows a more balanced gender distribution (55% female and 45% male vs. 68% female and 39% male), closer to the real distribution in the United States. However, our study cohort exhibits a higher percentage of Black individuals (18% vs. 12%) and a lower rate of missing race data (2.9% vs. 8.2%) This observed difference in missingness might obscure the true proportion of Black individuals and is likely associated with utilization. Notably, the average visit rate in our study cohort is more than twice that of the cohort without BMI values (19 vs. 7) during the study period. Higher utilization is likely associated with the presence of a BMI value as well as the more visits a patients has the more opportunities to have this value collected. Additionally, fewer patients in the study cohort lack a recorded payer type, and a greater percentage have multiple payer types, which likely corresponds to their higher average visit count. The impact of these differences driven by utilization are controlled for by stratifying by utilization in our study29. We also limited our study to middle-aged patients, meaning our conclusions cannot be generalized to the whole population. Cerner HealthFacts had a limited number of variables that we could control for. Future work that includes more variables such as education, physical activity, stress levels, and diet, and includes more younger and older age groups over a longer period of time could provide a more comprehensive approach for understanding the BMI-CCI relationship.

Conclusions

To our knowledge, this is the first study to assess the relationship between BMI and healthcare burden as stratified by race and healthcare utilization. This research demonstrated that the relationship between BMI and CCI varied across race within the same healthcare utilization cohorts. Some of this variation could be driven by access to healthcare resources. Most prior research regarding multimorbidity focused on a specific point in time and a few multimorbidities at the same time. Understanding how multimorbidity accumulates over time across populations has not been addressed. More work needs to be done to understand how multimorbidity, BMI, and healthcare burdens are associated across race.