Introduction

For decades, we have witnessed a steady increase in the prevalence of overweight and obesity worldwide1. Obesity is one of the significant public health challenges worldwide, being associated with a substantial increase in the incidence of chronic diseases such as type 2 diabetes, cardiovascular disease and certain cancers2,3. In addition, abdominal (central or visceral) obesity is a significant risk factor for cardiovascular disease, diabetes and cancer, and plays a vital role in the metabolic syndrome4. Therefore, people living with obesity have different profiles and health needs, but are often referred to as a single entity, defined by a single parameter (i.e., body mass index [BMI]) or not discussed at all2,5. Therefore, identifying obesity is crucial for assessing the risk of associated disorders, making it a significant health problem6. Simple metrics such as BMI have traditionally been used for diagnosis and classification; however, these do not always accurately reflect actual body composition or metabolic risk7,8. BMI is the metric currently used to define anthropometric height/weight characteristics in adults and to classify (categorize) them into groups9. Current BMI-based measures of obesity may underestimate or overestimate adiposity and provide inadequate information on individual-level health, undermining medically sound approaches to health care and policy10. Furthermore, BMI is insufficient for accurate disease classification of obesity at the individual level because people with similar BMIs often have disparate health risks11, and do not consider variations in parameters such as body composition, including fat mass (FM), lean mass and lean mass distribution12,13.

Because of this, there is a need for complementary indices to identify obesity in adults14. In this context, derived anthropometric indices, such as (FMI), fat-free mass index (FFMI) and skeletal muscle index (SMI), have demonstrated a greater discriminative ability to characterize different obesity profiles and nutritional status15,16,17. These indices offer a better understanding of obesity by differentiating between FM and fat-free mass (FFM), thus providing a more comprehensive assessment of health status18. One of the technologies commonly used to assess body composition, also in clinical trials, is bioelectrical impedance analysis (BIA)19. BIA allows the determination of FM and FFM20,21. Both FM and lean mass (kg) must be normalized by height squared (m2), as the FMI and FFMI22. Therefore, BIA provides valuable anthropometric data that can help differentiate obesity phenotypes and guide better therapeutic approaches23.

However, it is important to note that BIA does not constitute a reference method for measuring body composition; methods such as dual X-ray absorptiometry (DXA) are considered reference standards because of their higher accuracy24. In addition, BIA is based on predictive models derived from algorithms developed for various populations, which implies that the validity of the results may vary according to the characteristics of the sample evaluated.

In this study, the equipment manufacturer’s default algorithm was used, the development and validation of which was performed in a population with similar characteristics but not the same as those of the current sample. Although the instrument has been previously validated against DXA25, its accuracy has not been confirmed in populations with anthropometric profiles similar to those of the participants in this study, which represents a methodological limitation to be considered. Despite this, given that the aim of the present study was to evaluate the relative predictive value of the BIA-derived indices within a supervised classification model, it was considered that the possible absolute imprecision of the algorithm does not critically compromise the aims of the study, although it should be taken into account when interpreting the results and it is suggested that algorithms be validated for this population to allow the use of more accurate models.

Artificial intelligence (AI) has gained worldwide recognition, including machine learning (ML), which utilizes sophisticated neural networks26. AI algorithms can predict obesity; however, more research is needed to evaluate their effectiveness in analyzing obesity-related data and to examine more advanced AI methods27. ML can be classified into two main types: unsupervised learning, which operates without labelled data, and supervised learning, which relies on labelled data for guidance26. The development of ML algorithms offers an unprecedented opportunity to automate and improve the classification of individuals according to these parameters, facilitating clinical or public health decision-making with greater accuracy and efficiency27,28,29. The use of ML techniques has emerged as a key tool in the detection and management of obesity, allowing the analysis of large volumes of biometric and behavioral data to identify patterns associated with overweight30,31.

Different studies using ML algorithms to predict obesity have revealed that the use of these models proves highly effective in accurately predicting human obesity32,33. By considering various factors, such as demographic information, laboratory results, physical examination findings, and lifestyle factors, these models successfully identify crucial risk factors associated with the obese weight category34. The application of ML in this context not only improves the early detection of obesity but also optimizes prevention and treatment strategies, tailoring them to the individual needs of each patient33,35. In this sense, ML can transform three significant areas of biomedicine: clinical diagnosis, precision treatments and health monitoring, which aims to maintain health through various diseases and the normal ageing process36.

This study aimed to compare the performance of various supervised ML algorithms (i.e., random forest, gradient boosting, or support vector machine, among others) in classifying obesity levels in a large sample of adults, based on multiple anthropometric indices obtained through multifrequency bioimpedance. This approach aims to move towards more objective, interpretable and applicable tools in community or clinical settings.

Methods

Participants

A non-experimental, cross-sectional, comparative, and associative study was conducted with a total sample of 5372 adult participants (mean age: 34.6 ± 10.0 years), comprising 2727 women (29.3 ± 8.0 years) and 2645 men (37.8 ± 9.2 years). Participants were recruited through non-probabilistic sampling from a broad spectrum of urban and rural communities, ensuring demographic diversity. Eligibility criteria required individuals to be between 18 and 50 years of age, free from non-communicable chronic diseases, and, where applicable, not pregnant. All subjects provided signed informed consent. Individuals classified as physically active according to the World Health Organization (WHO) guidelines37, or those with medical conditions affecting body composition, were systematically excluded.

Instrumentation and procedure

All assessments were conducted in standardized environments within community health centers by licensed and certified nursing technicians. Prior to data collection, all technicians underwent rigorous training in the operation of BIA equipment. Participants were thoroughly briefed on the study procedures to ensure full compliance and understanding.

Measurements were obtained under strictly controlled conditions: ambient temperature averaged 20 °C, with a relative humidity of approximately 70%. Participants were instructed to refrain from engaging in intense physical activity, consuming alcohol, or taking diuretics for a minimum of 48 h prior to evaluation. Assessments were conducted following a minimum 4-hour fasting period, with prior gastric and bladder voiding. Subjects were measured barefoot and in undergarments, with all contact areas disinfected using 70% isopropyl alcohol in accordance with sanitation protocols.

A validated multi-frequency, octopolar bioelectrical impedance analyzer (InBody 270) was employed to assess body composition. Parameters extracted via the Lookin’ Body software suite included: body weight, height, BMI, fat mass (kg and %), FFM, total body water (TBW), skeletal muscle mass (SMM), and basal metabolic rate (BMR), among others.

Anthropometric indices

To comprehensively evaluate and classify adiposity and body composition, the following anthropometric indices were calculated:

  • BMI: Weight (kg) divided by height squared (m2).

  • FMI: FM (kg) divided by height squared (m2).

  • FFMI: FFM adjusted for height.

  • SMI: Appendicular muscle mass (kg) divided by height squared (m2).

  • Muscle mass index (MMI): Total muscle mass (kg) divided by squared height (m2).

Nutritional status was classified according to the thresholds proposed by Harty et al.38, adapted to general population parameters.

Statistical analysis

All statistical analyses were conducted using jamovi software (version 2.3.21). Descriptive data were expressed as mean ± standard deviation. Between-group comparisons by sex were performed using contingency tables and inferential statistics. The assumption of normality was assessed via the Shapiro-Wilk test, followed by analysis of variance (ANOVA) and Bonferroni post hoc corrections for pairwise comparisons. Effect size was estimated using Cohen’s d, with interpretation thresholds defined as follows: small (≥ 0.2–0.5), medium (> 0.5–0.8), and large (> 0.8)39. Statistical significance was set at p < 0.05.

Machine learning analysis

The ML analysis was performed in Jupyter Notebook, and the Python programming language was used to develop codes.

  1. 1.

    Data acquisition

A multidimensional dataset was compiled, integrating anthropometric parameters and lifestyle-related variables derived from structured questionnaires and physical measurements. Variables included weight, height, BMI, age, sex, physical activity level, caloric intake, and other health-related metrics. Rigorous data cleaning protocols were applied to remove incomplete entries and outliers, ensuring dataset integrity. Before training the models, outliers were removed to optimize the quality of the analysis using z-scores.

  1. 2.

    Data preprocessing

Prior to model training, a series of preprocessing steps were implemented:

  • Numerical normalization using minimum-maximum (min-max) scaling.

  • One-hot encoding for categorical variables such as gender and dietary habits.

  • The dataset was split into training (70%) and testing (30%) sets using stratified sampling, preserving the proportional distribution of obesity categories.

  1. 3.

    Feature selection

To enhance model efficiency and interpretability, both statistical and algorithmic feature selection techniques were employed. Correlation analysis and recursive feature elimination with a random forest base estimator were utilized to identify and retain the most predictive variables while minimizing dimensionality.

  1. 4.

    Supervised classification algorithms

Multiple supervised ML algorithms were trained and benchmarked to classify individuals into distinct obesity categories. The models included:

  • Support vector machine with radial basis function kernel.

  • Random forest classifier with hyperparameter tuning.

  • K-nearest neighbors with optimal k selection.

  • Logistic regression.

  • Gradient Boosting for high-performance ensemble modeling.

  • Decision tree using recursive binary partitioning based on impurity measures (e.g., Gini index or information gain), producing a hierarchical, tree-structured model for classification or regression tasks.

  1. 5.

    Model performance evaluation

Models were assessed using a comprehensive suite of classification metrics:

  • Accuracy, precision, recall, and F1-score.

  • Confusion matrix to examine class-level performance.

  • 5-fold cross-validation to ensure generalizability.

  • Area under the receiver operating characteristic curve (AUC-ROC).

  • Feature importance plots (for tree-based models) to interpret variable contributions from SHapley Additive exPlanations (SHAP).

  1. 6.

    Final implementation and internal validation on a held-out test set.

The best-performing model was validated on an independent test set, evaluating its robustness and generalization capability. The final model’s implications were discussed in terms of its clinical applicability and potential for integration into decision support systems in public health and preventive medicine frameworks.

Ethical considerations

This study was conducted in full compliance with the ethical principles outlined in the Declaration of Helsinki40 and the International Ethical Guidelines for Health-related Research Involving Humans issued by the Council for International Organizations of Medical Sciences (CIOMS)41. The evaluation protocols were approved by the Scientific Ethics Committee of the Universidad Viña del Mar (Code R62- 19a). Prior to data collection, the study protocol received approval from the appropriate institutional ethics review board. All participants were thoroughly informed about the study’s aims, procedures, potential risks, and benefits, and provided written informed consent before inclusion. Confidentiality and anonymity were maintained by assigning coded identifiers and securing data in password-protected files accessible only to authorized personnel. Participants were informed of their right to withdraw from the study at any time without consequences. Furthermore, all assessments were conducted under standardized and safe conditions, with trained healthcare professionals present to ensure participants’ well-being and adherence to ethical protocols.

Results

Table 1 presents descriptive statistics (mean ± standard deviation) for body composition indices stratified by fat classification (normal, high, very high). Individuals in the “very high” fat category showed the highest BMI mean (29.7 ± 3.0) and FMI (9.8 ± 2.4), whereas the “normal” fat group exhibited lower values in both parameters (BMI: 25.4 ± 2.5; FMI: 5.6 ± 1.6). Notably, the SMI and muscle mass index remained relatively stable across groups, with minimal variation. Similarly, the FFMI showed marginal differences between categories. These findings suggest that increases in fat classification are primarily associated with adiposity-related measures, while lean mass components remain comparatively consistent.

Table 1 Comparison of the various indexes used according to the level of obesity classification.

Six supervised learning algorithms were compared for a multiclass classification (three classes: normal, high, and very high), with their performance evaluated in terms of accuracy, precision, recall, F1-score, training time, and AUC-ROC. The results are summarized in Table 2.

Table 2 Comparison of the quality of the machine learning models.

The results presented in Table 2 indicate that the random forest model achieved the best overall performance, with an accuracy of 84.21%, a precision of 83.66%, a recall of 84.21%, and an F1-score of 83.74%, outperforming all other evaluated models. Gradient boosting and k-nearest neighbors also demonstrated competitive performance, each yielding F1-scores above 80%. In contrast, the support vector machine model exhibited the lowest performance, with an F1-score of only 60.96%, highlighting its limited ability to handle the complexity of the classification task. These findings support the selection of tree-based models, particularly random forest, as the most suitable approach for the multiclass classification task in this study, due to their robustness, stability, and ability to generalize effectively in the presence of nonlinear relationships.

Fig. 1
figure 1

Comparison of area under the receiver operating characteristic curves across machine learning models and obesity categories (normal, high, very high).

In Fig. 1 the AUC-ROC curves by model and class reveal that tree-based models, particularly random forest and gradient boosting, exhibit superior discriminative capacity across the three evaluated classes, consistently approaching the upper-left corner of the graph. This pattern reflects a high true positive rate coupled with a low false positive rate, indicative of excellent classification performance. In contrast, the support vector machine model displays noticeably flatter curves, especially for classes 0 and 1, suggesting a limited ability to correctly distinguish between classes. K-nearest neighbors, logistic regression, and decision tree show intermediate performance, with moderately high AUC-ROC curves for some classes but lacking the consistency and robustness observed in ensemble methods. Collectively, these results reinforce the conclusion that random forest provides the best balance between sensitivity and specificity in a multiclass context, further supporting its position as the most robust and generalized model for classifying fat level categories.

Fig. 2
figure 2

SHapley Additive exPlanations value summary of feature contributions to obesity classification in a single prediction. BMI, body mass index.

Figure 2 presents a local SHAP explanation for a specific model prediction, illustrating how the final output f(x) = 0.91, starting from a baseline value of E[f(x)] = 0.225, results from the cumulative contributions of various input features. The FMI emerges as the primary positive driver, contributing + 0.18 to the final probability, followed by the FFMI, BMI, and MMI, each contributing between + 0.13 and + 0.14. The SMI also exerts a positive influence (+ 0.09), while gender shows no significant impact. This analysis highlights that the high predicted probability is predominantly driven by features related to body composition, particularly fat and muscle mass, thereby supporting the physiological validity of the model’s behavior in classifying fat level categories.

Fig. 3
figure 3

Cumulative SHapley Additive exPlanations value plot showing feature impact on model output for obesity prediction.

Figure 3 shows a cumulative SHAP graph illustrating how each feature progressively contributes to the model’s prediction of a specific instance. The final model output converges to approximately 0.91, starting from a baseline value of around 0.225, which represents the model’s average output in the absence of individualized input. The strongest contributor to the elevated prediction is the FMI, followed by the FFMI, BMI, muscle mass index, and SMI, all of which incrementally drive the prediction upward. Gender appears once again as a neutral variable, with no meaningful impact. A color transition from blue (indicating negative impact and low values) to red (positive impact, high values) visually confirms that higher values in these physiological features are strongly associated with the predicted classification. This plot reinforces the interpretation that body composition is the primary determinant in the model’s output and that the model aligns with biologically grounded metrics in predicting body fat levels.

Fig. 4
figure 4

Feature importance ranked by SHapley Additive exPlanations values for a single prediction in obesity modeling. BMI, body mass index; SHAP, SHapley Additive exPlanations.

The SHAP bar chart in Fig. 4 shows the individual contributions of the features to a specific prediction, ranked by importance. The FMI stands out as the most influential variable, contributing + 0.18 to the final classification probability, followed by the FFMI and BMI, each contributing + 0.14. These values suggest that the model relies not only on the absolute fat content but also on the balance between fat and lean mass in making its decision. Next in importance are the MMI (+ 0.13) and the SMI (+ 0.09), indicating that muscularity also plays a role in the model’s output. Gender shows no contribution, suggesting that the model does not exhibit bias toward this variable in this instance. Overall, this visualization reinforces the physiological coherence of the model: it predicts higher body fat levels when confronted with elevated values in key indicators of total body mass, fat, and muscle.

Fig. 5
figure 5

SHapley Additive exPlanations summary plot depicting feature impact and value distribution on obesity classification. BMI, body mass index; SHAP, SHapley Additive exPlanations.

This SHAP bee diagram illustrated in Fig. 5 shows the impact of each feature on the model´s output across multiple instances, capturing both the magnitude and direction of influence. The FMI emerges as the most critical variable, with high values (shown in red) associated with positive SHAP values, thus increasing the model’s predicted probability. In contrast, low values (blue) correspond to negative contributions. A similar, though less pronounced, pattern is observed for the FFMI, BMI, and muscle mass index, all of which show moderate dispersion, suggesting a consistent and physiologically plausible influence on the model’s predictions. In contrast, gender and SMI display low impact, with SHAP values clustered around zero, indicating minimal or negligible overall effect on the classification outcome.

Fig. 6
figure 6

SHapley Additive exPlanations heatmap showing feature contributions across individuals ordered by model output in obesity prediction. BMI, body mass index; SHAP, SHapley Additive exPlanations.

The SHAP heatmap in Fig. 6 visualizes the cumulative impact of each feature on model predictions across multiple instances. The top panel displays the variation in the model’s output f(x), while the heatmap below encodes the SHAP values of individual features using a colour scale, blue indicating negative contributions and red indicating positive ones. The FMI stands out as the most influential variable, with consistently strong positive effects (intense red) in instances where the model output is high. It is followed in importance by BMI, FFMI, and MMI, all of which exhibit increasing contributions in higher predictions, albeit with smaller magnitude. In contrast, Gender and SMI display near-neutral effects, with SHAP values close to zero. This visualization reinforces the conclusion that the model’s predictions are primarily driven by body composition variables, especially FM, and that its behavior remains consistent and physiologically plausible across the evaluated population.

Discussion

The results of the present study show that tree-based decision models, notably the random forest algorithm, performed best in classifying obesity levels, with superior values for accuracy (83.6%) and AUC-ROC (0.947). Another study42 evaluated the effectiveness of non-dietary factors, such as lifestyle, family history and demographic characteristics, in predicting obesity using ML models. Analyzing data from more than 2,000 individuals aged 14–61 years, several algorithms were tested, with random forest being the most accurate, with an AUC-ROC of 92.3% and an accuracy of 66.9%, demonstrating that it is possible to detect obesity without resorting to dietary data, which may facilitate more accessible and earlier preventive assessments43. Similar results were found by, who developed a prediction model for obesity levels using nine ML algorithms43. The results showed that the random forest algorithm performed the best, with an accuracy of 92.29%. Also, Dirik44 showed that random forest achieved the highest accuracy with 95.78%, while logistic regression followed closely with 95.22%. These findings are consistent with previous research indicating the robustness of ensemble models to non-linear relationships and multivariate data in public health45,46.

Moreover, the relative importance of the predictor variables, assessed by SHAP values, indicated that the most influential indices were FMI, FFMI and BMI. This result reaffirms the relevance of directly measuring body composition rather than relying exclusively on BMI to characterize excess fat47,48. Studies have shown that FMI provides a more accurate measure of adiposity and is more strongly associated with health risks such as hypertension, type 2 diabetes and cardiovascular disease compared to BMI alone49,50,51. In a previous study, Górnika et al.16, according to their results, point out that FMI and FM percentage could be considered the best markers for the detection of obesity in adults, independent of sex. The low impact of sex as a predictor variable in our study is also in line with studies reporting that height-adjusted body composition can eliminate gender-related biases52. Furthermore, cross-sectional research has found that a high FMI is positively associated with a higher prevalence of metabolic syndrome independently of BMI and body fat percentage53.

The SHAP value is a uniform measure of the importance of features used in ML models54,55. Therefore, the use of explanatory tools such as SHAP adds significant value to ML models in healthcare by providing insight into how and why a given classification occurs, improving clinical confidence and algorithmic transparency56. Furthermore, it solves the problem of poor readability, better interpreting the model established by ML and applying it to early detection, monitoring and intervention of obesity57. In the study by Lin et al.57, although the SHAP value was used to visualize the effects of features in the model, their results showed that waist circumference had a positive impact on the predictive power and female sex was a positive predictor of obesity, but the sample consisted of individuals with overweight.

It is striking that in our study, the SMI showed almost neutral effects as a predictor of obesity, with SHAP values close to zero, even below BMI and FFM. These results are unexpected considering that skeletal muscle plays a key role in health and metabolic efficiency58. About FFM in our study it had a SHAP value of + 14 following in importance of contribution to FMI as a predictor of obesity, considering that higher FFMI is related to better physical fitness, higher metabolic rate and better overall health outcomes, suggesting that individuals with a higher ratio of FFM (lean tissue such as muscle and bone) to their FM may indicate a more favourable fat distribution59, which is associated with better metabolic health, as lean mass, particularly muscle mass, is associated with a lower risk of metabolic syndrome, better insulin sensitivity and better overall physical function60. This highlights the importance of considering both fat and lean mass when assessing body composition, as different proportions of fat and lean tissue may have other implications for health59,61,62.

This study makes a significant contribution to the field of health and AI applied to diagnosis by demonstrating the effectiveness of supervised ML algorithms for classifying obesity levels based on anthropometric indices obtained through BIA. The main contributions include:

  • Validation of the BIA-based approach: it was demonstrated that bioelectrical impedance-derived indices, such as the FMI, FFMI, and BMI, are highly predictive variables for classifying obesity levels. This supports the use of BIA as a non-invasive and efficient tool in clinical and community settings.

  • Identification of the best predictive model: When comparing different supervised algorithms, the random forest model stood out as the most effective, achieving an accuracy of 84.2%, an F1 score of 83.7% and an AUC-ROC value of 0.947. This demonstrates its robustness and reliability for this type of classification task.

  • Application of model interpretability: By analyzing SHAP values, the study identified the most influential variables in the model’s predictions, highlighting the greater predictive value of body composition indices compared to demographic variables such as sex, which showed minimal predictive impact. This transparent interpretation facilitates confidence in the model among health professionals.

  • Contribution to personalized and preventive medicine: By establishing an accurate approach to classifying obesity levels using individual variables derived from BIA, the study offers a potential tool to support personalized clinical decisions aimed at the prevention and early management of overweight and obesity.

  • Advancing the integration of AI in healthcare: The work represents a breakthrough in the effective integration of AI techniques in the biomedical field, demonstrating that supervised models can not only automate classification tasks but also improve understanding of the factors underlying complex conditions such as obesity.

Since BIA-derived anthropometric indices are obtained rapidly and with minimal operational burden, the values could be integrated in an automated fashion into electronic medical records to generate real-time risk stratification, trigger clinical decision alerts (e.g., referral for nutritional counseling, metabolic assessment, or intensive follow-up), and support tiered resource allocation. At the public health level, the algorithm could be incorporated into screening campaigns in schools, community primary care or workplace settings, provided that standardized measurement protocols, calibration between devices and action thresholds adapted by age, sex and epidemiological context are established. It is essential to consider data interoperability, multicenter external validation prior to deployment, cost-effectiveness analysis and equity surveillance (to ensure that historically underserved populations are not excluded or misclassified). An expanded discussion along these lines would situate the findings not only as an algorithmic proof of concept, but as a potential practical component of early obesity prevention and management networks.

Limitations

There are some limitations to this study. First, although a large sample was used, the sampling method was not probabilistic, which limits the generalizability of the results to the entire population. Second, anthropometric data were collected at a single point in time, preventing the assessment of longitudinal changes. In addition, although the algorithms were evaluated by cross-validation, external studies in other populations and contexts would be required to verify their general applicability. The model did not include metabolic or clinical variables (e.g. lipids or glucose), which could have enriched the prediction, considering that obesity is a heterogeneous clinical entity with distinct subtypes based on genetic architecture and phenotypic biomarkers including measures of insulin sensitivity, glycaemia, fitness, body composition and cardiovascular risk63. Finally, although the study’s limitations are acknowledged, a more in-depth reflection on potential bias arising from the use of non-probabilistic sampling would be warranted, as this approach may have led to the overrepresentation or underrepresentation of certain population subgroups. Finally, despite the relatively large sample size, the results were not stratified considering (e.g., age group, sex, socioeconomic status, or race/ethnicity), further limiting the generalizability of the findings to other populations or settings.

Conclusions

This study demonstrates that supervised ML algorithms, particularly random forest, are effective and accurate tools for classifying obesity levels from multiple anthropometric indices. The incorporation of explanatory models such as SHAP allows for a clear interpretation of the factors influencing the classification, promoting a safer and more understandable application in clinical or public health contexts. The multivariate and interpretable model-based approach represents a relevant step towards personalized medicine, where decisions on nutritional diagnosis can be based on more complex and representative models than traditional BMI.