Introduction

The impacts of climate change have become increasingly evident across the globe, particularly in developing regions like sub-Saharan Africa (SSA), where populations are highly vulnerable to climatic shifts. As global temperatures rise, projections indicate that food production will decline, exacerbating the risk of malnutrition, particularly among vulnerable groups such as children, women, and the elderly1,2,3,4.

The nutritional status (Underweight, Wasting and Stunting) of under five children is influenced by various environmental, social, and economic factors, with climate change playing a critical role5. In SSA, where the majority of the population relies on subsistence farming and natural resources for their livelihoods, even small changes in temperature can disrupt food production cycles and contribute to food shortages6,7,8. The direct and indirect consequences of climate change, including reduced agricultural productivity, increased food prices, and the degradation of food quality, are expected to have profound effects on nutritional outcomes in this region.

Research indicates that temperature increases can adversely affect crop yields through various mechanisms, including heat stress, changes in precipitation patterns, and increased pest and disease pressure9,10. For instance, studies have shown that staple crops like maize, which is a dietary staple in many SSA countries, exhibit reduced yields with even modest increases in temperature11,12.This decline in agricultural output not only reduces food availability but can also lead to increased food prices, further limiting access to essential nutrients for low-income households. Beyond direct effects on food production, rising temperatures can indirectly influence nutritional status by affecting water quality and availability, which are crucial for both food production and human health13,14. This relationship between climatic factors and nutrition underscores the urgent need to understand how rising temperatures impact food systems and nutritional health in SSA.

While much research has focused on the effects of climate change on food system, fewer studies have employed advanced analytical techniques, such as machine learning, to explore the implications for nutritional outcomes15. Climate change poses a growing threat to food systems in SSA by disrupting agricultural productivity, food availability, and dietary diversity. These disruptions heighten the risk of child undernutrition, particularly stunting, wasting, and underweight, which remain persistent public health challenges in the region. Understanding these complex and multi-dimensional relationships requires analytical approaches capable of capturing non-linear and interacting effects. Machine-learning methods offer a powerful complement to traditional regression models by identifying hidden patterns and heterogeneous vulnerabilities that shape how climate stressors influence nutritional outcomes16,17,18,19. With sufficient datasets on temperature patterns from different locations worldwide collected by the satellite, and health indicators from Demographic and Health Surveys (DHS), machine learning can identify trends and correlations that may not be immediately apparent through traditional statistical methods. Machine learning models can uncover complex patterns and interactions within large datasets that traditional statistical approaches may overlook. Through combining data on temperature changes, agricultural yields, and nutritional indicators, machine learning can facilitate a more nuanced understanding of the potential consequences of rising temperatures on nutritional outcomes20.

Rising temperatures may compromise the nutritional status of children under five in SSA. We hypothesize that each 1 °C increase in mean temperature will reduce dietary diversity and increase the risks of stunting, wasting, and underweight. Using supervised machine-learning, we aim to uncover non-linear associations and interactions across socioeconomic gradients that conventional regression models may miss. This approach will reveal heterogeneous vulnerability patterns and identify sub-populations most at risk, providing critical evidence to inform targeted interventions against climate-driven nutritional deficits.

This study aims to predict the effects of rising temperatures as a climatic factor on nutritional status in SSA using both empirical and machine learning approaches. The findings of this research provide valuable insights for policymakers, healthcare providers, and agricultural experts working to mitigate the health impacts of climate change and support sustainable development in the region.

Results

Descriptive data

The study included data from DHS with under-five children across 22 SSA countries. The average mother age was 29 years with 39.1% lack education and 34.6% with primary education and most participants resided in rural areas with 69.9%. Nutritional status indicators revealed that 33.6% of the children were stunted, 16.6% were underweight, and 7.8% were wasted, indicating a significant prevalence of malnutrition in the population. The demographic characteristics of the study population are summarized in Table 1.

Table 1 Descriptive statistics of the study participants (N = 345837).

Meteorological data

The analysis of meteorological data from the ERA5 dataset showed a consistent increase in average temperatures over the past 18 years. The average temperature during the study period was 23.56 °C, with variability in temperature throughout the year Analysis of temperature records indicates that the years 2006 and 2023 exhibited elevated average temperatures across 22 African countries, providing further evidence of the ongoing impact of climate change on regional climatic conditions as indicated in Fig. 1.

Fig. 1
Fig. 1
Full size image

(A,B) Average temperature between 2005 to 2023 and temperature variability by year across 22 SSA countries respectively.

Machine learning model performance

Machine learning models were developed to predict the effects of rising temperatures on nutritional status indicators. A total of 6 models were evaluated, including, Random Forest, Support Vector Machine, Logistic Regression, XGBoost, and Decision tree, with cross-validation applied to assess model performance. The best-performing model achieved an accuracy of 0.7832, indicating a strong predictive capability of underweight. The performance of the Random Forest classifier for stunting prediction outperformed all other models. The model achieved an accuracy of 0.7023, a recall of 0.6605, and a precision of 0.6801 as indicated in Table 2. Additionally, the Area Under the ROC Curve (AUC) was 0.7207 for Random Forest to predict stunting, indicating good overall model discrimination. A ROC curve was generated to visually assess the classification performance as shown in Fig. 2.

Table 2 Evaluation metric to predict nutritional status across all 22 countries.
Fig. 2
Fig. 2
Full size image

ROC curves for predicting child nutritional status based on best algorithms across all 22 countries (corresponding to Table 2).

Country-specific machine learning model performance

We trained and evaluated predictive models for outcomes Stunting, Underweight, and Wasting separately for each country with sufficient data collected between 2015 and 2023 in their respective DHS. Performance metrics, including accuracy, area under the ROC curve (AUC), precision, recall, and F1-score, varied considerably across countries. For instance, in Nigeria, the XGBoost model achieved the highest accuracy 0.9002 and AUC (0.8049) for predicting Stunting, indicating strong predictive power. In contrast, Burundi demonstrated lower performance metrics across all outcomes, potentially reflecting greater heterogeneity or limited sample size as shown in Table 3.

Table 3 Country-specific machine learning model performance.

Across all countries, predictive performance was consistently higher for stunting than for wasting or underweight, indicating that stunting was more effectively captured by the available predictors. The application of SMOTE balanced class distributions and led to measurable improvements in recall and F1-scores. Overall, model performance achieved reasonable accuracy in several countries but varied across settings, likely reflecting differences in data quality, sample size, and feature relevance.

Causal–effect relationship

The regression results in Fig. 3 revealed a statistically significant negative relationship between rising temperatures and nutritional status indicators. Specifically, for every 1 °C increase in average temperature, the likelihood of stunting increased by increased with the odds ratio (OR) of 1.01 (95% CI 1.00, 1.10), underweight was 1.03 (95% CI 1.01, 1.06), and wasting was 1.10 (95% CI 1.08, 1.12), and their p-Value were < 0.005 respectively. These findings support the hypothesis that rising temperatures adversely affect the nutritional status of under-five children in SSA.

Fig. 3
Fig. 3
Full size image

Association between temperature variability and nutritional status across 22 countries. Note; model was adjusted for the place of residence, Source of cooking fuels, Access to safe water, Access to the toilet facility, Mother educational level, Children gender, Mother’s age, Sex of Child.

Further analysis revealed that socioeconomic factors, such as household income and maternal education, also significantly influenced nutritional outcomes. Higher education of the mother was associated with lower rates of stunting odds ratio (OR) of 0.86 (95%CI; 0.82, 0.96), while determinants of household Wealth Index such as access to toilet, access to safe water, and use of clean fuel for cooking correlated with improved nutritional status among children as indicated in Table S2 of the Supplementary materials. Interaction terms between temperature and socioeconomic factors were included in the models, revealing that vulnerable populations are disproportionately affected by rising temperatures.

Country specific analysis for the causal–effect relationship

The country-specific causal analysis demonstrated that rising average temperature is significantly associated with increased risk of childhood stunting, underweight, and wasting, though the strength of these associations varied across the 22 countries studied. For example, in Burkina Faso, a 1 °C increase in temperature was associated with increased odds of wasting (OR: 1.42, 95% CI: 1.36–1.52) and underweight (OR: 1.11, 95% CI: 1.10–1.12). In Ethiopia, a similar temperature rise was linked to higher odds of wasting (OR: 1.11, 95% CI: 1.06–1.16). Sierra Leone also exhibited strong associations, with the highest odds ratios observed for stunting (OR: 1.08, 95% CI: 1.03–1.14). In contrast, some countries, such as Gabon and Kenya, showed smaller or non-significant associations (e.g., Kenya, stunting OR: 1.02, 95% CI: 0.98–1.06), suggesting possible mitigating effects from local interventions or socio-economic factors. While nearly all countries exhibited a positive relationship between temperature rise and adverse nutritional outcomes, the magnitude was context-dependent as indicated in Fig. 4. These findings emphasize the need for tailored, data-driven interventions at the country level, factoring in the heterogeneous impacts of climate variability on child nutrition across SSA.

Fig. 4
Fig. 4
Full size image

Country specific analysis on the association between temperature variability and nutritional status. Note; model was adjusted for the place of residence, Source of cooking fuels, Access to safe water, Access to the toilet facility, Mother educational level, Children gender, Mother’s age, Sex of Child.

Discussion

Our study provides a comprehensive evaluation of multiple Machine Learning algorithms, including Random Forests, Support Vector Machine (SVM), K-Nearest Neighbors, XG Boost, Decision Trees, and Logistic Regression, to predict key nutritional outcomes (stunting, underweight, and wasting) among children under five across 22 SSA countries. In addition, causal-effects relationship was performed to assess the association between annually temperature variability and nutritional status. We integrated demographic, economic, health, and weather data, to identify robust, generalizable models to inform targeted policy and intervention strategies for malnutrition reduction in SSA.

Our evaluation of multiple machine learning algorithms demonstrated that ensemble-based approaches, particularly XG Boost and Random Forest, provided superior predictive performance for childhood nutritional status across 22 SSA countries, especially in relation to the influence of rising temperatures. Consistent with prior research showing the advantages of ensemble methods for complex health prediction tasks15,19,21,22, XG Boost achieved the highest overall accuracy for underweight (0.7832), while Random Forest attained top performance for stunting prediction (accuracy = 0.7023; recall = 0.6605; precision = 0.6801; AUC = 0.7207). These results are in line with findings by Khudri et al. (2023)15 and Talukder et al. (2020)23, who reported increased accuracy and stability of supervised Machine Learning models in nutritional and health data contexts relative to traditional logistic regression. While model performance for all algorithms was generally robust for underweight and stunting, the predictive accuracy for wasting remained lower, reflecting well-documented challenges in capturing acute and often rapidly changing health outcomes19,24,25. Our use of cross-validation across the multi-country dataset further reinforces the generalizability and reliability of these results, overcoming issues of overfitting and supporting the models’ suitability for large-scale nutritional surveillance25,26. The ROC curves generated for our best-performing models confirmed good discrimination ability and support the implementation of machine learning techniques for timely identification of at-risk populations. However, as in previous studies, model accuracy remains subject to the availability and quality of input features, highlighting the ongoing need for enhanced health, demographic, and climate data integration27,28. Our findings reinforce the growing body of evidence supporting machine learning approaches for nutritional risk prediction in dynamic and resource-limited environments.

Our study reveals substantial heterogeneity in model performance between countries and nutritional status. While stunting predictions were most accurate overall (mean AUCs generally above 0.75), wasting and underweight were more challenging to predict, likely due to their data quality or reporting. Notably, the inclusion of meteorological data only marginally improved model performance, suggesting that, while weather factors may contribute to acute malnutrition, structural and household-level factors remain dominant predictors, especially for chronic outcomes like stunting. Still, further research leveraging finer-grained, high-frequency environmental data could yet uncover meaningful predictors at broader temporal or spatial scales. Consistent with prior studies19,28, our results demonstrate that supervised machine learning approaches, particularly random forests and XGBoost, outperform traditional logistic regression in most settings. In Uganda, for instance, the random forest model achieved particularly high predictive accuracy for stunting (AUC = 0.8532), outperforming all other approaches. Similar patterns were observed in Malawi and Lesotho, where supervised Machine Learning models consistently yielded strong performance across all nutritional outcomes. These findings align with the growing consensus that classification models are better equipped to capture the complex, multifactorial nature of nutritional status among children, reflecting intricate interplays between socio-economic, demographic, and environmental determinants29,30,31,32,33,34. Furthermore, our findings are broadly consistent with studies in SSA and South Asia that report rising temperatures as a risk factor for stunting and underweight in children35,36,37. Unlike these earlier studies, our approach integrates machine-learning methods with causal inference, allowing the detection of non-linear associations, interaction effects across socioeconomic gradients, and heterogeneous vulnerability patterns. This highlights both the generalizability of climate-related nutritional risks and the added value of advanced analytical approaches for identifying the most vulnerable sub-populations.

A unique contribution of our work is the integration of causal-effect modeling to examine the impacts of temperature variability on nutrition. The regression analysis revealed a statistically significant negative association between rising temperatures and the nutritional status of children under five. Specifically, each 1 °C rise in average temperature was associated with higher odds of stunting (OR 1.01, 95% CI: 1.00–1.10), underweight (OR 1.03, 95% CI: 1.01–1.06), and wasting (OR 1.10, 95% CI: 1.08–1.12; all p < 0.005). Although the odds ratios for temperature effects are small, even modest increases in risk may have meaningful public health implications given the large population of children exposed, while individual clinical impact remains limited. The observed association between rising temperatures and poor nutritional outcomes may be explained by multiple interlinked mechanisms. Elevated temperatures are known to reduce crop yields and livestock productivity, thereby lowering household food availability and dietary diversity. This exacerbates food insecurity and may lead families to rely on cheaper, less nutritious diets, increasing the risk of stunting and underweight. Warmer conditions also favor the transmission of diarrheal and vector-borne diseases, which impair nutrient absorption and increase energy demands in children. Furthermore, higher temperatures often intensify water scarcity, reducing hygiene and sanitation practices, thereby compounding infection risks. Our results were similar with the previous studies that investigates the impacts of climate change on malnutrition in SSA38,39,40,41,42. These findings robustly support the hypothesis that climate variability is exerting additional pressure on already vulnerable child populations, compounding existing risks from poverty, food insecurity, air pollution and poor health infrastructure43,44. Our country-specific models further suggest that these associations are most pronounced in lower-income and lower-education settings, confirming that socioeconomic disadvantage amplifies children’s susceptibility to environmental stresses45,46. However, certain country-level estimates, such as the elevated odds of wasting observed in Burkina Faso, warrant cautious interpretation, as they may arise from true contextual heterogeneity, variability in sample size, or potential data quality limitations.

The association between socioeconomic factors and nutritional outcomes remains strong throughout the dataset. Higher household income and maternal education levels were independently associated with decreased rates of stunting and underweight which were similarly from previous studies47,48,49, underscoring the protective effect of social and economic development50. The observed interaction between temperature rise and these socioeconomic variables suggests that future nutrition interventions must be holistic, addressing both structural inequalities and environmental challenges to maximize their impact.

A notable strength of this study is the unprecedented scale and scope, incorporating data from 345,837 participants across 22 SSA countries and spanning nearly two decades (2005–2023). The use of multiple machine learning algorithms, advanced cross-validation techniques, and integration of socioeconomic, demographic, and meteorological data provides a comprehensive and robust framework for predicting child nutritional outcomes. By comparing six different machine learning models, our study offers nuanced insights into model strengths and limitations for diverse malnutrition indicators. Furthermore, the simultaneous analysis of causal relationships between rising temperatures and child malnutrition using large datasets enables a deeper understanding of climate-related health vulnerabilities across SSA. This combination of methodological rigor, extensive geographic coverage, and integrated climate-health assessment sets our study apart and significantly advances the evidence base for data-driven nutrition and public health policies in the region.

Future research should focus on further enriching models with granular spatial, temporal, and behavioral inputs, testing their real-time surveillance value, and piloting targeted interventions guided by model outputs. Additionally, expansion to incorporate broader environmental, food system, and policy variables across different regions will provide actionable insights to drive multisectoral efforts against child malnutrition in SSA.

Despite the strengths and novel insights of this analysis, some of the limitations should be acknowledged. First, although the DHS and ECMWF datasets are the gold standard in their domains, inconsistencies in data quality, variable measurement, and survey intervals across countries may introduce bias or limit generalizability. Second, while our models capture associations, the use of cross-sectional secondary data restricts causal inference and precludes the assessment of temporal ordering or unmeasured confounding. Third, DHS cluster coordinates are displaced up to ~ 5 km for confidentiality, but most remain in the same area, so spatial mismatches with temperature data are likely minimal. Fourth, K-Nearest Neighbors, and Support Vector Machine (SVM) struggle to make classification during prediction and this could be associated with the quality of the datasets used. Although overall model accuracies were reasonable, predictive performance for acute outcomes such as wasting was modest, reflecting inherent challenges of machine-learning approaches in predicting low-prevalence or highly variable events. Finally, while machine learning models enhance predictive accuracy, they are often complex and less interpretable for policymakers. Future research should aim to develop user-friendly, explainable models that can be readily deployed by national and subnational health authorities.

In conclusion, our study demonstrates that integrating machine learning, causal inference, and environmental data substantially improves the prediction and understanding of child nutritional outcomes in SSA. We find that rising temperatures increase nutritional risks, particularly among socioeconomically vulnerable children, highlighting the need for targeted interventions such as region-specific nutrition programs and climate-resilient agricultural practices. Children in areas with limited access to resources are disproportionately affected, underscoring the importance of multisectoral strategies that integrate health, agriculture, and social protection measures. Investments in high-quality, harmonized datasets and capacity building for local decision-makers are essential to translate these modeling insights into sustainable improvements in child nutrition across the region.

Materials and methods

This study employs a quantitative research design to examine the effects of rising temperatures as a climatic factor on the nutritional status of under-five children in SSA. The analysis utilizes DHS data alongside meteorological data from the European Centre for Medium-Range Weather Forecasts (ECMWF) ERA5 dataset. In this study, machine-learning models are used solely for prediction and to explore non-linear associations and interactions, whereas regression-based analyses are employed to estimate causal effects of temperature on nutritional outcomes.

The primary source of data for nutritional status indicators is the DHS database, which provides comprehensive information on health, nutrition, and demographic characteristics of populations in developing countries. The DHS are large-scale, nationally representative surveys that gather data on health, population, and nutrition indicators. Geographic coordinates for each survey cluster are collected on-site using GPS devices by trained survey teams, enabling spatial analysis of health and demographic patterns across regions. For this study, our focus was specifically on the DHS data pertaining to under-five children across 22 SSA countries (Benin, Burkina Faso, Burundi, Cameroon, Congo D.R, Ivory Coast, Ethiopia, Gabon, Ghana, Guinea, Kenya, Lesotho, Liberia, Malawi, Mali, Nigeria, Rwanda, Sierra Leone, Tanzania, Uganda, Zambia, and Zimbabwe). We combined all standard surveys conducted in respective country between 2005 and 2023 as indicated in Table S1 of the Supplementary materials. Key variables extracted from the DHS included measurements of height, weight, and age to calculate height-for-age (stunting), weight-for-age (underweight), and weight-for-height (wasting) using the World Health Organization (WHO) growth standards51. Our outcomes variables include (Wasting, stunting, and underweight). Wasting, stunting, and underweight were coded as binary variables using standard WHO cutoffs (z-score < − 2) to reflect clinically meaningful thresholds and to facilitate interpretation of model predictions. In addition, demographic variables such as age of under-five children, sex of under-five Children, socioeconomic status of the family where questionnaire was conducted, maternal education, and household characteristics were included to control for confounding factors as per previous studies conducted in different countries52.

Meteorological data was sourced from the ERA5 dataset, which provides high-resolution climate data for various atmospheric variables with spatial resolution of 0.25° × 0.25° longitude and latitude (approximately 30 km2)52,53. This dataset includes daily average temperature and precipitation data, allowing for a comprehensive temporal analysis, and it has been used in many studies to assess the relation between climate change and health outcomes54. Data was aggregated to calculate monthly and annual averages for temperature, corresponding to the time periods of the DHS data collection. Meteorological data was merged with the DHS data based on geographical identifiers (e.g., Cluster coordinate and ID) and time periods to ensure that temperature data correspond to the timing of the health survey.

Data processing and analysis

The data preprocessing involved cleaning the DHS data to include only under-five children whose information regarding nutritional status and geographical coordinate were presents. We performed a complete-case analysis, excluding observations with missing values. The proportion of missing data was low, and cases with missing values were similar to those with complete data, so any resulting bias is expected to be minimal. Nevertheless, we acknowledge that this approach could introduce some bias, particularly in multi-country pooled analyses. Meteorological data was merged with the DHS data based on geographical identifiers (e.g., Cluster coordinate and ID) as indicate in Figure S1 of Supplementary Materials and time periods to ensure that temperature data correspond to the timing of the health survey. Descriptive statistics was calculated for demographic and nutritional variables to provide an overview of the study population. A multiple supervised machine learning algorithm, was employed to predict the effects of rising temperatures on nutritional status (underweight, stunting, and wasting), with models trained on a subset of the data and cross-validation used to assess their performance. The dataset was split into training and testing subsets using 80:20 split (Total number of participants N = 345837, training samples = 276669, and testing sample = 69168), ensuring that model validation was conducted on unseen data to prevent overfitting. Socioeconomic variables including household wealth, maternal education, and access to water and sanitation were incorporated in both causal-effect and machine-learning models to account for confounding and to explore potential effect modification of temperature impacts on child nutritional outcomes.

We implemented several machine learning classifiers such as logistic regression, random forest, XG Boost, K-Nearest Neighbors, Support Vector Machine (SVM) and decision tree to predict Nutritional status. Logistic regression models the log-odds of the outcome as a linear function of predictors. Random forests aggregate predictions from multiple decision trees trained on bootstrap samples, using majority voting and decision trees partition the feature space to reduce impurity measured by Gini index55. XG Boost sequentially builds trees to minimize logistic loss with regularization56. SVM finds a hyperplane that best separates classes by maximizing the margin between them, optionally using kernel functions to handle nonlinear boundaries57.

For model interpretability and to identify the most influential predictors of nutritional outcomes, we determined feature importance using two complementary strategies. In tree-based models (Random Forest, XG Boost), importance was quantified based on each feature’s mean decrease in impurity (Gini importance) or average gain improvement across splits as shown in Figure S1 of the Supplementary materials. For logistic regression, standardized coefficients were considered as indicators of predictor influence. After model fitting, features were ranked by their calculated importance, and the top contributors were visualized to provide actionable insights for policy and intervention priorities as explained from the previous study58. Potential multicollinearity among predictors was assessed in the logistic regression models to ensure robustness of feature selection as shown in Table S3 of the Supplementary Materials.

To address the problem of class imbalance, particularly for wasting and underweight categories, which had substantially fewer positive cases, we implemented the Synthetic Minority Over-sampling Technique (SMOTE) prior to model training. SMOTE was applied to the training set only, creating synthetic examples in the minority class by interpolating between existing observations and their nearest neighbors in feature space as revealed from the previous study59,60. This approach improved the balance between classes, ultimately enhancing model recall and F1-score performance, especially for our outcomes that comprises of severe imbalance. However, synthetic data can risk overfitting and may distort the original distribution; all models were rigorously evaluated on an independent test set that was not exposed to the oversampling process. This ensures that the reported model performance reflects the ability to generalize to unseen, real-world data rather than being influenced by synthetic samples.

To evaluate the performance of the classification models developed for nutritional status prediction, we employed a combination of widely accepted evaluation metrics. The primary performance metrics used were Accuracy, F1 Score, and the Area Under the Receiver Operating Characteristic Curve (AUC-ROC). Accuracy quantifies the overall proportion of correctly classified instances, while the F1 Score provides a harmonic mean of Precision and Recall, offering a balanced evaluation. All model performance metrics, including accuracy, precision, recall, F1-score, and AUC, were calculated using predictions on the independent test dataset, which was not used during model training or tuning.

The relationship between rising temperatures and nutritional status was evaluated using Multivariable logistic regression, with specific attention to the hypothesized negative impact of temperature increases on indicators of nutritional status among under-five children. Figure 5 shows the methodological approach used to for the analysis and more details on cluster location and country included in the study has been shown by Figure S2 of the Supplementary Materials.

Fig. 5
Fig. 5
Full size image

The methodological approaches for the data preprocessing and model evaluation.

All machine-learning analyses were performed using Python (via Jupyter Notebook) with standard libraries including scikit-learn, pandas, NumPy, and matplotlib. Causal inference analyses were conducted in R (version 4.4.1).

This study utilized publicly available secondary data; therefore, no ethical approval is required. However, data usage was adhered to the guidelines and regulations outlined by the DHS and ECMWF.