Multi-scalar risk drivers for a heat vulnerability assessment framework using machine learning algorithms

Li, Zecheng; Fong, Chng Saun; Aghamohammadi, Nasrin; Sulaiman, Nik Meriam; Ab Hamid, Siti Hafizah

doi:10.1038/s41598-026-44880-z

Download PDF

Article
Open access
Published: 30 March 2026

Multi-scalar risk drivers for a heat vulnerability assessment framework using machine learning algorithms

Zecheng Li¹,
Chng Saun Fong¹,
Nasrin Aghamohammadi^2,3,4,
Nik Meriam Sulaiman⁵ &
…
Siti Hafizah Ab Hamid⁶

Scientific Reports volume 16, Article number: 10594 (2026) Cite this article

779 Accesses
Metrics details

Subjects

Abstract

This study aims to address the challenge of quantifying amplified heat-related health risks in tropical nations by developing and validating a novel, data-driven framework in Malaysia to deconstruct the complex interplay between social vulnerability and environmental exposure. Methodologically, we constructed a Heat Vulnerability Index (HVI) and employed a Random Forest model to systematically evaluate whether integrating HVI with local land surface physical characteristics or with ambient atmospheric conditions (Universal Thermal Climate Index (UTCI), Ozone, $PM_{2.5}$) yielded superior all-caused mortality prediction. The findings reveal that the framework incorporating ambient atmospheric conditions achieved superior predictive power ($R^2$=0.8623), with the HVI, Ozone, and UTCI identified as the dominant predictors, while SHapley Additive exPlanations analysis further uncovered significant spatial heterogeneity in their impacts on mortality. Ultimately, this research provides a robust, evidence-based tool for policymakers, demonstrating that in a tropical context, combining macro-scale ambient atmospheric conditions with intrinsic social vulnerability is the most effective strategy for identifying high-risk communities and prioritizing targeted interventions, establishing a transferable protocol to mitigate heat-related health risks across the broader tropical zone.

Introduction

The growing frequency and intensity of Heatwaves (HWs), driven by climate change, are emerging as significant global threats to public health, infrastructure, and the environment¹. Spanning the vast tropical and subtropical belts, these regions are home to nearly 40% of the global population, primarily comprised of developing nations where rapid demographic growth often outpaces infrastructure resilience. Within this broader context, Southeast Asia (SEA) emerges as a critical hotspot. According to the IPCC Sixth Assessment Report (AR6)², the region is experiencing warming trends that are virtually certain to increase the intensity and frequency of hot extremes, posing severe risks to human systems. This climatological shift is particularly devastating in SEA’s rapidly urbanizing centers, where socioeconomic inequalities amplify vulnerability^3,4. Consequently, heatwaves have been increasingly linked to rising mortality and morbidity, especially in urban facilities where the Urban Heat Island (UHI) effect exacerbates the already intense air temperatures^5,6. Older adults, infants and young children, and people with pre-existing health conditions (e.g., cardiovascular, respiratory, or metabolic disease) are widely recognised as heat-vulnerable groups because age- and disease-related constraints on thermoregulation and cardiovascular reserve reduce their capacity to cope with acute thermal stress^7,8,9,10. For example, evidence syntheses indicate that heat-related mortality risk in older adults ( $\ge$65 years) increases by about 2 to 5% per 1 °C rise in temperature, while a large multicentre study of U.S. paediatric hospitals reported a 17% increase in all-cause emergency-department visits on extreme-heat days (RR=1.17, 95% CI: 1.12–1.21)^7,11. Vulnerability also influenced by economic status of populations, low-income communities face heightened risks because of limited access to cooling assets, insufficient housing, and restricted healthcare services, further amplifying their vulnerability¹².

Over the period spanning 2020 to 2025, HVI played a crucial role in assessing climate risks where it integrates diverse socioeconomic factors and environmental variables in order to clearly observe the population sensitivity for EH events^13,14,15. Theoretically, HVI was built by three basic components, which are exposure, sensitivity, and adaptations in the area². The regional specificity of the HVI improves interpretability by distinguishing spatial contexts in which heat-related risks and their potential adverse consequences differ, particularly in communities characterised by higher levels of population vulnerability^16,17. Socioeconomic and demographic conditions repeatedly emerge as key correlates of EH vulnerability in recent research. Coates et al., 2014 examined long-term extreme heat exposure in Australia with a focus on elderly populations and socioeconomically disadvantaged groups, using historical event analysis to assess patterns of vulnerability. The results present that the elderly populations are experiencing extraordinarily higher heat pressure to the EH events, indicating the interactive adverse effects among age, economic status and heat resilience^18,19. These previous findings identified the importance of incorporating the population factor into the construction of HVI^20,21. Different Occupations are dealing with different levels of heat exposure in EH events. Previous research reveals the higher heat risks for outdoor activities, which highlights the importance of integrating employment and occupation-related factors into the construction of HVI^22,19. These EH events-related integrations make HVI a strong, evidence-based climate adaptive assessment tool worldwide^23,13.

HVI as an evidence-based climate adaptive assessment tool, functions beyond climate adaptive assessment, but as general guidance for a city’s policy-making for EH events. HVI can be treated as an effective tool for identifying the high heat risk areas, so as to prepare for targeted heat mitigation strategies^24,9. Different areas such as SEA are identified as the most vulnerable areas under the current circumstances. More tropical climate zones cause higher frequency of EH events and higher urban heat stress^3,2,1. Warmer climate interacts with rapid urbanization, these factors concurrently exacerbate the heat vulnerability in the tropical areas^25,26. This amplified heat-related vulnerability prompts this study to do a further exploration in tropical urban areas.

The most typical country that with tropical areas is Malaysia, its rapid urbanization and high population density along with other potential factors severely contribute to the higher heat vulnerability. In the past 10 years, urban expansion in major metropolitan areas has led to a declining number of green areas and has been identified as having close associations with gradually increasing heat vulnerability¹². Despite the increasing number of research studies on heat stress in SEA, however, explicitly tailored to Malaysia with modified HVI construction remains unexplored²⁷. Malaysia as a case study sets up its wider relevance for other tropical countries.

To mitigate the severe heat stress under EH events in tropical areas, research must address a core problem, which is what are the dominant factors for EH events in urban areas? Previous studies assess and conceptualize the heat risk into environmental exposure, population sensitivity, and adaptive capacity^28,29. Traditional HVI, however, mainly reflects the sensitivity and adaptive capacity^28,29. It is arguable that only considering simple air temperature with socioeconomic status and demographic factors cannot fully capture the heat vulnerability in an area. It is valuable to consider other potential factors, except for the traditional human-activity-induced factors, such as greenery areas, day-night temperature difference, regional air pollution, and UTCI. Previous studies also indicated synergistic, non-additive impacts between high temperature and PM$_{2.5}$ on mortality^30,31, however, they did not consider incorporating with heat vulnerability.

This study addresses this gap by developing a comprehensive, multi-level assessment framework for heat-related health risks by taking Malaysia as a case study. We construct a data-driven HVI using Principal Component Analysis (PCA) to quantify social vulnerability. To resolve the debate on multi-scalar risk drivers, we then developed a novel, competitive evaluation framework. This framework systematically tests the predictive power of combining the HVI with two distinct environmental layers: layer 1 local land surface physical characteristics (diurnal temperature range (DTR), Normalized Difference Vegetation Index (NDVI)) and layer 2 ambient atmospheric conditions (UTCI, Ozone, $PM_{2.5}$). By applying a suite of machine learning algorithms to a decade of health data, this approach allows us to move beyond static vulnerability mapping and empirically identify which scale of environmental exposure, when integrated with social vulnerability, most accurately explains health outcomes.

Methods

This study encompasses all 13 states and three federal territories of Malaysia (Fig. 1), utilizing a longitudinal dataset spanning from 2010 to 2020. The state was chosen as the primary unit of analysis as it represents the finest administrative level at which comprehensive and consistent data across all required domains (socioeconomic, environmental, health) could be obtained for the entire study period in OpenDOSM³², which the Department of Statistics Malaysia supports, and offers various types of Census data in Malaysia. Most data collected from official websites is organized by states, representing the smallest units from which we can derive information. Moderate Resolution Imaging Spectroradiometer (MODIS)(MOD11A1) provides detailed 2m-Air Temperature (AT) with day and night daily data for the Malaysia region from the year 2000 to the year 2022 MODIS (MOD13Q1) provides NDVI annual data, and CAMS global reanalysis (EAC4) monthly datasets support for the analysis for Ozone and Modern-Era Retrospective analysis for Research and Applications, Version 2 (MERRA-2) supports details for $PM_{2.5}$ daily data. Geographic data for spatial mapping were sourced from the GADM database (version 4.1).

Multi-level data framework for heat risk assessment

This study applies a similar framework as stated in the previous studies, and aligns with IPCC’s Sixth Assessment framework for vulnerability². Table 1 summarizes the popular indicators to be included in the heat vulnerability assessment in different regions. The table also reflects the lack of heat vulnerability exploration in tropical, SEA regions. The framework consists of a Base Layer quantifying social vulnerability, and two distinct Exposure Layers representing environmental hazards at different spatial scales. All data were aggregated or resampled to a consistent state-year panel format.

Table 1 Clustered key indicators used in heat vulnerability studies, ordered by publication year (latest to oldest).

Full size table

As demonstrated in various studies, the construction of HVI frequently incorporates a variety of demographic, socioeconomic, and environmental indicators. The indicators used in prior studies are summarized in Table 1. Commonly included indicators encompass age, social isolation, education, poverty, race/ethnicity, health conditions, infrastructure, environmental factors, language, and population density. Notably, factors such as health conditions and racial demographics are prevalent in many studies, particularly those from Western or multicultural regions.

Additionally, we treat AT as a key variable in validating and comparing different methods, which is crucial given that the tropical climate of SEA consistently experiences high baseline temperatures^40,41, while short-term fluctuations in AT can drastically exacerbate EH and HW pressures, thereby undermining thermal comfort in public spaces^42,43. In Malaysia, incorporating AT with HVI evaluation affords a critical layer for knowledge of warmth vulnerability because it at once impacts city warmth strain and is a number one motive force of heatwave intensity in tropical climates^44,40. This addition aligns with our goal to create a domestically adapted HVI that captures both the environmental exposure and societal influences of extreme warmth, providing a more accurate evaluation tailored to Malaysia’s precise weather profile⁴⁵.

Referring to the existing works of literature and limited database, sixteen indicators were selected as potential factors that can influence Heat Vulnerability in regions. The final variables will be input into different models to be analyzed. All variables are listed in Table 2 along with their Data Sources and Categories.

Table 2 Descriptive statistics of risk components for all state-years (2010-2022).

Full size table

Adopting the risk framework from the IPCC Sixth Assessment Report, we first structured our dataset into three distinct categories: Hazard, Exposure, and Vulnerability⁴⁶. Most census-derived variables were classified as measures of Exposure and, more complexly, Vulnerability. For the latter, we utilized PCA to distill these indicators into their most significant components, creating a robust, lower-dimensional measure of vulnerability to EH. Other indicators such as DTR, UTCI, etc., are considered as Hazard and so they are not processed into PCA.

Hazard and exposure

This study distinguished the modified HVI framework into four main components. We selected UTCI as the main indicator for capturing heat stress, in particular, we considered factors beyond air temperature, such as humidity, wind speed, and solar radiation. To capture the physiological impact of temperature fluctuations, we consider the Day-Night difference LST as a proxy for DTR. Given that insufficient nighttime cooling is a critical driver of heat-related morbidity, this metric is particularly relevant; in our study area, values ranged from 4.61 to 13.43°C. Ambient air pollutants such as Ozone and PM$_{2.5}$ can exacerbate heat-related health risks, as studies have established their compounding effect with thermal stress during HWs, which intensifies cardiopulmonary strain^47,48. In our study area, the mean concentration for PM$_{2.5}$ was observed to be 18.35 $\mu$g/m$^3$, while the 95th percentile of Ozone was 115.96 $\mu$g/m$^3$ (Table 2).

The Exposure mainly quantified by Population Density. Table 2 records the mean value of 884.6 persons/km$^2$ with the maximum value of 7266.7 persons/km$^2$ and a standard deviation of 1614.0 persons/km$^2$. Higher Population Density induces more vulnerable communities due to limited resources, green areas, healthcare services, and cooling centers^24,49.

Adaptive capacity and sensitivity

Sensitivity is composed of Elderly Population, Pverty rate, and Ethnicity in this study. The Elderly usually come with various types of illnesses, which indicate they are more vulnerable when dealing with extreme high heat. The Poverty rate expresses the low quality of life in an area, such as severe health conditions, undesirable living environments, and limited resources. The Elderly population was identified as one of the strongest factors that contribute to the construction of HVI, which stands in stark contrast to the youth’s stronger heat resilience and lower heat vulnerability^50,51. Health conditions and adaptive capacity to the changes of temperature are other main reasons Elderly population is more vulnerable to EH events. Considering Ethnicity can further detail the living habits and living environments for different ethnicities. In particular, incorporating Bumiputera Malay, Other Bumiputera, Chinese, and Indian supports for the exploration of how different residential patterns, and socioeconomic disparities inherent to these groups within the Malaysian context mediate differential health outcomes from environmental stressors^52,53.

We derived Adaptive Capacity by incorporating a series of indicators reflecting the availability of resources at both the community and individual levels. Access to Healthcare services and Water both represent the corresponding direct medical resources and the basic needs for the population. Healthcare Access appears highly covered in all states in Malaysia (average 97.78%), however, pronounced regional disparities persist, with healthcare access in some areas remaining as low as 87.81%. As indicated in previous literature, the abilities and limitations for accessing healthcare services are key variables to determine the severity of the heat-related mortality intensity in the area^54,55. Water Access shares a similar value with Healthcare Access, with an average of 94.89% and a standard deviation of 9.33%, ranging from 55.50 to 100.00%. We also utilized Literacy Completion Rate which acts as a surrogate measure for collective social capital and community heat-alert responsiveness⁵⁶. It reflects the ability of citizens to access, understand and act on heat risk information^57,58. Research shows that highly literate communities are more environmentally resilient to environmental pressures, including intense heat, due to increased materials and information^59,60. Higher literacy rates are associated with improved engagement in public health alerts and more effective HW adaptation behaviours^39,26. Accordingly, the literacy indicator in our dataset displays a mean value of 99.66%, suggesting limited spatial variation but a generally high capacity for policy responsiveness in tropical contexts⁵¹. However, while Literacy is key, it should be examined alongside other indicators, as it does not operate in isolation. The ability to act on the available information is influenced by broader socioeconomic conditions and regional structures⁶¹. Finally, NDVI enhances the overall view of the region’s green area coverage. It was proved that shade provision and evapotranspiration serve as critical physical infrastructure for enhancing community heat resilience^62,63 where we see a vast difference between diverse areas spanning from 0.36 to 0.77 with a mean value of 0.65 and a standard deviation of 0.10.

Modified HVI construction

Data pre-processing

To ensure consistency across all data points, the datasets were merged using specific census district identifiers. Data acquisition and preprocessing utilized Google Earth Engine (GEE) via Google Colab for efficient cloud-based retrieval of high-resolution satellite imagery. Subsequently, all statistical analyses and machine learning modeling were standardized in a local Python 3.9 environment to ensure reproducibility and leverage GPU acceleration. To align the diverse environmental datasets with the census and mortality records, all predictors were aggregated to a unified spatiotemporal resolution (yearly). A detailed summary of the original data resolutions, sources, and their final analytical scales is provided in Table 3. The annual all-cause mortality data (2010 to 2022) were obtained from the DOSM. The dataset covers all 16 states and federal territories and represents the total number of registered deaths per year within each administrative boundary. This state-level aggregated count serves as the primary outcome variable for the annual modified HVI framework.

Table 3 Summary of data sources, original spatiotemporal resolutions, and final analytical scales.

Full size table

NDVI and DTR

To generate a high-quality time series for each state, we first pre-processed NDVI and DTR data into 8-day composites. This is a more standard temporal interval for satellite-derived indices, chosen to minimize atmospheric contamination (e.g., cloud cover) and other sources of noise in the daily data⁶⁴. These composite values were then averaged to obtain a single annual mean value for each state. During the pre-processing stage, we found there are limitations to cut data into different states in Malaysia, where MODIS and MERRA-2 data are not fully available for the resolution of state level, so we applied k-nearest neighbor (KNN) interpolation to fill the missing values in the datasets.

UTCI and air pollutant data

For thermal stress assessment, we utilized hourly data from the ECMWF ERA5-Land dataset to derive the UTCI. The primary input variables included 2m AT, 2m relative humidity, 10m u- and v-wind components, and surface solar radiation downwards.

For ambient air pollution, while multiple criteria pollutants exist, our analysis focuses specifically on Ozone (O$_3$) and fine particulate matter (PM$_{2.5}$). This selection is motivated by two key factors. First, these two pollutants are responsible for the vast majority of the global public health burden attributable to air pollution⁶⁵. Second, and critically for this study, both O$_3$ and PM$_{2.5}$ have well-documented synergistic effects with thermal stress, often co-occurring during HWs and compounding the risk of adverse health impacts⁴⁷.

This study deduces the 3 necessary variables for UTCI calculation:

First, wind speed (v) at 10 meters was calculated from its zonal ($u_{10}$) and meridional ($v_{10}$) vector components, representing the magnitude of the wind vector where it can be seen in Eq. 1:

$$\begin{aligned} v = \sqrt{u_{10}^2 + v_{10}^2} \end{aligned}$$

(1)

Next, the surface albedo ($\alpha$), which represents the reflectivity of the surface, was dynamically computed. It was derived from the ratio of net shortwave radiation ($R_{sw, net}$) to downward shortwave radiation ($R_{sw \downarrow }$)(See Eq. 2):

$$\begin{aligned} \alpha = 1 - \frac{R_{sw, net}}{R_{sw \downarrow }} \end{aligned}$$

(2)

Finally, using the dynamically calculated albedo, the Mean Radiant Temperature ($T_{mrt}$) was computed as shown in Eq. 3. This crucial variable quantifies the total radiative heat load on the human body and is formulated based on the Stefan-Boltzmann law:

$$\begin{aligned} T_{mrt} = \root 4 \of {\frac{R_{lw \downarrow } + (1 - \alpha ) \cdot R_{sw \downarrow }}{\epsilon \cdot \sigma }} \end{aligned}$$

(3)

In these equations, v denotes wind speed ($\mathrm {m\,s^{-1}}$). $\alpha$ represents the unitless surface albedo, derived from net (ssr) and downward (ssrd) shortwave radiation fluxes, both expressed in $\mathrm {W\,m^{-2}}$. $T_{mrt}$ is the mean radiant temperature ($\textrm{K}$), calculated using the downward longwave radiation flux ($R_{lw\downarrow }$, obtained from strd), the surface emissivity ($\epsilon$, assumed to be 0.97), and the Stefan–Boltzmann constant ($\mathrm {\sigma }$). All variables were subsequently integrated using the calculate_utci Python package to obtain UTCI values in degrees Celsius.

Hierarchical modeling framework

To comprehensively explore the potential interactions between heat vulnerability and confounding factors, this study combines potential interactive factors into two different layers with the traditional HVI construction to calculate the HVI. The Base Layer represents intrinsic vulnerability derived from socioeconomic and demographic factors. Layer 1 (Physical Mitigation) captures the daily temperature differences ($\Delta T$) and vegetation cover (NDVI), explicitly presenting the capacity of urban greening to moderate heat stress, especially in urban settings. Surface water bodies were not included as a separate mitigation indicator, as they are typically characterised by near-zero NDVI values and limited residential exposure, and therefore fall outside the population-based vulnerability framework adopted in this study. Layer 2 (External Stressors) incorporates direct atmospheric hazards, including UTCI, PM$_{2.5}$, and Ozone. This layered structure allows for the assessment of how physical mitigation strategies and external environmental pressures compound intrinsic vulnerability. The specific features grouped by layer are summarized in Table 4. We keep the data consistency by calculating the mean annual value of daily data for $PM_{2.5}$ concentration, then combining it with the annual mean Ozone data.

Table 4 Structure of the modeling framework and feature grouping.

Full size table

Heat vulnerability index

Constructing the HVI is the first step after preprocessing stage as our modified HVI framework is built upon the foundation of it. We initially split the dataset into three parts, 2010-2018 as the training set, 2019 as the validation set, and 2020 as the test set. The training set is used to build the model, the validation set is used to tune the hyperparameters, and the test set is used to evaluate the performance of the model.

To establish a reliable method to analyze the central tendency among all the heat-related features in the HVI, PCA is applied at the beginning of the research as several past research works utilized the same PCA to build HVI in different regions, including Hangzhou, China³³, Kuala Lumpur, Malaysia⁶⁶, and several cities in northern China⁶⁷.

Before utilizing PCA as the first method to build HVI in Malaysia, we employed Kaiser’s Criterion to check different numbers of main components kept in the PCA and kept the results with eigenvalue > 1. Factor scores were calculated at the district level to indicate heat vulnerability. To create a comprehensive HVI, we adopted a linear model approach, summing the weighted factor scores for each district.

To construct the composite, modified HVI, we applied PCA to the standardized Sensitivity and Adaptive Capacity indicators. We retained the first 4 principal components (eigenvalues > 1), which cumulatively explained 82.01% of the total variance. Figure 2 illustrates the component loadings and highlights the structural importance of ethnicity in the index construction. Principal Component 2 is composed of most ethnic groups (excluding the Bumiputera Malay population) exhibit strong positive contributions, while each of the other retained components is also significantly driven by at least one ethnic demographic factor.

The final, composite HVI ($HVI_{Score}$) was constructed by calculating a weighted sum of all principal components ($PC_j$) that were retained based on the Kaiser Criterion. This approach ensures that all significant dimensions of vulnerability contribute to the final score, proportional to their importance. The primary formula is presented in Eq. 4:

$$\begin{aligned} HVI_{Score} = \sum _{j=1}^{k} (PC_j \cdot w_j) \end{aligned}$$

(4)

$HVI_{Score}$ is the final, single vulnerability score for an observation. $PC_j$ represents the score of the j-th principal component. The weight for each component, $w_j$, is not equal; it is determined by the proportion of variance that the component explains relative to the total variance explained by all retained components. This is calculated as in Eq. 5:

$$\begin{aligned} w_j = \frac{\lambda _j}{\sum _{i=1}^{k} \lambda _i} \end{aligned}$$

(5)

The total number of retained components is denoted by k. The weight for the j-th component is $w_j$, which is calculated using its eigenvalue ($\lambda _j$) as a proportion of the sum of all eigenvalues for the components kept.

Combined model construction

Construction of HVI required some calculation steps, however, Layer 1 and Layer 2 do not need to be processed by PCA, as they are already in the form of features that can be directly used in machine learning models. The Layer 1 and Layer 2 are then combined with HVI to form the modified HVI framework, which is the final output of our framework.

It is essential to evaluate the different effectiveness of the three-layer combinations in constructing the modified HVI framework, so in total, we designed four layer combinations with different machine learning algorithms to compare the performance of different models. The combinations are shown in Fig. 3:

We benchmarked a deliberately diverse set of models to identify the most predictive and temporally stable approach. Model selection was designed to span key modelling paradigms and the interpretability-flexibility trade-off: (i) regularised linear baselines (Ridge, Lasso, Elastic Net) to provide parsimonious reference models under multicollinearity; (ii) a robust regression model (Huber) to mitigate the influence of outliers typical in extreme-event-related exposures; (iii) an interpretable non-linear model (Explainable Boosting Machine-Generalized Additive Model, EBM-GAM) to capture smooth non-linear effects while retaining transparency; and (iv) high-capacity machine learning models (Support Vector Regression, Random Forest, XGBoost, and LightGBM) to capture complex non-linearities and interactions.

This evaluation protocol used a two-stage design to assess both static performance and temporal stability under sequential prediction. For each model and feature set combination, an initial performance was assessed on a fixed validation and test set to establish a comparable baseline. We then performed a rigorous robustness check using a walk-forward validation methodology, where each model was trained on an expanding window of historical data (e.g., 2010-2018, 2010-2019, etc.) and evaluated on the subsequent unseen year (2019, 2020, etc.). This metric ($R^2_{robustness\_avg}$) served as a supplementary indicator to assess long-term stability, ensuring that the selected model not only performs well on static splits but also maintains consistency over time. This long-term stability evidence was used alongside static split performance to support the model comparison reported in Table 5. To further assess and validate the stability and performance of models, this study conducted robustness tests and temporal sensitivity analysis. For all the model candidates, we checked using a walk-forward validation methodology for temporal robustness. Models were trained on an expanding window of historical data such as from 2010 to 2018, 2010 to 2019, then assessed the precision based on the unseen years such as 2019 and 2020. The metric ($R^2_{robustness\_avg}$) reported in Table 5 was used for comparing the performance of stability of models.

Table 5 presents the performance of different model candidates. XGBoost and Random Forest achieved the highest performance with values of $R^2_{train} = 1.0000$, $R^2_{train} = 0.9840$, and $R^2_{train} = 0.9794$, respectively. But the result of $R^2 Val = 0.8831$ is much lower than the Random Forest model with 0.8927. This nearly perfect precision with low validation performance indicates that XGBoost has an overfitting issue; it memorizes the training set data, rather than Random Forest, which performs better in the validation set, and is selected as the best model combination.

Table 5 Top 10 model performances ranked by robustness.

Full size table

It is noticeable that Random Forest fits excellently with the modified HVI framework. Its ensemble-averaging mechanism, which considers predictions from different decision trees ensures the high robustness and stability, and effectively avoids overfitting issues. The core concept of Random Forest is combining different individual decision trees, each of which learns from the data to train, and eventually results in either the majority class for classification or mean prediction for regression during inference. To ensure the diversity of the trees, Random Forest induces two essential randomization methods shown in the following:

Bagging: Each tree is trained on a random subset of the training data, sampled with replacement. This means that each tree sees a slightly different view of the data.
Feature Randomness: When splitting nodes, RF randomly selects a subset of features rather than considering all features.

Each tree built in the RF is trained independently based on these randomization methods, while they all follow the same core principle which maximizes variance reduction . When deciding how it should split the data at each node, RF evaluates the candidate features and split points where it selects the one that minimizes the weighted sum of the variance in the two resulting child nodes. This equals maximizing the variance reduction, which is defined by the Eq. 6:

$$\begin{aligned} \text {VR}(S, A) = \text {Var}(y_S) - \left( \frac{|S_{\text {left}}|}{|S|}\text {Var}(y_{S_{\text {left}}}) + \frac{|S_{\text {right}}|}{|S|}\text {Var}(y_{S_{\text {right}}}) \right) \end{aligned}$$

(6)

From the equation, S here represents the dataset at the current random node selected, and A is a given split rule. $S_{\text {left}}$ and $S_{\text {right}}$ are two resulting subsets after the split. $\text {Var(y)}$ calculates the variance of the target variable y on the respective dataset. By greedily selecting the split rule that maximizes $\text {VR}(S, A)$, each tree then efficiently learns the local structure of the data. Once all B number of independent trees are constructed, the final prediction for a given input is calculated by averaging the outputs from all individual trees.

This splitting process continues until a data point reaches a terminal node, known as a leaf node. The prediction value stored at each leaf node is determined during training. Specifically the value of any given leaf node j, denoted as $\gamma _j$, is the arithmetic mean of the target values of all the training samples that fall into that leaf (See Eq. 7):

$$\begin{aligned} \gamma _j = \frac{1}{|I_j|} \sum _{i \in I_j} y_i \end{aligned}$$

(7)

Here, $I_j$ is the set of training samples that belong to leaf j. Thus, when a new data point traverses a tree and lands on leaf j, that tree’s prediction is this pre-calculated value, $\gamma _j$.

Once all B independent trees $f_b(x)$ are constructed in RF, where the final ensemble prediction F(x) is formed by taking the arithmetic mean of the outputs from all trees. This averaging process in Eq. 8 is the key to the model’s high performance, as it cancels out the noise and variance of individual estimators:

$$\begin{aligned} F(x) = \frac{1}{B} \sum _{b=1}^{B} f_b(x) \end{aligned}$$

(8)

Finally, the selected RandomForest model, configured with Feature Set C (HVI + Layer 2) was implemented to construct the modified HVI framework. The model’s feature importance analysis (Fig. 4) reveals a distinct hierarchy of predictive influence. The HVI emerges as the most dominant factor with the highest value of 0.7005 observed from Table 6, underscoring the critical roles of intrinsic vulnerability. Following these, Ozone and UTCI also demonstrate a significant contribution around 0.1 (0.1699, and 0.0844 respectively), while PM25 provides a more moderate, yet still valuable input (0.0368).

Table 6 Feature importance scores for the predictive model.

Full size table

Results

Spatio-temporal analysis

To examine the state-level spatial distribution of implemented indicators, we visualized them in Fig. 5. Figure 5a indicates that Selangor exhibits the highest mean annual mortality with 25778. This high mortality aligns with high urbanization and population density, with an average density of 80.01. Johor and Perak rank next and achieve 25778 and 17937 mean deaths. HVI follows a slightly different pattern where Kelantan, Sarawak, and Sabah record the highest HVI values despite lower mortality levels. For air pollution, high concentrations of Ozone mainly occur in Kuala Lumpur, however, for PM$_{2.5}$, it is averaged in Malaysia with higher concentrations in Southern regions like Johor. For UTCI, Penang achieves the highest value; it shares a similar spatial pattern with air pollution.

Temporally, our study in Fig. 6 reveals a nationwide upward trend in heat stress (UTCI) and mortality rates, consistent with global climate change patterns. While policy often focuses on the acute danger of significant HWs, our findings reveal a more insidious threat that most are unaware of. The HVI has systematically worsened since 2018, representing a creeping increase in the baseline vulnerability of the population. This hidden risk silently accumulates during non-extreme years, leaving communities progressively less prepared for future climate impacts⁶⁸.

Sensitivity and heterogeneity analysis

Sensitivity analysis was conducted by evaluating the model’s performance across different test years shown in Table 7, and we employed rolling window validation using years 2015 to 2017 as the baseline period (mean $R^2$ = 0.7411 ± 0.078 SD). The result shows no statistically significant degradation was detected across test years (2015-2022) with the highest accuracy occurred in year 2015 ($R^2$=0.8013) and year 2020 ($R^2$=0.8194) while the relatively lower accuracy was in year 2021 with $R^2$=0.6441 which is likely due to the COVID-19 mortality data anomalies (Fig. 7).

Table 7 Temporal sensitivity analysis of model performance (2015–2022).

Full size table

We also conducted a heterogeneity analysis to examine the complexity of how each feature affects mortality, and we utilized a SHAP dependent plot to describe the model. Specifically, we split Malaysia into Eastern Malaysia (EM) and Western Malaysia (WM) to observe the differences between the two regions. From Fig. 8a shows that in EM HVI is strongly correlated with mortality which is nearly in a linear relationship: where its positive effect on mortality risk (as indicated by higher SHAP values) becomes stronger. In contrast, WM shows a different pattern where it has a more complex relationship with HVI, where a threshold effect is observed: when the value of HVI is over 0, it steadily increases the mortality, when its value is below 0, it shows neglect effects.

Different than HVI’s direct effects, UTCI itself showed no consistent main effect, with its health impact primarily mediated through interaction with vulnerability (Fig. 8b). For WM, the relationship between UTCI and SHAP value is dispersed without a clear linear relationship while in EM the interaction pattern was more pronounced. Similar UTCI values lead to higher HVI values with higher SHAP values.

From Fig. 8c, d, it is clear to see a significant geographical heterogeneity. $PM_{2.5}$ makes non-consistent directional effects where SHAP values are all gathered near 0. In contrast, EM reveals $PM_{2.5}$ is a strong risk feature where $PM_{2.5}$ concentrations show a positive association with mortality risk, with this effect being amplified by higher HVI scores. Ozone displays a powerful, non-linear pattern in West Malaysia. While having a negligible effect at lower concentrations, it becomes strongly protective at higher concentrations, driving SHAP values sharply downward.

Discussion

In this study, we developed and validated a framework that contains multiple potential factors to analyze the primary drivers with their diverse contributions to the vulnerability of EH Risk. Our framework conceptualizes Heat Vulnerability, as measured by our HVI, as the baseline of population susceptibility to heat-related risks. Building on this foundation, the study then systematically investigates which set of environmental exposures better predicts mortality. Specifically, we compare two competing models: one incorporating micro-scale factors (DTR and NDVI), and another using macro-scale ambient stressors (UTCI and air pollutant concentrations). Our final result led to the strong association among air pollution (PM$_{2.5}$ and Ozone), and is the first research work to apply a novel tiered analytical framework that incorporates explicitly thermophysiological indicators.

A significant spatial decoupling often exists between regions identified as highly vulnerable and those with the highest mortality rates. This mismatch occurs because vulnerability to heat risk is driven by a diverse set of factors that vary geographically. A consistent pattern, however, is that mortality is frequently concentrated in metropolitan areas, where high population density exacerbates cumulative environmental exposure⁶⁹. Therefore, a deep assessment of this phenomenon is essential for effective public health interventions. Kelantan and Sabah express the different sources of gaining high heat vulnerability where their regions’ inner socioeconomic factors largely intensify their vulnerabilities to heat, those factors include unequal healthcare services, literacy, and water access etc., which they all captured by our basic HVI. Consistent with prior evidence, socioeconomic disadvantage and minority status are associated with a disproportionate burden of heat stress²⁵. Those factors composed a latent risk to the heat stress which significantly increases the heat exposure of the population. The danger of a heatwave is often magnified by other environmental factors. Although stressors like air pollution do not cause the EH itself, they can act as powerful amplifiers, strongly intensifying the adverse health consequences for the population⁷⁰. It is important to distinguish between acute heat risk and chronic heat vulnerability. While heat-related mortality is triggered by short-term meteorological events, the underlying susceptibility of a population is governed by structural factors. Such as aging demographics, economic capacity, and urban infrastructure, that evolve on an annual or decadal scale. Consequently, our annual HVI captures the baseline adaptive capacity of each state. Rather than obscuring heat effects, the inclusion of socioeconomic determinants (e.g., poverty, ethnicity) highlights how systemic inequalities amplify the health impacts of thermal stress, identifying regions where physiological heat exposure translates most fatally into mortality due to a lack of societal resilience.

Our results clearly show that UTCI with Air Pollutants as Layer 2 plays a better role as a predictive factor in our model. UTCI, Ozone, and $PM_{2.5}$ are indicators that present an avoidable environmental background for all the citizens where every individual is suffering from it⁷¹, while NDVI and DTR are more region-affected and customized in different areas. Our heterogeneity test further proves this point where the components within Layer 2 exhibit complex and spatially heterogeneous contributions to mortality risk⁷². For instance, $PM_{2.5}$ serves as a profound risk factor in EM but expresses a negligible effect in WM, which potentially correlates with the massive amount of biomass burning⁷³ and specific industrial activities in the EM where EM has a large portion of urbanized areas. Ozone demonstrates pronounced non-linear risk associations in the densely populated Klang Valley of Peninsular Malaysia, consistent with documented photochemical reactions of traffic and industrial emission precursors in this region⁷⁴. Collectively, these findings delineate a paradigm wherein region-specific, persistent environmental stressors constitute the primary external drivers of health risks, while intrinsic community vulnerability determines the ultimate magnitude of their health impacts²⁹.

Our model selection prioritises predictive accuracy and feature-level explainability for mortality, because actionable insight depends on both. Although some machine learning models can achieve higher accuracy, a precision of 1.0000 is more consistent with training overfit than with generalisability⁷⁵. This risk is aligned with XGB’s iterative decision tree optimisation, whereas Random Forest aggregates multiple independent trees to reduce variance and stabilise inference^76,77,74. On this basis, we selected Random Forest as the final model and retained SHAP as the primary explanation tool⁷⁸. This combination supports environmental health risk analysis where interpreting drivers and predicting outcomes are both required⁷⁹. This balance also supports transfer beyond Malaysia.

Transfer beyond Malaysia is supported by a framework that is built on reproducible data inputs and a portable modelling sequence. In the Malaysian setting, we map vulnerabilities linked to urban expansion, socioeconomic disparity, and intensifying extreme heat and air pollution, which are also present across SEA under similar climate and development trajectories. The framework is designed around globally harmonised, openly available datasets, including ERA5 reanalysis and satellite layers processed in Google Earth Engine, which reduces dependence on sparse local observations. This design allows the Hazard and Exposure components to be reproduced in Indonesia, Thailand, Vietnam, and the Philippines with minimal modification. The vulnerability layer requires national census inputs, but it uses standard indicators such as age structure, poverty, and access to essential services that are routinely collected across SEA. The PCA and subsequent machine learning stages remain data driven, so the dominant vulnerability dimensions are learned from each local context rather than imposed a priori. This local dependence is also the reason the study requires explicit limitations.

Limitations

This study has limitations related to outcome definition, shocks outside the training regime, and ecological inference. We use all-cause mortality in Malaysia, which can introduce bias when the outcome aggregates multiple pathways. This choice fits a model that includes non-thermal stressors such as air pollution and green space, but studies targeting heat-specific effects with cause-specific mortality should reconsider which non-thermal predictors remain appropriate. We observe reduced predictive performance in 2021, consistent with the COVID-19 pandemic as a systemic shock that shifted mortality patterns beyond the scope of the training data. The state-level design limits inference to population-level associations, and state-level vulnerability features may not capture within state heterogeneity, which is the core ecological fallacy concern. The small number of spatial units, with 16 states and N $\approx$ 176, constrains the complexity of models that can be fitted reliably. This constraint also supports our use of Random Forest, which showed greater stability and lower overfitting risk than more complex boosting algorithms. These limitations frame how conclusions should be interpreted and applied.

Conclusions

These conclusions translate the results into policy actions for Malaysia and settings with similar constraints. Our findings, especially those linked to Layer 2 heat stressors, support a dual track intervention strategy. In metropolitan high mortality areas of Peninsular Malaysia, policies should prioritise exposure reduction, including tighter control of ozone precursor emissions and expanded urban green infrastructure to reduce heat stress. In high vulnerability regions such as East Malaysia, priorities shift to adaptive capacity, including stronger healthcare systems, improved housing standards, climate-resilient employment pathways, and targeted education and early warning systems. These outputs can guide government and stakeholders in allocating resources for multi-risk reduction planning. Future work should prioritise individual-level cohort studies to test ecological associations and evaluate the cost-effectiveness of the dual-track strategy. Sustainable and resilient public health infrastructure remains a core requirement.

Data availability

The data that support the findings of this study are available from the first author upon reasonable request.

References

Perkins-Kirkpatrick, S. E. & Lewis, S. C. Increasing trends in regional heatwaves. Nat. Commun. 11, 3357 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
IPCC. AR6 Synthesis Report: Climate Change 2023. Intergovernmental Panel on Climate Change (IPCC) (2023).
Dong, Z. et al. Heatwaves in southeast asia and their changes in a warmer world. Earth’s Future 9 (2021).
Tran, D. et al. Spatial patterns of health vulnerability to heatwaves in vietnam. Int. J. Biometeorol. 64, 863–872 (2020).
Article ADS PubMed Google Scholar
Conlon, K. et al. Mapping human vulnerability to extreme heat: A critical assessment of heat vulnerability indices created using principal components analysis. Environ. Health Perspect. 128, 1–14 (2020).
Article Google Scholar
Guo, X., Huang, G., Jia, P. & Wu, J. Estimating fine-scale heat vulnerability in beijing through two approaches: Spatial patterns, similarities, and divergence. Remote Sens. 11, 2358 (2019).
Article ADS Google Scholar
Bernstein, A. et al. Warm season and emergency department visits to US children’s hospitals. Environ. Health Perspect. 130, 017001 (2022).
Article PubMed PubMed Central Google Scholar
Inostroza, L., Palme, M. & Barrera, F. A heat vulnerability index: Spatial patterns of exposure, sensitivity and adaptive capacity for santiago de chile. PLoS ONE 11, e0162464 (2016).
Article PubMed PubMed Central Google Scholar
Loughnan, M., Tapper, N. & Phan, T. Identifying vulnerable populations in subtropical brisbane, australia: A guide for heatwave preparedness and health promotion. Int. Scholar. Res. Notices 2014, 821759 (2014).
Google Scholar
Siddiqui, S. et al. A Systematic review and meta-analysis of the impact of environmental heat exposure on cardiovascular diseases, chronic respiratory diseases and diabetes mellitus in Low- & Middle-Income Countries. Environ. Res. 121980 (2025).
Yu, W. et al. Daily average temperature and mortality among the elderly: a meta-analysis and systematic review of epidemiological evidence. Int. J. Biometeorol. 56, 569–581 (2012).
Article ADS PubMed Google Scholar
Kamal, N. et al. Extreme heat vulnerability assessment in tropical region: a case study in malaysia. Climate Dev. 14, 472–486 (2022).
Article Google Scholar
Jeon, G. & Kim, W. Sub-district-level heat vulnerability assessment using ecostress: A case study of busan and daegu metropolitan cities. Korea. J. Remote Sens. 40, 1127–1139 (2024).
Article Google Scholar
Ramsey, V., Scannell, C., Dunbar, T., Sanderson, M. & Lowe, J. Co-producing an urban heat climate service for UK cities: A case study of belfast, northern Ireland. Clim. Serv. 34, 100464 (2024).
Article Google Scholar
Sestito, B., Reimann, L., Mazzoleni, M. & Botzen, W. & Aerts, J (A spatial statistical analysis across europe. Environmental Research Letters, Identifying vulnerability factors associated with heatwave mortality, 2025).
Google Scholar
Guo, F., Zheng, R., Zhao, J., Zhang, H. & Dong, J. Framework of street grid-based urban heat vulnerability assessment: Integrating entropy weight method and bpnn model. Urban Climate 56, 102067 (2024).
Article Google Scholar
Pham, C. & Lin, T. Assessing heat vulnerability in ho chi minh city: insights from local climate zones and the heat vulnerability index model. Proc. SPIE 13263, 30–34 (2025).
Google Scholar
Coates, L., Haynes, K., O’Brien, J., McAneney, J. & Oliveira, F. Exploring 167 years of vulnerability: An examination of extreme heat events in australia 1844–2010. Environ. Sci. & Policy 42, 33–44 (2014).
Article Google Scholar
Yang, J. et al. Leveraging machine learning to explore nonlinear associations between urban heat vulnerability and morbidity risk. Urban Clim. 59, 102320 (2025).
Article Google Scholar
Hansen, A., Bi, L., Saniotis, A. & Nitschke, M. Vulnerability to extreme heat and climate change: is ethnicity a factor?. Glob. Health Action 6, 21364 (2013).
Article PubMed PubMed Central Google Scholar
Hasan, M. Creating an urban heat vulnerability index (HVI) in the face of climate change employing geospatial technology in Halifax, Canada. Master’s thesis, Saint Mary’s University, Halifax, N.S. (2024).
Kjellstrom, T., Oppermann, E. & Lee, J. Climate Change, Occupational Heat Stress, Human Health, and Socioeconomic Factors (Springer, Cham, 2020).
Book Google Scholar
Aboulnaga, M., Trombadore, A., Mostafa, M. & Abouaiana, A. Environmental framework for mitigating high temperatures in global cities exploiting urban greening: Two case studies: Cairo, egypt, and Rome, italy. Livable Cities (2024).
Christenson, M. et al. Heat vulnerability index mapping for milwaukee and wisconsin. J. Public Health Manag. Pract. 23, 396–403 (2017).
Article PubMed Google Scholar
Harlan, S., Brazel, A., Prashad, L., Stefanov, W. & Larsen, L. Neighborhood microclimates and vulnerability to heat stress. Soc. Sci. Med. 63, 2847–2863 (2006).
Article PubMed Google Scholar
Wang, Q. et al. The relationship between population heat vulnerability and urbanization levels: A county-level modeling study across china. Environ. Int. 156, 106742 (2021).
Article CAS PubMed Google Scholar
Fong, C., Aghamohammadi, N., Ramakreshnan, L., Sulaiman, N. & Mohammadi, P. Holistic recommendations for future outdoor thermal comfort assessment in tropical southeast asia: A critical appraisal. Sustain. Cities Soc. 46, 101428 (2019).
Article Google Scholar
Adger, W. Vulnerability. Glob. Environ. Chang. 16, 268–281 (2006).
Article Google Scholar
Cutter, S., Boruff, B. & Shirley, W. Social vulnerability to environmental hazards. Soc. Sci. Q. 84, 242–261 (2003).
Article Google Scholar
Analitis, A. et al. Synergistic effects of ambient temperature and air pollution on health in Europe: Results from the PHASE project. Int. J. Environ. Res. Public Health 15, 1856 (2018).
Article PubMed PubMed Central Google Scholar
Orellano, P., Reynoso, J., Quaranta, N., Bardach, A. & Ciapponi, A. Short-term exposure to particulate matter (pm10 and $PM_{2.5}$), nitrogen dioxide (no2), and ozone (o3) and all-cause and cause-specific mortality: Systematic review and meta-analysis. Environ. Int. 142, 105876 (2020).
Department of Statistics Malaysia (DOSM). OpenDOSM: Open Data Portal Malaysia. Available at: https://open.dosm.gov.my. Accessed: 10 January 2026.
Liu, X. et al. Mapping urban heat vulnerability of extreme heat in hangzhou via comparing two approaches. Complexity 2020, 9717658 (2020).
Google Scholar
Nayak, S. et al. Development of a heat vulnerability index for new york state. Public Health 161, 127–137 (2018).
Article CAS PubMed Google Scholar
Kim, D., Deo, R., Lee, J. & Yeom, J. Mapping heatwave vulnerability in korea. Nat. Hazards 89, 35–55 (2017).
Article Google Scholar
Bradford, K., Abrahams, L., Hegglin, M. & Klima, K. A heat vulnerability index and adaptation solutions for pittsburgh, pennsylvania. Environ. Sci. Technol. 49, 11303–11311 (2015).
Article ADS CAS PubMed Google Scholar
Wolf, T. & McGregor, G. The development of a heat wave vulnerability index for london, united kingdom. Weather Clim. Extrem. 1, 59–68 (2013).
Article Google Scholar
Loughnan, M., Nicholls, N. & Tapper, N. Mapping heat health risks in urban areas. Int. J. Populat. Res. 2012, 518687 (2012).
Article Google Scholar
Reid, C. et al. Mapping community determinants of heat vulnerability. Environ. Health Perspect. 117, 1730–1736 (2009).
Article PubMed PubMed Central Google Scholar
Hare, J. et al. A vulnerability assessment of fish and invertebrates to climate change on the northeast U.S. continental shelf. PLOS ONE 11, e0146756 (2016).
Pandey, R. & Jha, S. Climate vulnerability index - measure of climate change vulnerability to communities: A case of rural lower himalaya, india. Mitig. Adapt. Strat. Glob. Change 17, 487–506 (2012).
Article Google Scholar
Fong, C. et al. Traits of adaptive outdoor thermal comfort in a tropical urban microclimate. Atmosphere 14, 852 (2023).
Article ADS Google Scholar
Thornton, P., Ericksen, P., Herrero, M. & Challinor, A. Climate variability and vulnerability to climate change: a review. Glob. Change Biol. 20, 3313–3328 (2014).
Article ADS Google Scholar
Glick, P., Stein, B. & Edelson, N. Scanning the Conservation Horizon: A Guide to Climate Change Vulnerability Assessment. (National Wildlife Federation, 2011).
Razgour, O. et al. Considering adaptive genetic variation in climate change vulnerability assessment reduces species range loss projections. Proc. Natl. Acad. Sci. U.S.A. 116, 10418–10423 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Cardona, O. et al. Determinants of risk: Exposure and vulnerability. In Managing the Risks of Extreme Events and Disasters to Advance Climate Change Adaptation: Special Report of the Intergovernmental Panel on Climate Change, 65–108 (2012).
Lee, W. et al. Synergic effect between high temperature and air pollution on mortality in northeast asia. Environ. Res. 178, 108735 (2019).
Article CAS PubMed Google Scholar
Newby, D. et al. Expert position paper on air pollution and cardiovascular disease. Eur. Heart J. 36, 83–93 (2015).
Article CAS PubMed Google Scholar
Wilson, B. & Chakraborty, A. Mapping vulnerability to extreme heat events: lessons from metropolitan chicago. J. Environ. Planning Manage. 62, 1065–1088 (2019).
Article Google Scholar
Rowland, T. Thermoregulation during exercise in the heat in children: old concepts revisited. J. Appl. Physiol. 105, 718–724 (2008).
Smith, C. Pediatric thermoregulation: Considerations in the face of global climate change. Nutrients 11, 2010 (2019).
Article CAS PubMed PubMed Central Google Scholar
Benmarhnia, T., Deguen, S., Kaufman, J. & Smargiassi, A. Vulnerability to heat-related mortality: A systematic review, meta-analysis, and meta-regression analysis. Epidemiology 26, 781–793 (2015).
Article PubMed Google Scholar
Gray, N., Lewis, A. & Moller, S. Evaluating disparities in air pollution as a function of ethnicity, deprivation and sectoral emissions in england. Environ. Int. 194, 109146 (2024).
Article CAS PubMed Google Scholar
Guo, Y. et al. Heat wave and mortality: A multicountry, multicommunity study. Environ. Health Perspect. 125, 087006 (2017).
Article PubMed PubMed Central Google Scholar
Toloo, G., Yu, W., Aitken, P., FitzGerald, G. & Tong, S. The impact of heatwaves on emergency department visits in brisbane, australia: a time series study. Crit. Care 18, R69 (2014).
Article PubMed PubMed Central Google Scholar
Cui, G. et al. The relationship among social capital, ehealth literacy and health behaviours in chinese elderly people: a cross-sectional study. BMC Public Health 21, 1–9 (2021).
Article Google Scholar
Razzak, J. et al. Impact of community education on heat-related health outcomes and heat literacy among low-income communities in karachi, pakistan: A randomised controlled trial. BMJ Glob. Health 7, e006845 (2022).
Article PubMed PubMed Central Google Scholar
Sørensen, K. et al. Health literacy and public health: A systematic review and integration of definitions and models. BMC Public Health 12, 1–13 (2012).
Article Google Scholar
Fong, C., Aghamohammadi, N., Ramakreshnan, L. & Sulaiman, N. Evaluation of secondary school student’s outdoor thermal comfort during peak urban heating hours in greater kuala lumpur. Journal of Health and Translational Medicine (JUMMEC), 3–11 (2020).
Nunes, A. Exploring the interactions between vulnerability, resilience and adaptation to extreme temperatures. Nat. Hazards 109, 2261–2293 (2021).
Article Google Scholar
Heaton, M. et al. Characterizing urban vulnerability to heat stress using a spatially varying coefficient model. Spatial Spatio-Temp. Epidemiol. 8, 23–33 (2014).
Article Google Scholar
Bowler, D., Buyung-Ali, L., Knight, T. & Pullin, A. Urban greening to cool towns and cities: A systematic review of the empirical evidence. Landsc. Urban Plan. 97, 147–155 (2010).
Article Google Scholar
Lafortezza, R., Chen, J., van den Bosch, C. & Randrup, T. Nature-based solutions for resilient landscapes and cities. Environ. Res. 165, 431–441 (2018).
Article CAS PubMed Google Scholar
Huete, A. et al. Overview of the radiometric and biophysical performance of the modis vegetation indices. Remote Sens. Environ. 83, 195–213 (2002).
Article ADS Google Scholar
Cohen, A. et al. Estimates and 25-year trends of the global burden of disease attributable to ambient air pollution: an analysis of data from the global burden of diseases study 2015. The Lancet 389, 1907–1918 (2017).
Article Google Scholar
Salleh, S. et al. The development of the vulnerability index (vi) using principal component analysis (pca). Int. J. Sustain. Constr. Eng. Technol. 14, 16–36 (2023).
Google Scholar
Niu, Y. et al. A systematic review of the development and validation of the heat vulnerability index: Major factors, methods, and spatial units. Curr. Clim. Change Reports 7, 87–97 (2021).
Article Google Scholar
Georgescu, M., Morefield, P., Bierwagen, B. & Weaver, C. Urban adaptation can roll back warming of emerging megapolitan regions. Proc. Natl. Acad. Sci. U.S.A. 111, 2909–2914 (2014).
Article ADS CAS PubMed PubMed Central Google Scholar
Vlahov, D. & Galea, S. Urbanization, urbanicity, and health. J. Urban Health (2002).
Gasparrini, A. et al. Mortality risk attributable to high and low ambient temperature: A multicountry observational study. The Lancet 386, 369–375 (2015).
Article Google Scholar
Lelieveld, J., Evans, J., Fnais, M., Giannadaki, D. & Pozzer, A. The contribution of outdoor air pollution sources to premature mortality on a global scale. Nature 525, 367–371 (2015).
Article ADS CAS PubMed Google Scholar
Monks, P. et al. Tropospheric ozone and its precursors from the urban to the global scale from air quality to short-lived climate forcer. Atmos. Chem. Phys. 15, 8889–8973 (2015).
Article ADS CAS Google Scholar
Johnston, F. et al. Estimated global mortality attributable to smoke from landscape fires. Environ. Health Perspect. 120, 695–701 (2012).
Article PubMed PubMed Central Google Scholar
Cutler, D. et al. Random forests for classification in ecology. Ecology 88, 2783–2792 (2007).
Article PubMed Google Scholar
Cawley, G. & Talbot, N. On over-fitting in model selection and subsequent selection bias in performance evaluation. J. Mach. Learn. Res. 11, 2079–2107 (2010).
MathSciNet Google Scholar
Bentéjac, C., Csörgő, A. & Martínez-Muñoz, G. A comparative analysis of gradient boosting algorithms. Artif. Intell. Rev. 54, 1937–1967 (2021).
Article Google Scholar
Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).
Article Google Scholar
Lundberg, S. & Lee, S. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems 30 (NIPS 2017), 4765–4774 (2017).
Reichstein, M. et al. Deep learning and process understanding for data-driven earth system science. Nature 566, 195–204 (2019).
Article ADS CAS PubMed Google Scholar
Kravchenko, J., Abernethy, A., Fawzy, M. & Lyerly, H. Minimization of heatwave morbidity and mortality. Am. J. Prev. Med. 44, 274–282 (2013).
Article PubMed Google Scholar
Roux, A. et al. Neighborhood of residence and incidence of coronary heart disease. N. Engl. J. Med. 345, 99–106 (2001).
Article Google Scholar

Download references

Funding

The authors would like to express their gratitude to the Universiti Malaya Living Lab Research Grant (Grant Ref. No: LL2024JNZ015) for providing the financial support necessary for this research. The support has been instrumental in facilitating the successful completion of this study.

Author information

Authors and Affiliations

Institute for Advanced Studies, Universiti Malaya, 50603, Kuala Lumpur, Malaysia
Zecheng Li & Chng Saun Fong
School of Design and Built Environment, Curtin University Sustainability Policy (CUSP) Institute, Perth, 6102, Australia
Nasrin Aghamohammadi
Department of Social and Preventive Medicine, Faculty of Medicine, Universiti Malaya, 50603, Kuala Lumpur, Malaysia
Nasrin Aghamohammadi
Harry Butler Institute, Murdoch University, Perth, 6150, Australia
Nasrin Aghamohammadi
Department of Chemical Engineering, Faculty of Engineering, Universiti Malaya, 50603, Kuala Lumpur, Malaysia
Nik Meriam Sulaiman
Department of Software Engineering, Faculty of Computer Science and Information Technology, Universiti Malaya, 50603, Kuala Lumpur, Malaysia
Siti Hafizah Ab Hamid

Authors

Zecheng Li
View author publications
Search author on:PubMed Google Scholar
Chng Saun Fong
View author publications
Search author on:PubMed Google Scholar
Nasrin Aghamohammadi
View author publications
Search author on:PubMed Google Scholar
Nik Meriam Sulaiman
View author publications
Search author on:PubMed Google Scholar
Siti Hafizah Ab Hamid
View author publications
Search author on:PubMed Google Scholar

Contributions

Z. L.: Conceptualization, Methodology, Software, Formal Analysis, Data Curation, Visualization, Writing - Original Draft. C. S. F.: Methodology, Validation, Writing - Review & Editing, Supervision. N. A.: Resources, Project Administration, Writing - Review & Editing. N. M. S.: Methodology, Investigation. S. H. A. H.: Investigation, Resources, Funding Acquisition, Writing - Review & Editing.

Corresponding authors

Correspondence to Nasrin Aghamohammadi or Siti Hafizah Ab Hamid.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Li, Z., Fong, C.S., Aghamohammadi, N. et al. Multi-scalar risk drivers for a heat vulnerability assessment framework using machine learning algorithms. Sci Rep 16, 10594 (2026). https://doi.org/10.1038/s41598-026-44880-z

Download citation

Received: 09 October 2025
Accepted: 16 March 2026
Published: 30 March 2026
Version of record: 31 March 2026
DOI: https://doi.org/10.1038/s41598-026-44880-z