Abstract
Accurate prediction in structured hospital monitoring data is challenging because inpatient datasets are high-dimensional and often contain redundant features and suboptimal hyperparameter settings. This problem is important because unreliable prediction can limit the effectiveness of hospital monitoring and clinical decision support. To address this, this study proposes a unified optimization framework that integrates the Al-Biruni Earth Radius (BER) metaheuristic with the Feature-Transformed Learning Model (FTLM) for both binary feature selection and continuous hyperparameter optimization. BER is first applied in a discrete search space to identify informative subsets of vital-sign, demographic, clinical, and temporal variables from the Patient Vital Signs and Event Tracking dataset, and then in a continuous space to tune FTLM hyperparameters under the same computational budget used for competing optimizers, including GWO, PSO, BA, WAO, SBO, SCA, FA, GA, and SAO. At baseline, FTLM achieved a mean squared error (MSE) of 0.012028 and \(R^2\) of 0.782413. After BER-based feature selection, performance improved to an MSE of \(9.80 \times 10^{-3}\) and \(R^2\) of 0.860654, with correlation of 0.848181 and Nash–Sutcliffe efficiency of 0.879577. Following BER-driven hyperparameter optimization, FTLM attained an MSE of \(7.43 \times 10^{-7}\), RMSE of \(8.62 \times 10^{-4}\), correlation coefficient of 0.955593543, \(R^2\) of 0.961124043, and Willmott Index of 0.963281686, achieving the strongest empirical performance among the evaluated optimizers under the same experimental setting. To further assess generalizability, an external validation experiment was conducted on an independent Human Vital Sign Dataset containing 200,000 samples, where BER + FTLM again achieved the strongest empirical performance among the evaluated optimizers. These findings show that BER provides stable convergence, reduced variance, and strong predictive alignment for structured clinical data modeling.
Similar content being viewed by others
Acknowledgements
Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2026R716), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.
Funding
No Fund
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Conflict of interest
All authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A External validation on an independent human vital signs dataset
Appendix A External validation on an independent human vital signs dataset
This appendix presents an external validation and robustness analysis of the proposed BER–FTLM framework using an independent structured vital-sign dataset. The purpose of this analysis is to examine whether the empirical behavior of the proposed optimization pipeline is preserved when transferred beyond the original Patient Vital Signs and Event Tracking dataset to a separate, structurally related human vital-sign dataset. Accordingly, this appendix should be interpreted as an additional experimental validation study rather than as the introduction of a new optimizer, a new prediction model, or a formal theoretical contribution.
The independent validation experiment uses the Human Vital Sign Dataset, publicly available on Kaggle, which contains structured physiological measurements intended for medical diagnostics, patient monitoring, and predictive analytics. The dataset includes both original physiological and demographic attributes and derived health indicators, including heart-rate variability, pulse pressure, body mass index, and mean arterial pressure. These characteristics make it suitable for evaluating the robustness of the proposed pipeline on a second vital-sign modeling problem with related clinical structure but distinct data composition.
A.1 Motivation for external validation
External validation is essential for determining whether a predictive modeling pipeline captures generalizable structure or primarily adapts to the statistical properties of a single dataset. Although the main experiments demonstrate the effectiveness of the BER–FTLM framework on the Patient Vital Signs and Event Tracking dataset, validation on only one dataset is insufficient for establishing broader empirical robustness. Therefore, an additional experiment was conducted on an independent human vital-sign dataset to assess whether the proposed BER-assisted optimization pipeline maintains its performance pattern under a different structured vital-sign data source.
The external dataset contains 200,000 samples and 17 columns, including heart rate, respiratory rate, timestamp, body temperature, oxygen saturation, systolic blood pressure, diastolic blood pressure, age, gender, weight, height, derived heart-rate variability, derived pulse pressure, derived body mass index, derived mean arterial pressure, patient identifier, and risk category. The inclusion of both direct physiological measurements and derived indicators provides a multivariate clinical representation that is appropriate for evaluating feature interaction learning, predictive stability, and optimizer behavior in a structured vital-sign modeling context.
The goal of this external validation is not to claim a new theoretical or algorithmic contribution. Rather, it evaluates whether the proposed BER–FTLM framework, which combines BER-based optimization with FTLM prediction, retains its empirical effectiveness when applied to a second dataset. This distinction is important because the contribution of the present work lies in the reproducible integration, benchmarking, and validation of a unified optimization-aware modeling pipeline, rather than in the formal invention of a new optimizer or a new convergence theory.
For this reason, the interpretation of the comparative optimization results is stated in empirical terms. BER is not presented as universally superior to all optimization methods. Instead, the external validation analysis evaluates whether BER shows the strongest empirical performance among the evaluated optimizers under the same experimental setting. This formulation preserves the comparative value of the results while avoiding unsupported general claims beyond the evaluated datasets, models, metrics, and computational conditions.
A.2 Independent dataset description
The external validation dataset was obtained from Kaggle: the Human Vital Sign Dataset, also described as the human vital dataset 2024. The dataset is released under the CC0: Public Domain license and contains the file human_vital_signs_dataset_2024.csv. It was designed to support research in medical diagnostics, patient monitoring, and predictive analytics by providing structured physiological measurements together with derived health indicators. Available at: https://www.kaggle.com/datasets/nasirayub2/human-vital-sign-dataset/data
The dataset contains 200,000 samples and 17 columns, providing a large-scale structured vital-sign dataset suitable for evaluating the generalization behavior of the proposed optimization-aware prediction pipeline. Its structure is clinically relevant because it combines direct physiological measurements, demographic variables, anthropometric variables, derived cardiovascular and body-composition indicators, and a categorical risk label within a single tabular representation. This makes the dataset appropriate for assessing whether the BER–FTLM framework remains empirically effective when transferred to a second vital-sign dataset with a different data source and a different target definition Table 8.
In addition to the original variables, the dataset contains derived physiological and anthropometric indicators. These derived features provide clinically interpretable transformations of the raw measurements and support multivariate modeling by encoding relationships that are not directly represented by individual variables alone. The derived variables are summarized in Table 9.
The target variable is Risk Category, with two categories: High Risk and Low Risk. High-risk cases are defined by abnormal vital-sign or body-composition thresholds, including heart rate greater than 90 bpm or less than 60 bpm, respiratory rate greater than 20 breaths per minute or less than 12 breaths per minute, body temperature greater than 37.5\(^{\circ }\)C or less than 36.0\(^{\circ }\)C, oxygen saturation less than 95%, systolic blood pressure greater than 140 mmHg or less than 110 mmHg, diastolic blood pressure greater than 90 mmHg or less than 70 mmHg, or body mass index greater than 30 or less than 18.5. Low-risk cases are those that do not satisfy any of these high-risk criteria. This target definition provides a clinically interpretable risk-stratification label and enables the external experiment to evaluate whether the proposed pipeline can model structured physiological risk patterns beyond the original dataset.
A.3 External dataset exploratory analysis
Exploratory analysis was conducted on the external Human Vital Sign Dataset to examine whether the dataset contains physiologically meaningful structure before applying baseline modeling and BER-based optimization. The analysis focuses on heart-rate distribution by risk category, heart-rate behavior across gender and risk strata, normalized multivariate patient profiles, and the distribution of derived heart-rate variability. These visual diagnostics provide evidence that the dataset contains both strong single-variable signals and weaker derived-feature patterns, thereby motivating the use of multivariate learning rather than reliance on isolated threshold rules.
Heart rate distribution by risk category in the external Human Vital Sign Dataset. The left panel shows the density distribution of heart rate for high-risk and low-risk groups, while the right panel compares heart-rate dispersion between the two risk categories using boxplots. The high-risk group shows a higher mean heart rate, approximately 83.6 bpm, compared with the low-risk group, approximately 75.1 bpm, indicating that heart-rate elevation contributes meaningfully to risk stratification in the external dataset.
Figure 48 supports the clinical plausibility of the external dataset because high-risk patients exhibit higher heart-rate values than low-risk patients. This pattern is consistent with the target definition, in which elevated heart rate is one of the abnormal physiological conditions contributing to high-risk classification. The figure also supports the use of vital-sign-based modeling because the risk groups are not arbitrary labels; rather, they reflect measurable differences in physiological parameters. At the same time, the visible overlap between the two groups indicates that heart rate alone is not sufficient to characterize all risk behavior, which motivates multivariate modeling using the full set of physiological, demographic, and derived features.
Heart rate distribution by gender and risk category. The left panel compares heart-rate density distributions for male and female patients, while the right panel reports mean heart rate stratified by gender and risk category. Mean heart rate is nearly identical between female and male patients, approximately 79.6 bpm and 79.5 bpm, respectively. However, within both genders, the high-risk subgroup shows higher mean heart rate than the low-risk subgroup.
Figure 49 indicates that the overall heart-rate distribution is highly similar across male and female patients, with nearly identical mean values. This suggests that the heart-rate difference associated with risk category is not primarily driven by gender imbalance. Within both gender groups, the high-risk subgroup demonstrates a higher mean heart rate than the low-risk subgroup, indicating that the risk-related signal is expressed within demographic strata rather than only through demographic separation. This observation supports the interpretation that the predictive models are expected to learn risk-relevant physiological structure rather than relying solely on gender-based stratification.
Normalized patient-profile radar chart comparing high-risk and low-risk groups. The radar chart summarizes normalized values for age, BMI, respiratory rate, temperature, systolic blood pressure, diastolic blood pressure, HRV, and \(\hbox {SpO}_2\). The visualization provides a compact multivariate comparison of risk-group profiles and illustrates that the external dataset contains multi-parameter physiological structure rather than a single-variable risk signal.
Figure 50 provides a compact multivariate profile of the external dataset by comparing normalized physiological and derived indicators across high-risk and low-risk groups. The radar chart shows that risk representation is distributed across several variables rather than being reducible to a single measurement. This is important because the proposed modeling framework is designed for structured vital-sign data in which physiological variables, demographic descriptors, and derived indicators may interact. The observed multi-parameter structure supports the suitability of the external dataset for evaluating the proposed BER–FTLM pipeline and aligns with the methodological premise that structured vital-sign prediction should preserve cross-variable dependencies instead of relying only on isolated thresholds.
Derived_HRV distribution analysis in the external dataset. The figure reports the overall Derived_HRV distribution, the Derived_HRV distribution by risk category, and a boxplot comparison between high-risk and low-risk groups. The mean and median Derived_HRV are both approximately 0.1000, and the risk-category boxplots show highly similar distributions.
Figure 51 shows that Derived_HRV has a stable overall distribution, with both the mean and median close to 0.1000. The high-risk and low-risk boxplots show highly similar distributions, indicating that Derived_HRV alone may not strongly separate the two risk categories in this dataset. This finding is useful because it demonstrates that not every derived feature provides strong marginal discrimination. Therefore, the external dataset is an appropriate test case for feature interaction learning and optimizer-guided modeling: strong prediction should arise from the combined structure of multiple physiological and derived variables rather than from a single-feature rule.
A.4 Baseline model performance on the external dataset
To establish a non-optimized reference point on the independent Human Vital Sign Dataset, the same baseline learning models were evaluated under the external validation setting. This baseline stage is important because it determines whether the model-ranking pattern observed in the main experiments is preserved when the framework is transferred to a separate structured vital-sign dataset. The results are reported in Table 10.
The baseline evaluation on the independent dataset shows that FTLM remains the strongest non-optimized model, achieving the lowest MSE (0.0478), RMSE (0.2186), MAE (0.1423), MBE (0.0952), and RRMSE (4.25), while maintaining strong correlation (\(r=0.949\)) and agreement (WI = 0.910). CTSM achieves the next-best baseline profile, with MSE = 0.1025, RMSE = 0.3202, \(r=0.944\), and WI = 0.907, indicating that contextual structured modeling remains competitive but does not exceed FTLM in error-based performance. VAST and LSTM show intermediate performance, whereas DTCN, TST, and VAE exhibit progressively larger errors and weaker agreement indices.
This baseline pattern supports the use of FTLM as the base learner for the subsequent optimization stage on the external dataset. More importantly, it demonstrates that the baseline model-ranking behavior observed in the original dataset is broadly reproducible in the independent dataset: FTLM remains the most favorable non-optimized configuration, while the remaining architectures follow a consistent degradation pattern across magnitude-based, correlation-based, and agreement-based metrics.
Facet grid of baseline model performance metrics on the external dataset.
Figure 52 provides a metric-wise visualization of the baseline model comparison. The error-based panels show that FTLM has the smallest values for MSE, RMSE, MAE, MBE, and RRMSE, whereas the agreement-oriented panels show that FTLM remains highly competitive in r, \(R^2\), NSE, and WI. The visual pattern confirms that FTLM is not favored by only one metric, but maintains a balanced baseline profile across the full evaluation suite.
Box plot with horizontal swarm overlay for baseline model metrics..
Figure 53 summarizes the distribution of baseline metric values across models using horizontal boxplots and swarm overlays. The figure shows that error metrics have wider dispersion across model families, particularly for MSE, RMSE, MAE, MBE, and RRMSE, while correlation- and agreement-based metrics are concentrated within a narrower range. This indicates that baseline models differ most strongly in magnitude accuracy, even though most models preserve relatively high association with the target structure.
Faceted MSE comparison across baseline learning models.
Figure 54 isolates the MSE behavior of each baseline learner. The faceted display clearly shows the progressive increase in MSE from FTLM to CTSM, VAST, LSTM, DTCN, TST, and VAE. This supports the tabulated results by showing that FTLM provides the lowest baseline squared-error magnitude, whereas VAE produces the largest MSE among the non-optimized models.
Mean metric comparison with error bars across baseline metrics.
Figure 55 provides an aggregate view of baseline metric behavior using mean values and error bars. The large scale of RRMSE relative to the remaining metrics highlights the importance of considering scale-normalized error separately from bounded agreement measures. The figure also illustrates that the baseline evaluation contains both low-range error metrics and high-range association metrics, reinforcing the need for a multi-metric assessment rather than reliance on a single performance criterion.
A.5 BER-based hyperparameter optimization on the external dataset
After establishing FTLM as the strongest baseline learner on the external Human Vital Sign Dataset, a second stage of evaluation was conducted to examine the effect of optimizer choice on the FTLM configuration. The goal of this stage is to determine whether the empirical optimization pattern observed in the primary experiments is preserved on an independent dataset. The comparative results for BER, GWO, PSO, BA, WAO, SBO, and SCA are reported in Table 11.
On the external dataset, BER + FTLM achieved the lowest MSE (0.00023), RMSE (0.01517), MAE (0.00062), MBE (0.00041), and RRMSE (1.52), while also obtaining the highest r (0.989), R2 (0.978), NSE (0.976), and WI (0.981). GWO + FTLM and PSO + FTLM provide the next strongest optimized configurations, while BA + FTLM, WAO + FTLM, SBO + FTLM, and SCA + FTLM show a gradual decline across the same performance criteria. These results reproduce the main empirical pattern observed in the original experimental setting: BER provides the strongest empirical configuration for FTLM under the evaluated optimization setting on the external Human Vital Sign Dataset.
At the same time, the interpretation of these results remains empirical and dataset-specific. The observed ranking indicates that BER is a stable and effective optimizer for this structured vital-sign modeling setting, but it does not establish universal optimizer dominance across all learning tasks, data distributions, or optimization scenarios. The primary significance of Table 11 is that the relative advantage of BER is reproduced on an independent dataset, thereby strengthening the robustness of the experimental findings.
Facet grid of optimized FTLM performance metrics across optimizers.
Figure 56 presents a metric-wise view of the optimized FTLM configurations. Across all error metrics, BER + FTLM occupies the most favorable position, followed by a largely monotonic progression through GWO + FTLM, PSO + FTLM, BA + FTLM, WAO + FTLM, SBO + FTLM, and SCA + FTLM. A complementary reverse ordering is visible for the agreement-based metrics, where BER + FTLM achieves the highest values. This figure confirms that the advantage of BER is consistent across multiple criteria rather than being concentrated in a single metric.
Box plot with horizontal swarm plot for optimized metric distributions..
Figure 57 summarizes the distribution of optimized metric values across all evaluated optimizers. The plot shows that the optimized models are relatively tightly grouped for bounded association metrics such as r, R2, NSE, and WI, whereas the spread is somewhat larger for RRMSE and the small-scale error metrics. This pattern indicates that, although all evaluated optimizers yield strong FTLM performance, optimizer choice still affects the absolute magnitude of the residual errors and relative-error behavior.
Faceted box-and-swarm comparison for each optimized metric..
Figure 58 provides a faceted comparison of each optimized metric across the seven optimizer-enhanced FTLM models. The figure makes the ranking particularly transparent: BER + FTLM occupies the most favorable position in every panel, and the remaining optimizers follow a regular deterioration pattern. The visual consistency across the metric panels supports the interpretation that the observed optimizer ranking is systematic under the current external validation setting.
Heatmap of optimized model metric comparison..
Figure 59 offers a compact tabular visualization of the optimized performance metrics. The heatmap clearly shows that BER + FTLM combines the smallest error values with the strongest agreement and association indices. The ordered gradient across rows also indicates that the differences between optimizers are not random fluctuations but follow a coherent comparative structure across the full metric set.
Q–Q plots for all optimized performance metrics.
Figure 60 shows Q–Q plots for the optimized metric values. The points align closely with the fitted reference lines for all reported metrics, reflecting a regular ordered progression of optimizer outcomes across the evaluated configurations. In the present context, the importance of this figure is descriptive: it shows that the metric values vary smoothly across optimizers and do not exhibit abrupt irregularities that would contradict the comparative pattern reported in Table 11.
Pair plot with regression lines for optimized model performance metrics.
Figure 61 illustrates pairwise relationships among the optimized performance metrics. Strong linear trends are visible between error-based measures, and inverse relationships appear between error measures and agreement-based metrics. This pairwise structure reinforces the internal consistency of the evaluation framework: optimizers that reduce residual error also improve correlation and agreement metrics, with BER + FTLM occupying the strongest overall position in that joint metric space.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Alzakari, S.A., Eid, M.M., Alhussan, A.A. et al. Structured vital sign prediction in hospital environments via an Al-Biruni earth radius optimization–driven unified metaheuristic framework. Sci Rep (2026). https://doi.org/10.1038/s41598-026-53812-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-026-53812-w
















