Table 2 Performance of clinical, supervised, and zero-shot models for CVD classification

From: Foundation models enable wearable signal screening for cardiovascular disease among people living with HIV

Model	Acc*	Recall	F1 Score	AUROC	AP Score
Clinical only
TabFPN	0.708 ± 0.10	0.500 ± 0.20	0.519 ± 0.19	0.667 ± 0.11*	0.412 ± 0.15*
ElasticNet	0.701 ± 0.09	0.500 ± 0.19	0.500 ± 0.17	0.705 ± 0.09**	0.340 ± 0.13
LightGBM	0.701 ± 0.10	0.500 ± 0.21	0.500 ± 0.17	0.667 ± 0.10*	0.362 ± 0.12
XGBoost	0.701 ± 0.09	0.500 ± 0.18	0.500 ± 0.16	0.641 ± 0.09*	0.421 ± 0.14*
Random Forest	0.779 ± 0.08	1.000 ± 0.00	0.469 ± 0.16	0.744 ± 0.08**	0.433 ± 0.11**
Decision Tree	0.543 ± 0.12	0.250 ± 0.20	0.317 ± 0.16	0.513 ± 0.13	0.210 ± 0.11
Framingham^†	0.480 ± 0.10	0.500 ± 0.18	0.231 ± 0.15	0.551 ± 0.10	0.208 ± 0.12
D:A:D Score^†	0.497 ± 0.11	0.500 ± 0.17	0.200 ± 0.13	0.462 ± 0.12	0.210 ± 0.11
Clinical + HRV features
TabFPN	0.677 ± 0.10	0.500 ± 0.21	0.444 ± 0.17	0.715 ± 0.09*	0.410 ± 0.14*
ElasticNet	0.677 ± 0.11	0.500 ± 0.20	0.444 ± 0.18	0.622 ± 0.11	0.321 ± 0.12
LightGBM	0.500 ± 0.00	0.000 ± 0.00	0.000 ± 0.00	0.745 ± 0.10*	0.444 ± 0.11*
XGBoost	0.677 ± 0.10	0.500 ± 0.19	0.444 ± 0.17	0.692 ± 0.09*	0.328 ± 0.13
Random Forest	0.631 ± 0.10	0.500 ± 0.18	0.364 ± 0.16	0.675 ± 0.10*	0.379 ± 0.12
Decision Tree	0.543 ± 0.12	0.250 ± 0.17	0.317 ± 0.15	0.551 ± 0.13	0.208 ± 0.11
Clinical + ECG embeddings
TabFPN	0.616 ± 0.10	0.500 ± 0.18	0.348 ± 0.16	0.667 ± 0.10*	0.395 ± 0.13*
ElasticNet	0.677 ± 0.09	0.500 ± 0.18	0.444 ± 0.17	0.705 ± 0.09**	0.340 ± 0.13
LightGBM	0.500 ± 0.00	0.000 ± 0.00	0.000 ± 0.00	0.756 ± 0.10*	0.322 ± 0.13
XGBoost	0.500 ± 0.00	0.000 ± 0.00	0.000 ± 0.00	0.667 ± 0.11*	0.460 ± 0.13*
Random Forest	0.555 ± 0.11	0.500 ± 0.17	0.286 ± 0.14	0.578 ± 0.12	0.291 ± 0.11
Decision Tree	0.497 ± 0.12	0.250 ± 0.17	0.240 ± 0.13	0.513 ± 0.12	0.192 ± 0.10
PPG-derived representations
PCA^u	0.528 ± 0.11	0.250 ± 0.12	0.300 ± 0.12	0.689 ± 0.10	0.208 ± 0.10
NormWear (PPG)^z^‡	0.552 ± 0.10	0.250 ± 0.12	0.333 ± 0.14	0.560 ± 0.09	0.226 ± 0.10
NormWear (PPG)^d^‡	0.677 ± 0.09	0.500 ± 0.18	0.333 ± 0.14	0.667 ± 0.08	0.322 ± 0.11
PaPaGei^c	0.903 ± 0.06	1.000 ± 0.00	0.560 ± 0.16	0.769 ± 0.07**	0.489 ± 0.12**

Models were trained or evaluated using clinical features, ECG embeddings, or raw PPG signals. NormWear was used in zero-shot mode and PaPaGei embeddings were evaluated using a downstream classifier. Metrics are reported as mean ± SD over 10 bootstrap resamples. Framingham and D:A:D scores are included solely to characterise the study population’s baseline cardiovascular risk profile; as these scores predict future events (5−10 year risk) rather than detect current abnormalities, they are not appropriate performance benchmarks for our diagnostic tool. Values in bold are the best for that metric. Acc* stands for Balanced Accuracy. u = unsupervised feature extraction followed by supervised classifier; z = zero-shot inference without supervised classifier; d = zero-shot inference with supervised classifier; c = pretrained embeddings followed by supervised classifier. Asterisks on PaPaGei indicate statistically significant improvement over Random Forest (Clinical only) at p < 0.05* (Wilcoxon signed-rank test) and p < 0.01**, though the modest effect size (AUROC difference 0.025) and small sample require cautious interpretation. ^† Framingham and D:A:D scores characterise baseline population risk for future events; not performance benchmarks for current CVD detection.
^‡ NormWear PPG represents a multiply constrained baseline: cross-modal application (ECG model on PPG), minimal preprocessing (128-sample windows, no resampling), and zero local training. This establishes a performance floor for resource-limited deployment rather than optimal PPG model capabilities.

Back to article page

Table 2 Performance of clinical, supervised, and zero-shot models for CVD classification

Search

Quick links