Table 2 Performance of clinical, supervised, and zero-shot models for CVD classification

From: Foundation models enable wearable signal screening for cardiovascular disease among people living with HIV

Model

Acc*

Recall

F1 Score

AUROC

AP Score

Clinical only

TabFPN

0.708  ± 0.10

0.500  ± 0.20

0.519  ± 0.19

0.667  ± 0.11*

0.412  ± 0.15*

ElasticNet

0.701  ± 0.09

0.500  ± 0.19

0.500  ± 0.17

0.705  ± 0.09**

0.340  ± 0.13

LightGBM

0.701  ± 0.10

0.500  ± 0.21

0.500  ± 0.17

0.667  ± 0.10*

0.362  ± 0.12

XGBoost

0.701  ± 0.09

0.500  ± 0.18

0.500  ± 0.16

0.641  ± 0.09*

0.421  ± 0.14*

Random Forest

0.779  ± 0.08

1.000  ± 0.00

0.469  ± 0.16

0.744  ± 0.08**

0.433  ± 0.11**

Decision Tree

0.543  ± 0.12

0.250  ± 0.20

0.317  ± 0.16

0.513  ± 0.13

0.210  ± 0.11

Framingham

0.480  ± 0.10

0.500  ± 0.18

0.231  ± 0.15

0.551  ± 0.10

0.208  ± 0.12

D:A:D Score

0.497  ± 0.11

0.500  ± 0.17

0.200  ± 0.13

0.462  ± 0.12

0.210  ± 0.11

Clinical + HRV features

TabFPN

0.677  ± 0.10

0.500  ± 0.21

0.444  ± 0.17

0.715  ± 0.09*

0.410  ± 0.14*

ElasticNet

0.677  ± 0.11

0.500  ± 0.20

0.444  ± 0.18

0.622  ± 0.11

0.321  ± 0.12

LightGBM

0.500  ± 0.00

0.000  ± 0.00

0.000  ± 0.00

0.745  ± 0.10*

0.444  ± 0.11*

XGBoost

0.677  ± 0.10

0.500  ± 0.19

0.444  ± 0.17

0.692  ± 0.09*

0.328  ± 0.13

Random Forest

0.631  ± 0.10

0.500  ± 0.18

0.364  ± 0.16

0.675  ± 0.10*

0.379 ± 0.12

Decision Tree

0.543  ± 0.12

0.250  ± 0.17

0.317  ± 0.15

0.551  ± 0.13

0.208  ± 0.11

Clinical + ECG embeddings

TabFPN

0.616  ± 0.10

0.500  ± 0.18

0.348  ± 0.16

0.667  ± 0.10*

0.395  ± 0.13*

ElasticNet

0.677  ± 0.09

0.500  ± 0.18

0.444  ±  0.17

0.705  ± 0.09**

0.340  ± 0.13

LightGBM

0.500  ± 0.00

0.000  ± 0.00

0.000  ± 0.00

0.756  ± 0.10*

0.322  ± 0.13

XGBoost

0.500  ± 0.00

0.000  ± 0.00

0.000  ± 0.00

0.667  ± 0.11*

0.460  ± 0.13*

Random Forest

0.555  ± 0.11

0.500  ± 0.17

0.286  ± 0.14

0.578  ± 0.12

0.291  ± 0.11

Decision Tree

0.497  ± 0.12

0.250  ± 0.17

0.240  ± 0.13

0.513  ± 0.12

0.192  ± 0.10

PPG-derived representations

PCAu

0.528  ± 0.11

0.250  ± 0.12

0.300  ± 0.12

0.689  ± 0.10

0.208 ± 0.10

NormWear (PPG)z

0.552  ± 0.10

0.250  ± 0.12

0.333  ± 0.14

0.560  ± 0.09

0.226  ± 0.10

NormWear (PPG)d

0.677  ± 0.09

0.500  ± 0.18

0.333  ± 0.14

0.667 ± 0.08

0.322  ± 0.11

PaPaGeic

0.903  ± 0.06

1.000  ± 0.00

0.560  ± 0.16

0.769  ± 0.07**

0.489  ± 0.12**

  1. Models were trained or evaluated using clinical features, ECG embeddings, or raw PPG signals. NormWear was used in zero-shot mode and PaPaGei embeddings were evaluated using a downstream classifier. Metrics are reported as mean  ± SD over 10 bootstrap resamples. Framingham and D:A:D scores are included solely to characterise the study population’s baseline cardiovascular risk profile; as these scores predict future events (5−10 year risk) rather than detect current abnormalities, they are not appropriate performance benchmarks for our diagnostic tool. Values in bold are the best for that metric. Acc* stands for Balanced Accuracy. u = unsupervised feature extraction followed by supervised classifier; z = zero-shot inference without supervised classifier; d = zero-shot inference with supervised classifier; c = pretrained embeddings followed by supervised classifier. Asterisks on PaPaGei indicate statistically significant improvement over Random Forest (Clinical only) at p < 0.05* (Wilcoxon signed-rank test) and p < 0.01**, though the modest effect size (AUROC difference 0.025) and small sample require cautious interpretation.  Framingham and D:A:D scores characterise baseline population risk for future events; not performance benchmarks for current CVD detection.
  2. NormWear PPG represents a multiply constrained baseline: cross-modal application (ECG model on PPG), minimal preprocessing (128-sample windows, no resampling), and zero local training. This establishes a performance floor for resource-limited deployment rather than optimal PPG model capabilities.