Introduction

Hypertension is the leading risk factor for cardiovascular disease, making it an important target for intervention in routine clinical practice1. Blood pressure-lowering with antihypertensive treatment has been shown to be very effective at reducing the risk of cardiovascular events across all age groups2. However, blood pressure lowering is not without harm3. Previous studies have highlighted the increased risk of adverse events such as hypotension, syncope, falls, acute kidney injury, and electrolyte abnormalities, especially in older patients and those with frailty4. For some individuals, where the risk of harm is high, it may not be appropriate to prescribe antihypertensive treatment. In those already treated, interventions such as deprescribing may be considered5,6. To enable informed decision-making, clinicians need to understand an individual’s underlying risk of adverse events, so that this can be weighed against a patient’s likelihood of benefit from new or continued treatment.

In previous studies, clinical prediction models have been developed to estimate the risk of serious falls and acute kidney injury in patients indicated for antihypertensive therapy7,8. Although most patients in these studies with a high risk of acute kidney injury or falls also had a high risk of cardiovascular disease (CVD) a small number were shown to be at high risk of adverse events but low risk of cardiovascular disease8. In these individuals, new or continued treatment may not be appropriate. Using prediction models to understand an individual’s risk of specific adverse events, treatment strategies can be personalised to ensure antihypertensive treatment is only prescribed to those with the most to gain.

In this study, we develop and externally validate three clinical prediction models for adverse events commonly associated with antihypertensive treatment—namely hypotension, syncope, and fracture—using data from over five million patients in the Clinical Practice Research Datalink. These models estimate baseline risk regardless of treatment status and are designed to support personalised prescribing decisions by identifying individuals at higher risk of adverse outcomes.

Results

Study population characteristics

A total of 1,773,224 patients were included in the model development cohort (CPRD GOLD) with a mean age of 59 years (SD 13 years) and a mean systolic blood pressure at study inclusion of 144 mmHg (SD 12 mmHg) (Table 1, figure S1). The 10-year prevalence of hazardous hypotension was 1.6% (n = 28,450), syncope was 2.2% (n = 39,898), and fracture was 4.1% (n = 73,491). The median follow-up time across the whole cohort was 6 years (IQR 2.6 to 10 years).

Table 1 Baseline characteristics of patients in the development dataset (CPRD Gold)

In the validation cohort, 3,805,366 patients were included, with 63,019 (1.7%) experiencing a hazardous hypotensive event, 84,262 (2.2%) a syncope event and 151,630 (4.0%) a fracture event during 10-year follow-up (table S3). Median follow-up time in the validation cohort was 7.0 years (2.9 to 10). Ethnicity data were more complete in the validation cohort compared to the development cohort (81% vs 44% complete data).

Model Development

A total of 31 predictors were included in the final STRATIFY-Hypotension model, after the exclusion of covariates with little or no association with hypotension (Table 2). High social deprivation, current smoking status, previous hypotension, chronic kidney disease, and Parkinson’s disease were strong predictors of hypotension resulting in hospitalisation or death. Prescription of all types of antihypertensive medication were associated with an increased risk of hazardous hypotension, with ACE inhibitors (SHR 1.41, 95% CI 1.37 to 1.45), angiotensin II receptor antagonists (SHR 1.36, 95% CI 1.30 to 1.43) and alpha blockers (SHR 1.35, 95% CI 1.26 to 1.45) conferring the greatest risk.

Table 2 STRATIFY prediction models for Hypotension, Syncope and Fracture. Values represent sub-distribution hazard ratios and 95% confidence intervals

A total of 31 predictors were included in the final STRATIFY-Syncope model (Table 2). Covariates predictive of syncope were similar to those predictive of hypotension, with the exception of South Asian ethnicity, other ethnicity, dementia and heart failure which were all associated with a reduced risk of syncope requiring hospitalisation or leading to death, and antipsychotic medication prescription which was associated with increased risk.

A total of 37 predictors were included in the final STRATIFY-Fracture model (Table 2). Of these, 14 were unique predictors of fracture that were not included in the STRATIFY-Hypotension and STRATIFY-Syncope models. Strong predictors of fracture included heavy drinking, female sex, chronic liver disease, previous fracture, multiple sclerosis, epilepsy, osteoporosis and rheumatoid arthritis. All antihypertensive medications had a weak or no association with the risk of fracture. Other medications were associated with an increased risk of fracture, with the exception of hormone replacement therapy which conferred a lower risk of fracture (Table 2).

Age was not linearly related with any of the outcomes so transformations were used. Miscalibration was observed across all models at 5 and 10 years so they were recalibrated to the observed pseudo-values in the development dataset (figure S2).

External validation

The distribution of the prognostic index for the derivation and external validation datasets can be seen in the appendix (figure S3), and model performance statistics are given in Table 3, S4 and S5 and Fig. 1. The final STRATIFY-Hypotension model exhibited strong discriminative ability at 10 years (C-statistic 0.824, 95% CI 0.823 to 0.826) and close agreement between predicted and observed risks depending on time horizon (Observed/Expected [O/E] at 10 years 0.983, 95% CI 0.961 to 1.005). The STRATIFY-Syncope model also showed strong discriminative ability (C-statistic at 10 years 0.819, 95% CI 0.817 to 0.821) and close agreement between predicted and observed risks (O/E ratio at 10 years 1.028, 95% CI 1.009 to 1.047). The STRATIFY-Fracture model showed good discrimination (C-statistic at 10 years 0.790, 95% CI 0.789 to 0.792) and close agreement between predicted and observed risks but with some underprediction for low probabilities (O/E ratio at 10 years 1.13, 95% CI 1.11 to 1.14). Model performance varied more among smaller practices, with more consistent performance seen as practice size increased (figures S4 to S6).

Fig. 1: Calibration curves for the external validation performance of the final STRATIFY models (CPRD Aurum).
figure 1

Green line corresponds to the line of equality, red line the calibration curve with 95% CI and yellow dots the deciles of predicted risk. The 5- and 10-year models were re-calibrated using the derivation dataset. For one-year re-calibration was not required. Groups represent tenths of the linear predictor, as created between deciles. Histograms underneath each calibration plot show the distribution of predicted probabilities.

Table 3 Predictive performance statistics at 10 years for the final STRATIFY-Hypotension, STRATIFY-Syncope and STRATIFY-Fracture models upon external validation in CPRD Aurum

Using a threshold of 5% across all models 732,598 (41%) of the patients were classified as high risk for at least one of the three adverse events at 10 years. Amongst the patients who were classified as high risk for at least one adverse event, 280,326 (38%) were classified as high risk for all three, 287,345 (39%) were classified high risk only for fracture, 31,289 (4.3%) only for syncope and 2,829 (0.4%) only for hypotension. ~24,195 (3.3%) were at high risk for both syncope and hypotension (Fig. 2). Results were similar when using a 10% threshold to define high risk patients in each model (figure S7).

Fig. 2: Overlap of High-Risk Patients Identified by STRATIFY Models (CPRD Gold).
figure 2

Venn diagrams showing the overlap of high-risk patients ( ≥ 5%) as classified using the final STRATIFY-Hypotension, STRATIFY-Syncope, and STRATIFY-Fracture in the CPRD Gold cohort (derivation). Denominator population is the total number of patients with either a high ( ≥ 5%) STRATIFY-Hypotension, STRATIFY-Syncope or STRATIFY-Fracture risk.

Decision curve analysis indicated that all three models had clinical utility across all three time points (Fig. 3). For example, using the STRATIFY-Hypotension model with a 10-year time horizon to guide decisions on prescribing would result in a higher net benefit compared to a “deprescribe/don’t treat anyone” strategy, and the same was true for the STRATIFY-Syncope model and the STRATIFY-Fracture model.

Fig. 3: Decision curve analysis of STRATIFY Models (CPRD Aurum).
figure 3

Decision curves, showing the smoothed, standardised net benefit of using the prediction models across different threshold probabilities for assigning treatment. Treat all corresponds to introducing adverse event prevention measures (which may include deprescribing) for all patients and treat none corresponds to not introducing adverse event prevention measures for all patients.

Subgroup analyses of the 10-year risk models showed similar performance in younger ( < 65 years) and older patients ( ≥ 65 years) and in females and males (figures S8, S10; tables S6 and S7). There was some evidence of under-prediction of hypotension risk in patients of white, black and South Asian ethnicity (figures S11, S12, table S8), although net benefit was consistent across ethnic minority groups for all three models (figure S13).

Comparison with CVD risk

When using a 10% risk threshold for both cardiovascular disease and adverse events, no patients had a high risk of adverse events but low risk of cardiovascular disease. At the 5% threshold, among those patients with a low risk of cardiovascular disease at 10 years, 244 (0.01%) had a high risk of hypotension, 2656 (0.2%) had a high risk of syncope and 17,040 (1%) had a high-risk fracture. Most patients had a high risk of cardiovascular disease but low risk adverse events (Fig. 4).

Fig. 4: Comparison of 10-Year Cardiovascular Risk (QRisk2) with STRATIFY Risk Scores (CPRD Gold).
figure 4

Comparison of 10-year cardiovascular risk (QRisk2) and final STRATIFY-Hypotension, STRATIFY-Syncope and STRATIFY-Fracture risk in the CPRD Gold dataset (derivation). Red dashed lines indicate 5% risk threshold; Black dashed lines indicate 10% risk CVD cardiovascular disease.

Discussion

This study developed three clinical prediction models for adverse events related to antihypertensive treatment, which estimate the baseline risk of hypotension, syncope and fracture over the next 1, 5 and 10 years. The models demonstrated good discrimination and suggested that individuals were most likely to be classified at high risk of a fracture, followed by syncope and hypotension. Calibration showed close agreement between predicted and observed risks for all models across all time horizons, with the exception of the hypotension model at 1 year, which tends to underestimate the risk. Some minor underprediction was also observed for the fracture model at 10 years for low predicted probabilities. Given this, we would suggest caution in using the 1-year hypotension model at this stage.

When compared to risk of cardiovascular disease, only a very small proportion of patients ( < 1%) were found to be at high risk of adverse events and low risk of cardiovascular disease. A large proportion of patients (39-56%) were identified to have high risk of cardiovascular disease and low risk of adverse events. This suggests that for the majority of people, the potential benefits of treatment will outweigh the risk of hypotension, syncope or fracture. The information from these models may therefore be useful in helping patients make informed decisions about their treatment options, potentially reducing unnecessary worry or apprehension.

As part of the clinical utility assessment, the predicted risks from each model were compared with the risk of cardiovascular disease, generated by the QRisk2 algorithm for specific thresholds9. This latter model was recommended by NICE during the study period and estimates the risk of cardiovascular disease in patients aged 35-84 years over a 10-year period, but does not consider the competing risk of death10. This can lead to overestimation of cardiovascular risk, particularly over shorter timeframes and in older patients with multiple health conditions where the competing risk of death from other causes is higher11. As a result, analyses showing that the risk of cardiovascular disease outweighs the risk of adverse events should be interpreted with caution, particularly over shorter timeframes.

Of the adverse events examined in the present study, fracture has most commonly been studied in previous risk prediction modelling studies12. Common examples include the FRAX score, Garvan Fracture Risk Calculator and the QFracture tool13,14,15. These typically focus on hip and osteoporotic fracture (FRAX and QFracture) and display varying performance upon external validation due to differences in underlying population and input variables12,14,15. However, unlike the present STRATIFY models, none of these previous models takes into account the competing risk of death and this has been shown to lead to significant over-prediction of fracture risk in older patients with multi-morbidity16. This is important when considering adverse event risk in particular, where one treatment strategy for high risk patients might include deprescribing, or not starting therapy which still carries benefit. Few studies have examined the risk of hypotension or syncope, but these tend to focus on risk prediction during emergency department admission and inpatient stays in hospital17,18,19,20. One study examined the risk of postural hypotension in the community and found moderate discrimination, but this model was not externally validated and calibration was not assessed21.

Clinical guidelines for the management of hypertension are increasingly recommending consideration of deprescribing antihypertensive therapy in specific circumstances, where the benefits of treatment may be outweighed by the harms22. In the UK, the National Institute for Health and Care Excellence currently advises that clinicians should use clinical judgement in blood pressure lowering treatment decisions in the presence of multimorbidity10. The purpose of developing these risk prediction models was to help clinicians estimate the baseline risk of adverse reactions. The models can be applied to both patients that are on antihypertensive treatment or patients for which treatment is being considered. Clinicians can then combine the estimated baseline risk with relative treatment effects obtained from well conducted randomised clinical trials or observational studies to estimate how the risk gets modified by starting, changing or altering the dosage of a medication3,4. Based on this updated risk different treatment strategies can be considered depending on the outcome of interest. For example, in patients at high risk of hypotension and fracture, modification of antihypertensive treatment may be considered, whereas in patients at risk of fracture alone, other prevention strategies may be more appropriate. These tools should be used alongside CVD risk estimation tools to get a more complete picture of the harm/benefit profile of the patient. This can enable better informed decisions regarding when to prescribe, continue or deprescribe antihypertensive treatment. To this end, these algorithms could easily be integrated into electronic health records systems to work alongside existing risk stratification tools such as QRisk9.

All three models suggested net clinical benefit when compared to usual care (with the exception of the 1-year hypotension model), which typically would not involve modifying treatment to account for adverse event risk. However, very few patients who were at high risk of serious hypotension or syncope were also observed to be at low risk of cardiovascular disease (between 0.01% and 1%), thus implementing interventions which withhold or deprescribe treatment due to the risk of hypotension or syncope alone is only likely to be considered for a very small number of patients.

Slightly more individuals were at high risk of fracture and low risk of cardiovascular disease (1%) when using a fracture and cardiovascular risk threshold of 5%, however, the direct association between antihypertensive treatment and fractures is disputed and is likely to be small3,4,23,24. Therefore enthusiasm for intervening in such patients should be tempered by a likely small effect from withholding or deprescribing treatment. For most patients, these models should be employed to provide reassurance that the risk of adverse events is low for the vast majority of the population even after taking into consideration treatment effects from other studies. This information can be useful for clinicians and patients in helping them to make informed decisions about their treatment options, potentially reducing unnecessary apprehension about starting treatment. Where patients are considered at high risk of adverse events such as fracture, other strategies besides modifying antihypertensive treatment may be more appropriate such as monitoring or addressing some of the other risk factors that might be increasing the risk.

This analysis has several strengths, including the robust analytical approach with both internal and external validation across multiple practices, which demonstrate each of the models’ reliability and generalisability within the UK primary care population25. There are also some limitations. These data may not accurately capture all events of interest (due to incorrect or incomplete coding), potentially affecting the model’s performance, especially if certain events are systematically underreported or misclassified26. Findings from clinical utility analyses should be interpreted with caution, as they estimate factual risk (i.e., risk based on baseline characteristics). The net benefit in Decision Curve Analysis (DCA) reflects the clinical utility of using the model’s predictions for treatment decision-making, without accounting for the causal effect of altering antihypertensive medication on outcomes. This study primarily aims to predict baseline risk and does not evaluate how treatment changes will modify these risks. While the model enhances risk stratification and supports decision-making by identifying high- and low-risk patients, it does not provide direct insight into how starting or adjusting antihypertensive treatments affects patient outcomes. Thus, any conclusions regarding treatment effects should rely on additional evidence, such as randomised controlled trials or observational studies. However, it is reasonable to assume that a model with higher net benefit may still improve clinical outcomes by better targeting interventions

The present study used large datasets of electronic health records to derive and externally validate three clinical prediction models to estimate the baseline risk of adverse events associated with antihypertensive therapy. These models were shown to perform well but revealed that only a small proportion of patients eligible for antihypertensive treatment are at high risk of adverse events in the short to medium term. Therefore, these models are most likely to be of clinical utility in providing reassurance to patients considering antihypertensive treatment, potentially reducing unnecessary worry or apprehension.

Methods

Ethics approval

The study protocol was approved by CPRD’s Independent Scientific Advisory Committee in February 2019 before obtaining the data relevant to the project (protocol given in the eAppendix in the Supplement). All data are fully anonymised so consent was not required. A project summary is published on the CPRD website (https://www.cprd.com/isac). Elements of the methodology used in this manuscript have been previously reported in related publications from the same research programme, using the same dataset and protocol and are summarised here for completeness7,8,27.

Design

We conducted a retrospective observational cohort study to develop three clinical prediction models using data from the Clinical Practice Research Datalink (CPRD) GOLD. This dataset includes primary care records from general practices that use the Vision electronic health record system (Cegedim Healthcare Solutions, London, England). The cohort comprised 11.33 million patients from 674 general practices, of whom 4.4 million were active (alive)28. For external validation, we used a second retrospective observational cohort based on CPRD Aurum, which contains data from practices using Egton Medical Information Systems (EMIS, Leeds, England) which t included 19 million patients from 738 practices, with 7 million active patients29. Both CPRD GOLD and Aurum datasets are representative of the UK population in terms of age, sex, ethnicity, and deprivation28,29. Primary care data from both sources were linked to additional datasets, including Office for National Statistics (ONS) mortality data, Hospital Episode Statistics (HES), and the Index of Multiple Deprivation (IMD). The study protocol was approved by the Independent Scientific Advisory Committee (ISAC) for CPRD (protocol number 19_042; see Protocol S1 in the supplementary material).

Population

Patients were eligible for inclusion if they were registered at a general practice in England contributing linked data to CPRD GOLD between 1 January 1998 and 31 December 2018. To avoid duplicate entries, individuals appearing in both CPRD GOLD and CPRD Aurum due to transitions between electronic health record systems during the study period were excluded from the CPRD Aurum (validation) dataset. Inclusion criteria required patients to be aged 40 years or older at the time of data entry (with no upper age limit), registered with a CPRD “up-to-standard” practice (for GOLD only), and to have records available during the defined study period.

Patients entered the cohort at the point they became potentially eligible for antihypertensive therapy, defined as the date of their first systolic blood pressure measurement ≥130 mmHg following the start of the study period. Follow-up continued for a maximum of 10 years.

The 130 mmHg threshold was selected to align with the varying treatment initiation criteria outlined in international hypertension guidelines22,30. Patients with a systolic blood pressure measurement ≥180 mmHg were excluded from the cohort, as treatment would be indicated for them regardless of their estimated risk of adverse outcomes.

Baseline patient characteristics and model predictors were assessed at the index date, defined as 12 months after cohort entry. The same eligibility criteria and procedures for determining baseline characteristics were applied consistently to both the development and validation cohorts.

Patients exited the cohort at the end of follow-up (31 December 2018) or upon transferring out of a CPRD-registered practice, death, or occurrence of the specific outcome of interest.

Outcomes

For each model, the primary outcome was defined as any hospitalisation or death with a primary diagnosis of (hazardous) hypotension, syncope, or fracture occurring within 10 years of the index date. This 10-year time horizon aligns with those used in established cardiovascular risk prediction models9. Outcomes were identified using ICD-9 and ICD-10 codes recorded in Hospital Episode Statistics (HES) and Office for National Statistics (ONS) mortality data (see Table S1 for code lists). Pre-specified secondary outcomes included hypotension, syncope, or fracture (defined identically) occurring within 1 and 5 years of the index date, to account for potential short-term clinical relevance.

Model covariates

Potential predictors of hazardous hypotension, syncope, and fracture were identified based on published literature and consultation with clinical experts. A detailed summary of included variables is provided in Supplementary Table S2. A total of 40 predictors were assessed for the hypotension model, 41 for the syncope model, and 44 for the fracture model. These included demographic characteristics (such as age, sex, smoking history, and alcohol intake), medical history (e.g., prior relevant adverse events, diabetes, chronic kidney disease, stroke, atrial fibrillation, arrhythmias, osteoporosis, rheumatoid arthritis, epilepsy), and current medications (including but not limited to antihypertensives, opioids, sedatives, antidepressants, corticosteroids, and proton pump inhibitors; see Table S2). All comorbidities and clinical history were defined using relevant Read codes recorded any time before the index date. In contrast, medication exposure was defined by at least one prescription issued in the 12 months prior to index.

Sample size

A pre-specified sample size calculation was used to guide model development, yielding an estimated events-per-variable (EPV) range from 7 for the hypotension model to 20 for the fracture model. These estimates were based on assumptions of event rates between 18 and 51 per 10,000 person-years, a median follow-up duration of 7 years, an anticipated Nagelkerke’s R² value of 0.15, and a maximum of 40 candidate predictor parameters per model4,31. Under these assumptions, the required number of outcome events was estimated to range from 277 to 784. The development cohort from CPRD GOLD substantially exceeded these requirements.

For external validation, the syncope model required a minimum sample of ~8000 individuals, including at least 400 events, to ensure a 95% confidence interval width of 0.2 around the estimated calibration slope32. This estimate was derived under several assumptions: a skew-normal distribution for the linear predictor with mean 0.16, variance 0.5, skewness 1, and kurtosis 4; an exponential survival time distribution with a baseline hazard rate of 0.008 (corresponding to 89% survival at 10 years); and an exponential distribution for censoring times with a rate of 0.2 (implying ~87% censoring by 10 years). Comparable sample size estimates were obtained for the other outcomes. The CPRD Aurum validation dataset also exceeded these sample size thresholds.

Statistical analysis

The analysis was conducted using the statistical software R versions 4.02 and 4.1.1 and STATA 16. All models are reported in line with the transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) guidelines for reporting of clinical prediction models (see Guideline S1 in appendix)33. Descriptive statistics were calculated for baseline characteristics in the model development and external validation cohorts separately.

Model development

Model development and internal validation were carried out by researchers at the University of Oxford (CK, AW, JPS). For each imputed dataset, multivariable models were constructed using Fine-Grey subdistribution hazard regression to account for the competing risk of death from other causes34. This approach was selected to avoid overestimating the cumulative incidence of adverse events in the presence of competing mortality35. Model coefficients are presented as subdistribution hazard ratios (SHRs) with corresponding 95% confidence intervals. Baseline cumulative incidence functions were estimated post-estimation using a Breslow-type method as outlined by Fine and Grey34. Analyses were undertaken using the fastcmprsk package in RStudio36. Automated selection algorithms were not employed; all predictors were predefined based on prior literature and expert clinical judgement. Given the large sample size, most variables would be statistically significant, so further filtering was used to improve parsimony. Specifically, predictors with SHRs close to 1 (i.e., between 0.95 and 1.05) and low prevalence were excluded in the final model-fitting stage. Shrinkage or penalisation methods to adjust for overfitting were not necessary due to the very large sample size.

To assess the linearity of continuous predictors (age, systolic and diastolic blood pressure, frailty index), fractional polynomial transformations were used37. The best-fitting transformation for each continuous covariate was applied uniformly across all imputations to ensure consistent coefficient estimates.

Potential interactions between age, sex, and antihypertensive therapy were explored but were excluded due to convergence issues, model instability and for the sake of parsimony. The proportional hazards assumption for each covariate was evaluated using Schoenfeld residuals38.

Apparent validation using development data

Apparent calibration of the models was evaluated using calibration plots that compared predicted and observed risks at 1, 5, and 10 years. Observed outcome probabilities were estimated using pseudo-values—jackknife-based estimators that quantify an individual’s contribution to the cumulative incidence function for each outcome while accounting for the competing risk of death, derived using the Aalen–Johansen method39. To enhance stability, pseudo-values were computed separately within 50 groups stratified by linear predictor values. These calculations accounted for both competing risks and non-informative right censoring40,41. Calibration plots were constructed from the pseudo-values, incorporating a non-parametric smooth curve (symmetric nearest neighbour smoothing) with 95% confidence intervals to visualise the agreement between predicted and observed risks across the full risk spectrum42. Plots were generated separately for each imputed dataset, and consistency across imputations was assessed.

When miscalibration was observed at any time point (1, 5, or 10 years), the original model was recalibrated for that time point by fitting a generalised linear model with a logit link to the observed pseudo-values in the development cohort. The recalibration model used only the original model’s linear predictor as the independent variable and allowed for non-linear recalibration via fractional polynomials. These recalibrated models were then subjected to external validation using the independent validation dataset.

External validation

External validation of the prediction models was conducted independently by researchers at the University of Birmingham (LA, KIES, RDR), separate from the model development team. The full prediction algorithms are presented in the supplementary appendix (Equations S1S3) and were applied to individuals in the validation dataset to generate predicted probabilities of hypotension, syncope, or fracture within 1, 5, and 10 years, accounting for the competing risk of death from other causes43. Calibration was assessed by comparing predicted risks to observed event probabilities, estimated using pseudo-values as described previously39.

Model performance was summarised using calibration plots, observed-to-expected (O/E) ratios, Harrell’s C-statistic, and Royston’s D-statistic along with its associated R² each calculated using the pseudo-values as described above. To evaluate variability in model performance across general practices, we used random-effects meta-analysis with restricted maximum likelihood estimation (REML), acknowledging that case mix and event incidence may differ between sites25,44. The O/E ratio was pooled on the natural log scale, the C-statistic on the logit scale (standard errors derived via the delta method), and the D-statistic was pooled on its original scale45,46. Pooled estimates are reported with prediction intervals (PI) to give an indication of expected model performance in a new GP practice.

Clinical utility was examined using decision curve analysis for a range of potential threshold probabilities probabilities47. A decision threshold is defined as the probability at which a patient is classified as high risk and thus a decision has to be made (i.e., treat, refer for further investigation, etc.). The range of probabilities should reflect potential decision thresholds for the STRATIFY models. In the UK a CVD risk of 10% is considered high and given that the STRATIFY models should be used alongside a CVD risk tool a range of threshold probabilities of up to 20% was considered reasonable. The final models for each outcome were compared at 1, 5 and 10 years to ‘model-blind methods’ of (a) introducing adverse event prevention measures (which may include deprescribing) for all patients or (b) not introducing adverse event prevention measures for all patients, regardless of risk. If the models have higher net benefit than the strategies (a) or (b) then this would suggest using the models to inform prescribing would be preferable.

Direct comparison with existing cardiovascular risk prediction tools (e.g., QRisk2) using decision curve analysis is not possible. Therefore, to further explore clinical utility and potential implementation, we examined the relationship between predicted risk of hypotension, syncope, and fracture (at 1, 5, and 10 years) and 10-year cardiovascular risk using QRisk2 at 5% and 10% thresholds9. The overlap in patients identified as high-risk ( ≥ 5% and ≥10%) by the STRATIFY models versus QRisk2 was quantified and visualised using Venn diagrams.

Missing data

To address missing data in both the development and validation cohorts, we employed multiple imputation using chained equations, generating ten imputed datasets for each cohort. Separate imputation procedures were conducted independently for the development and validation datasets. The imputation models included all model covariates within each dataset, along with the Nelson-Aalen estimator for the cumulative baseline cause-specific hazards for hypotension, syncope or fracture and for the competing event of death, and binary event indicators for each of these possible event types48,49. For comorbidity diagnoses and prescribed medications, missingness was handled under the assumption that absence of data reflected absence of diagnosis or prescription. Variables imputed included ethnicity, body mass index (BMI) category, smoking status, alcohol consumption, and (in the validation cohort only) the deprivation score.

Imputations were assessed for consistency and validity by comparing density plots, histograms, and summary statistics across imputations and back to complete values. Following imputation, model coefficients and performance metrics were estimated separately within each imputed dataset and then combined using Rubin’s Rules50. In instances where Rubin’s Rules were inappropriate due to non-normal posterior distributions, summary measures across imputations were reported using the median and interquartile range (IQR)51. A sensitivity analysis using a full case approach was also employed to compare with the imputed models.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.