Abstract
Emergency care systems are challenged by the emergence of an ageing population, requiring tailored inputs facilitated by early care needs assessment. We examined the potential of Machine Learning algorithms to identify in-hospital healthcare contacts in older patients after emergency admission, developed from linked electronic health record (EHR) data within South-East Scotland. Gradient-boosting (XGBoost) prediction models were trained on frailty markers and nursing risk assessments to predict healthcare contacts, adverse outcomes and requirements for specialist input between arrival and 72 hours following admission. Across 98,242 patients, the predicted contact error rate varied between 49% at point of emergency attendance and 34% at 72 hours post-admission. Area-under-the-curve reached 0.89 in predicting need for urgent geriatric services, and 0.83 for in-hospital rehabilitation. Pressure ulcer risk and its documentation were predictive of received contacts. EHR data can predict granular estimates of in-hospital activity after ED attendance, facilitating quicker allocation to appropriate urgent care pathways.
Similar content being viewed by others
Introduction
Older patients with multiple long-term conditions, functional decline and geriatric syndromes comprise around two-fifths of emergency department (ED) attendances1,2,3,4. This complexity makes allocation of the optimum hospital pathway for admitted patients more difficult, increasing the risk of fragmented care or delays to appropriate specialist care. A recent study by Roussel et al. highlighted the negative implications of overnight stays in the ED, indicating a poorer likelihood of survival to discharge5. This emphasises the need for risk-aware triage, targeting patients who may be subject to serious functional decline. Current models in urgent care, necessarily focused on rapid throughput, often fail to differentiate risks between older patients within the critical 72-h window after admission1,3,6. A more rigorous understanding of the predictors of likely healthcare inputs needed within a hospital admission, including rehabilitation, could help reduce preventable adverse events, such as hospital-acquired disability7,8 or function decline9. In turn, risk-attuned systems could improve the allocation of limited multidisciplinary resources, such as physiotherapy time.
Modern electronic health record (EHR) systems now capture multidimensional indicators of frailty10, that could explain health and care contacts in hospital. Furthermore, every unique contact with a health provider can be measured and timestamped to reflect in-hospital activity11. These contacts data represent a novel endpoint to understand system use. We have previously shown how these data can describe differences in care patterns for patients experiencing early emergency readmission12, and explored rehabilitation delivery between patients with and without COVID-19 infection13.
Several studies have adopted machine learning (ML) approaches using routine data to predict hospital endpoints for patients attending the ED. Traditional outcomes, including conversion to inpatient admission14, length of stay15, in-hospital death16 or admission to critical care17 have been previously used to produce high-performing risk estimates. While these outcomes are robust and easily extractable from EHRs, such measures do not provide granular detail on the patterns or dose of care processes delivered. Further, outcomes like length of stay are often influenced by factors outside of healthcare control, such as a patient’s social needs or family support.
Our aim was to develop and evaluate ML models to forecast in-hospital healthcare contacts and key outcomes such as the requirement for rehabilitation or specialist care, using comprehensive but routinely available data from community and hospital EHRs.
Results
Cohort summary
We identified 208,477 eligible admissions following ED attendance across the three acute hospital sites. Our analysis cohort consisted of 98,242 unique patients (72 ± 12 years old, 51% women) who received a median of 7 [3, 19] nursing or rehabilitation contacts during their admission. Rehabilitation was provided for 40,946 (42%) patients, including contacts with physiotherapy (36,838, 37%), occupational therapy (22,489, 23%) and speech and language therapy (5839, 6%). Baseline characteristics by quintiles of total contacts are shown in Table 1. The majority of urgent hospitalisations were within RIE, followed by WGH, with around one-fifth of attendances derived from the lower-volume SJH.
At baseline, patients at higher contact quintiles were older, with a higher proportion of women. Distributions by quintiles of deprivation (Scottish Index of Multiple Deprivation (SIMD)) were similar between contact groups. There was a steady increase in the proportion of patients with over 4 long-term conditions recorded in their medical history over the contact frequency levels (57% in the VH group vs 41% in the VL group), but numbers with physical-mental multimorbidity were similar across all contact groups. Proportions of patients at risk of delirium (4AT), malnutrition (MUST) and pressure ulcers (Waterlow score), as well as prior fall events, also increased in line with increased healthcare contacts. Patients under specialist services had more consistent coding for ED metadata, and those under older adult and rehabilitation services had a higher proportion of complete nursing risk assessments (Supplementary Table 2).
Patients at the highest level of received contacts (VH) had the highest prevalence of in-hospital deaths (13%), admissions to older adult specialist services (38%) and non-home discharges (30%). The annual incidence rates for each secondary outcome varied (Supplementary Fig. 1). Older individuals with an outcome were more likely to be living in lower deprivation areas (Supplementary Fig. 2). The log-transformed distribution of contacts was significantly higher in patients who experienced each of the secondary outcomes (see Supplementary Fig. 3). A further summary of the baseline characteristics by each of these outcomes is available in Supplementary Tables 3–6.
Health contact characteristics
Patterns of healthcare contact type are shown for each group in Table 2. Patients in the highest (VH) contact group had a median of 51 [34, 87] total hospital contacts, with a median 7 [5, 12] documented contacts each day of admission and significant multidisciplinary input (3 ± 1 unique provider types). This was characterised by an increase in both nursing (4 [3, 7] vs 2 [2, 3] contacts per day in H, p < 0.001), and rehabilitation contacts (93% vs 58% receiving rehabilitation in H, and 4 [2, 6] vs 2 [2, 3] contacts per day respectively, p < 0.001). Time to first rehabilitation contact was longer in the highest contact quintile (47 [21, 108] h in H vs 22 [15, 59] h in L), but the smaller proportion of patients within the lowest frequency (VL) group who received rehabilitation waited the longest to receive this (54 [21, 126] h). The total in-hospital healthcare contacts were elevated in all patients with any secondary outcome, with the highest level of care in those with extended stay and admission to older adult specialist services (see Supplementary Table 7). After log-transformation, the total number of contacts per individual remained significantly higher in RIE compared to WGH and SJH, particularly in rehabilitation provision (Supplementary Fig. 4, p < 0.001). The distributions of the average contacts per day, as well as the overall hospital length of stay, were different across the three sites, with WGH having the longest distribution tail for length of stay and RIE having the longest distribution tail for overall and rehabilitation contacts. Contact variability by admission seasons was present in rehabilitation provision, with significantly higher frequencies recorded during fall and winter time (Supplementary Fig. 5).
Model training summary
The preprocessed data for modelling included a stratified training set of 68,769 and a hold-out validation set of 29,743 samples at the point of ED attendance (median age 73 [62, 81], 51% women and 16% at the highest deprivation level in both sets). Using this model setup, the XGBoost regressor consistently outperformed other fine-tuned linear and non-linear estimators (see Supplementary Table 8). The training and validation sets were balanced by age, sex, SIMD, secondary outcome prevalence, total healthcare contacts, rehabilitation-specific contacts and number of healthcare disciplines involved during hospital stay (see Supplementary Table 9). The optimal model hyperparameters, identified using grid search, are listed in Supplementary Table 10.
Model performance for in-hospital healthcare contact prediction
Performance summary in healthcare contacts prediction is shown in Table 3. The XGBoost regression model showed a clear reduction in error rate when forecasting in-hospital contacts as more hospital episode data were included. The conditional mean absolute percentage error (cMAPE) score indicated a slight to reasonable forecasting ability of the XGBoost regressor (error rate between 49% [48%–49%] and 34% [33%–35%]).
The leave-one-hospital-out validation revealed a generalisation gap between the mean absolute error (MAE) and cMAPE scores across hospital sites (Fig. 1). When validated on the smaller sample of SJH patients, the MAE reduced up to 23%, and the cMAPE reduced by 19%, substantially lower than that of the WGH and RIE samples. Compared to the original performance, the cMAPE was 37% [36%-38%] for the WGH sample and 38% [38%-39%] for the RIE sample. The BACC and Cohen’s Kappa scores (CKSs) were more similar but slightly lower than the baseline at 72 h post-admission. By this time, 60% of patients in the validation set within the WGH sample remained in the study, while 53% remained within both the RIE and SJH groups.
Balanced accuracy and Cohen’s Kappa score were estimated after quintile-based discretisation of predictions into five frequency levels: ‘Very Low’, ‘Low’, ‘Medium’, ‘High’ and ‘Very High’. MAE mean absolute error, cMAPE conditional mean absolute percentage error, estimated by masking 0 values from the estimation.
Considering the ED arrival model as the baseline, we achieved an overall 5% reduction in error at point of hospitalisation (MAE = 0.93 [0.93–0.94] from 0.97 [0.97–0.98]), 12% reduction at the first day of admission (MAE = 0.86 [0.85–0.87]) and 18% reduction by the third day of admission (MAE = 0.80 [0.79–0.81]). Use of other linear and non-linear regression estimators showed an increase in error rate, compared to XGBoost (Supplementary Table 8). The 10-fold cross-validation strategy revealed comparable MAE and cMAPE measures for the baseline model (Supplementary Table 11).
The BACC score indicated more limited classification ability in estimating the contact frequency category across five intensity groups (between 0.28 [0.28–0.29] and 0.32 [0.32–0.33], with random choice being 0.20). The quality of stratification based on measures of agreement could be treated as fair (0.34 [0.33–0.35] at the point of ED attendance) up to moderate (0.46 [0.45–0.47] by the third day of admission). Additional error analysis on the validation set (Supplementary Fig. 6) revealed that the algorithm had the highest success rate when classifying patients requiring the highest level of contacts (37–46% across prediction timepoints). Meanwhile, classification of medium levels of care was most challenging (22–24%). A stratified analysis by age groups revealed significantly higher forecasting quality among the younger groups at point of ED attendance (MAE = 1.1 in 90+ group vs 0.87 in 50–59 group), but this gap in quality reduced in the later prediction windows (Supplementary Fig. 7). These differences were not observed across SIMD groups (Supplementary Fig. 8). After adjusting for non-home discharge outcomes (Supplementary Fig. 9), patients at the highest level of received contacts had the lowest probability of death by day 10 after ED arrival (1% vs 7% in medium-high). However, the survival advantage of the highest contact group diminished by day 30.
After testing the model on survivors to discharge (n = 92,149), we observed similar performance trajectories across the four regression quality metrics (Supplementary Fig. 10). However, when validated on patients strictly admitted within the COVID-19 lockdown period, performance improved substantially with the cMAPE score dropping to 24% and the MAE reaching 0.68 after 72 h of admission (Supplementary Fig. 11).
Model performance for secondary hospital outcomes
The XGBoost classification models achieved moderate to robust performance on the hospital endpoints, considering the same input features as the health contact forecasting model. Figure 2A, B shows the change in the trajectory of the receiver operating characteristic area-under-the-curve (ROC-AUC) and precision-recall area-under-the-curve (PR-AUC) scores, detailing overall discrimination ability and positive event classification rate, respectively. Predictive quality at 24 h from admission was excellent in admissions to older adult specialist services (ROC-AUC = 0.90 [0.89–0.91], PR-AUC = 0.64 [0.60–0.68]). At this stage of prediction, discrimination was also efficient for the forecasting of any rehabilitation requirements (ROC-AUC = 0.83 [0.82–0.83], PR-AUC = 0.81 [0.79–0.83]).
A Receiver operating characteristic area under the curve. B Precision-recall area under the curve indicated by the straight coloured lines, along with the change in outcome prevalence due to patient exclusion over time, marked by dashed lines and triangle points. C Proportion of patients with the health outcome over time. D Patient response rate after risk stratification, measuring the % of correctly captured individuals at the 10th decile of risk with the respective hospital outcome.
Across all prediction timepoints, there was a notable increase in discrimination ability for in-hospital death (by 9%, 0.85 [0.84–0.85] from 0.76 [0.75–0.77]), extended stay (by 8%, 0.80 [0.79–0.80] from 0.72 [0.71–0.72]) and any rehabilitation delivery (by 7%, 0.83 [0.82, 0.83] from 0.76 [0.75–0.76]) within the first day of admission. By comparison, the positive class detection rate (PR-AUC) increased steadily over time. The tradeoff between precision and recall was reasonable in the presence of imbalanced data, as the random choice threshold for adverse event detection (equal to the incidence rate of the target time point) was two to three times lower than the reported scores.
Table 4 showed that model sensitivity was maximised either at point of hospitalisation (0.75 [0.73, 0.77]) for non-home discharge or within 24 h of admission (0.81 [0.79–0.83] for in-hospital death, 0.76 [0.75–0.77] for extended stay, 0.78 [0.78–0.79] for rehabilitation and 0.86 [0.85–0.87] for admission to geriatric services), indicating that a majority of cases could be flagged early in the admission. Selection of individuals requiring rehabilitation was close to ideal (Fig. 2D), ranging from a 79% response rate at ED arrival up to 95% by the third day of admission.
Feature importances for healthcare contacts prediction
The TreeSHAP density plot highlighted the top predictors of received healthcare contacts and the direction of their relationship (Fig. 3). Most of the top predictors were linked to older age, ED measurements, nursing assessments, blood testing and morbidities.
Density plots include the top 20 ranked predictors (beginning with the most important, independent of risk direction). Low values (blue) indicate individual association with less than average healthcare contacts, while high values (red) indicate association with above average contacts. A Point of ED arrival, B Point of hospital admission, C 24 h post-admission, D 48 h post-admission, E 72 h post-admission. Blood markers: CRP C-reactive protein, ALT alanine transaminase, CK creatine kinase, ESR erythrocyte sedimentation rate, eGFR estimated glomerular filtration rate, GGT gamma-glutamyl transferase. *Marked features include missing data indicated as −1 (lowest value: light blue).
Interestingly, after the first day of admission, the Waterlow score became the most prominent indicator of received healthcare contacts, surpassing age in importance. Patients at moderate to high risk of pressure ulcers were predictive of less than average contacts, while no documented assessment was predictive of receiving more contacts. A similar pattern was also present in the 4AT score and mobility score specific to bathing dependence. In Supplementary Fig. 12, we identified possible relationships with survivorship bias within these variables, as patients who died in hospital and were assessed to be at no or low documented risk for delirium and pressure ulcers still had significantly more nursing and rehabilitation contacts compared to survivors at lower risk. However, the distributions were similar in the higher risk ranges. Patients with no documented nursing risk assessments were likely to receive more contacts, possibly as a result of complications in care continuity (Supplementary Fig. 13). The frequency of rehabilitation contacts was notably much higher in patients with no documented Waterlow scores.
In contrast, recent diagnoses for progressive neurological disease (including dementia) were predictive of higher contact counts. Mode of arrival is a known predictor of disease severity and health outcomes. Here, it was apparent that private transport was linked to fewer healthcare inputs in the hospital, while arrival via ambulance indicated higher contacts. Attendance to RIE was likely to influence lower overall contacts, while attendance to SJH was likely to influence higher than average contacts, particularly when staying in hospital for more than 48 h. Among the lab tests, C-reactive protein (CRP), Urea and Creatine Kinase (CK) were the most prominent predictors. In this case, elevated CK or CRP (as a result of tissue damage or inflammation), or elevated urea levels (as often seen in dehydration) were associated with having more healthcare contacts.
Discussion
We have described the clinical utility of an ML approach for predicting in-hospital healthcare contacts for patients from the point of ED attendance, developed entirely using routinely entered data. Forecasting for future healthcare contacts was feasible but had limited precision at the point of ED arrival, reaching moderate performance within 72 h of admission. Model agreement across care contact levels was limited, but certainty was greatest for those receiving the most care contacts. There was a notable generalisation gap across hospital sites, as the error rate was lower across timepoints in SJH, compared to RIE and WGH. The secondary models had excellent discrimination ability for in-hospital death and requirement for older adult specialist and rehabilitation services. Within 24 h of admission, 9 out of 10 individuals within the top decile of risk for requiring rehabilitation were correctly captured. We developed models for predicting flexible endpoints of in-hospital activity, covering relevant ageing, frailty and multimorbidity predictors captured during and before hospital admission. This identified the potentially hidden predictive utility of routine nursing risk assessments and blood tests measured at the critical points of patient movement through hospital and emergency services. Our approach demonstrates the potential of healthcare contacts forecasting while acknowledging the current limitations in routine healthcare data for care pathway optimisation, with the future aim to move from queue-based resource allocation to person-centred, data-driven approaches and reduce delays to accessing specialist care.
Age, mode of ED arrival, nursing assessments (pressure damage and delirium risk) and laboratory tests (infection, tissue damage and dehydration) were among the top indicators of received healthcare contacts. This reflects a combination of chronic determinants of received healthcare and potential surrogates of acute illness. The Waterlow skin health score was the most prominent predictor for fewer contacts after 24 h of admission. A high risk of pressure ulceration often indicates underlying frailty and a limited functional baseline prior to the acute illness precipitating hospitalisation. While increasing nursing attention is clearly indicated, there may be limited scope for rehabilitation contacts in this population, as there may be no goal to regain or improve mobility. Delirium, a form of acute cognitive decline, was also associated with reduced contacts. This contrasted with dementia, where higher interactions were documented. This distinction between acute and chronic cognitive impairment is clinically meaningful, with patients suffering from delirium often unable to participate in rehabilitation until the acute driver for this condition settles. There may also be some competing risk here, given the high association between delirium and mortality, effectively reducing observed inpatient contacts in this group18.
Importantly, the lack of documentation for pressure ulcer, delirium and mobility assessments was also predictive of more received contacts later in the pathway. This may be a signal of gaps in care coordination, increasing the likelihood of triggering further clinical reviews. For delirium, it is well-known that underreporting and underdiagnosis in hospital practice can lead to high rates of complications and subsequent adverse events19. Missing mobility or skin integrity assessments can also delay necessary repositioning interventions20, including pain control and wound management, directly contributing to unmet physical needs.
In our secondary analyses, we compared the quality of prediction in binary hospital endpoints, including in-hospital death, extended stay, non-home discharge, admission under older adult specialty services and requirement for rehabilitation. Discrimination ability was excellent for the requirements of older adult specialist services, even at the point of hospitalisation, suggesting a potential clinically important role in early triage of these patients to appropriate specialist care. Delivering such individualised care is often limited by system constraints21, despite numerous trials showing benefits of these person-centred approaches22,23. Some of the results from this observational study point to the future utility of data-driven approaches to support proactive care in high-risk patients. For example, our models show excellent discrimination for death during the hospital admission, using data obtained within the first 24 h of attendance. This could support clinicians in decisions on placement for appropriate monitoring and continuity of care where the risk of deterioration may not be obviously apparent, or to allow sensitive and targeted conversations to prepare patients and families. The tradeoff between precision and recall suggested a reasonable classification rate across all hospital endpoints, but generally better sensitivity than specificity. This is common in screening applications to support clinical teams, with an acceptance of more false negative cases to maximise identification of those at the highest risk of the harmful outcome.
A few ML studies have employed adjacent approaches to capture healthcare utilisation. Our study uses logged care contacts during hospitalisation with significant completeness across the full population. Ocagli et al. used a two-step approach using preliminary clustering of hospitalised individuals based on the Barthel index24, before classifying individuals into high- and low-consumption categories25. Tello et al. provided an ML approach for predicting healthcare utilisation via in-hospital bed days, which aligns with our key objective of proposing this work to help design optimum pathways in older individuals26. We argue that predicting received healthcare contacts can yield considerable benefits for both triage and service planning, reflective of requirements for staffing and multidisciplinary care.
Definitions of healthcare utilisation in hospitals vary significantly in the literature. For example, Blecker et al. use the term ‘EHR interactions’ to provide comparisons for delivered hospital care, specifically focusing on variation between workload on weekdays and weekends27,28. The study highlights the inherent fluctuations in healthcare delivery that vary by the day of admission. This variation in available resourcing makes individualised targeting to patients at higher contact levels even more important. A number of other works used claims data linked to continuity of care measures to study all-cause or disease-specific readmission risk29,30,31. Length of stay is another health outcome that is often used to determine the complexity of treatment. However, this often does not reflect received healthcare input, as discharge delays could occur while waiting for the availability of community and social care32.
Our study has several important strengths. We provide an extensive ML framework developed from rich primary and secondary care data across multiple hospitals. We used a reproducible and interpretable data pipeline to extract, select and explain features across a range of data sources from the EHR. Our derived nursing risk assessments provide a novel set of data features that do not form part of usual disease-based coded datasets, but which capture important markers of frailty early in a hospital admission. Our use of routine nursing and rehabilitation contacts provides a novel endpoint to explain healthcare utilisation. While there are definite relationships between this and administrative measures like length of stay, the prediction of individual elements of care could be pivotal for targeting resources and staff allocation. We have previously shown the added value of healthcare contacts in relation to early readmission risk after hospital discharge, with this association adding information beyond length of stay12. This approach is scalable and transferable to other health services using EHR systems, as we use a simple contact measure that should be extractable from most EHRs as auditable data. Our models generalised better when independently validated in attendance during the COVID-19 lockdown period in the UK. This may be indicative of the predictive qualities of the routinely collected data and their robustness to rapid shifts in healthcare delivery. In a previous study, we evaluated care coordination during the pandemic within NHS Lothian services, identifying substantial shifts in the variation of care in the acute hospitalisations of older adults29. Although patient care pathways did not appear to be streamlined in patients with COVID-19 infection, it is possible that the documentation of care became more consistent during this period due to reduced routine activities and a focus on respiratory interventions and infection control.
We must also acknowledge some important limitations. Our XGBoost regressor produces an important baseline for predicting future care contacts but falls short, particularly at the early stages of ED arrival, where forecasting accuracy is weaker. While this does improve at later prediction timepoints, performance would suggest that future deployment would require cautious use with clinical oversight, as is appropriate for any decision support tool. We must also accept that the error rate was poorer when independently validated against more acute and complex presentations, typically present in the higher-volume RIE and WGH sites. This is not particularly surprising as RIE and WGH act as regional referral centres for major trauma, where many patients arrive with acute or life-threatening conditions that require rapid escalation. As a district general hospital, SJH still offers services for mental health and skin injuries, but does not typically support major trauma cases, which lowers the comparative level of patient acuity at presentation. Thus, post-hoc calibration may be needed to correct for systematic differences in prediction between tertiary and district-level hospitals with regard to the serving of specialist care.
As with all models developed on retrospective data, it is important to acknowledge that predicted outcomes reflect current practice rather than optimised care. Application must not increase inequalities by bluntly refocusing activity based entirely on model predictions, as this may embed ineffective care. We also recognise that clinical application of such tools would require significant improvements in real-time data linkage of primary and secondary healthcare data to make these model outputs visible to hospital clinicians at the point of patient attendance. Initiatives such as NHS England’s Federated Data Platform33 are likely to move closer to this ambition, but these will require sustained investment to support appropriate use of ML/AI models to improve patient care.
Further inclusion of health and frailty markers prior to hospitalisation, such as the electronic Frailty Index10 from primary care, or Hospital Frailty Risk Score34 and more granular data on illness acuity, such as vital signs that comprise the National Early Warning Score (NEWS2)35, could significantly improve generalisation. We have used blood results from ED as a surrogate of illness acuity, but these might be less reliable markers in older adults36. Our significant findings on the Waterlow score might suggest that this is capturing severe illness acuity better than other variables. Further observational research integrating these data and engagement with stakeholders could aid in determining a suitable error threshold and useful timepoints for model outputs to inform resource allocation. We must acknowledge the potential implications of survivorship bias in a retrospective study cohort. Documentation in routine screenings, such as delirium and pressure ulcer risk, may cause overrepresentation of low-risk patients with future in-hospital death, because of unanticipated disease progression precluding comprehensive assessment. Subsequently, these patients may be less likely to have documented nursing and rehabilitation assessments, resulting in more flawed predictions. Thus, a multi-state or multi-label modelling approach accounting for the competing risk of death when estimating future contacts may prove to be more effective in strengthening clinical validity. We also recognise that our current definition of a healthcare contact does not provide a holistic representation of the care delivered across the hospital service. Patients are likely to receive inputs from many other members of a multidisciplinary team, such as doctors, pharmacists and dieticians, amongst many others, which could not be included in our models. Our predictive modelling reflects the downstream risk associated with conditions, lab testing and prescribing recorded in primary care, but does not explicitly capture GP interventions (e.g. preventative management, early referrals) that may influence acuity at hospital presentation. Future research could enrich models by incorporating data on GP consultations, care access patterns, and prescribing trajectories, which may provide additional insight into how primary care influences hospital demand. Importantly, our contacts measure captures EHR interaction in clinical notation, but this does not reflect every nursing or rehabilitation interaction with a patient.
This foundational work proposes the use of ML algorithms to improve hospital resource allocation using routinely entered healthcare data. We demonstrate that ML algorithms can use routinely entered healthcare data to predict fine-grained elements of care pertaining to resource allocation in urgent care. Data-driven methods are needed to improve the efficiency and effectiveness of care pathways in older people. We have demonstrated methods that could promote equitable, targeted and person-centred care, which is of great importance in complex systems where capacity and organisation are critical constraints.
Methods
Study design and participants
To determine our study population, we used retrospective research-linked EHR data in adults aged ≥50 years across three acute hospitals in the Lothian region of South-East Scotland:
-
Royal Infirmary of Edinburgh (RIE), Edinburgh: a tertiary hospital handling major trauma and a wide range of specialist acute care
-
Western General Hospital (WGH), Edinburgh: a tertiary hospital and the regional centre for cancer treatment and specialist neurology
-
Saint John’s Hospital (SJH), Livingston: a general hospital supporting district-level emergency services with a focus on elective care
Data linkage extracted eligible hospitalisations between 1st April 2016 and 1st February 2024. Hospital admission episodes were included if linked to prior ED attendance. We included only consecutive non-surgical admissions, excluding episodes with no transfer of care after ED arrival, immediate admissions under surgical services or very short admissions without any linked nursing or rehabilitation contact. We used the first (index) admission meeting the above criteria per patient to generate an analysis cohort of unique individuals.
Healthcare contacts data for outcome derivation
To derive the endpoints for received in-hospital contacts, we used available entries from nursing staff and specialised rehabilitation providers in physiotherapy, occupational therapy and speech and language therapy. Our primary dataset used hospital care contact data, collated from a shared EHR system (TrakCare, InterSystems, MA) used in the region. This includes each record of a delivered patient interaction where a clinical note was entered in a patient record. All three hospitals included in the study used fully digital records for recording clinical care throughout this period. Timestamped clinical notes were classified by the specialty of the healthcare professional. Contacts from doctors and other clinicians, such as pharmacists, were not available for this study. We linked these data to eligible episodes between index hospital admission and discharge to calculate the total number of health contacts delivered to each patient.
Other linked data sources
Relevant hospital episodes were determined using the Scottish Morbidity Record (SMR) 0137, a national administrative dataset provided by Public Health Scotland that was linked to the hospital EHR system, to add additional information about the associated ED attendance (triage urgency codes and mode of arrival). We used the attending hospital code and season of admission as additional baseline features. Inpatient discharge codes were used to determine whether episodes led to home discharges or transfers to a non-home environment, including continued complex care. Similarly, previous attendances in outpatient clinics were linked and summarised by specialty using the SMR00 dataset. Linked demographics data included age, sex and the SIMD38, measured in quintiles and describing relative socioeconomic deprivation by postcode area within the region.
Medication history was obtained from the Scottish Prescribing Information System, containing records of dispensed community prescriptions at any point prior to the index admission. Prescriptions were categorised by Anatomical Therapeutic Chemical (ATC) coding into 14 common prescribing groups. Laboratory data for common haematology and biochemistry tests were included from both community and hospital records, including the index episode, extracted from the local EHR system (Supplementary Table 1).
Morbidities were extracted using GP Read codes from primary care (Read version 2 data) and ICD-10 coding from secondary care data (SMR01). These were consolidated into long-term condition groups based on code lists produced by the CALIBER research group39, published in the Health Data Research UK phenotyping library. We utilised 25 high-level condition groups developed from a previous observational cohort study as training inputs12. Patient deaths were determined via linkage with the National Records of Scotland registration data.
EHR specialty codes were used to determine which patients were managed by older adult specialist (geriatric medicine) services. Finally, data from a range of nursing risk assessments were linked from the EHR, using a bundle of measures mandated within the first 24 h of all older adult hospital admissions as part of accreditation and care assurance standards40. These included assessments for mobility, falls, nutrition, bowel and bladder function, infection prevention, rationale for bedrail use, and three routinely coded risk scores:
-
The 4 As test (4AT)41 score: a cumulative score between 0 and 12, where a score \(\ge\)4 indicates possible delirium.
-
The Waterlow score42: a cumulative score between 0 and 25, used to estimate risk of developing pressure ulcers (bedsores), where a score \(\ge\)10 indicates the presence of risk and recommendations for skin health.
-
The Malnutrition Screening Tool (MUST) score43: a five-step assessment using body mass index (BMI), weight loss and nutritional intake, where a cumulative score \(\ge\)2 indicates high risk of malnutrition.
Outcomes
The primary outcome was the total received in-hospital healthcare contacts, defined via documented nursing and rehabilitation inputs during hospital stay. Secondary endpoints were in-hospital death, extended hospital stay (defined as ≥14 days), non-home discharge, admission under older adult specialist care and requirement for rehabilitation (any rehabilitation-coded contact). All patient outcomes were evaluated up to hospital discharge. To assess change in performance and risk trajectory, we provide predictions of these outcomes using data available up to 5 distinct timepoints: ED arrival, at hospital admission, and 24, 48 and 72 h post-admission. Predictions at ED arrival were defined as received healthcare contacts within the index hospital stay.
Data preprocessing for model development
We adopted a data-driven approach to extracting and selecting the relevant features for prediction. Figure 4 details the clinical variables extracted to train a supervised ML model across each prediction time point. Only data available up to the point of prediction were included in each model. Apart from the test value, temporal variables describing the days since test collection, along with the 365-day average and standard deviations, were used as additional feature inputs. Morbidity features were generated based on the total number of previously diagnosed long-term conditions among the list of 25 phenotype groups. Additional binary variables categorised physical-mental multimorbidity (at least 1 physical and 1 mental condition), simple multimorbidity (2 or more conditions of any type) and high-count multimorbidity (4 or more conditions of any type). Outpatient visits were categorised by total number of attendances by specialty code, including feature variants for non-attendances. All categorical data were one-hot-encoded by adding ‘dummy’ binary variables for each possible option. Prescription features were counted by ATC group. As the healthcare contacts variable contained significant skewness, we used a bounded Box-Cox transformation at the point of prediction44, minimising the negative log-likelihood function of the contacts distribution to empirically identify the ideal scalar parameter \(\lambda\). As the optimal \(\lambda\) was estimated to be in the range (−0.3; 0.3), this was effectively equivalent to the natural logarithm of the contacts data. Further details on the classification of all extracted clinical variables, thresholds and temporal criteria can be found in Supplementary Table 1.
Predictions at the point of ED arrival are based on historical data from primary and secondary care. Predictions at the point of hospitalisation included available information collected during ED attendance. Predictions at 24 h post-admission and beyond further include routine nursing assessments and in-hospital lab testing within the target episode. Created in BioRender. Georgiev, K. (2025) https://BioRender.com/205crtq.
Data missingness was addressed by introducing a ‘meaningless’ value of ‘−1’, where appropriate for categorical and continuous data and ‘−9999’, where ‘−1’ was a possible valid measurement. This can effectively encourage non-linear models to learn how to ignore features with significant missingness45. Data cleaning procedures removed sparse features with <1% completeness within the collection window. We reported the relevant missingness across the top-ranked predictors for in-hospital healthcare contacts, grouped by population-specific hospital endpoints (admission to older adult specialist services and rehabilitation), shown in Supplementary Table 2.
Feature selection procedures
Correlated features were removed using Pearson’s correlation coefficient with a threshold ≥0.9, determining significant linear relationships. We employed an additional cross-outcome feature selection procedure to filter meaningful predictor variables using the Boruta algorithm46. Boruta operates by generating random permutations called ‘shadow’ features, followed by training a Random Forest47 model to highlight the original features that scored higher than the most prominent ‘shadow’ features. Those features will be retained and marked as ‘important’. This process was repeated for 100 iterations, generating one importance set per outcome, where the union across all sets was used as the target feature set for training. Features marked as ‘tentative’ (the Boruta algorithm did not converge to a decision) were included in the importance set if the detected ranking score was ≤10 (among the top 10 tentative features for at least one outcome). The selected features used for model training are highlighted in Supplementary Table 1.
Machine learning procedures
We used an ensemble model with gradient-boosted trees (XGBoost48) to develop a regression estimator for in-hospital healthcare contacts and a set of five classification estimators for each secondary outcome (data pipeline shown in Fig. 5). To train the models, we used a random stratified split (70% for training and 30% held-out for validation), balanced for age, sex, SIMD, total health contacts and each secondary endpoint, where applicable. In addition to a hold-out validation set, we adopted a leave-one-hospital-out validation strategy, evaluating model performance on independent data partitions across the three hospital sites. We further used a 10-fold stratified cross-validation during training to evaluate performance on random data partitions.
Created in BioRender. Georgiev, K. (2025) https://BioRender.com/b80mij6.
For the regression estimator, we reported performance using the MAE and cMAPE49, estimated after masking 0 values within the health contacts variable. MAPE is undefined (division by zero) if any actual predicted value is zero, and is heavily biased towards infinity for small actual values. Therefore, we computed a ‘conditional’ MAPE by excluding data points where the actual value is zero. We used absolute error measures as they are less sensitive to outliers compared to squared error estimates. Percentage errors like cMAPE additionally provide an interpretable and scale-independent measure suitable for forecasting tasks49,50.
To accurately measure model agreement, we equally discretised the prediction set and the original data within the validation set to define a five-level categorical variable describing the level of contact frequency: very low (VL), low (L), medium (M), high (H) and very high (VH) contacts. We then measured the Balanced Accuracy (BACC) Score across categories and the inter-rater reliability using the CKS. According to guidelines by Landis and Koch51, a CKS of 0 indicates agreement by random chance, with estimates closer to 1 indicating a more reliable systematic assessment. According to the guidelines, values up to 0.2 indicate almost no agreement, whereas values within [0.21–0.39], [0.40–0.59], [0.60–0.79], [0.80–0.90], [0.90–1] indicate minimal, weak, moderate, strong and ideal agreement, respectively. CKS was included as a chance-corrected agreement measure, which mitigates the inflation of agreement in imbalanced classes previously applied in clinical reliability studies52. The BACC is evaluated over five classes, and classification by random chance is set at 0.20, with estimates over this threshold indicating better than average sensitivity. We additionally used BACC, which explicitly accounts for unequal class sizes by averaging sensitivity and specificity, providing a fairer performance estimate in rarer contact groups53.
For the classification estimators, we reported the ROC-AUC and the PR-AUC, measuring overall discrimination ability and error rate for positive events, respectively. For PR-AUC, a higher score than the baseline outcome prevalence is treated as ‘better than random choice’. We additionally measured model sensitivity and specificity selected using the maximum F1-Score achieved for predicting an adverse event. We used decile-based discretisation on the probability scores to generate equally-sized risk groups, testing response rates within patients at the top 10% of predicted risk in the validation set. The DeLong method was used to generate 95% confidence intervals (95% CI) for all performance estimates, using a non-parametric algorithm to compute covariance between false positive rates, optimised for large sample sizes54. Details on the hyperparameter setup and training procedures are provided in the Supplementary Methods section. Hyperparameter tuning was performed using a grid search strategy focused on refining the maximum tree depth, learning rate and positive class weight of the XGBoost algorithms.
We reported the top 20 predictors and displayed their feature importances for estimating healthcare contacts using the TreeSHAP framework (Shapley Additive eXplanations)55. The aggregated Shapley values were estimated using the internal validation set of each prediction time point, summarised in global-level density plots. These values provided risk contribution scores, indicating the level of impact on contacts prediction.
Models were developed using Python version 3.10.12, using the ‘xgboost’ package (version 2.0.3) for training, ‘scikit-learn’ (version 1.3.2) for validation procedures and ‘shap’ (version 0.46.0) for feature importance analysis.
Modelling setup and training strategy
All XGBoost models were trained over a maximum of 20,000 rounds with up to 100 early stopping rounds (patience threshold for the target loss function indicating when the model should interrupt its training). Hyperparameter tuning was conducted using a grid search strategy over an exponentially spaced grid for the learning rate and a linear grid for the maximum tree depth and any additional regularisation parameters. All classification estimators were trained with an additional prevalence weighting parameter (scale_pos_weight) based on the absolute ratio of patients with the outcome. The objective for optimisation was minimising the logistic loss, due to its convex nature and subsequent ability to produce well-calibrated probabilities in the presence of imbalanced data. The regression estimator used an objective function for minimising the pseudo-huber error, which balanced the properties of MAE and RMSE. Compared to the standard Huber loss, this modified loss provides a smoother approximation in gradient-based functions, making it less sensitive to the outliers present in the health contacts variable56.
Statistical and risk analysis
Baseline patient characteristics were reported by the contact frequency group. Statistical testing was performed using the Kruskal–Wallis H test in non-normally distributed data and the one-way analysis of variance (ANOVA) test in normally distributed continuous data. Statistical testing in categorical variables was reported using Pearson’s chi-squared test. When reporting the spread of healthcare contacts by each secondary outcome, we used a two-sided Mann–Whitney–Wilcoxon test with Bonferroni correction. Statistical significance was assumed at p < 0.001. We provided additional survival analysis using an Aalen-Johansen estimator examining competing risks between in-hospital death and non-home discharge stratified by contact frequency level57.
Study approval and consent
All data from EHR and national registries were linked and de-identified by the DataLoch service (Edinburgh, United Kingdom) and analysed within a Secure Data Environment (reference: DL_2022_001). DataLoch enables access to de-identified extracts of healthcare data from the South-East Scotland region to approved applicants: https://dataloch.org/. The study was reviewed and received approval under delegated authority from a regional National Health Service Research Ethics Committee (REC 22/NS/0093) and Caldicott Guardian. Individual patient consent was not required.
Data availability
The data that support the findings of this study are not openly available due to reasons of sensitivity, but summary data can be provided by the corresponding author upon reasonable request. Specific elements of the analysis may be made available upon reasonable request to the corresponding author, subject to standard application and approval processes.
Code availability
The code used to conduct the exploratory data analysis, model development and validation is publicly available on GitHub, accessible via: https://github.com/kgeorgiev42/RCHD-Geriatric-Risk-Analytics/tree/main/notebooks/geriatric_ml.
References
Hullick, C. J., McNamara, R. & Ellis, B. Silver Book II: an international framework for urgent care of older people in the first 72 hours from illness or injury. Age Ageing. 50, 1081–1083 (2021).
Hwang, U. et al. Transforming emergency care for older adults. Health Aff. 32, 2116–2121 (2013).
LaMantia, M. A. et al. Predicting hospital admission and returns to the emergency department for elderly patients. Acad. Emerg. Med. 17, 252–259 (2010).
de Gelder, J. et al. Predicting adverse health outcomes in older emergency department patients: the APOP study. Neth. J. Med. 74, 342–352 (2016).
Roussel, M. et al. Overnight stay in the emergency department and mortality in older patients. JAMA Intern. Med. 183, 1378–1385 (2023).
Mostafa, R., El-Atawi, K., Mostafa, R. & El-Atawi, K. Strategies to measure and improve emergency department performance: a review. Cureus 16, https://doi.org/10.7759/cureus.52879 (2024).
Loyd, C. et al. Prevalence of hospital-associated disability in older adults: a meta-analysis. J. Am. Med. Dir. Assoc. 21, 455–461.e5 (2020).
Reichardt, L. A. et al. Unravelling the potential mechanisms behind hospitalization-associated disability in older patients; the Hospital-Associated Disability and Impact on Daily Life (Hospital-ADL) cohort study protocol. BMC Geriatr. 16, 59 (2016).
Geyskens, L. et al. Patient-related risk factors for in-hospital functional decline in older adults: a systematic review and meta-analysis. Age Ageing 51, afac007 (2022).
Clegg, A. et al. Development and validation of an electronic frailty index using routine primary care electronic health record data. Age Ageing 45, 353–360 (2016).
Georgiev, K. et al. Comparing care pathways between COVID-19 pandemic waves using electronic health records: a process mining case study. J. Health. Inf. Res. 9, 41–66 (2025).
Georgiev, K. et al. Understanding hospital activity and outcomes for people with multimorbidity using electronic health records. Sci. Rep. 15, 8522 (2025).
Georgiev, K. et al. Understanding hospital rehabilitation using electronic health records in patients with and without COVID-19. BMC Health Serv. Res. 24, 1245 (2024).
King, Z. et al. Machine learning for real-time aggregated prediction of hospital admission for emergency patients. npj Digit. Med. 5, 1–12 (2022).
Chrusciel, J. et al. The prediction of hospital length of stay using unstructured data. BMC Med. Inform. Decis. Mak. 21, 351 (2021).
Wu, K. H. et al. Predicting in-hospital mortality in adult non-traumatic emergency department patients: a retrospective comparison of the Modified Early Warning Score (MEWS) and machine learning approach. PeerJ 9, e11988 (2021).
Ahmed, A. et al. Developing a decision model to early predict ICU admission for COVID-19 patients: a machine learning approach. Intell. Based Med. 9, 100136 (2024).
Anand, A., Cheng, M., Ibitoye, T., Maclullich, A. M. J. & Vardy, E. R. L. C. Positive scores on the 4AT delirium assessment tool at hospital admission are linked to mortality, length of stay and home time: two-centre study of 82,770 emergency admissions. Age Ageing 51, afac051 (2022).
Titlestad, I. et al. Delirium is frequently underdiagnosed among older hospitalised patients despite available information in hospital medical records. Age Ageing 53, afae006 (2024).
Jaul, E. Assessement and management of pressure ulcers in the elderly. Drugs Aging 27, 311–325 (2010).
Coulter, A. & Oldham, J. Person-centred care: What is it and how do we get there?. Future Hosp. J. 3, 114–116 (2016).
Berntsen, G. K. R. et al. Person-centred, integrated and pro-active care for multi-morbid elderly with advanced care needs: a propensity score-matched controlled trial. BMC Health Serv. Res. 19, 682 (2019).
Thevelin, S. et al. Experience of hospital-initiated medication changes in older people with multimorbidity: a multicentre mixed-methods study embedded in the OPtimising thERapy to prevent Avoidable hospital admissions in Multimorbid older people (OPERAM) trial. BMJ Qual. Saf. 31, 888–898 (2022).
Mahoney, F. I. & Barthel, D. W. Functional evaluation: the Barthel Index: a simple index of independence useful in scoring improvement in the rehabilitation of the chronically ill. Md. State Med. J. 14, 61–65 (1965).
Ocagli, H. et al. Profiling patients by intensity of nursing care: an operative approach using machine learning. J. Pers. Med. 10, 279 (2020).
Tello, M. et al. Machine learning based forecast for the prediction of inpatient bed demand. BMC Med Inf. Decis. Mak. 22, 55 (2022).
Blecker, S. et al. Electronic health record use, intensity of hospital care, and patient outcomes. Am. J. Med. 127, 216–221 (2014).
Blecker, S. et al. Impact of an intervention to improve weekend hospital care at an academic medical center: an observational study. J. Gen. Intern. Med. 30, 1657–1664 (2015).
Swanson, J. O., Vogt, V., Sundmacher, L., Hagen, T. P. & Moger, T. A. Continuity of care and its effect on readmissions for COPD patients: a comparative study of Norway and Germany. Health Policy 122, 737–745 (2018).
Amico, P., Pope, G. C., Meadow, A. & West, P. Episode-based payment for the medicare outpatient therapy benefit. Arch. Phys. Med. Rehabil. 97, 1323–1328 (2016).
Halfon, P., Eggli, Y., Morel, Y. & Taffé, P. The effect of patient, provider and financing regulations on the intensity of ambulatory physical therapy episodes: a multilevel analysis based on routinely available data. BMC Health Serv. Res. 15, 52 (2015).
Bayer-Oglesby, L., Zumbrunn, A. & Bachmann, N. Social inequalities, length of hospital stay for chronic conditions and the mediating role of comorbidity and discharge destination: a multilevel analysis of hospital administrative data linked to the population census in Switzerland. PLoS ONE 17, e0272265 (2022).
NHS England. NHS Federated Data Platform. Accessed 25 September 2025, https://www.england.nhs.uk/digitaltechnology/nhs-federated-data-platform/
Gilbert, T. et al. Development and validation of a Hospital Frailty Risk Score focusing on older people in acute care settings using electronic hospital records: an observational study. Lancet 391, 1775–1782 (2018).
Royal College of Physicians. National Early Warning Score (NEWS) 2: Standardising the Assessment of Acute-Illness Severity in the NHS. Updated Report of a Working Party https://www.rcp.ac.uk/media/a4ibkkbf/news2-final-report_0_0.pdf (RCP, 2017).
Flamant, L. et al. Association between admission biomarkers and clinical outcome in older adults diagnosed with an infection in the emergency department. Acta Clin. Belgica 78, 285–290 (2023).
Public Health Scotland. SMR01 Data Dictionary. https://publichealthscotland.scot/resources-and-tools/health-intelligence-and-data-management/national-data-catalogue/data-dictionary/search-the-data-dictionary/smr01-generalacute-inpatient-and-day-case (2021).
SIMD (Scottish Index of Multiple Deprivation). Scottish Index of Multiple Deprivation. Accessed 20 February 2023, http://simd.scot/.
Denaxas, S. et al. UK phenomics platform for developing and validating electronic health record phenotypes: CALIBER. J. Am. Med Inf. Assoc. 26, 1545–1559 (2019).
NHS Lothian report. Lothian Accreditation & Care Assurance Standards Framework. https://staff.nhslothian.scot/falls/wp-content/uploads/sites/27/2024/02/LACAS-Framework-2023.pdf (2023).
Tieges, Z. et al. Diagnostic accuracy of the 4AT for delirium detection in older adults: systematic review and meta-analysis. Age Ageing 50, 733–743 (2020).
Waterlow, J. From costly treatment to cost-effective prevention: using Waterlow. Br. J. Community Nurs. 10, S25–S30 (2005).
Stratton, R. J. et al. Malnutrition in hospital outpatients and inpatients: prevalence, concurrent validity and ease of use of the ‘malnutrition universal screening tool’(‘MUST’) for adults. Br. J. Nutr. 92, 799–808 (2004).
Box, G. E. & Cox, D. R. An analysis of transformations. J. R. Stat. Soc. Ser. B Stat. Methodol. 26, 211–243 (1964).
Lipton, Z. C., Kale, D. C. & Wetzel, R. Modeling missing data in clinical time series with RNNs. Mach. Learn. Healthc. 56, 253–270 (2016).
Kursa, M. B. & Rudnicki, W. R. Feature selection with the Boruta Package. J. Stat. Softw. 36, 1–13 (2010).
Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).
Chen, T. & Guestrin, C. XGBoost: a scalable tree boosting system. In Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 785–794 (ACM, 2016).
De Myttenaere, A., Golden, B., Le Grand, B. & Rossi, F. Mean absolute percentage error for regression models. Neurocomputing 192, 38–48 (2016).
Makridakis, S. Accuracy measures: theoretical and practical concerns. Int. J. Forecast. 9, 527–529 (1993).
Landis, J. R. & Koch, G. G. The measurement of observer agreement for categorical data. Biometrics 33, 159–174 (1977).
Sim, J. & Wright, C. C. The Kappa statistic in reliability studies: use, interpretation, and sample size requirements. Phys. Ther. 85, 257–268 (2005).
Diallo, R., Edalo, C. & Awe, O. O. Machine learning evaluation of imbalanced health data: a comparative analysis of balanced accuracy, MCC, and F1 score. In Practical Statistical Learning and Data Science Methods: Case Studies from LISA 2020 Global Network, USA (eds Awe, O. O. & Vance E. A.) 283–312. https://doi.org/10.1007/978-3-031-72215-8_12 (Springer Nature Switzerland, 2025).
Sun, X. & Xu, W. Fast implementation of DeLong’s algorithm for comparing the areas under correlated receiver operating characteristic curves. IEEE Signal Process. Lett. 21, 1389–1393 (2014).
Lundberg, S. & Lee, S. I. A unified approach to interpreting model predictions. NeurIPS. 30 (2017).
Tong, H. et al. Nonasymptotic analysis of robust regression with modified Huber’s loss. J. Complex. 76, 101744 (2023).
Aalen, O. O. & Johansen, S. An empirical transition matrix for non-homogeneous Markov chains based on censored observations. Scand. J. Stat. 5, 141–150 (1978).
Acknowledgements
K.G. is supported by a PhD Fellowship award from the Sir Jules Thorn Charitable Trust (21/01PhD) as part of the University of Edinburgh’s Precision Medicine PhD programme. D.D. is supported by the British Heart Foundation (PG/24/12136). NL.M. is supported by a Chair Award, Programme Grant, and Research Excellence Award (CH/F/21/90010, RG/20/10/34966, RE/18/5/34216, RE/24/130012) from the British Heart Foundation. This work was supported by DataLoch (dataloch.org), which is core-funded by the Data-Driven Innovation programme within the Edinburgh and South East Scotland City Region Deal (ddi.ac.uk), and the Chief Scientist Office, Scottish Government (cso.scot.nhs.uk).
Author information
Authors and Affiliations
Contributions
K.G. and A.A. conceived the study design. K.G. extracted the target population, developed the features and conceived the primary analysis and evaluation procedures. K.G. conducted the data analysis and model development. A.A. provided expert input for refining the input features and target endpoints. K.G. validated the models and conducted stratified and sensitivity analyses. D.D. provided feedback on the study design and conducted data analysis. K.G. and A.A. had access to and verified the raw data, analysis tables and model outputs. K.G. drafted the manuscript. All authors revised the manuscript critically for important intellectual content. NL.M., J.F., SD.S. and J.M. provided support and expert feedback on the manuscript. All authors provided their final approval of the version to be published. All authors are accountable for the work.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Georgiev, K., Doudesis, D., McPeake, J. et al. Machine learning-based predictions of healthcare contacts following emergency hospitalisation using electronic health records. npj Digit. Med. 8, 764 (2025). https://doi.org/10.1038/s41746-025-02138-4
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41746-025-02138-4







