Machine learning-based predictions of healthcare contacts following emergency hospitalisation using electronic health records

Georgiev, Konstantin; Doudesis, Dimitrios; McPeake, Joanne; Mills, Nicholas L.; Shenkin, Susan D.; Fleuriot, Jacques D.; Anand, Atul

doi:10.1038/s41746-025-02138-4

Download PDF

Article
Open access
Published: 17 December 2025

Machine learning-based predictions of healthcare contacts following emergency hospitalisation using electronic health records

Konstantin Georgiev¹,
Dimitrios Doudesis¹,
Joanne McPeake²,
Nicholas L. Mills¹,
Susan D. Shenkin³,
Jacques D. Fleuriot⁴ &
…
Atul Anand¹

npj Digital Medicine volume 8, Article number: 764 (2025) Cite this article

3820 Accesses
1 Altmetric
Metrics details

Subjects

Abstract

Emergency care systems are challenged by the emergence of an ageing population, requiring tailored inputs facilitated by early care needs assessment. We examined the potential of Machine Learning algorithms to identify in-hospital healthcare contacts in older patients after emergency admission, developed from linked electronic health record (EHR) data within South-East Scotland. Gradient-boosting (XGBoost) prediction models were trained on frailty markers and nursing risk assessments to predict healthcare contacts, adverse outcomes and requirements for specialist input between arrival and 72 hours following admission. Across 98,242 patients, the predicted contact error rate varied between 49% at point of emergency attendance and 34% at 72 hours post-admission. Area-under-the-curve reached 0.89 in predicting need for urgent geriatric services, and 0.83 for in-hospital rehabilitation. Pressure ulcer risk and its documentation were predictive of received contacts. EHR data can predict granular estimates of in-hospital activity after ED attendance, facilitating quicker allocation to appropriate urgent care pathways.

Predicting individual patient and hospital-level discharge using machine learning

Article Open access 18 November 2024

Using explainable machine learning to identify patients at risk of reattendance at discharge from emergency departments

Article Open access 02 November 2021

A comparative study of explainable ensemble learning and logistic regression for predicting in-hospital mortality in the emergency department

Article Open access 10 February 2024

Introduction

Older patients with multiple long-term conditions, functional decline and geriatric syndromes comprise around two-fifths of emergency department (ED) attendances^1,2,3,4. This complexity makes allocation of the optimum hospital pathway for admitted patients more difficult, increasing the risk of fragmented care or delays to appropriate specialist care. A recent study by Roussel et al. highlighted the negative implications of overnight stays in the ED, indicating a poorer likelihood of survival to discharge⁵. This emphasises the need for risk-aware triage, targeting patients who may be subject to serious functional decline. Current models in urgent care, necessarily focused on rapid throughput, often fail to differentiate risks between older patients within the critical 72-h window after admission^1,3,6. A more rigorous understanding of the predictors of likely healthcare inputs needed within a hospital admission, including rehabilitation, could help reduce preventable adverse events, such as hospital-acquired disability^7,8 or function decline⁹. In turn, risk-attuned systems could improve the allocation of limited multidisciplinary resources, such as physiotherapy time.

Modern electronic health record (EHR) systems now capture multidimensional indicators of frailty¹⁰, that could explain health and care contacts in hospital. Furthermore, every unique contact with a health provider can be measured and timestamped to reflect in-hospital activity¹¹. These contacts data represent a novel endpoint to understand system use. We have previously shown how these data can describe differences in care patterns for patients experiencing early emergency readmission¹², and explored rehabilitation delivery between patients with and without COVID-19 infection¹³.

Several studies have adopted machine learning (ML) approaches using routine data to predict hospital endpoints for patients attending the ED. Traditional outcomes, including conversion to inpatient admission¹⁴, length of stay¹⁵, in-hospital death¹⁶ or admission to critical care¹⁷ have been previously used to produce high-performing risk estimates. While these outcomes are robust and easily extractable from EHRs, such measures do not provide granular detail on the patterns or dose of care processes delivered. Further, outcomes like length of stay are often influenced by factors outside of healthcare control, such as a patient’s social needs or family support.

Our aim was to develop and evaluate ML models to forecast in-hospital healthcare contacts and key outcomes such as the requirement for rehabilitation or specialist care, using comprehensive but routinely available data from community and hospital EHRs.

Results

Cohort summary

We identified 208,477 eligible admissions following ED attendance across the three acute hospital sites. Our analysis cohort consisted of 98,242 unique patients (72 ± 12 years old, 51% women) who received a median of 7 [3, 19] nursing or rehabilitation contacts during their admission. Rehabilitation was provided for 40,946 (42%) patients, including contacts with physiotherapy (36,838, 37%), occupational therapy (22,489, 23%) and speech and language therapy (5839, 6%). Baseline characteristics by quintiles of total contacts are shown in Table 1. The majority of urgent hospitalisations were within RIE, followed by WGH, with around one-fifth of attendances derived from the lower-volume SJH.

Table 1 Patient characteristics at baseline grouped by in-hospital healthcare contact level

Full size table

At baseline, patients at higher contact quintiles were older, with a higher proportion of women. Distributions by quintiles of deprivation (Scottish Index of Multiple Deprivation (SIMD)) were similar between contact groups. There was a steady increase in the proportion of patients with over 4 long-term conditions recorded in their medical history over the contact frequency levels (57% in the VH group vs 41% in the VL group), but numbers with physical-mental multimorbidity were similar across all contact groups. Proportions of patients at risk of delirium (4AT), malnutrition (MUST) and pressure ulcers (Waterlow score), as well as prior fall events, also increased in line with increased healthcare contacts. Patients under specialist services had more consistent coding for ED metadata, and those under older adult and rehabilitation services had a higher proportion of complete nursing risk assessments (Supplementary Table 2).

Patients at the highest level of received contacts (VH) had the highest prevalence of in-hospital deaths (13%), admissions to older adult specialist services (38%) and non-home discharges (30%). The annual incidence rates for each secondary outcome varied (Supplementary Fig. 1). Older individuals with an outcome were more likely to be living in lower deprivation areas (Supplementary Fig. 2). The log-transformed distribution of contacts was significantly higher in patients who experienced each of the secondary outcomes (see Supplementary Fig. 3). A further summary of the baseline characteristics by each of these outcomes is available in Supplementary Tables 3–6.

Health contact characteristics

Patterns of healthcare contact type are shown for each group in Table 2. Patients in the highest (VH) contact group had a median of 51 [34, 87] total hospital contacts, with a median 7 [5, 12] documented contacts each day of admission and significant multidisciplinary input (3 ± 1 unique provider types). This was characterised by an increase in both nursing (4 [3, 7] vs 2 [2, 3] contacts per day in H, p < 0.001), and rehabilitation contacts (93% vs 58% receiving rehabilitation in H, and 4 [2, 6] vs 2 [2, 3] contacts per day respectively, p < 0.001). Time to first rehabilitation contact was longer in the highest contact quintile (47 [21, 108] h in H vs 22 [15, 59] h in L), but the smaller proportion of patients within the lowest frequency (VL) group who received rehabilitation waited the longest to receive this (54 [21, 126] h). The total in-hospital healthcare contacts were elevated in all patients with any secondary outcome, with the highest level of care in those with extended stay and admission to older adult specialist services (see Supplementary Table 7). After log-transformation, the total number of contacts per individual remained significantly higher in RIE compared to WGH and SJH, particularly in rehabilitation provision (Supplementary Fig. 4, p < 0.001). The distributions of the average contacts per day, as well as the overall hospital length of stay, were different across the three sites, with WGH having the longest distribution tail for length of stay and RIE having the longest distribution tail for overall and rehabilitation contacts. Contact variability by admission seasons was present in rehabilitation provision, with significantly higher frequencies recorded during fall and winter time (Supplementary Fig. 5).

Table 2 Summary of the healthcare contacts distribution in the target population, grouped by in-hospital contact frequency

Full size table

Model training summary

The preprocessed data for modelling included a stratified training set of 68,769 and a hold-out validation set of 29,743 samples at the point of ED attendance (median age 73 [62, 81], 51% women and 16% at the highest deprivation level in both sets). Using this model setup, the XGBoost regressor consistently outperformed other fine-tuned linear and non-linear estimators (see Supplementary Table 8). The training and validation sets were balanced by age, sex, SIMD, secondary outcome prevalence, total healthcare contacts, rehabilitation-specific contacts and number of healthcare disciplines involved during hospital stay (see Supplementary Table 9). The optimal model hyperparameters, identified using grid search, are listed in Supplementary Table 10.

Model performance for in-hospital healthcare contact prediction

Performance summary in healthcare contacts prediction is shown in Table 3. The XGBoost regression model showed a clear reduction in error rate when forecasting in-hospital contacts as more hospital episode data were included. The conditional mean absolute percentage error (cMAPE) score indicated a slight to reasonable forecasting ability of the XGBoost regressor (error rate between 49% [48%–49%] and 34% [33%–35%]).

Table 3 Model performance summary for healthcare contacts prediction

Full size table

The leave-one-hospital-out validation revealed a generalisation gap between the mean absolute error (MAE) and cMAPE scores across hospital sites (Fig. 1). When validated on the smaller sample of SJH patients, the MAE reduced up to 23%, and the cMAPE reduced by 19%, substantially lower than that of the WGH and RIE samples. Compared to the original performance, the cMAPE was 37% [36%-38%] for the WGH sample and 38% [38%-39%] for the RIE sample. The BACC and Cohen’s Kappa scores (CKSs) were more similar but slightly lower than the baseline at 72 h post-admission. By this time, 60% of patients in the validation set within the WGH sample remained in the study, while 53% remained within both the RIE and SJH groups.

**Fig. 1: Leave-one-hospital-out validation for in-hospital contacts forecasting across prediction timepoints.**

Considering the ED arrival model as the baseline, we achieved an overall 5% reduction in error at point of hospitalisation (MAE = 0.93 [0.93–0.94] from 0.97 [0.97–0.98]), 12% reduction at the first day of admission (MAE = 0.86 [0.85–0.87]) and 18% reduction by the third day of admission (MAE = 0.80 [0.79–0.81]). Use of other linear and non-linear regression estimators showed an increase in error rate, compared to XGBoost (Supplementary Table 8). The 10-fold cross-validation strategy revealed comparable MAE and cMAPE measures for the baseline model (Supplementary Table 11).

The BACC score indicated more limited classification ability in estimating the contact frequency category across five intensity groups (between 0.28 [0.28–0.29] and 0.32 [0.32–0.33], with random choice being 0.20). The quality of stratification based on measures of agreement could be treated as fair (0.34 [0.33–0.35] at the point of ED attendance) up to moderate (0.46 [0.45–0.47] by the third day of admission). Additional error analysis on the validation set (Supplementary Fig. 6) revealed that the algorithm had the highest success rate when classifying patients requiring the highest level of contacts (37–46% across prediction timepoints). Meanwhile, classification of medium levels of care was most challenging (22–24%). A stratified analysis by age groups revealed significantly higher forecasting quality among the younger groups at point of ED attendance (MAE = 1.1 in 90+ group vs 0.87 in 50–59 group), but this gap in quality reduced in the later prediction windows (Supplementary Fig. 7). These differences were not observed across SIMD groups (Supplementary Fig. 8). After adjusting for non-home discharge outcomes (Supplementary Fig. 9), patients at the highest level of received contacts had the lowest probability of death by day 10 after ED arrival (1% vs 7% in medium-high). However, the survival advantage of the highest contact group diminished by day 30.

After testing the model on survivors to discharge (n = 92,149), we observed similar performance trajectories across the four regression quality metrics (Supplementary Fig. 10). However, when validated on patients strictly admitted within the COVID-19 lockdown period, performance improved substantially with the cMAPE score dropping to 24% and the MAE reaching 0.68 after 72 h of admission (Supplementary Fig. 11).

Model performance for secondary hospital outcomes

The XGBoost classification models achieved moderate to robust performance on the hospital endpoints, considering the same input features as the health contact forecasting model. Figure 2A, B shows the change in the trajectory of the receiver operating characteristic area-under-the-curve (ROC-AUC) and precision-recall area-under-the-curve (PR-AUC) scores, detailing overall discrimination ability and positive event classification rate, respectively. Predictive quality at 24 h from admission was excellent in admissions to older adult specialist services (ROC-AUC = 0.90 [0.89–0.91], PR-AUC = 0.64 [0.60–0.68]). At this stage of prediction, discrimination was also efficient for the forecasting of any rehabilitation requirements (ROC-AUC = 0.83 [0.82–0.83], PR-AUC = 0.81 [0.79–0.83]).

**Fig. 2: Performance trajectory curves for the binary hospital outcomes reported with 95% confidence intervals across the five prediction timepoints.**

Across all prediction timepoints, there was a notable increase in discrimination ability for in-hospital death (by 9%, 0.85 [0.84–0.85] from 0.76 [0.75–0.77]), extended stay (by 8%, 0.80 [0.79–0.80] from 0.72 [0.71–0.72]) and any rehabilitation delivery (by 7%, 0.83 [0.82, 0.83] from 0.76 [0.75–0.76]) within the first day of admission. By comparison, the positive class detection rate (PR-AUC) increased steadily over time. The tradeoff between precision and recall was reasonable in the presence of imbalanced data, as the random choice threshold for adverse event detection (equal to the incidence rate of the target time point) was two to three times lower than the reported scores.

Table 4 showed that model sensitivity was maximised either at point of hospitalisation (0.75 [0.73, 0.77]) for non-home discharge or within 24 h of admission (0.81 [0.79–0.83] for in-hospital death, 0.76 [0.75–0.77] for extended stay, 0.78 [0.78–0.79] for rehabilitation and 0.86 [0.85–0.87] for admission to geriatric services), indicating that a majority of cases could be flagged early in the admission. Selection of individuals requiring rehabilitation was close to ideal (Fig. 2D), ranging from a 79% response rate at ED arrival up to 95% by the third day of admission.

Table 4 Model performance summary on the binary hospital outcomes across prediction timepoints

Full size table

Feature importances for healthcare contacts prediction

The TreeSHAP density plot highlighted the top predictors of received healthcare contacts and the direction of their relationship (Fig. 3). Most of the top predictors were linked to older age, ED measurements, nursing assessments, blood testing and morbidities.

**Fig. 3: TreeSHAP feature importance summaries for the forecasting of in-hospital healthcare contacts across prediction timepoints.**

Interestingly, after the first day of admission, the Waterlow score became the most prominent indicator of received healthcare contacts, surpassing age in importance. Patients at moderate to high risk of pressure ulcers were predictive of less than average contacts, while no documented assessment was predictive of receiving more contacts. A similar pattern was also present in the 4AT score and mobility score specific to bathing dependence. In Supplementary Fig. 12, we identified possible relationships with survivorship bias within these variables, as patients who died in hospital and were assessed to be at no or low documented risk for delirium and pressure ulcers still had significantly more nursing and rehabilitation contacts compared to survivors at lower risk. However, the distributions were similar in the higher risk ranges. Patients with no documented nursing risk assessments were likely to receive more contacts, possibly as a result of complications in care continuity (Supplementary Fig. 13). The frequency of rehabilitation contacts was notably much higher in patients with no documented Waterlow scores.

In contrast, recent diagnoses for progressive neurological disease (including dementia) were predictive of higher contact counts. Mode of arrival is a known predictor of disease severity and health outcomes. Here, it was apparent that private transport was linked to fewer healthcare inputs in the hospital, while arrival via ambulance indicated higher contacts. Attendance to RIE was likely to influence lower overall contacts, while attendance to SJH was likely to influence higher than average contacts, particularly when staying in hospital for more than 48 h. Among the lab tests, C-reactive protein (CRP), Urea and Creatine Kinase (CK) were the most prominent predictors. In this case, elevated CK or CRP (as a result of tissue damage or inflammation), or elevated urea levels (as often seen in dehydration) were associated with having more healthcare contacts.

Discussion

We have described the clinical utility of an ML approach for predicting in-hospital healthcare contacts for patients from the point of ED attendance, developed entirely using routinely entered data. Forecasting for future healthcare contacts was feasible but had limited precision at the point of ED arrival, reaching moderate performance within 72 h of admission. Model agreement across care contact levels was limited, but certainty was greatest for those receiving the most care contacts. There was a notable generalisation gap across hospital sites, as the error rate was lower across timepoints in SJH, compared to RIE and WGH. The secondary models had excellent discrimination ability for in-hospital death and requirement for older adult specialist and rehabilitation services. Within 24 h of admission, 9 out of 10 individuals within the top decile of risk for requiring rehabilitation were correctly captured. We developed models for predicting flexible endpoints of in-hospital activity, covering relevant ageing, frailty and multimorbidity predictors captured during and before hospital admission. This identified the potentially hidden predictive utility of routine nursing risk assessments and blood tests measured at the critical points of patient movement through hospital and emergency services. Our approach demonstrates the potential of healthcare contacts forecasting while acknowledging the current limitations in routine healthcare data for care pathway optimisation, with the future aim to move from queue-based resource allocation to person-centred, data-driven approaches and reduce delays to accessing specialist care.

Age, mode of ED arrival, nursing assessments (pressure damage and delirium risk) and laboratory tests (infection, tissue damage and dehydration) were among the top indicators of received healthcare contacts. This reflects a combination of chronic determinants of received healthcare and potential surrogates of acute illness. The Waterlow skin health score was the most prominent predictor for fewer contacts after 24 h of admission. A high risk of pressure ulceration often indicates underlying frailty and a limited functional baseline prior to the acute illness precipitating hospitalisation. While increasing nursing attention is clearly indicated, there may be limited scope for rehabilitation contacts in this population, as there may be no goal to regain or improve mobility. Delirium, a form of acute cognitive decline, was also associated with reduced contacts. This contrasted with dementia, where higher interactions were documented. This distinction between acute and chronic cognitive impairment is clinically meaningful, with patients suffering from delirium often unable to participate in rehabilitation until the acute driver for this condition settles. There may also be some competing risk here, given the high association between delirium and mortality, effectively reducing observed inpatient contacts in this group¹⁸.

Importantly, the lack of documentation for pressure ulcer, delirium and mobility assessments was also predictive of more received contacts later in the pathway. This may be a signal of gaps in care coordination, increasing the likelihood of triggering further clinical reviews. For delirium, it is well-known that underreporting and underdiagnosis in hospital practice can lead to high rates of complications and subsequent adverse events¹⁹. Missing mobility or skin integrity assessments can also delay necessary repositioning interventions²⁰, including pain control and wound management, directly contributing to unmet physical needs.

In our secondary analyses, we compared the quality of prediction in binary hospital endpoints, including in-hospital death, extended stay, non-home discharge, admission under older adult specialty services and requirement for rehabilitation. Discrimination ability was excellent for the requirements of older adult specialist services, even at the point of hospitalisation, suggesting a potential clinically important role in early triage of these patients to appropriate specialist care. Delivering such individualised care is often limited by system constraints²¹, despite numerous trials showing benefits of these person-centred approaches^22,23. Some of the results from this observational study point to the future utility of data-driven approaches to support proactive care in high-risk patients. For example, our models show excellent discrimination for death during the hospital admission, using data obtained within the first 24 h of attendance. This could support clinicians in decisions on placement for appropriate monitoring and continuity of care where the risk of deterioration may not be obviously apparent, or to allow sensitive and targeted conversations to prepare patients and families. The tradeoff between precision and recall suggested a reasonable classification rate across all hospital endpoints, but generally better sensitivity than specificity. This is common in screening applications to support clinical teams, with an acceptance of more false negative cases to maximise identification of those at the highest risk of the harmful outcome.

A few ML studies have employed adjacent approaches to capture healthcare utilisation. Our study uses logged care contacts during hospitalisation with significant completeness across the full population. Ocagli et al. used a two-step approach using preliminary clustering of hospitalised individuals based on the Barthel index²⁴, before classifying individuals into high- and low-consumption categories²⁵. Tello et al. provided an ML approach for predicting healthcare utilisation via in-hospital bed days, which aligns with our key objective of proposing this work to help design optimum pathways in older individuals²⁶. We argue that predicting received healthcare contacts can yield considerable benefits for both triage and service planning, reflective of requirements for staffing and multidisciplinary care.

Definitions of healthcare utilisation in hospitals vary significantly in the literature. For example, Blecker et al. use the term ‘EHR interactions’ to provide comparisons for delivered hospital care, specifically focusing on variation between workload on weekdays and weekends^27,28. The study highlights the inherent fluctuations in healthcare delivery that vary by the day of admission. This variation in available resourcing makes individualised targeting to patients at higher contact levels even more important. A number of other works used claims data linked to continuity of care measures to study all-cause or disease-specific readmission risk^29,30,31. Length of stay is another health outcome that is often used to determine the complexity of treatment. However, this often does not reflect received healthcare input, as discharge delays could occur while waiting for the availability of community and social care³².

Our study has several important strengths. We provide an extensive ML framework developed from rich primary and secondary care data across multiple hospitals. We used a reproducible and interpretable data pipeline to extract, select and explain features across a range of data sources from the EHR. Our derived nursing risk assessments provide a novel set of data features that do not form part of usual disease-based coded datasets, but which capture important markers of frailty early in a hospital admission. Our use of routine nursing and rehabilitation contacts provides a novel endpoint to explain healthcare utilisation. While there are definite relationships between this and administrative measures like length of stay, the prediction of individual elements of care could be pivotal for targeting resources and staff allocation. We have previously shown the added value of healthcare contacts in relation to early readmission risk after hospital discharge, with this association adding information beyond length of stay¹². This approach is scalable and transferable to other health services using EHR systems, as we use a simple contact measure that should be extractable from most EHRs as auditable data. Our models generalised better when independently validated in attendance during the COVID-19 lockdown period in the UK. This may be indicative of the predictive qualities of the routinely collected data and their robustness to rapid shifts in healthcare delivery. In a previous study, we evaluated care coordination during the pandemic within NHS Lothian services, identifying substantial shifts in the variation of care in the acute hospitalisations of older adults²⁹. Although patient care pathways did not appear to be streamlined in patients with COVID-19 infection, it is possible that the documentation of care became more consistent during this period due to reduced routine activities and a focus on respiratory interventions and infection control.

We must also acknowledge some important limitations. Our XGBoost regressor produces an important baseline for predicting future care contacts but falls short, particularly at the early stages of ED arrival, where forecasting accuracy is weaker. While this does improve at later prediction timepoints, performance would suggest that future deployment would require cautious use with clinical oversight, as is appropriate for any decision support tool. We must also accept that the error rate was poorer when independently validated against more acute and complex presentations, typically present in the higher-volume RIE and WGH sites. This is not particularly surprising as RIE and WGH act as regional referral centres for major trauma, where many patients arrive with acute or life-threatening conditions that require rapid escalation. As a district general hospital, SJH still offers services for mental health and skin injuries, but does not typically support major trauma cases, which lowers the comparative level of patient acuity at presentation. Thus, post-hoc calibration may be needed to correct for systematic differences in prediction between tertiary and district-level hospitals with regard to the serving of specialist care.

As with all models developed on retrospective data, it is important to acknowledge that predicted outcomes reflect current practice rather than optimised care. Application must not increase inequalities by bluntly refocusing activity based entirely on model predictions, as this may embed ineffective care. We also recognise that clinical application of such tools would require significant improvements in real-time data linkage of primary and secondary healthcare data to make these model outputs visible to hospital clinicians at the point of patient attendance. Initiatives such as NHS England’s Federated Data Platform³³ are likely to move closer to this ambition, but these will require sustained investment to support appropriate use of ML/AI models to improve patient care.

Further inclusion of health and frailty markers prior to hospitalisation, such as the electronic Frailty Index¹⁰ from primary care, or Hospital Frailty Risk Score³⁴ and more granular data on illness acuity, such as vital signs that comprise the National Early Warning Score (NEWS2)³⁵, could significantly improve generalisation. We have used blood results from ED as a surrogate of illness acuity, but these might be less reliable markers in older adults³⁶. Our significant findings on the Waterlow score might suggest that this is capturing severe illness acuity better than other variables. Further observational research integrating these data and engagement with stakeholders could aid in determining a suitable error threshold and useful timepoints for model outputs to inform resource allocation. We must acknowledge the potential implications of survivorship bias in a retrospective study cohort. Documentation in routine screenings, such as delirium and pressure ulcer risk, may cause overrepresentation of low-risk patients with future in-hospital death, because of unanticipated disease progression precluding comprehensive assessment. Subsequently, these patients may be less likely to have documented nursing and rehabilitation assessments, resulting in more flawed predictions. Thus, a multi-state or multi-label modelling approach accounting for the competing risk of death when estimating future contacts may prove to be more effective in strengthening clinical validity. We also recognise that our current definition of a healthcare contact does not provide a holistic representation of the care delivered across the hospital service. Patients are likely to receive inputs from many other members of a multidisciplinary team, such as doctors, pharmacists and dieticians, amongst many others, which could not be included in our models. Our predictive modelling reflects the downstream risk associated with conditions, lab testing and prescribing recorded in primary care, but does not explicitly capture GP interventions (e.g. preventative management, early referrals) that may influence acuity at hospital presentation. Future research could enrich models by incorporating data on GP consultations, care access patterns, and prescribing trajectories, which may provide additional insight into how primary care influences hospital demand. Importantly, our contacts measure captures EHR interaction in clinical notation, but this does not reflect every nursing or rehabilitation interaction with a patient.

This foundational work proposes the use of ML algorithms to improve hospital resource allocation using routinely entered healthcare data. We demonstrate that ML algorithms can use routinely entered healthcare data to predict fine-grained elements of care pertaining to resource allocation in urgent care. Data-driven methods are needed to improve the efficiency and effectiveness of care pathways in older people. We have demonstrated methods that could promote equitable, targeted and person-centred care, which is of great importance in complex systems where capacity and organisation are critical constraints.

Methods

Study design and participants

To determine our study population, we used retrospective research-linked EHR data in adults aged ≥50 years across three acute hospitals in the Lothian region of South-East Scotland:

Royal Infirmary of Edinburgh (RIE), Edinburgh: a tertiary hospital handling major trauma and a wide range of specialist acute care
Western General Hospital (WGH), Edinburgh: a tertiary hospital and the regional centre for cancer treatment and specialist neurology
Saint John’s Hospital (SJH), Livingston: a general hospital supporting district-level emergency services with a focus on elective care

Data linkage extracted eligible hospitalisations between 1st April 2016 and 1st February 2024. Hospital admission episodes were included if linked to prior ED attendance. We included only consecutive non-surgical admissions, excluding episodes with no transfer of care after ED arrival, immediate admissions under surgical services or very short admissions without any linked nursing or rehabilitation contact. We used the first (index) admission meeting the above criteria per patient to generate an analysis cohort of unique individuals.

Healthcare contacts data for outcome derivation

To derive the endpoints for received in-hospital contacts, we used available entries from nursing staff and specialised rehabilitation providers in physiotherapy, occupational therapy and speech and language therapy. Our primary dataset used hospital care contact data, collated from a shared EHR system (TrakCare, InterSystems, MA) used in the region. This includes each record of a delivered patient interaction where a clinical note was entered in a patient record. All three hospitals included in the study used fully digital records for recording clinical care throughout this period. Timestamped clinical notes were classified by the specialty of the healthcare professional. Contacts from doctors and other clinicians, such as pharmacists, were not available for this study. We linked these data to eligible episodes between index hospital admission and discharge to calculate the total number of health contacts delivered to each patient.

Outcomes

The primary outcome was the total received in-hospital healthcare contacts, defined via documented nursing and rehabilitation inputs during hospital stay. Secondary endpoints were in-hospital death, extended hospital stay (defined as ≥14 days), non-home discharge, admission under older adult specialist care and requirement for rehabilitation (any rehabilitation-coded contact). All patient outcomes were evaluated up to hospital discharge. To assess change in performance and risk trajectory, we provide predictions of these outcomes using data available up to 5 distinct timepoints: ED arrival, at hospital admission, and 24, 48 and 72 h post-admission. Predictions at ED arrival were defined as received healthcare contacts within the index hospital stay.

Data preprocessing for model development

We adopted a data-driven approach to extracting and selecting the relevant features for prediction. Figure 4 details the clinical variables extracted to train a supervised ML model across each prediction time point. Only data available up to the point of prediction were included in each model. Apart from the test value, temporal variables describing the days since test collection, along with the 365-day average and standard deviations, were used as additional feature inputs. Morbidity features were generated based on the total number of previously diagnosed long-term conditions among the list of 25 phenotype groups. Additional binary variables categorised physical-mental multimorbidity (at least 1 physical and 1 mental condition), simple multimorbidity (2 or more conditions of any type) and high-count multimorbidity (4 or more conditions of any type). Outpatient visits were categorised by total number of attendances by specialty code, including feature variants for non-attendances. All categorical data were one-hot-encoded by adding ‘dummy’ binary variables for each possible option. Prescription features were counted by ATC group. As the healthcare contacts variable contained significant skewness, we used a bounded Box-Cox transformation at the point of prediction⁴⁴, minimising the negative log-likelihood function of the contacts distribution to empirically identify the ideal scalar parameter \(\lambda\). As the optimal \(\lambda\) was estimated to be in the range (−0.3; 0.3), this was effectively equivalent to the natural logarithm of the contacts data. Further details on the classification of all extracted clinical variables, thresholds and temporal criteria can be found in Supplementary Table 1.

**Fig. 4: Flow diagram detailing the feature collection windows and individual prediction timepoints used to forecast healthcare contacts and related hospital outcomes.**

Data missingness was addressed by introducing a ‘meaningless’ value of ‘−1’, where appropriate for categorical and continuous data and ‘−9999’, where ‘−1’ was a possible valid measurement. This can effectively encourage non-linear models to learn how to ignore features with significant missingness⁴⁵. Data cleaning procedures removed sparse features with <1% completeness within the collection window. We reported the relevant missingness across the top-ranked predictors for in-hospital healthcare contacts, grouped by population-specific hospital endpoints (admission to older adult specialist services and rehabilitation), shown in Supplementary Table 2.

Feature selection procedures

Correlated features were removed using Pearson’s correlation coefficient with a threshold ≥0.9, determining significant linear relationships. We employed an additional cross-outcome feature selection procedure to filter meaningful predictor variables using the Boruta algorithm⁴⁶. Boruta operates by generating random permutations called ‘shadow’ features, followed by training a Random Forest⁴⁷ model to highlight the original features that scored higher than the most prominent ‘shadow’ features. Those features will be retained and marked as ‘important’. This process was repeated for 100 iterations, generating one importance set per outcome, where the union across all sets was used as the target feature set for training. Features marked as ‘tentative’ (the Boruta algorithm did not converge to a decision) were included in the importance set if the detected ranking score was ≤10 (among the top 10 tentative features for at least one outcome). The selected features used for model training are highlighted in Supplementary Table 1.

Machine learning procedures

We used an ensemble model with gradient-boosted trees (XGBoost⁴⁸) to develop a regression estimator for in-hospital healthcare contacts and a set of five classification estimators for each secondary outcome (data pipeline shown in Fig. 5). To train the models, we used a random stratified split (70% for training and 30% held-out for validation), balanced for age, sex, SIMD, total health contacts and each secondary endpoint, where applicable. In addition to a hold-out validation set, we adopted a leave-one-hospital-out validation strategy, evaluating model performance on independent data partitions across the three hospital sites. We further used a 10-fold stratified cross-validation during training to evaluate performance on random data partitions.

**Fig. 5: Overview of the data preprocessing, training and evaluation strategy.**

For the regression estimator, we reported performance using the MAE and cMAPE⁴⁹, estimated after masking 0 values within the health contacts variable. MAPE is undefined (division by zero) if any actual predicted value is zero, and is heavily biased towards infinity for small actual values. Therefore, we computed a ‘conditional’ MAPE by excluding data points where the actual value is zero. We used absolute error measures as they are less sensitive to outliers compared to squared error estimates. Percentage errors like cMAPE additionally provide an interpretable and scale-independent measure suitable for forecasting tasks^49,50.

To accurately measure model agreement, we equally discretised the prediction set and the original data within the validation set to define a five-level categorical variable describing the level of contact frequency: very low (VL), low (L), medium (M), high (H) and very high (VH) contacts. We then measured the Balanced Accuracy (BACC) Score across categories and the inter-rater reliability using the CKS. According to guidelines by Landis and Koch⁵¹, a CKS of 0 indicates agreement by random chance, with estimates closer to 1 indicating a more reliable systematic assessment. According to the guidelines, values up to 0.2 indicate almost no agreement, whereas values within [0.21–0.39], [0.40–0.59], [0.60–0.79], [0.80–0.90], [0.90–1] indicate minimal, weak, moderate, strong and ideal agreement, respectively. CKS was included as a chance-corrected agreement measure, which mitigates the inflation of agreement in imbalanced classes previously applied in clinical reliability studies⁵². The BACC is evaluated over five classes, and classification by random chance is set at 0.20, with estimates over this threshold indicating better than average sensitivity. We additionally used BACC, which explicitly accounts for unequal class sizes by averaging sensitivity and specificity, providing a fairer performance estimate in rarer contact groups⁵³.

For the classification estimators, we reported the ROC-AUC and the PR-AUC, measuring overall discrimination ability and error rate for positive events, respectively. For PR-AUC, a higher score than the baseline outcome prevalence is treated as ‘better than random choice’. We additionally measured model sensitivity and specificity selected using the maximum F1-Score achieved for predicting an adverse event. We used decile-based discretisation on the probability scores to generate equally-sized risk groups, testing response rates within patients at the top 10% of predicted risk in the validation set. The DeLong method was used to generate 95% confidence intervals (95% CI) for all performance estimates, using a non-parametric algorithm to compute covariance between false positive rates, optimised for large sample sizes⁵⁴. Details on the hyperparameter setup and training procedures are provided in the Supplementary Methods section. Hyperparameter tuning was performed using a grid search strategy focused on refining the maximum tree depth, learning rate and positive class weight of the XGBoost algorithms.

We reported the top 20 predictors and displayed their feature importances for estimating healthcare contacts using the TreeSHAP framework (Shapley Additive eXplanations)⁵⁵. The aggregated Shapley values were estimated using the internal validation set of each prediction time point, summarised in global-level density plots. These values provided risk contribution scores, indicating the level of impact on contacts prediction.

Models were developed using Python version 3.10.12, using the ‘xgboost’ package (version 2.0.3) for training, ‘scikit-learn’ (version 1.3.2) for validation procedures and ‘shap’ (version 0.46.0) for feature importance analysis.

Modelling setup and training strategy

All XGBoost models were trained over a maximum of 20,000 rounds with up to 100 early stopping rounds (patience threshold for the target loss function indicating when the model should interrupt its training). Hyperparameter tuning was conducted using a grid search strategy over an exponentially spaced grid for the learning rate and a linear grid for the maximum tree depth and any additional regularisation parameters. All classification estimators were trained with an additional prevalence weighting parameter (scale_pos_weight) based on the absolute ratio of patients with the outcome. The objective for optimisation was minimising the logistic loss, due to its convex nature and subsequent ability to produce well-calibrated probabilities in the presence of imbalanced data. The regression estimator used an objective function for minimising the pseudo-huber error, which balanced the properties of MAE and RMSE. Compared to the standard Huber loss, this modified loss provides a smoother approximation in gradient-based functions, making it less sensitive to the outliers present in the health contacts variable⁵⁶.

Statistical and risk analysis

Baseline patient characteristics were reported by the contact frequency group. Statistical testing was performed using the Kruskal–Wallis H test in non-normally distributed data and the one-way analysis of variance (ANOVA) test in normally distributed continuous data. Statistical testing in categorical variables was reported using Pearson’s chi-squared test. When reporting the spread of healthcare contacts by each secondary outcome, we used a two-sided Mann–Whitney–Wilcoxon test with Bonferroni correction. Statistical significance was assumed at p < 0.001. We provided additional survival analysis using an Aalen-Johansen estimator examining competing risks between in-hospital death and non-home discharge stratified by contact frequency level⁵⁷.

Study approval and consent

All data from EHR and national registries were linked and de-identified by the DataLoch service (Edinburgh, United Kingdom) and analysed within a Secure Data Environment (reference: DL_2022_001). DataLoch enables access to de-identified extracts of healthcare data from the South-East Scotland region to approved applicants: https://dataloch.org/. The study was reviewed and received approval under delegated authority from a regional National Health Service Research Ethics Committee (REC 22/NS/0093) and Caldicott Guardian. Individual patient consent was not required.

Data availability

The data that support the findings of this study are not openly available due to reasons of sensitivity, but summary data can be provided by the corresponding author upon reasonable request. Specific elements of the analysis may be made available upon reasonable request to the corresponding author, subject to standard application and approval processes.

Code availability

The code used to conduct the exploratory data analysis, model development and validation is publicly available on GitHub, accessible via: https://github.com/kgeorgiev42/RCHD-Geriatric-Risk-Analytics/tree/main/notebooks/geriatric_ml.

References

Hullick, C. J., McNamara, R. & Ellis, B. Silver Book II: an international framework for urgent care of older people in the first 72 hours from illness or injury. Age Ageing. 50, 1081–1083 (2021).
Article PubMed Google Scholar
Hwang, U. et al. Transforming emergency care for older adults. Health Aff. 32, 2116–2121 (2013).
Article Google Scholar
LaMantia, M. A. et al. Predicting hospital admission and returns to the emergency department for elderly patients. Acad. Emerg. Med. 17, 252–259 (2010).
Article PubMed PubMed Central Google Scholar
de Gelder, J. et al. Predicting adverse health outcomes in older emergency department patients: the APOP study. Neth. J. Med. 74, 342–352 (2016).
PubMed Google Scholar
Roussel, M. et al. Overnight stay in the emergency department and mortality in older patients. JAMA Intern. Med. 183, 1378–1385 (2023).
Article PubMed PubMed Central Google Scholar
Mostafa, R., El-Atawi, K., Mostafa, R. & El-Atawi, K. Strategies to measure and improve emergency department performance: a review. Cureus 16, https://doi.org/10.7759/cureus.52879 (2024).
Loyd, C. et al. Prevalence of hospital-associated disability in older adults: a meta-analysis. J. Am. Med. Dir. Assoc. 21, 455–461.e5 (2020).
Article PubMed Google Scholar
Reichardt, L. A. et al. Unravelling the potential mechanisms behind hospitalization-associated disability in older patients; the Hospital-Associated Disability and Impact on Daily Life (Hospital-ADL) cohort study protocol. BMC Geriatr. 16, 59 (2016).
Article PubMed PubMed Central Google Scholar
Geyskens, L. et al. Patient-related risk factors for in-hospital functional decline in older adults: a systematic review and meta-analysis. Age Ageing 51, afac007 (2022).
Article PubMed Google Scholar
Clegg, A. et al. Development and validation of an electronic frailty index using routine primary care electronic health record data. Age Ageing 45, 353–360 (2016).
Article PubMed PubMed Central Google Scholar
Georgiev, K. et al. Comparing care pathways between COVID-19 pandemic waves using electronic health records: a process mining case study. J. Health. Inf. Res. 9, 41–66 (2025).
Article Google Scholar
Georgiev, K. et al. Understanding hospital activity and outcomes for people with multimorbidity using electronic health records. Sci. Rep. 15, 8522 (2025).
Article CAS PubMed PubMed Central Google Scholar
Georgiev, K. et al. Understanding hospital rehabilitation using electronic health records in patients with and without COVID-19. BMC Health Serv. Res. 24, 1245 (2024).
Article PubMed PubMed Central Google Scholar
King, Z. et al. Machine learning for real-time aggregated prediction of hospital admission for emergency patients. npj Digit. Med. 5, 1–12 (2022).
Article Google Scholar
Chrusciel, J. et al. The prediction of hospital length of stay using unstructured data. BMC Med. Inform. Decis. Mak. 21, 351 (2021).
Article PubMed PubMed Central Google Scholar
Wu, K. H. et al. Predicting in-hospital mortality in adult non-traumatic emergency department patients: a retrospective comparison of the Modified Early Warning Score (MEWS) and machine learning approach. PeerJ 9, e11988 (2021).
Article PubMed PubMed Central Google Scholar
Ahmed, A. et al. Developing a decision model to early predict ICU admission for COVID-19 patients: a machine learning approach. Intell. Based Med. 9, 100136 (2024).
Article Google Scholar
Anand, A., Cheng, M., Ibitoye, T., Maclullich, A. M. J. & Vardy, E. R. L. C. Positive scores on the 4AT delirium assessment tool at hospital admission are linked to mortality, length of stay and home time: two-centre study of 82,770 emergency admissions. Age Ageing 51, afac051 (2022).
Article PubMed PubMed Central Google Scholar
Titlestad, I. et al. Delirium is frequently underdiagnosed among older hospitalised patients despite available information in hospital medical records. Age Ageing 53, afae006 (2024).
Article PubMed PubMed Central Google Scholar
Jaul, E. Assessement and management of pressure ulcers in the elderly. Drugs Aging 27, 311–325 (2010).
Article PubMed Google Scholar
Coulter, A. & Oldham, J. Person-centred care: What is it and how do we get there?. Future Hosp. J. 3, 114–116 (2016).
Article PubMed PubMed Central Google Scholar
Berntsen, G. K. R. et al. Person-centred, integrated and pro-active care for multi-morbid elderly with advanced care needs: a propensity score-matched controlled trial. BMC Health Serv. Res. 19, 682 (2019).
Article CAS PubMed PubMed Central Google Scholar
Thevelin, S. et al. Experience of hospital-initiated medication changes in older people with multimorbidity: a multicentre mixed-methods study embedded in the OPtimising thERapy to prevent Avoidable hospital admissions in Multimorbid older people (OPERAM) trial. BMJ Qual. Saf. 31, 888–898 (2022).
Article PubMed PubMed Central Google Scholar
Mahoney, F. I. & Barthel, D. W. Functional evaluation: the Barthel Index: a simple index of independence useful in scoring improvement in the rehabilitation of the chronically ill. Md. State Med. J. 14, 61–65 (1965).
CAS PubMed Google Scholar
Ocagli, H. et al. Profiling patients by intensity of nursing care: an operative approach using machine learning. J. Pers. Med. 10, 279 (2020).
Article PubMed PubMed Central Google Scholar
Tello, M. et al. Machine learning based forecast for the prediction of inpatient bed demand. BMC Med Inf. Decis. Mak. 22, 55 (2022).
Article Google Scholar
Blecker, S. et al. Electronic health record use, intensity of hospital care, and patient outcomes. Am. J. Med. 127, 216–221 (2014).
Article PubMed Google Scholar
Blecker, S. et al. Impact of an intervention to improve weekend hospital care at an academic medical center: an observational study. J. Gen. Intern. Med. 30, 1657–1664 (2015).
Article PubMed PubMed Central Google Scholar
Swanson, J. O., Vogt, V., Sundmacher, L., Hagen, T. P. & Moger, T. A. Continuity of care and its effect on readmissions for COPD patients: a comparative study of Norway and Germany. Health Policy 122, 737–745 (2018).
Article PubMed Google Scholar
Amico, P., Pope, G. C., Meadow, A. & West, P. Episode-based payment for the medicare outpatient therapy benefit. Arch. Phys. Med. Rehabil. 97, 1323–1328 (2016).
Article PubMed Google Scholar
Halfon, P., Eggli, Y., Morel, Y. & Taffé, P. The effect of patient, provider and financing regulations on the intensity of ambulatory physical therapy episodes: a multilevel analysis based on routinely available data. BMC Health Serv. Res. 15, 52 (2015).
Article PubMed PubMed Central Google Scholar
Bayer-Oglesby, L., Zumbrunn, A. & Bachmann, N. Social inequalities, length of hospital stay for chronic conditions and the mediating role of comorbidity and discharge destination: a multilevel analysis of hospital administrative data linked to the population census in Switzerland. PLoS ONE 17, e0272265 (2022).
Article CAS PubMed PubMed Central Google Scholar
NHS England. NHS Federated Data Platform. Accessed 25 September 2025, https://www.england.nhs.uk/digitaltechnology/nhs-federated-data-platform/
Gilbert, T. et al. Development and validation of a Hospital Frailty Risk Score focusing on older people in acute care settings using electronic hospital records: an observational study. Lancet 391, 1775–1782 (2018).
Article PubMed PubMed Central Google Scholar
Royal College of Physicians. National Early Warning Score (NEWS) 2: Standardising the Assessment of Acute-Illness Severity in the NHS. Updated Report of a Working Party https://www.rcp.ac.uk/media/a4ibkkbf/news2-final-report_0_0.pdf (RCP, 2017).
Flamant, L. et al. Association between admission biomarkers and clinical outcome in older adults diagnosed with an infection in the emergency department. Acta Clin. Belgica 78, 285–290 (2023).
Article CAS Google Scholar
Public Health Scotland. SMR01 Data Dictionary. https://publichealthscotland.scot/resources-and-tools/health-intelligence-and-data-management/national-data-catalogue/data-dictionary/search-the-data-dictionary/smr01-generalacute-inpatient-and-day-case (2021).
SIMD (Scottish Index of Multiple Deprivation). Scottish Index of Multiple Deprivation. Accessed 20 February 2023, http://simd.scot/.
Denaxas, S. et al. UK phenomics platform for developing and validating electronic health record phenotypes: CALIBER. J. Am. Med Inf. Assoc. 26, 1545–1559 (2019).
Article Google Scholar
NHS Lothian report. Lothian Accreditation & Care Assurance Standards Framework. https://staff.nhslothian.scot/falls/wp-content/uploads/sites/27/2024/02/LACAS-Framework-2023.pdf (2023).
Tieges, Z. et al. Diagnostic accuracy of the 4AT for delirium detection in older adults: systematic review and meta-analysis. Age Ageing 50, 733–743 (2020).
Article PubMed Central Google Scholar
Waterlow, J. From costly treatment to cost-effective prevention: using Waterlow. Br. J. Community Nurs. 10, S25–S30 (2005).
Article PubMed Google Scholar
Stratton, R. J. et al. Malnutrition in hospital outpatients and inpatients: prevalence, concurrent validity and ease of use of the ‘malnutrition universal screening tool’(‘MUST’) for adults. Br. J. Nutr. 92, 799–808 (2004).
Article CAS PubMed Google Scholar
Box, G. E. & Cox, D. R. An analysis of transformations. J. R. Stat. Soc. Ser. B Stat. Methodol. 26, 211–243 (1964).
Article Google Scholar
Lipton, Z. C., Kale, D. C. & Wetzel, R. Modeling missing data in clinical time series with RNNs. Mach. Learn. Healthc. 56, 253–270 (2016).
Google Scholar
Kursa, M. B. & Rudnicki, W. R. Feature selection with the Boruta Package. J. Stat. Softw. 36, 1–13 (2010).
Article Google Scholar
Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).
Article Google Scholar
Chen, T. & Guestrin, C. XGBoost: a scalable tree boosting system. In Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 785–794 (ACM, 2016).
De Myttenaere, A., Golden, B., Le Grand, B. & Rossi, F. Mean absolute percentage error for regression models. Neurocomputing 192, 38–48 (2016).
Article Google Scholar
Makridakis, S. Accuracy measures: theoretical and practical concerns. Int. J. Forecast. 9, 527–529 (1993).
Article Google Scholar
Landis, J. R. & Koch, G. G. The measurement of observer agreement for categorical data. Biometrics 33, 159–174 (1977).
Article CAS PubMed Google Scholar
Sim, J. & Wright, C. C. The Kappa statistic in reliability studies: use, interpretation, and sample size requirements. Phys. Ther. 85, 257–268 (2005).
Article PubMed Google Scholar
Diallo, R., Edalo, C. & Awe, O. O. Machine learning evaluation of imbalanced health data: a comparative analysis of balanced accuracy, MCC, and F1 score. In Practical Statistical Learning and Data Science Methods: Case Studies from LISA 2020 Global Network, USA (eds Awe, O. O. & Vance E. A.) 283–312. https://doi.org/10.1007/978-3-031-72215-8_12 (Springer Nature Switzerland, 2025).
Sun, X. & Xu, W. Fast implementation of DeLong’s algorithm for comparing the areas under correlated receiver operating characteristic curves. IEEE Signal Process. Lett. 21, 1389–1393 (2014).
Article Google Scholar
Lundberg, S. & Lee, S. I. A unified approach to interpreting model predictions. NeurIPS. 30 (2017).
Tong, H. et al. Nonasymptotic analysis of robust regression with modified Huber’s loss. J. Complex. 76, 101744 (2023).
Article Google Scholar
Aalen, O. O. & Johansen, S. An empirical transition matrix for non-homogeneous Markov chains based on censored observations. Scand. J. Stat. 5, 141–150 (1978).
Google Scholar

Download references

Acknowledgements

K.G. is supported by a PhD Fellowship award from the Sir Jules Thorn Charitable Trust (21/01PhD) as part of the University of Edinburgh’s Precision Medicine PhD programme. D.D. is supported by the British Heart Foundation (PG/24/12136). NL.M. is supported by a Chair Award, Programme Grant, and Research Excellence Award (CH/F/21/90010, RG/20/10/34966, RE/18/5/34216, RE/24/130012) from the British Heart Foundation. This work was supported by DataLoch (dataloch.org), which is core-funded by the Data-Driven Innovation programme within the Edinburgh and South East Scotland City Region Deal (ddi.ac.uk), and the Chief Scientist Office, Scottish Government (cso.scot.nhs.uk).

Author information

Authors and Affiliations

Institute of Neuroscience and Cardiovascular Research, Queen’s Medical Research Institute, University of Edinburgh, Edinburgh, UK
Konstantin Georgiev, Dimitrios Doudesis, Nicholas L. Mills & Atul Anand
Department of Public Health and Primary Care, The Healthcare Improvement Studies Institute, University of Cambridge, Cambridge, UK
Joanne McPeake
Ageing and Health Research Group and Advanced Care Research Centre, Usher Institute, Edinburgh BioQuarter, University of Edinburgh, Edinburgh, UK
Susan D. Shenkin
Artificial Intelligence and its Applications Institute, School of Informatics, University of Edinburgh, Edinburgh, UK
Jacques D. Fleuriot

Authors

Konstantin Georgiev
View author publications
Search author on:PubMed Google Scholar
Dimitrios Doudesis
View author publications
Search author on:PubMed Google Scholar
Joanne McPeake
View author publications
Search author on:PubMed Google Scholar
Nicholas L. Mills
View author publications
Search author on:PubMed Google Scholar
Susan D. Shenkin
View author publications
Search author on:PubMed Google Scholar
Jacques D. Fleuriot
View author publications
Search author on:PubMed Google Scholar
Atul Anand
View author publications
Search author on:PubMed Google Scholar

Contributions

K.G. and A.A. conceived the study design. K.G. extracted the target population, developed the features and conceived the primary analysis and evaluation procedures. K.G. conducted the data analysis and model development. A.A. provided expert input for refining the input features and target endpoints. K.G. validated the models and conducted stratified and sensitivity analyses. D.D. provided feedback on the study design and conducted data analysis. K.G. and A.A. had access to and verified the raw data, analysis tables and model outputs. K.G. drafted the manuscript. All authors revised the manuscript critically for important intellectual content. NL.M., J.F., SD.S. and J.M. provided support and expert feedback on the manuscript. All authors provided their final approval of the version to be published. All authors are accountable for the work.

Corresponding author

Correspondence to Konstantin Georgiev.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Georgiev, K., Doudesis, D., McPeake, J. et al. Machine learning-based predictions of healthcare contacts following emergency hospitalisation using electronic health records. npj Digit. Med. 8, 764 (2025). https://doi.org/10.1038/s41746-025-02138-4

Download citation

Received: 07 August 2025
Accepted: 31 October 2025
Published: 17 December 2025
Version of record: 17 December 2025
DOI: https://doi.org/10.1038/s41746-025-02138-4

Subjects

Abstract

Similar content being viewed by others

Predicting individual patient and hospital-level discharge using machine learning

Using explainable machine learning to identify patients at risk of reattendance at discharge from emergency departments

A comparative study of explainable ensemble learning and logistic regression for predicting in-hospital mortality in the emergency department

Introduction

Results

Cohort summary

Health contact characteristics

Model training summary

Model performance for in-hospital healthcare contact prediction

Model performance for secondary hospital outcomes

Feature importances for healthcare contacts prediction

Discussion

Methods

Study design and participants

Healthcare contacts data for outcome derivation

Other linked data sources

Outcomes

Data preprocessing for model development

Feature selection procedures

Machine learning procedures

Modelling setup and training strategy

Statistical and risk analysis

Study approval and consent

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplementary Information

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links