Introduction

Excess opioid prescribing at surgical discharge is a significant contributor to the global opioid crisis1. Inappropriate opioid prescribing is associated with higher rates of persistent opioid use and opioid-related deaths2. Whilst opioids are prescribed with the aim of reducing postoperative pain and discomfort to optimise recovery, accumulating evidence shows that the harms frequently outweigh the benefits3,4,5. Even among those who do not consume their prescribed opioids, diversion of unused opioids into the community poses serious risks of exacerbating the opioid epidemic6.

Opioids are frequently overprescribed at hospital discharge following surgery, with quantities larger than typically consumed by patients5,7,8. The ‘Opioid PrEscRiptions and usage After Surgery’ (OPERAS) study found that patients are prescribed more than double the quantity of opioids they would consume in the first week after surgery. It is also acknowledged that analgesic prescribing is complex, with multifactorial contributors driving clinician and patient decision making. Therefore, personalised, data-driven strategies are needed to guide opioid prescribing in the post-discharge setting, tailored to patients’ needs. However, accurately identifying these needs requires reliable measures of patient-reported opioid consumption after discharge8.

The OPERAS study enroled a large, prospective cohort of surgical patients across a range of specialties with patient-reported, post-discharge opioid consumption data9. This offers an opportunity to reduce the quantity of unused opioid prescriptions based on robust, patient-reported opioid consumption data. We therefore aimed to develop and validate a model to predict opioid consumption within the first week after surgical discharge and assess the potential clinical impact of applying such a model in this context on patient-reported outcomes.

Results

Between 4 April 2022 and 4 September 2022, data from 4267 patients across 144 hospitals in 25 countries were collected and analysed (2268 women, 53.2%; median age 50 years; median length of stay 2 [IQR 1–3] days). Patients were followed up at a median of 7 days (IQR 7–8). Overall, 1308 (31%) were discharged with an opioid prescription, but only 1014 (23.8%) consumed an opioid within 7 days post-discharge; 297 (22.7% of those prescribed an opioid) did not consume any opioid. Only 3 (0.1%) not prescribed an opioid consumed an opioid during this time.

Predictors of opioid consumption

Consumption of opioids after discharge occurred more frequently after discharge in patients undergoing orthopaedic surgery or emergency surgery; when patients consumed higher quantities of opioids 24 h prior to discharge; when patients used opioids in the 6 months prior to admission; or were from a high-income country (Table 1). Age, gender, smoking status, urgency of surgery, the surgery type, discharge day opioid use, and 6-month opioid use prior to admission, were all considered and were available in both datasets, known to plausibly influence opioid consumption, and were included in our a priori protocol. Feature importance was then compared, including all candidate variables (Supplementary Figs. 1 and 2 and Supplementary Table 3) to guide feature selection. The variables alcohol use and BMI category had >10% missing data and were excluded10. Based on the mean dropout loss of the RMSE, ASA category and indication for surgery were excluded. Our final predictors used for model training were therefore age, gender, smoking status, urgency of surgery, procedure mapped to BUPA severity categories, discharge day opioid use, and prior 6-month opioid use (Supplementary Table S3).

Table 1 Demographics and candidate variables across datasets, by consumption of opioids post-discharge from surgery

In a multivariable model of the whole OPERAS cohort, in reference to appendicectomy, we found ACL repair surgery (odds ratios (OR) 7.61, 95% confidence interval (CI) 3.00–19.3; p < 0.001), cholecystectomy (OR 1.54, 95% CI 1.04–2.28; p = 0.031), hip arthroplasty (OR 4.20, 95% CI 2.19–8.09; p < 0.001), inguinal hernia repair (OR 2.78, 95% CI 1.67–4.64; p < 0.001), knee arthroplasty (OR 13.74, 95% CI 7.26–26.0; p < 0.001), oophorectomy and salpingectomy (OR 3.14, 95% CI 1.11–8.87; p = 0.031), rotator cuff repair (OR 21.15, 95% CI 6.15–72.67; p < 0.001), and shoulder arthroplasty (OR 22.83, 95% CI 4.29–121.57; p < 0.001) were associated with higher risks of opioid consumption. Malignant indication, day of discharge opioid use quantity, and prior 6-month opioid use were also significant predictors (Supplementary Fig. 3).

Model development and validation

Use of a model with all candidate predictors and a refined model with the final set of predictors were compared in ROC and PR curves (with AUROC and AUPRC metrics), as reported in Supplementary Fig. 4 and Table 2. Results of internal validation in the OPERAS testing cohort and external validation in the Newcastle cohort are shown in Fig. 1 and Table 2. The internal validation AUROC was 0.84 (95% CI 0.81–0.88), and the external validation AUROC was 0.77 (95% CI 0.74–0.80), showing excellent and good discrimination, respectively. Brier scores were 0.13 (95% CI 0.12–0.14) and 0.19 (95% CI 0.17–0.20), respectively, indicating good calibration, with a tendency toward under-predicting the need for an opioid when visually assessing calibration plots. AUPRC was 0.64 (95% CI 0.57–0.72) and 0.62 (95% CI 0.56–0.68). Sensitivity, specificity, positive and negative predictive values across different risk thresholds are summarised in Table 3. Sensitivity cross-validation analysis, excluding individual procedures, is reported in Supplementary Results.

Fig. 1: Model predictive performance plots.
Fig. 1: Model predictive performance plots.
Full size image

A, B Receiver operator curve (ROC), C, D calibration plot, E, F precision-recall (PR) curve, and G, H decision curve analysis of our model during internal (blue) and external (orange) validation. The shaded area represents 95% confidence intervals.

Table 2 Performance metrics of models in the derivation cohort, calculated using 10-fold cross-validation
Table 3 Proportion of patients that consumed an opioid stratified by model predictions across varying risk classification thresholds, tabulated with accompanying sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV)

Clinical implications

Taking the ground truth as a patient consuming opioids, in the OPERAS testing set, the sensitivity (99.2%) of clinician prescribing is expectedly high as very few patients consume an opioid when not prescribed it, but their specificity of 89.2% is associated with a false positive rate of 10.8%. In the external Newcastle cohort, the sensitivity of clinician prescribing was 100% and the specificity was 91.5% with a false positive rate of 8.5%. Using the model could improve specificity relative to clinicians by targeting opioid prescriptions to individuals with >50% risk of consuming them (see Table 3: specificity of 93.3 and false positive rate of 6.7%). Setting the classification threshold at 50%, 15.6% were predicted to require an opioid, compared to 31.9% that were prescribed opioids. In the external Newcastle cohort, 15.9% were predicted to require an opioid versus the 40.1% of the cohort that were prescribed opioids (see example cases in Table 4).

Table 4 Table of five example patients, their contributory predictor variables with their model-derived predicted risk of consuming an opioid in the first week after surgical discharge, alongside the prediction classification at a 50% threshold, the real-world clinician decision and real-world patient consumption outcome

To assess harm, we looked at the quantity of opioids consumed in the proportion of patients predicted not to require an opioid by the algorithm but who had consumed opioids after discharge (i.e. false negatives). Among those predicted to require an opioid in the OPERAS testing dataset, the median quantity of opioids consumed was 37.5 OME (IQR 0.0–90.0; mean 83.4 ± 172.0) compared to a median of 0.0 OME (IQR 0.0–0.0; mean 15.4 ± 52.3; p < 0.0001) among those predicted not to require an opioid (122 [16.9%] of whom consumed some opioid). Similarly, among those predicted to require an opioid in the Newcastle external cohort, the median quantity of opioids consumed was 37.5 OME (IQR 0.0–71.2; mean 48.8 ± 61.3) compared to a median of 0.0 OME (IQR 0.0–12.5; mean 15.4 ± 36.4; p < 0.0001) among those predicted not to require an opioid (196 [28.2%] of whom consumed some opioid; Fig. 2). Further, applying the TabPFN regression model to predict OME quantities consumed in the week after surgery; the median predicted quantity was 11.0 (IQR 1–25; range 0.7–56; mean 13.5 ± 13.3), which can be taken to mean that generally prescribing more than 10 OME per day in the first week after surgery is likely in excess to requirements (Supplementary Fig. 5).

Fig. 2: Box plots depicting total milligram morphine equivalents consumed stratified by patients predicted to be <50% and >50% risk of consuming an opioid in the first week after surgical discharge.
Fig. 2: Box plots depicting total milligram morphine equivalents consumed stratified by patients predicted to be <50% and >50% risk of consuming an opioid in the first week after surgical discharge.
Full size image

Left: internal validation (blue); right: external validation (orange). The box represents the interquartile range (IQR), and the whiskers extend to 1.5× the IQR, with outlier dots plotted beyond the whisker extents.

We then calculated the observed effect of prescribing opioids on quality of life, proportion of time spent in severe pain, and mean satisfaction with analgesia, and then formulated a counterfactual, i.e. quality of life, pain severity, and satisfaction when the model prediction was discordant with the clinician’s prescribing decision. Taking the difference between the observed and counterfactual effects provides an estimate of the impact of these errors (i.e. discordant prescribing decisions) on these outcomes. There were no differences in patient satisfaction (+0.6, 95% CI −0.19 to 1.5; p = 0.08), proportion of time spent in severe pain (−4.3%, 95% CI −17.7 to 8.6; p = 0.7) or quality of life (−0.03, 95% CI −0.12 to 0.06; p = 0.8) with the use of our model compared to the real-world opioid prescribing in the OPERAS testing set.

Global impact

Applying our predictions from this model to surgical case volumes globally, this personalised prescribing approach would result in 12,129,908 fewer prescriptions (22.5% reduction in opioid prescribing rate), and a 4.5% decrease in overall quantities of oral morphine equivalents (OMEs) prescribed translating to an average 667,596,123 (SD 4,405,314,710) fewer OMEs globally (Fig. 3).

Fig. 3: Global opioid reduction modelling.
Fig. 3: Global opioid reduction modelling.
Full size image

Modelled global reductions in average oral morphine equivalents (OMEs) prescribed based on country-level case volume and predicted reductions in OMEs. Darker shades of blue indicate greater reductions in OME prescribed, while countries with missing data are shown in white.

Discussion

We have developed and validated an algorithm to predict the need for an opioid prescription at surgical discharge. Derived from a multispecialty, global dataset, this model generates generalisable and specific predictions across a diverse range of surgical patients. Our model offers guidance to support a personalised approach to analgesic prescribing that may facilitate reductions in overall opioid prescribing after surgery, particularly when they are unlikely to be used (reducing false positive prescriptions by >20%). Further counterfactual simulations suggest that this overall reduction in predicted opioid prescriptions could be achieved without negatively impacting patient-reported quality of life, satisfaction with analgesia, and pain severity. Additionally, when opioids were predicted not to be required where patients had actually consumed some in the real world, quantities consumed averaged less than 20 OME (approximately two 10 mg oral morphine tablets), such that opioid omission would have been safe.

Guidelines advocate for a personalised approach to opioid prescribing at surgical discharge11,12, yet little data is available to guide the decision to prescribe an opioid13. This work expands on recent work that has highlighted the large discrepancy between what is prescribed after surgery and what is actually consumed by patients8. Coupled with rising awareness of the harms of opioid overprescription, and possibly overstated analgesic benefits in the post-discharge setting5,14, there is an urgent need for a reliable method to curtail opioid overprescription. A web application that allows input of variables for the final predictors and reports the model’s predicted risk of patients consuming an opioid in the first week after surgical discharge can be accessed here: https://operas-model.shinyapps.io/OPERAS/.

Several rule-based, procedure-specific, or protocolised approaches to opioid prescribing exist. Some examples include specialty-specific standardisations15,16,17, while others stratify prescriptions based on inpatient opioid consumption18,19,20. Our data are congruent in finding inpatient opioid consumption to be an important predictor of analgesic needs based on patient-reported post-discharge opioid consumption. This work also advances on recent efforts using deep learning to predict opioid refills, offering improved discrimination in a broader surgical and geographic cohort21.

Importantly, broad rule-based approaches to opioid prescribing can inadvertently lead to protocolised overprescription22. This highlights the need for individualised prescribing, hence, the value of a predictive model in place of a one-size-fits-all approach. Additionally, administrative datasets and dispensing data may not accurately reflect patients’ opioid consumption. Hence, our patient-reported consumption data, robustly obtained through standardised telephone interviews, offers a more accurate parallel to patients’ need for an opioid after surgery8,9. Hence, this work is a substantial advance to previous single-centre and specialty-specific studies by offering personalised recommendations that incorporate patient-, procedure-, and pathology-specific predictors.

While this study does not explore long-term outcomes of implementing this algorithm in practice, our modelling is in agreement with previous literature, which shows that reducing opioid prescribing in a landscape where overprescription is so common can occur without negatively impacting patient recovery23. Our algorithm predicted a reduction in opioid prescribing by up to 73% without a significant predicted detriment in pain-satisfaction, quality of life, or pain-severity. Others have also shown that implementing protocols to reduce postsurgical opioid prescribing could significantly reduce chronic opioid use without negatively impacting patient recovery24,25. In further support, a recent meta-analysis of randomised evidence suggests the omission of opioids, particularly in the setting of minor surgery, was frequently associated with equivalent analgesia and fewer opioid-related adverse events14.

This work has significant implications for postoperative opioid stewardship globally. Our modelling estimates substantial reductions in the quantities of opioids that would be prescribed based on extrapolating the model-predicted versus observed differences in opioid prescribing. This is likely to have particular value in high-income countries where excess opioid prescribing is more common26, and this study supports that these personalised reductions can be achieved without detriment to patients’ quality of pain relief5,14.

A key limitation of the present study is that patients’ consumption patterns can be influenced by what they are prescribed by doctors8. Hence, a residual risk of overestimating opioid requirements persists in the absence of randomised evidence to offer counterfactual data. However, as it is well known that clinicians prescribe in excess of what patients consume8, and therefore, offering a predictive tool that is likely to result in overall reductions in opioid prescribing whilst better matching patients’ consumption needs within the first week after discharge remains attractive. We also did not include some variables that were found to be important prognostically (such as alcohol use and BMI), this was due to the high missing data rates that may introduce bias10, and also indicate challenges in using these parameters in routine use. A further limitation is that whilst the OPERAS study included a range of surgical specialties and procedures, this list was not comprehensive, and extrapolation of these findings to different surgeries and diverse cohorts should be cautioned. Additionally, our external validation cohort had a more restrictive set of included procedures that lacked orthopaedic operations. Further external validation of this model would therefore be of value.

We envision this tool being used on the day of discharge by the discharging clinician in collaboration with patients and the wider clinical team. Algorithmic clinical decision making should not occur in isolation. Our algorithm, which considers individual patient factors, offers advantages compared to ‘one-size-fits-all’ protocols, to enable both judicious and personalised opioid prescribing. Such an algorithm, however, does not obviate the need for clinical judgement, consideration of patient circumstances and preferences, safe prescription quantities, and appropriate disposal advice. Prospective evaluation is now required ahead of widespread clinical implementation.

In conclusion, we have developed and validated a novel foundation model for predicting the consumption of opioid analgesia after surgical discharge. This tool is derived from a global cohort and is underpinned by patient-reported opioid consumption data, offering a generalisable tool for surgeons globally. Future studies should prospectively investigate the clinical impacts of utilising our model in practice.

Methods

Study design

This was an international, prospective, multicentre, observational cohort study. Analyses were based on a prespecified, published protocol, and the study was registered in the Australian New Zealand Clinical Trials Registry in October 2021 (ANZCTR: ACTRN12621001451897p)9. This is a secondary analysis of the OPERAS dataset, which addresses the per protocol, a priori aim of addressing risk factors for opioid consumption at 7 days; we extend this work here through development and validation of a prognostic model using a pre-trained foundation model to aid clinical decision making regarding who should be prescribed opioids on the day of discharge following surgery. This study is reported in accordance with the TRIPOD+AI guidelines27.

Ethical approval

The Hunter New England Human Research Ethics Committee approved both the OPERAS study (2021/ETH11508) and data collection for the external validation cohort from Newcastle, Australia (2021/ETH11508). For the OPERAS study, institutional approvals were also obtained according to the requirements at each participating centre and verified by the central steering committee.

Development cohort

In the OPERAS dataset, all hospitals routinely performing general, orthopaedic, gynaecological and urological procedures were eligible to enrol. Data were collected prospectively whilst patients were in hospital from clinical records and discharge prescriptions, and post-discharge via standardised telephone interviews at 7 days28. Data collection took place over six predefined 14-day data collection periods between 4 April 2022 and 4 September 2022. Centres could choose to participate in any number of 14-day consecutive recruitment periods.

External validation cohort

An independent prospective cohort of patients aged 18 years or older undergoing elective or emergency general surgical procedures across two metropolitan hospitals in Newcastle, Australia, between March 2019 and March 2020 was used for external validation28. This cohort included patients undergoing thyroidectomies, which were not included in the OPERAS study. Prescribing was per the discharging clinician’s discretion.

Eligibility criteria

Centres participating in the OPERAS study prospectively screened and approached all consecutive patients who met eligibility criteria to obtain informed participant consent where this was a requirement of the site ethics approval (Supplementary Table 1). Participants could withdraw at any stage. Consecutive adult patients aged 18 years or older undergoing either elective (planned) or emergency (unplanned) common general surgical (cholecystectomy, appendicectomy, inguinal hernia repair, colon resection, fundoplication, or sleeve gastrectomy), orthopaedic (total or reverse shoulder arthroplasty, rotator cuff or labral repair, anterior cruciate ligament repair, or hip or knee arthroplasty), gynaecological (hysterectomy, oophorectomy, or salpingectomy and oophorectomy), and urological procedures (prostatectomy, cystectomy or nephrectomy) were eligible to be included9. Only patients discharged home or to a non-healthcare setting were included. Patients receiving medication-assisted treatment of opioid dependence with methadone, suboxone, or buprenorphine, discharged to rehabilitation, nursing-supported care services, another hospital, or discharged with palliative intent were excluded. Patients undergoing multivisceral resections or who required return to theatre were also excluded from the OPERAS study.

Outcome variable

The dependent variable for model development was the binary variable of whether patients had consumed any OMEs after discharge from the hospital or not. We pragmatically chose this outcome for prognostic modelling to avoid undue harms possibly associated with predicting excess quantity of opioids, owing to the skewed distribution of both quantitative opioid prescribing and consumption patterns as overviewed in our sentinel analysis7. Individuals predicted not to consume an opioid within 7 days were considered a target group in which opioids could be safely omitted. An exploratory regression analysis using the same model architecture applied to our primary categorical outcome was also performed.

Explanatory variables

Data were also collected on patient demographics (age, gender, tobacco use, vaping status, alcohol use, BMI), Society of Anesthesiologists (ASA) physical status classification, comorbidities, diagnosis and procedure-specific details (indication, surgical approach, and urgency), opioid use in the 6 months prior to admission (6-month opioid use), opioid quantity consumed in the 24 h prior to hospital discharge (discharge day opioid use), opioid prescription quantity at the time of discharge from hospital (including opioid type, dose, and quantity of pills). Procedures were mapped to BUPA severity classification codes (minor, intermediate, major)29. Data on opioid doses were converted to OMEs to account for the potencies of different medications and allow comparison. OME conversion ratios were calculated using conversion ratios defined by the Australian and New Zealand College of Anaesthetics (ANZCA) Faculty of Pain Medicine30. Where opioid conversion ratios were not defined by ANZCA, accepted conversion ratios were identified through a literature review9. Methods for the calculation of OMEs are further detailed in Supplementary Note 1 and Supplementary Table 2. Patient satisfaction with analgesia was measured on a numeric rating scale between 0 (lowest satisfaction) and 100 (highest satisfaction).

Data analysis

All statistical analyses were performed with R version 4.2.0 (R Foundation for Statistical Computing, Vienna, Austria) and Python version 3.9.6 (Python Software Foundation, Amsterdam, Netherlands). An a priori sample size calculation was performed, necessitating a minimum sample size of 852 for a multivariate linear regression model9. For a binary classification model, targeting an AUC of 0.8, with 12 parameters and 30% prevalence of opioid prescribing, a minimum sample size of 413 (with 124 events, events per predictor 10.32)31.

Descriptive statistics were used to compare demographic and prescription-specific variables based on whether or not patients were prescribed opioids at discharge using the χ2 test for categorical variables and the Kruskal-Wallis test for continuous variables. Data are presented as median (IQR) or mean ± SD. A two-tailed α level was set at 5% for interpretation of significance. Missing data were explored via visual inspection. For descriptive purposes, a mixed effects binomial logistic regression model, including individual procedures and country as the random effect, was produced and presented with associated OR and 95% CIs.

For feature selection, we utilised the descriptive machine learning explanations (DALEX) package in R, enabling model agnostic variable importance quantification across various model types32. This framework was employed to assess variable importance via a permutation-based approach. This was supplemented by the use of Shapley Additive Explanations.

Models were trained with a state-of-the-art tabular foundation model (TabPFN)33, a new model that is well suited to smaller (<10,000 sample) datasets, capable of handling missing data, and outperforms best-in-class machine learning approaches such as gradient boosting (comparison to logistic regression, and hyperparameter optimised, random forest, gradient boosted and neural network models are reported in Supplementary Tables 3 and 4). TabPFN is a generative transformer-based foundation model tailored for prediction tasks trained on tabular data by itself, having been pre-trained on millions of synthetic datasets, including various patterns of missing data. Regarding data preprocessing, we first converted categorical variables to numeric, and missing data were retained. Two TabPFN models were subsequently trained to classify the likelihood of opioid consumption in the first week after surgical discharge, in an 80% subset of the original OPERAS cohort. The first model contained all candidate predictors, and the second model only contained the refined subset of predictors determined by the feature selection approaches described above. Cross-validation with a leave-one-out approach for each procedure was performed as a sensitivity analysis when using BUPA severity classifications in place of the procedure. Receiver operator curves (ROC) and precision-recall (PR) curves, with the respective area under the receiver operator curve (AUROC) and average under the precision-recall curve (AUPRC) metrics, were interrogated to compare models. Thereafter, the results of the refined model in the testing and external Newcastle cohorts were presented. ROC, PR curve, calibration plots, and decision curve analysis plots were plotted with bootstrapped 95% CIs from 100 draws, and presented alongside their respective AUROC, AUPRC, and Brier score metrics (initially, using the >0.5 risk threshold for positive classification). AUC values between 0.5–0.59 were considered ‘poor’, 0.6–0.69 ‘moderate’, 0.7-0.79 ‘good’ and ≥0.8 ‘excellent’ discrimination34.

To assess clinical utility and impact, we first compared the quantity of opioids consumed by patients in OMEs by prediction class in the OPERAS testing set and the external Newcastle cohort. A sample of example cases is presented with variables, predictions, the original clinician’s prescribing decision, and the patient’s actual consumption outcome. Further, the TabPFN regression model was applied to assess the range of quantities of opioids predicted to be consumed within 1 week post-discharge. In the OPERAS testing set, where patient-reported outcomes were available, we then developed counterfactual simulations to assess the impact on quality of life (EQ-5D score), proportion of time spent in severe pain (0–100; herein pain severity), and satisfaction with analgesia (0–10). First, the observed effect was calculated as the mean difference in EQ-5D score, pain severity, and satisfaction with analgesia between those originally discharged with an opioid and those not discharged with an opioid. The counterfactual effect was then defined as the mean difference in satisfaction between (a) the group that the model predicted requiring an opioid, but clinicians did not; and (b) the group where the model predicted not requiring an opioid, but clinicians did (i.e. when model predictions disagreed with clinician decisions). The impact of the model was determined by subtracting the effect size in the counterfactual model from the observed effect. We then assessed for statistical significance by bootstrapping 95% CIs with 1000 resamples.

The global impact of applying our predictions was modelled using country-level case volume data from Holmer et al.35. Opioid prescribing rate was taken for available countries from the OPERAS study, and supplemented by rates described by Kaafarani et al.26. First the difference in rates of observed and predicted opioid prescribing was calculated and described. Thereafter, the average quantity of OME prescribed per patient was multiplied by the average prescribing rate, and the absolute difference between observed and predicted rates and presented as mean and standard deviation (SD). These data are visualised with a choropleth.