Deep learning unlocks the true potential of organ donation after circulatory death with accurate prediction of time-to-death

Sun, Xingzhi; De Brouwer, Edward; Liu, Chen; Krishnaswamy, Smita; Batra, Ramesh

doi:10.1038/s41598-025-95079-7

Download PDF

Article
Open access
Published: 19 April 2025

Deep learning unlocks the true potential of organ donation after circulatory death with accurate prediction of time-to-death

Xingzhi Sun¹^na1,
Edward De Brouwer²^na1,
Chen Liu¹,
Smita Krishnaswamy^1,2 &
…
Ramesh Batra³

Scientific Reports volume 15, Article number: 13565 (2025) Cite this article

3889 Accesses
2 Citations
8 Altmetric
Metrics details

Subjects

Abstract

Increasing the number of organ donations after circulatory death (DCD) has been identified as one of the most important ways of addressing the ongoing organ shortage. While recent technological advances in organ transplantation have increased their success rate, a substantial challenge in increasing the number of DCD donations resides in the uncertainty regarding the timing of cardiac death after terminal extubation, impacting the risk of prolonged ischemic organ injury, and negatively affecting post-transplant outcomes. In this study, we trained and externally validated an ODE-RNN model, which combines recurrent neural network with neural ordinary equations and excels in processing irregularly-sampled time series data. The model is designed to predict time-to-death following terminal extubation in the intensive care unit (ICU) using the history of clinical observations. Our model was trained on a cohort of 3,238 patients from Yale New Haven Hospital, and validated on an external cohort of 1,908 patients from six hospitals across Connecticut. The model achieved accuracies of \(95.3~\pm ~1.0\%\) and \(95.4~\pm ~0.7\%\) for predicting whether death would occur in the first 30 and 60 minutes, respectively, with a calibration error of \(0.024~\pm ~0.009\). Heart rate, respiratory rate, mean arterial blood pressure (MAP), oxygen saturation (SpO2), and Glasgow Coma Scale (GCS) scores were identified as the most important predictors. Surpassing existing clinical scores, our model sets the stage for reduced organ acquisition costs and improved post-transplant outcomes.

A multicenter observational study to establish practice for circulatory death declaration for organ donation in South Korea

Article Open access 24 October 2024

Development of a deep learning model that predicts critical events of pediatric patients admitted to general wards

Article Open access 27 February 2024

Discrete-time survival analysis in the critically ill: a deep learning approach using heterogeneous data

Article Open access 14 September 2022

Introduction

Organ donation plays a critical role in saving lives and improving the quality of life for individuals suffering from end-organ failure. Historically, organs from donation after brain death (DBD) donors have constituted the predominant source of transplantable organs, with donation after circulatory death (DCD) contributing to a comparatively smaller, albeit recently increasing, fraction¹. A major reason for this disparity is the lower organ-yield from DCD donors, due to the reduced quality and longevity of allografts². However, in the last 5-years, technological explosion of normothermic machine perfusion (NMP) and normothermic regional perfusion (NRP) have improved organ quality from DCD donors, highlighting the unrecognized potential of DCD donors in the transplant community^3,4. Given these recent advances, there is now a growing consensus that augmenting DCD practice represents the largest and underutilized opportunity for expanding the organ donor pool⁵.

Although NMP and NRP work to improve the quality of organs procured from a DCD, the critical challenge limiting the volume of DCD practice is the unpredictability regarding whether, or when, a patient after terminal extubation (TE) will progress to meet the Uniform Declaration of Death Act (UDDA)⁶ criteria for organ donation. This uncertainty limits the ability of Organ Procurement Organizations (OPOs) to evaluate a potential DCD donor for organ donation and thus negatively impacts the organ yield from DCD donors. Indeed, while conventional guidelines stipulate that circulatory death must occur within a narrow time-frame following the cessation of life-sustaining treatment, only 59-72% of potential DCD donors die within the first hour^1,7. The goal of this study is to investigate the potential of advanced machine learning models to accurately predict time-to-death (TTD) after extubation.

Recognizing the complexities inherent in DCD, we trained and externally validated an ODE-RNN¹⁸ model, which combines recurrent neural network with neural ordinary equations and excels in processing irregularly-sampled time series data. The model is designed to predict the time-to-death (TTD) of a patient following terminal extubation in the intensive care unit (ICU), leveraging the history of clinical observations. Our model shows remarkable accuracy and calibration, underscoring its ability to accurately and reliably identify viable DCD organ donors, and enables a nuanced balance between the risks and benefits of a specific organ donation procedure.

A key challenge of modeling ICU data for TTD prediction is that the data consist of both static variables and a multidimensional time series of longitudinal variables, are measured at irregularly-spaced time points, and contain missing measurements in many variables. Previous efforts in predicting circulatory death within specified time frames include clinical risk scores such as the United Organ Sharing (UNOS) criteria⁸, and machine learning models such as XGBoost⁹, RNN¹⁰, LSTM¹¹, GRU¹², GRU-D¹³. However, these studies are predominantly based on conventional statistical models or basic machine learning architectures designed for regularly-sampled fixed-dimensional data. As a result, they cannot take full advantage of the data available, leading to insufficient performance and lower clinical reliability^{7,14,15,16,17}. The UNOS criteria and XGBoost, a tree-based machine learning model, only consider static variables and cannot use the rich history of longitudinal variables. Recurrent neural network models and their extensions (RNN, LSTM, GRU and GRU-D) are able to model the time series of longitudinal variables, but they are primarily designed for data regularly-measured in time and perform badly on irregularly-sampled data.

In contrast, we recommend using ODE-RNN, an architecture that builds upon recent advances in longitudinal modeling through the use of neural ordinary differential equations^18,19,20, which specifically address the challenges posed by the irregularly-sampled data. The proposed model also allows one to form patient phenoscape visualizations for a better understanding of the cohort’s structure and heterogeneity. By leveraging state-of-the-art deep learning and representation learning methodologies, our approach surpasses the limitations of previous models and sets the stage for a more accurate and clinically relevant prediction of time-to-death following extubation, thereby promising an increase in the DCD donor pool.

Methods

Patient cohort and data preparation

We used two separate cohorts of patients to develop and validate the model. The first cohort contained 3,238 patients at Yale New Haven hospital (YNHH) older than 18 years old, with a recorded TE in the ICU between 2014 and 2023. For unbiased validation, using same inclusion criteria, we formed an external cohort from six different hospitals with 1,908 patients. The hospitals from the external cohort are the Bridgeport hospital, Greenwich hospital, Lawrence and Memorial hospital, Saint Raphael hospital, Westerly hospital and the Yale New Haven Children’s hospital. The median time from extubation to death was 7.15 minutes in the first cohort and 8.28 minutes in the external cohort. The two cohorts are summarized in Table 1.

Table 1 Statistics of the two cohorts. BMI stands for body mass index, and TTD for time-to-death.

Full size table

For each patient, we extracted 5 static variables and 25 longitudinal variables, and collected historical measurements before extubation. Longitudinal variables were observed at a range of frequencies – from a single observation in more than 24 hours to one/minute. Missing values in the longitudinal records were imputed by performing a combination of forward-fill, backward-fill and mean-fill imputation²¹. In addition to imputation, presence of missing values was fed to the model by creating binary missingness indicators. TTD was defined as the time from terminal extubation to circulatory death. TTD was converted into an ordinal variable with 4 categories (0: 0\(\sim\)30 minutes, 1: 30\(\sim\)60 minutes, 2: 60\(\sim\)120 minutes, and 3: longer than 120 minutes), that were used as target labels in the machine learning model. We chose these time frames as most transplant centers use 0-60 minutes as “standard criteria” for kidney DCD donation and 60\(\sim\)120 minutes as an “extended spectrum”²²; similarly, 0\(\sim\)30 minutes as “standard criteria” for liver DCD donation and 30\(\sim\)60 minutes as an “extended spectrum”²³.

List of clinical variables used in the models

The following longitudinal and clinical variables were extracted for each patient.

Longitudinal variables B-type natriuretic peptide (BNP), carboxyhemoglobin, corneal reflex, fraction of inspired oxygen (FiO2), gag reflex, Glasgow Coma Scale (GCS), hemoglobin, lactate, mean arterial blood pressure (MAP), methemoglobin, O2-Hemoglobin, partial pressure of carbon dioxide (pCO2), positive end-expiratory pressure (PEEP), blood potential of hydrogen (pH), partial pressure of oxygen (pO2), pulse, respirations, oxygen saturation (SpO2), Troponin-I, Troponin-T, Dopamine, Epinephrine, Levothyroxine, Lidocaine, Norepinephrine.

Static variables Age, Body Mass Index (BMI), Dialysis, Sex, Weight.

Model description

To capture the impact of longitudinal variables on the target while accommodating for the irregular sampling of clinical trajectories, we used an Ordinary Differential Equation Recurrent Neural Network (ODE-RNN)¹⁸. ODE-RNNs combine two powerful architectures, recurrent neural networks (RNN) and neural ordinary differential equations (Neural ODE)²⁷, making them exceptionally apt at processing clinical time series^19,24. RNNs are neural networks specialized for processing sequences. At each time step, they maintain a hidden state which represents the whole previous information in the time series. Upon reading a clinical record, the RNN updates its hidden state by combining the previous hidden state with the new observation, using an update unit (here a gated recurrent unit (GRU)). However, RNNs assume regular time intervals between observations, an assumption typically not met in clinical time series. To address this limitation, ODE-RNN uses a Neural ODE, that describes dynamics in continuous time, to model the dynamics of the hidden state between observations. This uniquely allows the model to capture the full span of the clinical records of each patient, correctly accounting for the time interval between observations, and improving upon previous methods such as logistic regression, XGBoost, or the UNOS criteria, among others.

The model accumulates the history before extubation of the patient’s vitals, medications used (e.g. vasopressors), neurological assessments, and lab results, together with their demographic records, and produces a representation of the patient, which we refer to as the patient’s latent phenotype. This latent phenotype can be understood as a learnt compact clinical summary of a particular patient. This representation is then used as an input to a multi-layer perceptron classifier that predicts the probability of each label category (0, 1, 2, or 3, corresponding to the 4 time ranges).

A graphical depiction of the architecture is presented in Fig. 1. Panel A describes the high-level design and panel B illustrates the specific modules involved.

Our model contains the following neural networks:

\(f_{\textrm{static}}\): a multi-layer perceptron (MLP) that processes the static variables.
\(f_{\textrm{ODE}}\): an MLP that predicts the derivative of the hidden state dynamics between observations.
\(f_{\textrm{GRU}}\): a gated recurrent unit (GRU) that updates the hidden states at each observation point.
\(f_{\textrm{longitudinal}}\): an MLP that processes the final hidden state of the longitudinal variables.
\(f_{\textrm{fusion}}\): an MLP that fuses the latent states derived from static and longitudinal variables, producing a latent variable called the latent phenotype.
\(f_{\textrm{classifier}}\): an MLP that performs classification using the latent phenotype.

Algorithm 1 describes how the model predicts outcomes for a single patient. The model takes as input static variables \(s\in \mathbb {R}^{\ell }\), longitudinal variables \(x_1,\dots ,x_n\in \mathbb {R}^{k}\), observation times \(t_1,\dots ,t_n\in \mathbb {R}\), and boolean observation masks \(m_1,\dots ,m_n\in \mathbb {R}^{k}\), where \(m_i=(m_{i1},\dots ,m_{ik})\). The value \(m_{ij}=\textrm{true}\) if \(x_{ij}\), the \(j^{\textrm{th}}\) variable at time i, is observed, and \(m_{ij}=\textrm{false}\) otherwise. Including these observation masks enables the model to utilize “informative missingness,” which is correlated with the patient’s condition. For instance, a patient is unlikely to be suspected of heart failure if B-type Natriuretic Peptide (BNP) is not frequently measured. The model iterates through the observation time points, updating the hidden state h of the longitudinal variables using \(f_{\textrm{ODE}}\) and \(f_{\textrm{GRU}}\). Between observation points, \(f_{\textrm{ODE}}\) is integrated to continuously update h, while at observation times, \(f_{\textrm{GRU}}\) updates h using \(x_i, m_i, t_i\). The final h contains accumulated information from the entire history of the longitudinal variables, which is then processed by \(f_{\textrm{longitudinal}}\) and fused with the processed static variables s (via \(f_{\textrm{static}}\)) using \(f_{\textrm{fusion}}\). This results in a latent variable representing the patient’s condition, referred to as the latent phenotype. \(f_{\textrm{classifier}}\) uses the latent phenotype to make the final classification prediction.

Evaluation of model performance

We compared the performance of our approach with machine learning methods and clinical scores such as the UNOS criteria⁸. Machine learning methods directly learn associations from the available data while clinical scores consist of clinical criteria designed by experts. Within the machine learning methods, we distinguish between static methods and longitudinal methods. Unlike longitudinal methods, static methods cannot process a time series of clinical information. They are therefore trained on the last available clinical observation at the time of extubation only.

Static machine learning methods

1.
XGBoost XGBoost⁹ is an effective and widely used tree-based machine learning algorithm for predictive modeling. Utilizing an ensemble of decision trees, XGBoost improves model accuracy by combining weak learners into a strong one. It is designed with sparsity awareness, which renders it helpful in clinical health data. However, being a static method, XGBoost cannot process longitudinal data.

Longitudinal machine learning methods

1.
RNN Recurrent Neural Networks (RNNs)¹⁰ are a class of neural networks specialized for processing sequences, designed for handling time-series data or sequential information. A vanilla RNN processes sequences by iterating through elements, using its internal state to retain information from previous inputs.
2.
LSTM Long Short-Term Memory (LSTM)¹¹, an extension of vanilla RNNs, is designed to overcome the vanishing gradient problem by incorporating memory cells. These cells enable LSTMs to retain information over extended sequences, making them adept at tasks requiring longer-term dependencies.
3.
GRU Gated Recurrent Units (GRUs)¹² are a streamlined variant of LSTMs. Compared to the LSTM architecture, a GRU replaces the input, forget and output gates with the reset and update gates for higher efficiency.
4.
GRU-D GRU-D, an extension of GRUs¹³, integrates decay mechanisms to handle missing data in time-series. It modifies the GRU architecture to accommodate irregularly-sampled data, enhancing prediction accuracy in such scenarios.

Clinical scores Besides the machine learning models above, we also compared our approach to the most widely used clinical score for DCD candidates identification: the UNOS criteria⁸.

1.
UNOS UNOS criteria consisted of fourteen clinical variables developed by the UNOS DCD consensus committee, based on expert opinion⁸. Criteria include physiological measurements (e.g. heart rate <30) and respiratory characteristics (e.g. FiO2 >0.5). A final score was computed by adding up the number of UNOS criteria present in the patient at the time of extubation.

For comparison, we trained these various models on the YNHH cohort using a temporal data split. Data from patients before 2021 was used for training the models. Patients after 2021 were used for evaluation only. This ensured that our results were robust to distribution shifts over time²⁵. Standard errors were computed by training five different models with different initializations.

The models were trained to predict TTD as a categorical variable within a given time frame (0\(\sim\)30 min, 30\(\sim\)60 min, 60\(\sim\)120 min, or >120 min). We evaluated the different models according to the overall categorical accuracy as well as pairwise binary classification for different grouped time-frames (e.g., <30 min vs. >30 min). For these binary groupings, we also computed the positive predictive values (PPV), negative predicted values (NPV), area under the receiver operating characteristic curve (ROC-AUC) and the area under the precision-recall curve (PR-AUC). The ROC curve measures the trade-off between sensitivity and specificity, while the PR curve measures the trade-off between precision and recall. To assess the calibration of the models, we computed the expected calibration error (ECE).

Visualizing structures in high-dimensional patient phenoscape with PHATE

To predict TTD, our ODE-RNN model produces a latent phenotype for each patient, which can be intuitively understood as a learnt summary of the clinical history of the patient. We explored the space of latent phenotypes of all patients in the cohort, the patient phenoscape, showing its potential to provide new clinical insights.

The latent phenotype of each patient is high-dimensional, and thus cannot be directly visualized. Therefore, we first produced a two-dimensional representation of the phenotype using PHATE²⁶, a non-linear dimensionality reduction and visualization method that stays faithful to the geometry of the data and retains the inherent similarity between patients. In this visualization, each patient is represented as a point in a two-dimensional phenotypic space. The patient phenoscape is the set of the representations of all patients in the cohort. We colored each point according to their TTD and the value of certain clinical variables, enabling a fine-grained exploration of the impact of different clinical factors on the TTD.

Statements regarding human participants

All methods were carried out in accordance with relevant guidelines and regulations. Identifiable information, including human participants’ names and other HIPAA identifiers, has been removed from all sections of the manuscript, including supplementary materials.

The Yale University Institutional Review Board (IRB) waived the need for approval and informed consent for this study after reviewing the proposal, as this research is not considered human subject research since all the participants were deceased. Please refer to the definition for Human Subjects in the HHS Policy for Protection of Human Research Subjects 45 CFR 46.102: “Human subject means a living individual about whom an investigator (whether professional or student) conducting research obtains”.

Results

Modeling longitudinal clinical variables acquired at irregular time intervals

Since we aim to predict time-to-death using a patient’s history of clinical observations prior to terminal extubation, the input data contains both static and longitudinal variables. The latter pose challenges for statistical analysis and machine learning methods due to their irregular measurements over time and the presence of missing values. By using an Ordinary Differential Equation Recurrent Neural Network (ODE-RNN)¹⁸, we address both issues, as the model integrates a recurrent neural network (RNN)¹⁰ component tailored for sequential modeling, with a Neural ODE²⁷ component that interpolates between irregularly-sampled time points.

The model operates by accumulating longitudinal variables with static variables to create a summary of the clinical history of each patient, that we call the latent phenotype²⁸. The latent phenotype is then used by a classifier to predict patient outcomes (i.e. TTD). This procedure makes the ODE-RNN particularly effective at processing EHRs containing both static and longitudinal variables²⁴.

Predictive performance evaluation

We compared our method with the UNOS criteria, the most widely used clinical score for identifying DCD candidates⁸, and existing machine learning models that have been used for clinical outcome prediction (RNN¹⁰, LSTM¹¹, GRU¹², GRU-D¹², and XGBoost⁹).

All machine learning models were trained on the Yale New Haven Hospital (YNHH) cohort using a temporal data split. Data from patients before 2021 was used for training the models. Patients after 2021 were used for evaluation only to ensure robustness to distribution shifts over time.

The models were trained to predict time-to-death as a categorical variable, i.e., whether TTD fell within a given time frame (0-30 min, 30-60 min, 60-120 min, or >120 min). We evaluated the different models according to the overall categorical accuracy as well as pairwise binary classification for different grouped time frames (e.g., <30 min vs. >30 min). For these binary groupings, we also computed the area under the positive and negative predicted values (PPV, NPV), the receiver operating characteristic curve (ROC-AUC) and the area under the precision-recall curve (PR-AUC). To assess the calibration of the models, that is, how well the predicted values represent the true likelihood, we computed the expected calibration error (ECE)²⁹.

Table 2 displays the comparative performance of various models on the YNHH patient cohort after 2021 and the external validation cohort. As illustrated in Fig. 2 panel A, the ODE-RNN model consistently outperforms all other approaches in predictive performance for TTD forecasts at 30, 60, and 120 minutes. We note the high performance of the model despite the stringent experimental setup (temporal split and external validation), highlighting the robustness of the method. ODE-RNN also shows the best calibration, suggesting the probability outputs of the model are very reliable.

Table 2 Comparison of the performance results of various machine learning models and statistical models on the Yale New Haven Hospital (YNHH) test cohort and external validation cohort.

Full size table

The poor performance of XGBoost and UNOS can be explained by (1) the inability of UNOS criteria and XGBoost to capture the temporality of the patient’s data – they only use the last observation at the time of extubation, (2) the limited number of clinical variables used in UNOS (14 variables) compared to our ODE-RNN model (5 static and 25 longitudinal variables).

To accommodate certain DCD protocols with longer time thresholds, we also trained and evaluated the models over extended time frames of 30, 60, 120, 180 and 240 minutes, and our ODE-RNN remains competitive (see Supplementary Table S1). Additionally, in clinical deployment, a dedicated model that predicts TTD several hours before extubation is beneficial, as early prediction facilitates better preparation and planning for DCD. To demonstrate the capability of early prediction, we trained and evaluated the same models using data up to 12 hours before extubation, and ODE-RNN is again the most competitive method in this realistic setting (see Supplementary Table S2).

In Fig. 2 panel B, we report the calibration plot of the ODE-RNN model for the binary prediction (< 30 min vs. > 30 min) on the external validation cohort. Calibration plots for other binary tasks are available in the Supplementary Materials. The model tended to give a reliable but conservative estimate of the probability of death within 30 minutes. Importantly, the model appeared well calibrated for low and high predicted probabilities, highlighting its reliability.

Variable importance assessment

We assessed the importance and impact of the different clinical variables on the prediction of the models with permutation importance testing (Fig. 2 panel C). For longitudinal variables, we found that heart rate was the most important variable in the prediction of the ODE-RNN, followed by respiratory rate, mean arterial blood pressure (MAP), oxygen saturation (SpO2), and the Glasgow Coma Scale (GCS) score. Corneal reflex and gag reflex which are frequently used by transplant surgeons and OPOs, appear to have the least impact on the prediction. Notably, static variables were found significantly less important than longitudinal variables, by an order of magnitude. We also found a strong consistency in the variable importance across different binary tasks.

Patient phenoscape analysis

The patient phenotypes learnt by our model enable accurate TTD predictions because they faithfully represent the patients’ condition and past clinical history. As such, these phenotypes provide richer information about the patients, compared to the single numerical value of TTD prediction. These phenotypes form a continuous landscape of the patient cohort, which we refer to as the phenoscape. Within the phenoscape, we can observe clusters of patients with similar conditions and continuous transitions from one condition to another, thereby uncovering the underlying dynamics of circulatory death. We used PHATE²⁶, a dimensionality reduction method that preserves the underlyding data geometry^30,31,32,33, to visualize the phenoscape and provide examples of new insights drawn from such analysis.

Our patient phenoscape visualizations in Fig. 3 revealed that patients reside on a continuous spectrum of phenotype that goes beyond the TTD categorization. Patients were organized along an axis that corresponded with TTD but also with heart rate, SpO2, or GCS, among others. Finer investigation allowed us to identify the dynamical patterns most correlated with TTD, complementing the variable importance analysis above. For instance, in Fig. 3 panel A, patients were colored according to their average heart rate and to their range of heart rate measurements (defined as the difference between highest and lowest values). While the range correlated with TTD, the average value did not, suggesting the variation in the heart rate is more important than the average.

Our phenoscape visualization also showed a clear distinction between patients with TTD<120 minutes and another group with TTD>120 minutes (categories 0,1,2 vs. 3 in top-left of Fig. 3 panel A). This separation suggests an obvious clinical difference between patients with TTD<120 min and TTD>120 min. We performed a fine-grained analysis of the former cluster, to identify clinical drivers that make the difference between short-range (\(\approx\)30min) and medium-range (\(\approx\)60min) TTD. Panels B and C in Fig. 3 show a visualization of the identified group of patients with TTD<120min. From this zoomed-in analysis, we identified three clusters (A, B, and C). Guided by our visualization, we examined the specific phenotype of patients from cluster A, which appear to have higher TTD than rest of the patients. We found that patients from cluster A were characterized by higher TTD, higher range of heart rate, lower minimum SpO2, higher GCS, higher MAP and lower body mass index (BMI). This reveals that, over the period before TE, range of heart rate, minimum SpO2, average GCS, and the maximum MAP observed are all predictive of TTD.

Discussion

Augmenting the number of donations after circulatory death has been recognized as a crucial factor in mitigating the ongoing organ shortage, with the potential to increase the organ donor pool by as much as 30% in the United States³⁴. However, a major factor hindering the rapid increase of DCD is the unpredictability regarding the time of circulatory death after extubation, leading to an unmanageable risk of prolonged warm ischemic injury. In the United Kingdom, it is estimated that 40% of donation teams mobilized for potential DCD donations are unsuccessful due to unpredictably long ischemic injury³⁵. In the United States, only 59-72% of potential DCD donors die within the first hour after terminal extubation^1,7. Similarly, in our cohort, only 73.8% of the patients died within the first hour.

This low success rate worsened by total unpredictability results in the waste of essential and valuable health-care resources and increased distress for families. Therefore, the average cost per DCD organ is estimated to be 63% higher than a DBD organ, mostly attributable to the unpredictability of DCDs from these “dry runs”^36,37. The recent introduction of NMP and NRP, which have significantly improved the quality of organs from DCD by resuscitation of DCD organs prior to transplant; but that added expense has also contributed in making “dry runs” even more costly as the resources spent in mobilizing the NMP and NRP teams are still wasted on an unpredictable and thus failed DCD attempt^38,39. Unsuccessful DCD donations also result in wasted human effort, and an avoidable environmental cost linked to the inherent logistics (air/ground transport) of a failed DCD attempt; and of an immeasurable psychological burden on grieving families hoping to make sense of their tragedy with the hope of a successful organ donation⁴⁰.

These considerations highlight the importance and value of an accurate TTD prediction and have motivated the introduction of clinical scores, such as the UNOS criteria⁸ or the University of Wisconsin Donation after Circulatory Death evaluation tool (UW-DCD)⁴¹. However, these statistical methods show poor discrimination performance. For the UNOS criteria, PPV and NPV are reported to be 75.8% and 73%⁴². The UW-DCD, shows even worse performance (57.6% PPV and 61.8% NPV)⁴², and requires disconnecting the patient from ventilator for 10 minutes⁴¹. In contrast, our model achieved a PPV of 95.8%, a NPV of 93.9% and an accuracy of 95.4% for predicting whether the donor would die within the first hour on the external validation cohort, thereby only misclassifying 4.6% of the patients. Our model also does not require disconnecting the patient from ventilator, as seen in UW-DCD.

Predicting time to circulatory death has been previously attempted in the literature^{8,14,15,16,17,41}. Nevertheless, previous studies predominantly have relied on conventional statistics and machine learning architectures such as logistic regression⁷ or long short term memory (LSTM)^11,14. These models showed promising performance, but their simple architecture failed to fully capture the signal in data. Winter et al.¹⁶ proposed a model including only pediatric patients, with an ROC-AUC of 0.85, which is significantly lower than our model (ROC-AUC of \(98.7 \pm 0.3\) for ODE-RNN). It is noteworthy that pediatric patients (<18 years) only conforms to 5% of total deceased organ donations in United States, whereas our model is applicable for the 95% organ donors in U.S.⁴³. Furthermore, the performance evaluation in these studies was often limited and without calibration, preventing a thorough assessment of the maturity of models for a potential clinical use.

Our study aims at addressing these shortcomings by leveraging the most recent advances in machine learning and providing the most robust clinical evaluation possible. The ODE-RNN architecture is specifically designed to handle specific challenges of clinical time series, such as irregular sampling or missing data, which enables capturing all the relevant information in patient’s data. To evaluate the models as closely as possible to a realistic clinical practice scenario, we used temporal splitting, removing bias induced by a change of clinical practices over time. Notably, such an evaluation strategy was absent from previous works on TTD prediction. We also used an external validation cohort to remove the bias linked to the clinical practices at different hospitals.

In clinical setting, it is important that predictive models give a notion of certainty regarding their predictions. Indeed, having access to a probability of death within a timeframe is crucial in balancing the expected benefits and costs of a planned organ donation. Remarkably, we found that our model showed excellent calibration, suggesting that the predicted probabilities could be directly interpreted at face-value. Furthermore, we note that our model jointly predicts probabilities for all four time-frames (<30 min, 30\(\sim\)60 min, 60\(\sim\)120 min, >120 min), which enables a fine-grained evaluation. This facilitates work of OPOs and transplant centers, who plan procurement of particular organs based on their warm ischemia time acceptance criteria for that particular donor as standard or extended criteria.

Our variable importance analysis was generally consistent with the UNOS criteria variables with respiratory rate, heart rate, SpO2, PEEP, and norepinephrine ranking high. However, only dopamine, a variable in the UNOS model, was found to be one of the least important variables in our model. We also found that MAP and GCS, although absent from the UNOS criteria, were very important variables for the predictive accuracy in our model. It is noteworthy that, although UNOS model excludes both MAP and GCS, they have been previously identified as important predictors of death after TE^8,16.

Deep learning architecture like, ODE-RNN, bring added value with ability to handle irregular time series of arbitrary length and to provide a hidden state representation of the patient: a latent phenotype. Our experimental results showed, the ability to process the whole available time-series, and handle irregular sampling, resulted in better predictive performance. We further showed that our model enables a fine-grained analysis of the patient cohort by producing a latent phenotype for each patient, put together visualized as a phenoscape. The phenoscape identifies and separates the specific subgroups of patients with higher TTD and could potentially support clinical discovery essential to the prediction and identification for a DCD donor.

While the model developed in this study represents an important proof-of-concept and demonstrates compelling predictive performance, our study still suffers from some limitations. Notably, our patient cohorts were derived from various hospital records rather than directly from OPOs, meaning that some patients in our cohorts might have been ineligible for organ donation. Nevertheless, we are optimistic that the exceptional performance of our model will pave the way for future research involving a dedicated cohort of organ donors, ultimately enabling a definitive demonstration of the utility of machine learning models in improving the success rate of DCD.

Data Availability:

The data cannot be shared openly due to HIPPA regulations. However, they will be made available upon reasonable request to the corresponding authors.

References

Bellingham, J. M. et al. Donation after cardiac death: A 29-year experience. Surgery 150, 692–702 (2011).
Article PubMed Google Scholar
Israni, A. K. et al. Optn/srtr 2021 annual data report: Deceased organ donation. Am. J. Transplant. 23, S443–S474 (2023).
Article PubMed Google Scholar
Markmann, J. F. et al. Impact of portable normothermic blood-based machine perfusion on outcomes of liver transplant: The ocs liver protect randomized clinical trial. JAMA Surg. 157, 189–198 (2022).
Article PubMed PubMed Central Google Scholar
van Rijn, R. et al. Hypothermic machine perfusion in liver transplantationâ€”a randomized trial. N. Engl. J. Med. 384, 1391–1401 (2021).
Article PubMed Google Scholar
Sonnenberg, E. M., Hsu, J. Y., Reese, P. P., Goldberg, D. S. & Abt, P. L. Wide variation in the percentage of donation after circulatory death donors across donor service areas: a potential target for improvement. Transplantation 104, 1668–1674 (2020).
Article CAS PubMed PubMed Central Google Scholar
for the Study of Ethical Problems in Medicine, U. S. P. C., Biomedical & Research, B. Defining death: A report on the medical, legal and ethical issues in the determination of death (1981). Accessed: 2024-02-07.
Kotsopoulos, A., Böing-Messing, F., Jansen, N. E., Vos, P. & Abdo, W. F. External validation of prediction models for time to death in potential donors after circulatory death. Am. J. Transplant. 18, 890–896 (2018).
Article CAS PubMed Google Scholar
DeVita, M. et al. Donors after cardiac death: Validation of identification criteria (dvic) study for predictors of rapid death. Am. J. Transplant. 8, 432–441 (2008).
Article CAS PubMed Google Scholar
Chen, T., Guestrin, C. Xgboost: A scalable tree boosting system. In booktitleProceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, 785–794 (2016).
Rumelhart, D. E., Hinton, G. E. & Williams, R. J. Learning internal representations by error propagation, parallel distributed processing, explorations in the microstructure of cognition, ed. de Rumelhart and J. Mcclelland vol. 1. 1986. Biometrika 71, 6 (1986).
Google Scholar
Schmidhuber, J. et al. Long short-term memory. Neural Comput. 9, 1735–1780 (1997).
Article PubMed Google Scholar
Chung, J., Gulcehre, C., Cho, K., Bengio, Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. Preprint at arXiv:1412.3555 (2014).
Che, Z., Purushotham, S., Cho, K., Sontag, D. & Liu, Y. Recurrent neural networks for multivariate time series with missing values. Sci. Rep. 8, 6085 (2018).
Article ADS PubMed PubMed Central Google Scholar
Huang, J., Shung, D., Burman, T., Krishnaswamy, S. & Batra, R. K. Prediction of death after terminal extubation, the machine learning way. J. Am. Coll. Surg. 235, S95–S96 (2022).
Article Google Scholar
Bradley, J., Pettigrew, G. & Watson, C. Time to death after withdrawal of treatment in donation after circulatory death (dcd) donors. Curr. Opin. Organ Transplant. 18, 133–139 (2013).
Article CAS PubMed Google Scholar
Winter, M. C. et al. Machine learning to predict cardiac death within 1 hour after terminal extubation. Pediatr. Crit. Care Med. 22, 161–171 (2021).
Article PubMed Google Scholar
He, X. et al. Nomogram for predicting time to death after withdrawal of life-sustaining treatment in patients with devastating neurological injury. Am. J. Transplant. 15, 2136–2142 (2015).
Article CAS PubMed Google Scholar
Rubanova, Y., Chen, R.T., Duvenaud, D.K. Latent ordinary differential equations for irregularly-sampled time series. Advances in neural information processing systems32 (2019).
De Brouwer, E., Simm, J., Arany, A., Moreau, Y. Gru-ode-bayes: Continuous modeling of sporadically-observed time series. Advances in neural information processing systems32 (2019).
Liu, C. et al. Imageflownet: Forecasting multiscale image-level trajectories of disease progression with irregularly-sampled longitudinal medical images. IEEE International Conference on Acoustics, Speech and Signal Processing (2025).
Bertsimas, D., Orfanoudaki, A. & Pawlowski, C. Imputation of clinical covariates in time series. Mach. Learn. 110, 185–248 (2021).
Article MathSciNet Google Scholar
Scalea, J. et al. Does dcd donor time-to-death affect recipient outcomes? implications of time-to-death at a high-volume center in the united states. Am. J. Transplant. 17, 191–200 (2017).
Article ADS CAS PubMed Google Scholar
Kalisvaart, M. et al. Donor warm ischemia time in dcd liver transplantation-working group report from the ilts dcd, liver preservation, and machine perfusion consensus conference. Transplantation 105, 1156–1164 (2021).
Article PubMed Google Scholar
De Brouwer, E., Gonzalez, J. & Hyland, S. Predicting the impact of treatments over time with uncertainty aware neural differential equations. In International Conference on Artificial Intelligence and Statistics (ed. De Brouwer, E.) 4705–4722 (PMLR, 2022).
Google Scholar
Guo, L. L. et al. Evaluation of domain generalization and adaptation on improving model robustness to temporal dataset shift in clinical medicine. Sci. Rep. 12, 2726 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Moon, K. R. et al. Visualizing structure and transitions in high-dimensional biological data. Nat. Biotechnol. 37, 1482–1492 (2019).
Article CAS PubMed PubMed Central Google Scholar
Chen, R.T., Rubanova, Y., Bettencourt, J., Duvenaud, D.K. Neural ordinary differential equations. Advances in neural information processing systems31 (2018).
Richesson, R. L., Sun, J., Pathak, J., Kho, A. N. & Denny, J. C. Clinical phenotyping in selected national networks: demonstrating the need for high-throughput, portable, and computational methods. Artif. Intell. Med. 71, 57–61 (2016).
Article PubMed PubMed Central Google Scholar
Guo, C., Pleiss, G., Sun, Y. & Weinberger, K. Q. On calibration of modern neural networks. In International Conference on Machine Learning (ed. Guo, C.) 1321–1330 (PMLR, 2017).
Google Scholar
Acosta, J. N., Falcone, G. J., Rajpurkar, P. & Topol, E. J. Multimodal biomedical ai. Nat. Med. 28, 1773–1784 (2022).
Article CAS PubMed Google Scholar
Heumos, L. et al. Best practices for single-cell analysis across modalities. Nat. Rev. Genet. 24, 550–572 (2023).
Article CAS PubMed Google Scholar
Liu, C. et al. Cuts: A deep learning and topological framework for multigranular unsupervised medical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention (ed. Liu, C.) 155–165 (Springer, 2024).
Google Scholar
Sun, X. et al. Geometry-aware generative autoencoders for warped riemannian metric learning and generative modeling on data manifolds. Preprint at arXiv:2410.12779 (2024).
Jawitz, O. K. et al. Increasing the united states heart transplant donor pool with donation after circulatory death. J. Thorac. Cardiovasc. Surg. 159, e307–e309 (2020).
Article PubMed Google Scholar
Manara, A. R., Murphy, P. & O’Callaghan, G. Donation after circulatory death. Br. J. Anaesth. 108, i108–i121 (2012).
Article PubMed Google Scholar
Wall, A. E. et al. A cost comparison of liver acquisition fees for donation after circulatory death versus donation after brain death donors. Liver Transplantation 10–1097 (2024).
Lindemann, J. et al. Cost evaluation of a donation after cardiac death program: how cost per organ compares to other donor types. J. Am. Coll. Surg. 226, 909–916 (2018).
Article PubMed Google Scholar
Webb, A. N., Izquierdo, D. L., Eurich, D. T., Shapiro, A. J. & Bigam, D. L. The actual operative costs of liver transplantation and normothermic machine perfusion in a Canadian setting. PharmacoEconomics-Open 5, 311–318 (2021).
Article PubMed Google Scholar
Webb, A. N., Lester, E. L., Shapiro, A. M. J., Eurich, D. T. & Bigam, D. L. Cost-utility analysis of normothermic machine perfusion compared to static cold storage in liver transplantation in the canadian setting. Am. J. Transplant. 22, 541–551 (2022).
Article PubMed Google Scholar
Taylor, L. J. et al. Harms of unsuccessful donation after circulatory death: an exploratory study. Am. J. Transplant. 18, 402–409 (2018).
Article PubMed Google Scholar
Lewis, J. et al. Development of the university of Wisconsin donation after cardiac death evaluation tool. Prog. Transplant. 13, 265–273 (2003).
Article PubMed Google Scholar
Coleman, N. L., Brieva, J. L. & Crowfoot, E. Prediction of death after withdrawal of life-sustaining treatments. Crit. Care Resusc. 10, 278–284 (2008).
PubMed Google Scholar
Procurement, O., Network, T. Optn data for deceased donors recovered in the u.s. by donor age (2024). Accessed: 2024-09-18.

Download references

Author information

Xingzhi Sun and Edward De Brouwer contributed equally to this work.

Authors and Affiliations

Department of Computer Science, Yale University, New Haven, 06511, USA
Xingzhi Sun, Chen Liu & Smita Krishnaswamy
Department of Genetics, Yale University, New Haven, 06511, USA
Edward De Brouwer & Smita Krishnaswamy
Department of Surgery, Yale University, New Haven, 06511, USA
Ramesh Batra

Authors

Xingzhi Sun
View author publications
Search author on:PubMed Google Scholar
Edward De Brouwer
View author publications
Search author on:PubMed Google Scholar
Chen Liu
View author publications
Search author on:PubMed Google Scholar
Smita Krishnaswamy
View author publications
Search author on:PubMed Google Scholar
Ramesh Batra
View author publications
Search author on:PubMed Google Scholar

Contributions

S.K. and R.B. identified the research problem. R.B. provided the data. X.S., E.D.B. and C.L. conceived the experiments. X.S. and E.D.B. conducted the experiments. X.S., E.D.B. and C.L. analyzed the results. S.K. and R.B. provided advice and supervision. All authors wrote and reviewed the manuscript.

Corresponding authors

Correspondence to Smita Krishnaswamy or Ramesh Batra.

Ethics declarations

Competing Interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Sun, X., De Brouwer, E., Liu, C. et al. Deep learning unlocks the true potential of organ donation after circulatory death with accurate prediction of time-to-death. Sci Rep 15, 13565 (2025). https://doi.org/10.1038/s41598-025-95079-7

Download citation

Received: 09 December 2024
Accepted: 19 March 2025
Published: 19 April 2025
Version of record: 19 April 2025
DOI: https://doi.org/10.1038/s41598-025-95079-7