Table 2 Comparison of the performance results of various machine learning models and statistical models on the Yale New Haven Hospital (YNHH) test cohort and external validation cohort.

From: Deep learning unlocks the true potential of organ donation after circulatory death with accurate prediction of time-to-death

Model

UNOS

XGBoost

RNN

LSTM

GRU

GRU-D

ODE-RNN

Yale New Haven Hospital Test Cohort (Temporal Split, After 2021)

 Accuracy (4-way) \(\uparrow\)

\(0.627 \pm {0.000}\)

\(0.831 \pm {0.000}\)

\(0.816 \pm {0.014}\)

\(0.861 \pm {0.010}\)

\(0.862 \pm {0.021}\)

\(0.856 \pm {0.014}\)

\({\textbf {0.878}} \pm {0.007}\)

 Accuracy (<30 vs. >30) \(\uparrow\)

\(0.627 \pm {0.000}\)

\(0.900 \pm {0.000}\)

\(0.922 \pm {0.020}\)

\(0.947 \pm {0.009}\)

\(0.948 \pm {0.006}\)

\(0.941 \pm {0.005}\)

\({\textbf {0.955}} \pm {0.010}\)

 Accuracy (<60 vs. >60) \(\uparrow\)

\(0.711 \pm {0.000}\)

\(0.928 \pm {0.000}\)

\(0.892 \pm {0.012}\)

\(0.939 \pm {0.007}\)

\(0.946 \pm {0.010}\)

\(0.941 \pm {0.006}\)

\({\textbf {0.953}} \pm {0.003}\)

 Accuracy (<120 vs. >120) \(\uparrow\)

\(0.779 \pm {0.000}\)

\(0.934 \pm {0.000}\)

\(0.903 \pm {0.004}\)

\(0.924 \pm {0.006}\)

\(0.930 \pm {0.011}\)

\(0.929 \pm {0.012}\)

\({\textbf {0.942}} \pm {0.003}\)

 ROC-AUC (<30 vs. >30) \(\uparrow\)

\(0.584 \pm {0.000}\)

\(0.962 \pm {0.000}\)

\(0.952 \pm {0.023}\)

\(0.978 \pm {0.012}\)

\(0.973 \pm {0.010}\)

\(0.972 \pm {0.008}\)

\({\textbf {0.987}} \pm {0.004}\)

 ROC-AUC (<60 vs. >60) \(\uparrow\)

\(0.592 \pm {0.000}\)

\(0.966 \pm {0.000}\)

\(0.932 \pm {0.019}\)

\(0.968 \pm {0.012}\)

\(0.965 \pm {0.011}\)

\(0.963 \pm {0.007}\)

\({\textbf {0.987}} \pm {0.003}\)

 ROC-AUC (<120 vs. >120) \(\uparrow\)

\(0.623 \pm {0.000}\)

\(0.975 \pm {0.000}\)

\(0.922 \pm {0.018}\)

\(0.961 \pm {0.014}\)

\(0.957 \pm {0.011}\)

\(0.951 \pm {0.006}\)

\({\textbf {0.984}} \pm {0.002}\)

 PR-AUC (<30 vs. >30) \(\uparrow\)

\(0.734 \pm {0.000}\)

\(0.972 \pm {0.000}\)

\(0.953 \pm {0.034}\)

\(0.981 \pm {0.014}\)

\(0.971 \pm {0.012}\)

\(0.975 \pm {0.012}\)

\({\textbf {0.987}} \pm {0.003}\)

 PR-AUC (<60 vs. >60) \(\uparrow\)

\(0.799 \pm {0.000}\)

\(0.986 \pm {0.000}\)

\(0.956 \pm {0.024}\)

\(0.982 \pm {0.010}\)

\(0.974 \pm {0.010}\)

\(0.976 \pm {0.008}\)

\({\textbf {0.995}} \pm {0.001}\)

 PR-AUC (<120 vs. >120) \(\uparrow\)

\(0.857 \pm {0.000}\)

\(0.993 \pm {0.000}\)

\(0.962 \pm {0.019}\)

\(0.985 \pm {0.008}\)

\(0.977 \pm {0.010}\)

\(0.974 \pm {0.009}\)

\({\textbf {0.996}} \pm {0.001}\)

 F1 (<30 vs. >30) \(\uparrow\)

\(0.770 \pm {0.000}\)

\(0.929 \pm {0.000}\)

\(0.940 \pm {0.013}\)

\(0.958 \pm {0.007}\)

\(0.960 \pm {0.003}\)

\(0.954 \pm {0.003}\)

\({\textbf {0.967}} \pm {0.007}\)

 F1 (<60 vs. >60) \(\uparrow\)

\(0.831 \pm {0.000}\)

\(0.949 \pm {0.000}\)

\(0.926 \pm {0.008}\)

\(0.957 \pm {0.004}\)

\(0.963 \pm {0.006}\)

\(0.959 \pm {0.003}\)

\({\textbf {0.968}} \pm {0.002}\)

 F1 (<120 vs. >120) \(\uparrow\)

\(0.876 \pm {0.000}\)

\(0.957 \pm {0.000}\)

\(0.935 \pm {0.006}\)

\(0.953 \pm {0.003}\)

\(0.957 \pm {0.006}\)

\(0.953 \pm {0.009}\)

\({\textbf {0.960}} \pm {0.003}\)

 PPV (<30 vs. >30) \(\uparrow\)

\(0.627 \pm {0.000}\)

\(0.882 \pm {0.000}\)

\(0.930 \pm {0.024}\)

\(0.942 \pm {0.010}\)

\(0.945 \pm {0.010}\)

\(0.935 \pm {0.010}\)

\({\textbf {0.963}} \pm {0.010}\)

 PPV (<60 vs. >60) \(\uparrow\)

\(0.711 \pm {0.000}\)

\(0.944 \pm {0.000}\)

\(0.922 \pm {0.010}\)

\(0.945 \pm {0.007}\)

\(0.952 \pm {0.013}\)

\(0.945 \pm {0.008}\)

\({\textbf {0.967}} \pm {0.003}\)

 PPV (<120 vs. >120) \(\uparrow\)

\(0.779 \pm {0.000}\)

\(0.966 \pm {0.000}\)

\(0.925 \pm {0.015}\)

\(0.937 \pm {0.015}\)

\(0.935 \pm {0.018}\)

\(0.927 \pm {0.019}\)

\({\textbf {0.976}} \pm {0.010}\)

 NPV (<30 vs. >30) \(\uparrow\)

\(0.000 \pm {0.000}\)

\({\textbf {0.960}} \pm {0.000}\)

\(0.915 \pm {0.030}\)

\(0.955 \pm {0.018}\)

\(0.956 \pm {0.008}\)

\(0.954 \pm {0.009}\)

\(0.953 \pm {0.013}\)

 NPV (<60 vs. >60) \(\uparrow\)

\(0.000 \pm {0.000}\)

\(0.886 \pm {0.000}\)

\(0.826 \pm {0.023}\)

\(0.922 \pm {0.025}\)

\({\textbf {0.935}} \pm {0.006}\)

\(0.928 \pm {0.011}\)

\(0.922 \pm {0.011}\)

 NPV (<120 vs. >120) \(\uparrow\)

\(0.000 \pm {0.000}\)

\(0.829 \pm {0.000}\)

\(0.797 \pm {0.039}\)

\(0.885 \pm {0.042}\)

\({\textbf {0.919}} \pm {0.048}\)

\(0.914 \pm {0.025}\)

\(0.827 \pm {0.018}\)

 ECE \(\downarrow\)

\(0.054 \pm {0.000}\)

\(0.092 \pm {0.000}\)

\(0.051 \pm {0.004}\)

\(0.048 \pm {0.007}\)

\(0.055 \pm {0.014}\)

\(0.055 \pm {0.008}\)

\({\textbf {0.033}} \pm {0.008}\)

External Validation Cohort

 Accuracy (4-way) \(\uparrow\)

\(0.641 \pm {0.000}\)

\(0.785 \pm {0.000}\)

\(0.791 \pm {0.015}\)

\(0.800 \pm {0.015}\)

\(0.821 \pm {0.010}\)

\(0.809 \pm {0.012}\)

\({\textbf {0.866}} \pm {0.010}\)

 Accuracy (<30 vs. >30) \(\uparrow\)

\(0.641 \pm {0.000}\)

\(0.884 \pm {0.000}\)

\(0.923 \pm {0.017}\)

\(0.930 \pm {0.003}\)

\(0.942 \pm {0.001}\)

\(0.934 \pm {0.003}\)

\({\textbf {0.953}} \pm {0.012}\)

 Accuracy (<60 vs. >60) \(\uparrow\)

\(0.713 \pm {0.000}\)

\(0.898 \pm {0.000}\)

\(0.892 \pm {0.014}\)

\(0.913 \pm {0.009}\)

\(0.932 \pm {0.005}\)

\(0.919 \pm {0.008}\)

\({\textbf {0.954}} \pm {0.007}\)

 Accuracy (<120 vs. >120) \(\uparrow\)

\(0.786 \pm {0.000}\)

\(0.883 \pm {0.000}\)

\(0.865 \pm {0.004}\)

\(0.877 \pm {0.012}\)

\(0.892 \pm {0.008}\)

\(0.884 \pm {0.011}\)

\({\textbf {0.934}} \pm {0.005}\)

 ROC-AUC (<30 vs. >30) \(\uparrow\)

\(0.508 \pm {0.000}\)

\(0.943 \pm {0.000}\)

\(0.946 \pm {0.012}\)

\(0.964 \pm {0.003}\)

\(0.966 \pm {0.004}\)

\(0.965 \pm {0.003}\)

\({\textbf {0.989}} \pm {0.005}\)

 ROC-AUC (<60 vs. >60) \(\uparrow\)

\(0.506 \pm {0.000}\)

\(0.952 \pm {0.000}\)

\(0.928 \pm {0.007}\)

\(0.950 \pm {0.005}\)

\(0.955 \pm {0.004}\)

\(0.952 \pm {0.003}\)

\({\textbf {0.987}} \pm {0.003}\)

 ROC-AUC (<120 vs. >120) \(\uparrow\)

\(0.534 \pm {0.000}\)

\(0.945 \pm {0.000}\)

\(0.896 \pm {0.005}\)

\(0.915 \pm {0.006}\)

\(0.926 \pm {0.004}\)

\(0.923 \pm {0.004}\)

\({\textbf {0.971}} \pm {0.004}\)

 PR-AUC (<30 vs. >30) \(\uparrow\)

\(0.695 \pm {0.000}\)

\(0.961 \pm {0.000}\)

\(0.950 \pm {0.018}\)

\(0.971 \pm {0.005}\)

\(0.956 \pm {0.015}\)

\(0.967 \pm {0.008}\)

\({\textbf {0.994}} \pm {0.003}\)

 PR-AUC (<60 vs. >60) \(\uparrow\)

\(0.752 \pm {0.000}\)

\(0.978 \pm {0.000}\)

\(0.951 \pm {0.012}\)

\(0.972 \pm {0.004}\)

\(0.959 \pm {0.012}\)

\(0.967 \pm {0.006}\)

\({\textbf {0.995}} \pm {0.001}\)

 PR-AUC (<120 vs. >120) \(\uparrow\)

\(0.824 \pm {0.000}\)

\(0.985 \pm {0.000}\)

\(0.946 \pm {0.007}\)

\(0.968 \pm {0.005}\)

\(0.960 \pm {0.004}\)

\(0.963 \pm {0.004}\)

\({\textbf {0.993}} \pm {0.001}\)

 F1 (<30 vs. >30) \(\uparrow\)

\(0.781 \pm {0.000}\)

\(0.918 \pm {0.000}\)

\(0.937 \pm {0.013}\)

\(0.946 \pm {0.002}\)

\(0.955 \pm {0.001}\)

\(0.949 \pm {0.003}\)

\({\textbf {0.966}} \pm {0.010}\)

 F1 (<60 vs. >60) \(\uparrow\)

\(0.833 \pm {0.000}\)

\(0.932 \pm {0.000}\)

\(0.926 \pm {0.007}\)

\(0.941 \pm {0.005}\)

\(0.954 \pm {0.003}\)

\(0.945 \pm {0.004}\)

\({\textbf {0.967}} \pm {0.004}\)

 F1 (<120 vs. >120) \(\uparrow\)

\(0.880 \pm {0.000}\)

\(0.926 \pm {0.000}\)

\(0.913 \pm {0.005}\)

\(0.924 \pm {0.006}\)

\(0.933 \pm {0.004}\)

\(0.929 \pm {0.007}\)

\({\textbf {0.955}} \pm {0.003}\)

 PPV (<30 vs. >30) \(\uparrow\)

\(0.641 \pm {0.000}\)

\(0.871 \pm {0.000}\)

\(0.947 \pm {0.013}\)

\(0.937 \pm {0.009}\)

\(0.949 \pm {0.005}\)

\(0.937 \pm {0.007}\)

\({\textbf {0.961}} \pm {0.009}\)

 PPV (<60 vs. >60) \(\uparrow\)

\(0.713 \pm {0.000}\)

\(0.903 \pm {0.000}\)

\(0.925 \pm {0.017}\)

\(0.925 \pm {0.018}\)

\(0.938 \pm {0.007}\)

\(0.922 \pm {0.010}\)

\({\textbf {0.958}} \pm {0.010}\)

 PPV (<120 vs. >120) \(\uparrow\)

\(0.786 \pm {0.000}\)

\(0.905 \pm {0.000}\)

\(0.892 \pm {0.016}\)

\(0.891 \pm {0.022}\)

\(0.902 \pm {0.016}\)

\(0.890 \pm {0.016}\)

\({\textbf {0.966}} \pm {0.011}\)

 NPV (<30 vs. >30) \(\uparrow\)

\(0.000 \pm {0.000}\)

\(0.932 \pm {0.000}\)

\(0.876 \pm {0.032}\)

\(0.917 \pm {0.013}\)

\(0.928 \pm {0.010}\)

\(0.928 \pm {0.015}\)

\({\textbf {0.949}} \pm {0.025}\)

 NPV (<60 vs. >60) \(\uparrow\)

\(0.000 \pm {0.000}\)

\(0.889 \pm {0.000}\)

\(0.820 \pm {0.015}\)

\(0.886 \pm {0.019}\)

\(0.919 \pm {0.008}\)

\(0.910 \pm {0.015}\)

\({\textbf {0.939}} \pm {0.010}\)

 NPV (<120 vs. >120) \(\uparrow\)

\(0.000 \pm {0.000}\)

\(0.769 \pm {0.000}\)

\(0.713 \pm {0.027}\)

\(0.794 \pm {0.020}\)

\(0.836 \pm {0.032}\)

\({\textbf {0.839}} \pm {0.013}\)

\(0.815 \pm {0.015}\)

 ECE \(\downarrow\)

\(0.032 \pm {0.000}\)

\(0.111 \pm {0.000}\)

\(0.035 \pm {0.008}\)

\(0.085 \pm {0.020}\)

\(0.074 \pm {0.007}\)

\(0.074 \pm {0.007}\)

\({\textbf {0.024}} \pm {0.009}\)

  1. ROC-AUC stands for area under the receiver operating characteristic curve, PR-AUC stands for area under the precision-recall curve, ECE stands for expected calibration error. Note that in 4-way classification, random chance corresponds to an accuracy of 0.25. XGBoost and UNOS have zero standard deviation because there is no stochasticity in the training procedure.