Fig. 1: Overview of the IMPACT model and analyses.
From: Interpretable machine learning prediction of all-cause mortality

a We use the NHANES (1999-2014) dataset, which includes 151 variables and 47,261 samples. The variables can be categorized into four groups: demographics, examination, laboratory and questionnaire. We train the model using different follow-up times and different age groups. b IMPACT combines tree-based models with an explainable AI method. Specifically, IMPACT (1) trains tree-based models for mortality prediction using the NHANES dataset, and (2) uses TreeExplainer to provide local explanations for our models. c We illustrate the advantages of interpretable tree-based models compared to traditional linear models in epidemiological studies. d We further analyze all mortality models and demonstrate the effectiveness of IMPACT at verifying existing findings, identifying new discoveries, verifying reference intervals, obtaining individualized explanations, and comparing models using different follow-up times and age groups. e We propose a supervised distance to help us explore feature redundancy. We further develop a supervised distances-based feature selection method that helps us select predictive and less-redundant features. f We build mortality risk scores that are applicable to professional and non-professional individuals with different cost-vs-accuracy tradeoffs. The individualized explanations of IMPACT show the impact of each risk factor for the overall risk score.