Fig. 2: Comparative performance of prediction and survival models across data categories, disease types, and prevalence levels.
From: UKB-MDRMF: a multi-disease risk and multimorbidity framework based on UK biobank data

Model performance in disease prediction (a–c) and risk assessment (d–f) on the test set. a Performance of disease prediction models across different data categories. The prediction process initiates with basic information and gradually integrates additional categories. Seven machine learning and deep learning methods are compared. b The box plot illustrates model performance on the testing set with different numbers of positive patients in the training set (horizontal axis). c Disease prediction performance of the FCNN model using six data categories. Individual FCNNs were trained for each disease type and compared with FCNNs trained collectively for all Phecodes. The numerical values above each box plot represent the p values from two-sided Wilcoxon tests in each disease type, and no multiple comparison correction was applied. d Performance of risk assessment models (survival models). Testing set C-index comparisons across four models are used to assess risk assessment model performance, considering various input data categories. e Model performance on the testing set under different numbers of positive patients in the training set (horizontal axis). f Risk assessment performance of the DeepSurv model across 21 disease types. Similar to (c), the numerical values above each box plot represent the p values from two-sided Wilcoxon tests in each disease type, and no multiple comparison correction was applied. Box plots depict the median (central line), interquartile range (box), and whiskers extending to the minimum and maximum values, excluding outliers—defined as points beyond 1.5× the interquartile range from the first and third quartiles.