Fig. 1: Data curation and the algorithm pipeline. | Nature Communications

Fig. 1: Data curation and the algorithm pipeline.

From: Identification of predictive subphenotypes for clinical outcomes using real world data and machine learning

Fig. 1

a Construction of the full cohort and development and validation sub-cohorts. b Data preprocessing by extracting the feature vectors of patients from the EHR data. c Model deployment and derivation of subphenotypes in the development cohort. Our proposed model extracted efficient patient representations, clustered the patients into subphenotypes, and predicted survival distributions. d Model evaluation and reproduction of the subphenotypes on the validation cohort. Further analyses were conducted to interpret subphenotypes in both development and validation cohorts. ICI immune checkpoint inhibitor, NSCLC non-small cell lung cancer, GNN graph neural network. The microscope icon was made by iconnut, the health chart by Awicon, the medicine, syringe, and standing-up man by Freepik, and the neural network by pojok d, all obtained from Flaticon (www.flaticon.com) under appropriate licenses.

Back to article page