Fig. 5: Workflow of data processing and model building.
From: DNA methylation-based epigenetic signatures predict somatic genomic alterations in gliomas

A Data processing procedure for binary genetic alteration prediction with infiltrating gliomas. Step by step details are in Supplementary Methods section 1.1.1. B Data processing procedure for gene expression subtype prediction with GBM samples. Step by step details are in “Methods”. C Model building procedure for binary genetic alteration prediction. Samples were randomly split into three sets: training set, development set, and test set. The training set was used for variable selection and to build candidate models, then the candidate models were applied to the development set. Based on the prediction accuracy of the development set, the final model was selected. The final model was applied to the test set for model performance evaluation. D Model building procedures for gene expression subtype prediction. GBM samples with DNA methylation and gene expression data available were included. DNA methylation probes were overlapped between the HM27K and HM450K platforms. Samples were split into training, development, and test set. Machine learning algorithms were evaluated on the training set and candidate algorithms were picked out. The development set was used to determine the final algorithm. The final model was built using the training and development sets and validated in the test set.