Figure 4: Prediction of gender from genome-wide DNA methylation patterns.

Left: Using statistical learning, gender (red: female, black: male) of test individuals (triangles) was assigned accurately on the basis of a training set (circles). The two-dimensional projection of the DNA methylome data shows group centroids of the predictors 1 and 2 for the discrimination of male and female samples on the x- and y axes. Right: To assess robustness and stability of the gender classification, the statistical learning algorithm (PLR) was run multiple times (5 fold cross validation, 50 iterations). Genomic regions whose methylation signatures were selected for modelling are visualized in the graphical representation of the genome (19 chromosomes of the poplar genome, all scaffolds were concatenated into a single unit). The shade of the color indicates the frequency with which features were repeatedly included in the PLR model for gender prediction (F4 > 80%: F3: ≤80%, F2: ≤20%, F1: ≤10% of all meta-features). The darker the marker, the greater the utility in prediction. The only marker in black is the one marking the region of the genome coincident with the PbRR9 gene. This plot shows the results for CG methylation: CHG and CHH contexts show nearly identical results.