Figure 4 | Scientific Reports

Figure 4

From: Comparative performances of machine learning methods for classifying Crohn Disease patients using genome-wide genotyping data

Figure 4

Comparison of the best features selected from different linear and non-linear models and those associated to CD in the GWAS meta-analysis by Jostins et al.3 Panel A shows the importance and the position on the genome of the best 140 (left) and 800 (right) SNPs, selected by logistic regression with Lasso regularization and weight criterion (LR weight), LightGBM with gain criterion (LGBM gain), a dense residual neural network with 3 hidden layers with permutation feature importance criterion (ResDN3 PFI), and of those reported by Jostins et al.(GWAS). The importance of the SNPs is given by the criteria discussed in the main text, while for GWAS we show the \(|\mathrm{log}({\rm{OR}})|\). Dotted vertical lines indicate the separation between chromosomes. Panel B shows the number of common loci (as defined in the main text) between the different models with different criteria for feature selection and GWAS analysis, as a function of the first x selected best loci. The random model was built using randomly weighted SNPs. Solid and dotted lines represent the mean values over all the subsets, while shaded regions represent the 1 standard deviation confidence intervals. The vertical dotted line indicates the 140 limit for GWAS, while the diagonal shows the perfect agreement baseline.

Back to article page