Figure 4 | Scientific Reports

Figure 4

From: Similar genomic patterns of clinical infective endocarditis and oral isolates of Streptococcus sanguinis and Streptococcus gordonii

Figure 4

Machine learning using Random Forest modelling based on the none redundant count of individual protein domains in each genome. (a) ROC curves for the species model using LOU CV (red), the clinical vs. oral model using LOU CV (blue) and 5-fold CV (purple). (b) Histogram of the prediction probabilities for the LOU CV  of the species and clinical vs. oral model. (c) Boxplots of the AUC determined from 100 runs using LOU (blue) and 5-fold CV (purple) on the clinical vs. oral model. The boxplots also show the values when random labelling is applied. (d) Boxplots of MCC determined from 100 runs using LOU (blue) and 5-fold CV (purple) on the clinical vs. oral model. The boxplots also show the values when random labelling is applied. Boxplots shows the distribution of the data by illustrating the minimum and maximum values, as well as the first and third quartile (the box) with the median highlighted with white. Outliers are illustrated as circles outside of the plot.

Back to article page