Fig. 2: Model training and cross validation. | Nature Communications

Fig. 2: Model training and cross validation.

From: Improved maximum growth rate prediction from microbial genomes by integrating phylogenetic information

Fig. 2

a The phylogenetic distance between the training and test datasets is defined as the minimum average phylogenetic distance (\({D}_{p}\)) between species in the test set and those in the training set. This distance decreases as the number of clades increases when the tree is cut at time points \({D}_{c}\) closer to the present. b \({D}_{c}\) represents the cutting time point at which the phylogenetic tree is divided into \(n\) clades. For cross-validation, we iteratively use each clade as the test dataset while treating the remaining clades as the training dataset. c, d Phylogenetic trees cut at two different time points, resulting in 10 and 50 clades, respectively, are illustrated to demonstrate blocked cross-validation.

Back to article page