Extended Data Fig. 3: CherryML matches EM accuracy on diverse datasets. | Nature Methods

Extended Data Fig. 3: CherryML matches EM accuracy on diverse datasets.

From: CherryML: scalable maximum likelihood estimation of phylogenetic models

Extended Data Fig. 3: CherryML matches EM accuracy on diverse datasets.

On diverse datasets from the QMaker paper9, CherryML matches the accuracy of the EM method. The end-to-end runtime of each approach (including tree estimation) is shown. The runtime of the CherryML optimizer was in all cases negligible (less than 5 minutes), therefore end-to-end runtime was dominated by phylogeny reconstruction with FastTree, which took a few CPU hours depending on the dataset. In contrast, for the EM approach, the EM optimizer dominated runtime, leading to an overall slowdown of 5-20 fold in end-to-end runtime compared to the CherryML approach. Since tree estimation is embarrassingly parallel, end-to-end estimation with the CherryML method using 32 CPU cores takes only a few minutes on all of these datasets. The diversity of the datasets means that LG is no longer the best fit rate matrix compared to JTT and WAG. In fact, JTT is preferred in three of these datasets. This highlights the need to estimate new rate matrices for improved phylogenetic inference in specific applications9. Training dataset sizes are included for reference.

Back to article page