Fig. 2: Performance comparison of phylogenetic tree updating methods. | Nature Communications

Fig. 2: Performance comparison of phylogenetic tree updating methods.

From: PhyloTune: An efficient method to accelerate phylogenetic updates using a pretrained DNA language model

Fig. 2: Performance comparison of phylogenetic tree updating methods.

a Schematic overview of tree reconstruction strategies. The original tree is built from sequences simulated on the ground-truth tree (gt) with one sequence removed as the new sequence. Updates are performed using: all sequences (complete tree), full-length sequences of a target subtree (full-length tree), or high-attention regions of the subtree (high-attention region tree). Using the example of the addition of species 4, the updated parts of the three trees using RAxML are highlighted in blue. b, c Robinson-Foulds (RF) distance and construction time compare updated trees to gt. Each box plot (n=5 independent experiments) shows the median, interquartile range (25th to 75th percentile), and whiskers to minima/maxima within 1.5 times IQR. d Example of updating the phylogenetic tree using PhyloTune. The original tree consisted of 677 species of 20 orders from Embryophyta. The tree was built using RAxML, with organisms colored based on order. The scale represents the normalized fraction of total branch length. The rugged bars at the outer circle represent the normalized length of input DNA sequences. (i) Update of out-of-distribution (OOD) sequences: the three newly added sequences belong to the order Fabales, but do not belong to any families or genera in the original tree, so only the subtree of Fabales is updated. (ii) Update of in-distribution (ID) sequences: the two newly added sequences belong to the genus Primulina, so the subtree of Primulina is updated. e Time comparison for the example tree. Blue and orange curves show subtree reconstruction times using full-length sequences and high-attention regions (one-third of the full length), respectively. The red dotted line indicates the time needed to update the tree using all sequences (about 20.1 h). Source data are provided as a Source Data file.

Back to article page