Table 1 Features used in the machine-learning framework.

From: Harnessing machine learning to guide phylogenetic-tree search algorithms

# Feature

Feature name

Details

Represented action

Tree considered

1

Total branch lengths

The sum of branch lengths in the starting tree

Shared for pruning and regrafting

Initial tree

(a in Fig. 1)

2

Longest branch

The length of the longest branch in the starting tree

3–4

Branch length

The length of the branch that was being pruned or regrafted

Both pruning and regrafting

5

Topology distance from the pruned node

The number of branches in the path between the regrafting and the pruning branches, not including these branches

Regrafting only

6

Branch length distance from the pruned node

The sum of branches in the path between the regrafting and the pruning branches, not including these branches

7

New branch length

The approximated length of the new branch formed due to pruning (see Supplementary Note 1 for feature extraction details)

8–11

Number of species

The number of leaves in the four subtrees

Both pruning and regrafting

Each of the four

subtrees (b, c, c1, c2 in Fig. 1)

12–15

Total branch lengths

The sum of branch lengths in the four subtrees

16–19

Longest branch

The length of the longest branch in the four subtrees

  1. The table lists the 19 features on which the machine-learning algorithm is based, extracted for each data point. Features 1-7 are extracted from the starting tree, while the remaining features are extracted from the four subtrees in Fig. 1. Features 1 and 2 are not affected by SPR moves.