Extended Data Fig. 5: Subtype deep branching and comparison of FBP and TBE using medium-sized HIV datasets.
From: Renewing Felsenstein’s phylogenetic bootstrap in the era of big data

As the taxa were randomly drawn from the full dataset, the supports and findings show some fluctuations. a, b, Trees obtained with two of the medium-sized datasets; branches with FBP > 70%: yellow dots; branches with TBE > 70%: blue dots; subtype clades: red stars, filled if support > 70% (see Methods and Fig. 1 legend for further details). c, Deep branching of the subtypes19 and supports obtained on the full dataset (see also Fig. 1). Rare subtypes (H, J and K) are absent in the medium-sized datasets, and the subtype clades are almost perfectly recovered (only one incorrect taxon in A clade for both trees). FBP supports are higher when using medium-sized datasets than when using the full dataset (for example, 58% and 99% for subtype B, versus 3% in Fig. 1). However, some subtype clades (for example, D) have moderate FBP support, though the clade matches the subtype perfectly. When using TBE, all subtype supports are higher than 95%. The deep branching is the same for all full and medium-sized datasets, and is identical to that found in a previous study19, but is not supported by FBP, whereas TBE is larger than 70% for every branch (or path in Fig. 1). Again, the Indian and East African sub-epidemics of subtype C are supported by TBE, but not by FBP.