Fig. 1

Compression of terminal branches of sequence-based phylogenetic trees detected through analysis of gene content decay curves. a If tree distances are proportional to the true evolutionary time, the fraction of genes shared by a subset of genomes will decay with the total length of the subtree, and the decay curves will be the same regardless of the number of genomes in the subset. For illustration purposes, three subsets of 2 genomes are highlighted in brown, and two subsets of 4 genomes are highlighted in green. b Homologous recombination between pairs of closely related genomes erases recent sequence divergence which results in an underestimation of the evolutionary times associated with terminal tree branches. Such underestimation leads to gene content decay curves that depend on the number of genomes included in the subset. Accordingly, the decay curve of subsets of 4 genomes is different from the decay curve of subsets of 2 genomes. c The gene content decay curves of the Bacillus thuringiensis/cereus/anthracis group are compatible with a scenario of recombination-driven shortening of the terminal branches of the phylogenetic (substitution-based) tree. Based on the tree from Fig. 2a. d If the recombination model is used to correct for unobserved variation (fit in Fig. 2c, left panel), overlapping decay curves are obtained. Source data are provided as a Source Data file.