Table 1 (A) The estimated completeness for the 10 Accumulibacter genomes in this study. (B) The expected probability of observing pattern of presence and absence across the 10 Accumulibacter genome set

From: Ancestral genome reconstruction identifies the evolutionary basis for trait acquisition in polyphosphate accumulating bacteria

(A)

Genome

AW09

AW06

CAPSK01

AW08

AW07

AW12

CAP2UW1

CAP1UW1

AW11

AW10

Completeness

0.92

0.92

0.87

0.91

0.89

0.88

1

0.85

0.89

0.88

(B)

Patterns

Calculation

Expected probability

Sum

PPPPPPPPPP

0.92 × 0.92 × 0.87 × 0.91 × 0.89 × 0.88 × 1 × 0.85 × 0.89 × 0.88

0.3494

0.349

APPPPPPPPP

0.08 × 0.92 × 0.87 × 0.91 × 0.89 × 0.88 × 1 × 0.85 × 0.89 × 0.88

0.0304

 

PAPPPPPPPP

0.92 × 0.08 × 0.87 × 0.91 × 0.89 × 0.88 × 1 × 0.85 × 0.89 × 0.88

0.0304

 

PPAPPPPPPP

0.92 × 0.92 × 0.13 × 0.91 × 0.89 × 0.88 × 1 × 0.85 × 0.89 × 0.88

0.0522

 

PPPAPPPPPP

0.92 × 0.92 × 0.87 × 0.09 × 0.89 × 0.88 × 1 × 0.85 × 0.89 × 0.88

0.0346

 

PPPPAPPPPP

0.92 × 0.92 × 0.87 × 0.91 × 0.11 × 0.88 × 1 × 0.85 × 0.89 × 0.88

0.0432

0.391

PPPPPAPPPP

0.92 × 0.92 × 0.87 × 0.91 × 0.89 × 0.12 × 1 × 0.85 × 0.89 × 0.88

0.0476

 

PPPPPPAPPP

0.92 × 0.92 × 0.87 × 0.91 × 0.89 × 0.88 × 0 × 0.85 × 0.89 × 0.88

0.0000

 

PPPPPPPAPP

0.92 × 0.92 × 0.87 × 0.91 × 0.89 × 0.88 × 1 × 0.15 × 0.89 × 0.88

0.0617

 

PPPPPPPPAP

0.92 × 0.92 × 0.87 × 0.91 × 0.89 × 0.88 × 1 × 0.85 × 0.11 × 0.88

0.0432

 

PPPPPPPPPA

0.92 × 0.92 × 0.87 × 0.91 × 0.89 × 0.88 × 1 × 0.85 × 0.89 × 0.12

0.0476

 
  1. Given the completeness estimates, it is possible to calculate the expected probability of observing pattern of presence and absence across the 10 Accumulibacter genome set. For example, here we present 11 patterns of presence and absences and demonstrate how the probability of each pattern was calculated. The first pattern represents a gene that is present in all genomes. The 10 patterns below represent the possibilities for a single absence. Presence is indicated by a 'P', and absence is indicated by an 'A' or in bold for the calculation. For each pattern, if a gene family was present in a genome, the product of the completeness estimates for those genome was calculated. This was then multiplied by the product of 1 minus the completeness estimate of genomes in which the gene family was absent. The sum of these probabilities within a particular number of genomes may then be calculated. Presence and absence is binomial, therefore, there are 210 (1024) possible patterns.