Figure 2: Bayesian posterior predictive simulations for assessing model fit to the key enzymes implicated in the ménage à trois. | Nature Communications

Figure 2: Bayesian posterior predictive simulations for assessing model fit to the key enzymes implicated in the ménage à trois.

From: Plastid establishment did not require a chlamydial partner

Figure 2: Bayesian posterior predictive simulations for assessing model fit to the key enzymes implicated in the ménage à trois.

Posterior predictive simulations25 are a technique for assessing model adequacy with respect to key properties of the sequence alignment, which has an impact on phylogenetic inference. Here we compared the ability of the LG and CAT+GTR models to adequately capture the site-specific biochemical constraints experienced by the genes implicated in the ‘ménage à trois’ hypothesis. In sequence alignments, these constraints are manifest in the reduced number of amino acids observed in any one alignment column, which is usually much less than the theoretical maximum of 20. (a) The mean observed number of different amino acids per site in the GlgC alignment was 7.78. Data simulated under the LG model showed mean per-site diversity values (dot) much higher than the real data, suggesting this model did a poor job of modelling site-specific constraints. In contrast, the range (bars) of site-specific diversities predicted under the CAT+GTR model was comparable to that of the real data (P=0.33), suggesting adequate model fit with respect to this important metric. (bd) The results for our analyses of GlgP, GlgX and UhpC were similar, with the CAT+GTR model better able to capture site-specific constraints, although neither model produced realistic predictions for the GlgP alignment. (e) Analyses of three different GlgA alignments under the CAT+GTR model. The original data set contained a large and highly diverse outgroup, leading to a high per-site diversity and poor model fit. An outgroup consisting only of the sequences most closely related to the relevant GlgA clade reduced per-site diversity and enabled adequate model fit; Dayhoff recoding of the original alignment also resulted in improved model fit relative to the unrecoded data. In both analyses in which adequate model fit was achieved, we did not recover a specific Chlamydiae/Archaeplastida clade, as discussed in the main text. Error bars represent s.e. and P-values were calculated using the ‘ppred’ and ‘readpb_mpi’ programmes in the PhyloBayes and PhyloBayes-MPI packages, respectively.

Back to article page