Extended Data Fig. 7: Information on LECA’s reconstructed gene order.
From: EdgeHOG: a method for fine-grained ancestral gene order inference at large scale

a. Guide species tree for LECA reconstruction. This corresponds to the species tree of the OMA database version Nov2022, essentially a pruned version of the NCBI taxonomy tree, containing only the genomes present in OMA. b. The LECA node is a polytomy with 9 children nodes and has 2 outgroups (Archaea and Bacteria). c. Impact of gene duplication on biological process GO term enrichment of LECA’s contigs. The distribution on the left shows the number of modeled ancestral in-paralogs in contigs with and without biological process GO term enrichment. Contigs without enrichment are indicated in red (n = 816), while those with enrichment are shown in blue (n = 193). There is no significant difference in the number of modeled ancestral in-paralogs between enriched and non-enriched contigs (Mann-Whitney test, alternative = ‘greater’, p-value = 0.99). The distribution on the right displays the average number of descendant in-paralogs in extant species for each HOG/ancestral gene within a contig (with a zoom in in the region with the highest density of points), grouped by whether the ancestral gene’s GO terms contribute to the enrichment of the contig’s GO terms (n = 349) or not (n = 1548). HOGs associated with GO term enrichment tend to exhibit a slightly higher number of descendant in-paralogs (Mann-Whitney test, alternative = ‘greater’, p-value = 2.2e-16; median = 1.22 for enriched HOGs, median = 1.13 for others). d. Relationship between the degree (number of neighbors) of HOGs in LECA’s contigs and their Completeness Score. The plot gives the distribution of Completeness Scores for HOGs at the LECA level in the current OMA release that are included within reconstructed contigs (degree=2; n = 1139), that are terminal genes in contigs (degree=1; n = 1848), that are singletons and thus excluded from the ancestral genome (degree=0; n = 37773). The vertical line in each ridge plot gives the median of the distribution for each degree level. It shows that the most reliable HOGs are included within contigs and that singletons typically correspond to low quality HOGs.