Fig. 1: PAN-GO annotation process illustrated using the UAE family.
From: A compendium of human gene functions derived from evolutionary modelling

a, View of the PAINT software tool (Methods) showing the process of creating a function evolution model for the human ATG7 gene (top) that integrates function information from related genes. The phylogenetic tree (left) shows the evolutionary relationships between genes found in different organisms. Tree nodes represent speciation events (circles) and gene duplication events (squares); extant genes are labelled with the UniProt five-letter species code51 and gene symbol when available. For each extant gene, the sparse experimental function annotations are shown on the right (green squares, each column is a distinct GO class). Information in the gene tree and primary GO annotations (green callouts) is used to construct a parsimonious model for function evolution (bottom callout, dark blue), in which the selected functional characteristics first arose in an ancestral, ATG7-like gene. These functions were then transmitted by inheritance to the human ATG7 gene (dashed yellow arrow). b, The PAN-GO evolutionary model and PAN-GO MF annotations for all human genes in the UAE family. Gene duplication events and functional evolution have resulted in ten human genes that serve as activating enzymes (AEs) with different functions at the molecular (shown here), cellular and organism levels (see full model at https://pantree.org/tree/family.jsp?accession=PTHR10953). The PAN-GO function evolution model is shown by circles indicating gains in function, with crosses indicating losses of function and orange arrows indicating inheritance of ancestral function. The LCA of the family had ‘sulfotransferase activity’ (gain labelled 1), which was passed on to the human MOCS3 gene (arrow leading from 1), but this function was modified in other descendants (losses and gains labelled 2–11) to create the canonical UAEs of varying specificities for different UBLs. For example, human UBA5 is specific for the UBL called UFM1. Branch lengths represent the numbers of amino-acid substitutions per site. The tree was drawn using the iToL tool52.