Fig. 1: The Phrank information content based score.

a The conditional probability P(ϕ|pa(ϕ)) for a single node in the phenotype ontology. As these are log transformed, negated, and then summed in b via paths to the root node, one can see that nodes annotated with fewer genes will ultimately contribute more to the final score (as the match they represent is less expected by chance; see Methods). b The Phrank match score of any two sets of phenotypes, ΦA and ΦB, is defined as the information content of the intersection of all ancestral nodes of both sets, anc(ΦA) and anc(ΦB), and can be computed as shown in step 3 (see Supplemental Note 1 for the derivation). We define the Phrank score of any disease to any particular patient as the Phrank score of the set of phenotypes associated with the disease and the set of phenotypes observed in the patient (see Methods). DAG directed acyclic graph