Figure 2

Assessment of CryptoNet.
(A) A performance curve shows that CryptoNet outperforms all individual networks associated with each data type. The x-axis represents the percentage coverage of the C. neoformans coding genome and the y-axis represents the percentage of gene pairs that share KEGG pathway annotations. Each data point represents a bin of 1,000 co-functional links ordered by the log likelihood score (LLS). Data sets are named as XX-YY, where XX represents the origin species of data (CN, C. neoformans; HS, H. sapiens; SC, S. cerevisiae) and YY represents the data type (CC, co-citation; CX, co-expression; DC, domain co-occurrence; GN, gene neighborhood; GT, genetic interaction; HT, high throughput protein-protein interactions; LC, literature-curated protein-protein interactions; PG, phylogenetic profile similarity; TS, protein-protein interactions inferred from the tertiary structure). (B) The Venn diagram illustrates the overlap among three species-associated co-functional links in CryptoNet. The number of genes and links of the networks for each compartment of the diagram are also marked as ‘n:’ followed by the node (i.e., gene) count and ‘e:’ followed by the edge (i.e., link) count. The accuracy of each network, which is the percentage of correctly retrieved gene pairs that share KEGG pathway annotations, is also indicated in parentheses. (C) In a comparison of the AUC scores (i.e., network prediction power) between CryptoNet and a C. neoformans gene network derived from YeastNet for three virulence phenotypes, CryptoNet exhibits substantially improved predictive powers for all three virulence phenotypes. (D) For 162 UniProtGOA biological process terms (only terms with more than five annotated genes were considered), CryptoNet shows significantly higher range of AUC scores than that of YeastNet-associalogs (p-value < 2.2 × 10−16, Wilcoxon signed rank test), suggesting that the higher prediction power of CryptoNet can be generalized to many biological processes.