Figure 4 | Scientific Reports

Figure 4

From: Patterns of diverse gene functions in genomic neighborhoods predict gene function and phenotype

Figure 4

Semantically distant functions in gene neighborhoods are important for accurate inference of gene function. Bars show accuracy (as AUPRC score, measured in crossvalidation) for predicting the eleven representative gene functions, using various types of neighborhood function profiles (NFP) that are listed in the legend. The “Full profile” are the full NFP of the ‘biological process’ GO graph, while the “CL/CLPar”, “Med/Par” and “Dist/Par” represent the partial NFP consisting only of close, medium-distance and distant functions, respectively (the “/Par” denotes that parent GO terms of the target functions were removed). The “CLPar” partial profiles contain only the selected function and its semantically close parents, meaning that “CLPar” is an implementation of the standard approaches that transfer functions across neighborhoods. In many cases, the close (but non-self), medium-distance and distant functions are more predictive than CLPar, and the complete profile is the most predictive. Serving as a control, the removal of the significantly enriched functions (labeled as “/Enr” in the legend) from the partial NFP strongly reduces accuracy, either for the close functions (CL), the medium-distance (Med) or the distant functions (Dist). Bars are average AUPRC scores of 200 runs of cross-validation of the Random Forest classifier, whereas error bars show standard deviation across the 200 runs.

Back to article page