Figure 5
From: Patterns of diverse gene functions in genomic neighborhoods predict gene function and phenotype

Predicting phenotypes of individuals from the effects of structural variants on the composition of gene neighborhoods. (a) Distribution of predictive models’ AUC scores (top-left) and AUPRC scores (top-right) across 151 Escherichia coli phenotypes, estimated in crossvalidation. The baseline classifier predicts phenotype from the scores based on gene disruption by small variants. The PCA-NFP classifier predicts from neighborhood function profiles, which are a representation of how structural variants affect genomic neighborhoods. The Ensemble classifier is a combination of both sources of data (see Supplementary 1, Section S3.11). (b) The cross-validation receiver operating characteristic (ROC) curv of a baseline method based on small genetic variants and gene content (green) and the ensemble method (blue) that also includes copy number neutral structural variants, shown for two example phenotypes. Additional examples are in Supplementary 1, Fig. S43.