Extended Data Fig. 8: Edit distance from wild-type sequence as a predictive model (UBE4B U-box domain). | Nature Biotechnology

Extended Data Fig. 8: Edit distance from wild-type sequence as a predictive model (UBE4B U-box domain).

From: Learning protein fitness models from evolutionary and assay-labeled data

Extended Data Fig. 8

Analogous to Fig. 6, but on the UBE4B U-box domain data set. We compared the performance of non-augmented evolutionary density models to two predictive models that use only the edit distance of a sequence to the wild type. In one version, the edit distance is defined as the number of mutations away from the wild type. In the other version, we used BLOSUM62 to compute the distance from wild type, which thus accounts not only for the number of mutations, but also the type of mutation. Each dot represents a UBE4B U-box domain sequence, with darker colors indicating larger distances from the wild-type.

Back to article page