Fig. 4: Generated sequences recapitulate biologically relevant cleavage patterns across MMP functional classes.

A IceLogos for individual MMPs, grouped by relevant subclasses, are shown for the top 100 Z-scoring sequences in each set: CleaveNet-generated (green boxes) and mRNA-display training (blue boxes). The IceLogos are normalized by amino acid frequencies from the mRNA-display set. B Histograms of 3-mer frequencies shared across the top 100 Z-scoring MMP13 sequences in each of the CleaveNet-generated (top), mRNA-display training (middle), and site-independent baseline (bottom) sets. The frequencies of the top-five occurring 3-mers in each set are summarized as tables on the right, with shared 3-mers between the CleaveNet-generated and mRNA-display sets bolded. C Heat map, colored by CleaveNet-predicted cleavage score and annotated by hierarchical clustering, of the 25-top Z-scoring substrates per MMP, including the similarity cutoff and subsequent clustering of proteases into 5 groups with shared phylogeny.