Fig. 3: CleaveNet generates biophysically plausible MMP substrates. | Nature Communications

Fig. 3: CleaveNet generates biophysically plausible MMP substrates.

From: Deep learning guided design of protease substrates

Fig. 3: CleaveNet generates biophysically plausible MMP substrates.The alternative text for this image may have been generated using AI.

A Unconditional generation from the CleaveNet Generator. Baseline sequences are produced via position-wise sampling from the amino acid distribution of peptides in the mRNA display train set. B IceLogo visualization of position-specific amino acid composition of CleaveNet unconditional generations (n = 4000), with the Kullback–Leibler (KL) divergence between the generated and mRNA-display test distributions denoted (normalized by natural amino acid frequencies). Position-wise distributions are included in Fig. 13. C Probability density functions of in silico-computed biophysical properties of generated (green, n = 4000) and mRNA-display test (blue, n = 3717) sequences. D CleaveNet was used to generate 20,000 candidate substrates and score their cleavage profiles across 18 MMPs. E Distributions of predicted cleavage scores, across 18 MMPs, for CleaveNet unconditional generations (green, n = 19,905), site-independent baseline sequences (yellow, n = 20,000), and mRNA-display sequences (blue, n = 18,583). Box plots show the median (center line), interquartile range (box bounds, 25th to 75th percentile), whiskers extending to the most extreme points within 1.5 times interquartile range (or to the minimum or maximum, if none exceed), and outliers plotted as points beyond the whiskers. F Cumulative distribution of the frequency of unique k-mers present in CleaveNet generations (green, n = 19,905), site-independent baseline sequences (yellow, n = 20,000), and mRNA-display sequences (blue, n = 18,583) as a function of the total number of k-mers considered. The top-occurring k-mers in each set are provided (right).

Back to article page