Fig. 1

Design for a functional oligopeptide screening strategy using deep learning. a Workflow of the oligopeptide screening procedure. Briefly, the intrinsically disordered regions (IDRs) of proteins were employed to build a functional peptide motif dataset. b Architecture of the deep learning-based oligopeptide generation model: the model comprises two submodels, namely, N-grams and Monte Carlo simulation. The ten amino acids (AAs) with the highest frequency are used as the “initial AAs”. N-grams are used to infer the conditional probability of the context words (residues) of the oligopeptides awaiting extension, and the Monte Carlo simulation then generates the extended oligopeptides according to inferred probabilities. The overall model represents a repeating cycle of this basic process, with one AA extended in each cycle. The new oligopeptides obtained in each cycle then serve as oligopeptides awaiting extension in the next cycle until the end condition of the cycle is reached (here, until the oligopeptides have been extended to a total of 10 residues). c Protein candidates were retrieved from UniProt based on their reported involvement in bone formation by using four osteogenic GO terms: “ossification”, “osteogenesis”, “osteoblast development”, and “osteoblast differentiation”. A total of 171 protein candidates were thus collected. d Display of amino acid frequency distribution in the intrinsically disordered regions or full-length sequences of 171 ossification-annotated proteins