Figure 1 | Scientific Reports

Figure 1

From: Protein embeddings predict binding residues in disordered regions

Figure 1

Workflow of IDBindT5. To predict whether or not a residue in a disordered region (IDPR) is binding, IDBindT5 takes numerical representations of a protein sequence (embeddings) generated by the protein language model (pLM) ProtT518 accompanied by either predicted or experimental per-residue annotation of (dis-)order as input. IDBindT5 avoids dramatic explosion of free parameters and overfitting through a relatively simple feedforward network (FNN, single hidden layer). The input consists of a binary vector for disorder/order (either from prediction or annotation; dimension \({\text{L}} \times 1\), with L being the number of residues in a given protein), and the generated embeddings are of shape \({\text{L}} \times {\text{m}}\) (m depends on specific pLM, for ProtT5 \({\text{m}} = 1024\)). The output produced by IDBindT5 is of shape \({\text{L}} \times 1\), representing a per-residue binding prediction.

Back to article page