Fig. 1: Overview of FSFP. | Nature Communications

Fig. 1: Overview of FSFP.

From: Enhancing efficiency of protein language models with minimal wet-lab data through few-shot learning

Fig. 1

FSFP includes three stages: building auxiliary tasks for meta-learning, meta-training PLMs on the auxiliary tasks, and transferring PLMs to the target task. a Based on the wild-type sequence or structure of the target protein, the labeled mutant datasets of two similar proteins are retrieved to be the first two tasks. In addition, an MSA-based method is used to estimate the variant effects of the candidate mutants as pseudo labels for the third task. b MAML algorithm is used to meta-train the PLM on the built tasks and eventually optimizes it into a meta-learner that provides good parameter initialization for the target task (right). To prevent PLMs from overfitting on small training data, LoRA is applied to constrain model updates to a limited number of parameters (left). c The meta-trained model is then transferred to the target few-shot learning task. FSFP treats fitness prediction as a ranking problem, and leverages the LTR technique for both transfer learning and meta-training. It trains PLMs to rank the fitness by computing a listwise ranking loss between their predictions and the ground truth permutation.

Back to article page