Fig. 2: Low-dimensional representation of the training data.
From: Machine learning designs new GCGR/GLP-1R dual agonists with enhanced biological potency

a, Projection of the 125 training set sequences (grey) and 19 independent sequences from the literature (blue)15 onto the first (PC1, x axis) and second (PC2, y axis) principal components of the covariance matrix of the training set. All possible single-site mutants of human glucagon (cyan) and human GLP-1 (purple) are included as a reference to facilitate evaluation of sequence diversity across the training set samples. b, Projected training set sequences coloured by their potency category, following the colour scheme introduced in Fig. 1b.