Table 1 Summary of encodings of protein sequences, models, and acquisition functions tested in this work

Encoding	Dimension per Residue		Description
AAIndex	4		Continuous fixed amino acid descriptors
Georgiev⁷¹	19		Continuous fixed amino acid descriptors
Onehot	20		Categorical (which amino acid)
ESM2³³	1280		Learned embedding from a protein language model (ESM2 with 650 million parameters)
Model	Bayesian?	Deep Learning?	Description
Boosting Ensemble	N	N	An ensemble of 5 boosting models
Gaussian Process (GP)	Y	N	A collection of continuous functions described by a posterior
DNN Ensemble	N	Y	An ensemble of 5 multilayer perceptrons (deep neural networks, DNNs)
Deep Kernel Learning (DKL)²⁹	Y	Y	A GP on the last layer of a deep neural network
Acquisition Function	Deterministic?		Description
Greedy	Y		Acquires the maximum value of the mean from the posterior
Upper Confidence Bound (UCB)	Y		Acquires the maximum value of a certain confidence interval from the posterior (tuned by a hyperparameter)
Thompson Sampling (TS)	N		Acquires the maximum value of a random function sampled from the posterior

Quick links

Search