Fig. 1: Overview of our mixture of experts framework.

a First, separate machine learning models, each consisting of a feature extractor and a classification or regression head, are trained on separate learning tasks. b Next, the pre-trained experts are adapted to a downstream learning task along with a newly initialized head.