Fig. 2

Our attention-based NAS predictor mainly consists of an encoder and a regressor. We first encode the information of path into continuous representation, followed by 3 Transformer encoder layers, and the regressor uses the output feature of Transformer encoder layers to derive the final prediction.