Fig. 7: Architecture of assist action prediction.

Our approach to generating assist action sequences mainly uses ResNet18 and transformer models. This model predicts sequences a few milliseconds ahead of sensor inputs that contain historical information from a few milliseconds before the current time. Predicted assist action sequence is averaged by the temporal ensemble and used for the final control command.