Fig. 4: Comparison of performance among different model components.

a F-score of the average of right and left leg motions when the estimations were treated as a binary classification. b Components of each model. VK-TR refers to our approach. VK-T uses the same transformer architecture as the proposed approach but does not use ResNet. VK-S and VK-LR use a Support Vector Machine with a radial basis function kernel and Logistic Regression, respectively. Image features extracted from only PCA are combined with kinematic information and input to VK-T, VK-S, and VK-LR.