Fig. 5: Comparison of F-scores between the original motion task and tasks involving new environments across different models.

The original motion refers to the task set used when constructing the proposed model. The tasks of squatting to pick up new objects (A and B) and climbing a step with new stairs (C and D) involve novel colors and shapes for generalization testing. The original model represents the performance of our proposed model, which experienced only the original motion during training. The finetuned model refers to the model obtained by fine-tuning the original model with a small amount of data from the step-climbing tasks with new stairs (C and D). Notably, when a separate model was trained from scratch using only the same limited data from climbing steps with new stairs, the F-scores for the two new stair types, (C and D), were 0.71 and 0.61, respectively, substantially lower than the scores of 0.95 and 0.95 achieved by the finetuned model. These results demonstrate that leveraging the pre-trained model significantly improves performance compared to training from scratch.