Extended Data Fig. 4: Ablation of cross embodiment training results on SimplerEnv.
From: What matters in building vision–language–action models for generalist robots

We evaluate four different training recipes. On the WidowX+Bridge environments, we test (1) Bridge Finetune finetunes the VLA directly on the full Bridge datasets (tested tasks not included); (2) OXE Co-Train Co-trains the VLA on OXE dataset; (3) Post-Train trains the OXE Co-trained VLA on Bridge datasets. On the Google Robot environments, we test (1) RT-Partial Finetune finetunes the VLA on tested RT tasks only; (2) RT Finetune finetunes the VLA on the full RT dataset (tested tasks included), along with (3) OXE Co-Train and (4) Post-Train on the tested RT tasks stage.