Extended Data Table 5 The performance of the built VLAs based on VLMs with different image token numbers and VL pre-train data scales
From: What matters in building vision–language–action models for generalist robots

From: What matters in building vision–language–action models for generalist robots
