Extended Data Fig. 6: Alternative DoFs and actuators.

Performance of the modified fly model with all 102 DoFs enabled and position actuators replaced with torque actuators. a,b, Flight imitation task, same as in Fig. 2 (a) and walking imitation task, same as in Fig. 3 (b). Top, Middle: Percentiles of errors between the fly model and target fly CoM position and body orientation. Bottom: Learning curve comparison between the original (blue) and modified (red) fly model. Episode return (e.g., cumulative episode reward) vs MuJoCo control steps during training is shown. The training is slower for the modified fly model. For flight, the episode return at end of training is similar in both cases. For walking, an additional multiplicative reward term is required to keep the (now enabled) wing DoFs in folded position and it causes most of the discrepancy between the two learning curves. This reward term is only approximately satisfied, causing a reduction of the episode return by a factor of ~0.69 in the trained model.