Fig. 6: Transfer to locations outside the training domain and interpretation of RL policies. | Nature Communications

Fig. 6: Transfer to locations outside the training domain and interpretation of RL policies.

From: Sensing flow gradients is necessary for learning autonomous underwater navigation

Fig. 6: Transfer to locations outside the training domain and interpretation of RL policies.The alternative text for this image may have been generated using AI.

A Geocentric and (B) egocentric agents starting from initial conditions unseen during training: geocentric agents fail upstream of the target location but outperform egocentric agents downstream of the target (Supplementary Movie 3). Success rates and consumed time of (C, E) geocentric and (D, F) egocentric agents reaching a fixed target (*) starting anywhere in the wake (Green colormap) with 100% success of both policies within the training domain (black circle), 58% and 66% in favor of egocentric policy outside the training domain, and overall 60% and 68% success across the entire domain. Both policies fail downstream: solid lines marking failure of the geocentric policy align with the direction of the “time-optimal” strategy (Supplementary Fig. 2A) and those of the egocentric policy align with the direction of the “drift-optimal” strategy (Supplementary Fig. 2B). The field of “preferred orientations” defined by the stable fixed points of the average policy for (G) geocentric and (H) egocentric agents explains the behavior of the trained agent inside and outside the training domain. Preferred orientations that align with time-optimal and drift-optimal strategies are highlighted in green and orange, respectively.

Back to article page